If the primitive restart index and the primitive type can
be handled by the cut index feature, then use the hardware
to handle the primitive restart feature.
The VBO module's software handling of primitive restart is
used as a fall back.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When brw->prim_restart.enable_cut_index is set, the cut index
will be enabled when uploading index_buffer commands.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For newer hardware we disable the VBO module's software handling
of primitive restart. We now handle primitive restarts in
brw_handle_primitive_restart.
The initial version of brw_handle_primitive_restart simply calls
vbo_sw_primitive_restart, and therefore still uses the VBO
module software primitive restart support.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The VBO module now can handle primitive restart in software
if required. Therefore this support is no londer required.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If the PIPE_CAP_PRIMITIVE_RESTART screen param is not set, then enable
PrimitiveRestartInSoftware to enable software primitive restart
support in the VBO module.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When PrimitiveRestartInSoftware is set, the VBO module will handle
primitive restart scenarios before calling the vbo->draw_prims
drawing function.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If set, then the VBO module will handle all primitive
restart scenarios before calling the driver draw_prims.
Software primitive restart support is disabled by default.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
vbo_sw_primitive_restart implements primitive restart in software
by splitting primitive draws apart.
This is based on similar support in mesa/state_tracker/st_draw.c.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's an implied argument, and I don't think being explicit about it
helps.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The comment quotes spec saying that only scalar integers are allowed,
but we only checked for integer.
Fixes piglit switch-expression-const-ivec2.vert
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Total instructions: 261582 -> 261316
135/2147 programs affected (6.3%)
36752 -> 36486 instructions in affected programs (0.7% reduction)
This excludes a tropics shader that now gets 16-wide mode and throws
off the numbers. 5 shaders are hurt: two extra MOVs in 4 tropics
shaders it looks like because we don't split register names according
to independent webs, and one gstreamer shader where it looks like
try_rewrite_rhs_to_dst() is falling on its face.
This should also help avoid a regression in VSes from idr's ARB
programs to GLSL work.
By using the live variables code for determining interference, we can
handle coalescing in the presence of control flow, which the other
register coalescing path couldn't.
Total instructions: 207184 -> 206990
74/1246 programs affected (5.9%)
33993 -> 33799 instructions in affected programs (0.6% reduction)
There is a newerth shader that loses out, because of some extra MOVs
that now get their dead-code nature obscured by coalescing. This
should be fixed by doing better at dead code elimination.
Starting with LLVM 3.0, named structures are meant not for debugging, but
for recursive data types, previously also known as opaque types.
The recursive nature of these types leads to several memory management
difficulties. Given that we don't actually need recursive types, avoid
them altogether.
This is an attempt to address fdo bugs 41791 and 44466. The issue is
somewhat random so there's no easy way to check how effective this is.
No functional change. This patch replaces the
brw_blorp_params::exec() method with a global function
brw_blorp_exec() that performs the operation described by the params
data structure.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch enables MSAA for Gen6, by modifying intel_mipmap_tree to
understand multisampled buffers, adapting the rendering pipeline setup
to enable multisampled rendering, and adding multisample resolve
operations to brw_blorp_blit.cpp. Some preparation work is also
included for Gen7, but it is not yet enabled.
MSAA support is still fairly preliminary. In particular, the
following are not yet supported:
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centroid interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
GL_SAMPLE_COVERAGE_INVERT).
Fixes piglit tests "EXT_framebuffer_multisample/accuracy" on
i965/Gen6.
v2:
- In intel_alloc_renderbuffer_storage(), quantize the requested number
of samples to the next higher sample count supported by the
hardware. This ensures that a query of GL_SAMPLES will return the
correct value. It also ensures that MSAA is fully disabled on Gen7
for now (since Gen7 MSAA support doesn't work yet).
- When reading from a non-MSAA surface, ensure that s_is_zero is true
so that we won't try to read from a nonexistent sample.
This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves. The new blitting code is located in
brw_blorp_blit.cpp. This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).
Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target. However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets. So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.
All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.
In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers. The current fallback sequence is: first we try to do
a blit using the hardware blitting engine. If that fails we try to do
a blit using the render path. If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).
Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring. These limitations will be addressed in future
patch series.
v2:
- Add the code that configures the WM program to
gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.
This patch exposes the functions brw_get_surface_tiling_bits and
gen7_set_surface_tiling, so that they can be re-used when setting up
surface states in gen6_blorp.cpp and gen7_blorp.cpp.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This patch splits up the gen6_blorp_exec and gen7_blorp_exec
functions, which were very long, into simple component functions.
With a few exceptions, there is one function per state packet.
This will allow blit functionality to be added without significantly
complicating the code.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
v2: Rename the functions gen{6,7}_emit_wm_disable() to
gen{6,7}_emit_wm_config() (since the WM is not actually disabled
during HiZ ops; it simply doesn't have a program). Also, on gen7,
split out the configration of 3DSTATE_PS to a separate function
gen7_emit_ps_config().
This patch groups together the parameters used by the HiZ functions
into a new data structure, brw_hiz_resolve_params, rather than passing
each parameter individually between the HiZ functions. This data
structure is a subclass of brw_blorp_params, which represents the
parameters of a general-purpose blit or resolve operation. A future
patch will add another subclass for blits.
In addition, this patch generalizes the (width, height) parameters to
a full rect (x0, y0, x1, y1), since blitting operations will need to
be able to operate on arbitrary rectangles. Also, it renames several
of the HiZ functions to reflect the expanded role they will serve.
v2: Rename brw_hiz_resolve_params to brw_hiz_op_params. Move
gen{6,7}_blorp_exec() functions back into gen{6,7}_blorp.h.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
When drawing to a FBO, the viewport wasn't always set correctly. It
was fine in the usual case of the viewport dims matching the surface
dims but broken otherwise. In particular, this was happening because
the viewport scale is negative for FBO rendering.
The piglit fbo-viewport test exercises this.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We can save one instruction by lowering it to:
SUB_INT tmp, 0, src
MAX_INT dst, src, tmp
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This patch adds .gitignore files to ignore the makefiles generated by
the gallium pipe loader and the clover OpenCL state tracker.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Previously, I tried implementing this in the i965 driver, but did so
in a way that violated the intent of the spec, and broke Tropics.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be convenient when I want to comment out optimization code
to see the raw program being optimized, but more importantly will let
the interference check be used during optimization.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
We could do more by handling abs/negate and non-GRF sources, but this is
a good start. Improves tropics performance 0.30% +/- .17% (n=43).
shader-db results:
Total instructions: 208032 -> 207184
60/1246 programs affected (4.8%)
23286 -> 22438 instructions in affected programs (3.6% reduction)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When I had a bug causing the backend to never finish optimizing, it
also sent me deep into swap. This avoids extra memory allocation per
trip through optimization, and thus may reduce the peak memory
allocation of the driver even in the success case.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Total instructions: 18210 -> 17836
49/163 programs affected (30.1%)
12888 -> 12514 instructions in affected programs (2.9% reduction)
This reduces Lightsmark's "Scale down filter" shader from 395
instructions to 283, a whopping 28%. It also reduces register pressure
significantly: the SIMD8 program now uses 29 registers instead of 101,
giving us more than enough room for a SIMD16 program.
v2: Add && !inst->conditional_mod to the "skip some instructions" check.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This lets you omit some ampersands and is more idiomatic C++. Using
const also marks the function as not altering either register (which
was obvious, but nice to enforce).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This allows us to calculate the triangle's area using fixed point,
previously it was cacluated in floating point space. It was possible
that a triangle which had negative area in floating point space had
a positive area in fixed point space.
Fixes fdo 40920.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
As pointed out by Marek, if we have only one cb, we may as well add this
single register write here rather than adding it in the draw loop.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Vertex and index buffers are never used by hardware, only by Draw.
SWTCL chipsets usually have very little memory, so this might help
with stability and reliability.
Instead of having to hack the code to enable these debugging options,
set them through the MESA_DEBUG env var.
Reviewed-by: Eric Anholt <eric@anholt.net>
This flag has been around for a while but it wasn't actually used anywhere.
Now, setting this flag causes a glFlush() to be issued after each
drawing call (including glBegin/End, glDrawElements, glDrawArrays,
glDrawPixels, glCopyPixels and glBitmap).
This was being done in the _mesa_Flush/Finish() calls but if there
was an internal call to _mesa_flush/finish() the FLUSH_VERTICES()
wouldn't happen. Looks like only the intel and radeon drivers made
such calls in MakeCurrent().
When glColorMaterial() is used to latch glColor commands to a material
attribute, glMaterial calls to change that material should become no-ops.
This failed to work properly when the glMaterial call was inside a
display list.
This removes the Material function from the vbo_attrib_tmp.h template
file. We have separate/different implementations for the "save" and
"exec" cases now.
NOTE: This is a candidate for the 8.0 branch.
_mesa_material_bitmask() will record a GL error and return 0 if
face or mode are illegal. Return early in that case.
NOTE: This is a candidate for the 8.0 branch.
Some code relies on the existing of an invalid texture target. It seems
safer to bring it back than to deal with unintended consequences.
This partially reverts commit a4ebb04214.
Reviewed-by: Brian Paul <brianp@vmware.com>
Contains the following patches squashed in:
commit 9fff1dc0875f7c9591550fa3ebbe1ba7a18483fa
Author: Tom Stellard <thomas.stellard@amd.com>
Date: Tue Mar 20 23:20:03 2012 +0100
configure.ac: Build gallium loader when OpenCL is enabled
commit 542111cb02957418c6a285cb6ef2924e49adc66e
Author: Tom Stellard <thomas.stellard@amd.com>
Date: Tue Mar 20 23:30:29 2012 +0100
configure.ac: Add sw/null to GALLIUM_WINSYS_DIRS for gallium loader
commit 876f8de46062dde76b6075be3b6628f969b16648
Author: Tom Stellard <thomas.stellard@amd.com>
Date: Thu Feb 9 11:26:05 2012 -0500
configure.ac: Require gcc > 4.6.0 for clover
commit 99049d50fa3d9a23297ae658189c19c89dca1766
Author: Tom Stellard <thomas.stellard@amd.com>
Date: Tue Mar 20 23:32:06 2012 +0100
configure.ac: Require Gallium drm loader when gallium loader is enabled
No longer silently exclude this when building OpenCL drivers
for nouveau and r600.
Add a test program that tries to exercise some of the language
features commonly used by compute programs at the Gallium API level:
- Correctness of the values returned by the grid parameters.
- Proper functioning of resource LOADs and STOREs.
- Subroutine calls.
- Argument passing to the compute parameter through the INPUT
memory space.
- Mapping of buffer objects to the GLOBAL memory space.
- Proper functioning of the PRIVATE and LOCAL memory spaces.
- Texture sampling and constant buffers.
- Support for multiple kernels in the same program.
- Indirect resource indexing.
- Formatted resource loads and stores (i.e. with channel conversion
and scaling) using several different formats.
- Proper functioning of work-group barriers.
- Atomicity and semantics of the atomic opcodes.
As of now all of them seem to pass on my nvA8.
It simplifies things slightly, and besides, it makes possible to
execute the trivial tests on a hardware device instead of being
limited to software rendering.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
This target generates pipe driver modules intended to be consumed by
auxiliary/pipe-loader. Most of it was taken from the "gbm" target --
the duplicated code will be replaced with references to this target in
a future commit.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
The goal is to have a uniform interface to create winsys and
pipe_screen instances for any driver, exposing the device enumeration
capabilities that might be supported by the operating system (for now
there's a "drm" back-end using udev and a "sw" back-end that always
returns the same built-in devices).
The typical use case of this library will be:
>
> struct pipe_loader_device devs[n];
> struct pipe_screen *screen;
>
> pipe_loader_probe(&devs, n);
>[pick some device from the array...]
>
> screen = pipe_loader_create_screen(dev, library_search_path);
>[do something with screen...]
>
> screen->destroy(screen);
> pipe_loader_release(&devs, N);
>
A part of the code was taken from targets/gbm/pipe_loader.c, which
will be removed and replaced with calls into this library by a future
commit.
Add a shader cap for specifying the preferred shader representation.
Right now the only supported value is TGSI, other enum values will be
added as they are needed.
This is mainly to accommodate AMD's LLVM compiler back-end by letting
it bypass the TGSI representation for compute programs. Other drivers
will keep using the common TGSI instruction set.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This change will be useful to implement function parameter passing on
top of TGSI. As we don't have a proper stack, a register-based
calling convention will be used instead, which isn't necessarily a bad
thing given that GPUs often have plenty of registers to spare.
Using the same register space for local temporaries and
inter-procedural communication caused some inefficiencies, because in
some cases the register allocator would lose the freedom to merge
temporary values together into the same physical register, leading to
suboptimal register (and sometimes, as a side effect, instruction)
usage.
The LOCAL declaration modifier specifies that the value isn't intended
for parameter passing and as a result the compiler doesn't have to
give any guarantees of it being preserved across function boundaries.
Ignoring the LOCAL flag doesn't change the semantics of a valid
program in any way, because local variables are just supposed to get a
more relaxed treatment. IOW, this should be a backwards-compatible
change.
Normal resource access (e.g. the LOAD TGSI opcode) is supposed to
perform a series of conversions to turn the texture data as it's found
in memory into the target data type.
In compute programs it's often the case that we only want to access
the raw bits as they're stored in some buffer object, and any kind of
channel conversion and scaling is harmful or inefficient, especially
in implementations that lack proper hardware support to take care of
it -- in those cases the conversion has to be implemented in software
and it's likely to result in a performance hit even if the pipe_buffer
and declaration data types are set up in a way that would just pass
the data through.
Add a declaration flag that marks a resource as typeless. No channel
conversion will be performed in that case, and the X coordinate of the
address vector will be interpreted in byte units instead of elements
for obvious reasons.
This is similar to D3D11's ByteAddressBuffer, and will be used to
implement OpenCL's constant arguments. The remaining four compute
memory spaces can also be understood as raw resources.
This texture type was already referred to by the documentation but it
was never defined. Define it as 0 to match the pipe_texture_target
enumeration values.
Move Interpolate, Centroid and CylindricalWrap from tgsi_declaration
to a separate token -- they only make sense for FS inputs and we need
room for other flags in the top-level declaration token.
This commit splits the current concept of resource into "sampler
views" and "shader resources":
"Sampler views" are textures or buffers that are bound to a given
shader stage and can be read from in conjunction with a sampler
object. They are analogous to OpenGL texture objects or Direct3D
SRVs.
"Shader resources" are textures or buffers that can be read and
written from a shader. There's no support for floating point
coordinates, address wrap modes or filtering, and, unlike sampler
views, shader resources are global for the whole graphics pipeline.
They are analogous to OpenGL image objects (as in
ARB_shader_image_load_store) or Direct3D UAVs.
Most hardware is likely to implement shader resources and sampler
views as separate objects, so, having the distinction at the API level
simplifies things slightly for the driver.
This patch introduces the SVIEW register file with a declaration token
and syntax analogous to the already existing RES register file. After
this change, the SAMPLE_* opcodes no longer accept a resource as
input, but rather a SVIEW object. To preserve the functionality of
reading from a sampler view with integer coordinates, the
SAMPLE_I(_MS) opcodes are introduced which are similar to LOAD(_MS)
but take a SVIEW register instead of a RES register as argument.
Define an interface that exposes the minimal functionality required to
implement some of the popular compute APIs. This commit adds entry
points to set the grid layout and other state required to keep track
of the usual address spaces employed in compute APIs, to bind a
compute program, and execute it on the device.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
This patch renames the gen6_hiz.h and gen7_hiz.h files to correspond
to the renames of the corresponding .cpp files (see previous commit).
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This patch converts the files gen6_hiz.c and gen7_hiz.c to C++, in
preparation for expanding the HiZ code to support arbitrary blits.
The new files are called gen6_blorp.cpp and gen7_blorp.cpp to reflect
the expanded role that this code will serve--"blorp" stands for "BLit
Or Resolve Pass".
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Previous to this patch, gen6_hiz.c contained two implicit type casts
from void * to a a non-void pointer type. This is allowed in C but
not in C++. This patch makes the type casts explicit, so that
gen6_hiz.c can be converted into a C++ file.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
In C++, if a struct is defined inside another struct, or its name is
first seen inside a struct or function, the struct is nested inside
the namespace of the struct or function it appears in. In C, all
structs are visible from toplevel.
This patch explicitly moves the decalartions of intel_batchbuffer to
toplevel, so that it does not get nested inside a namespace when
header files are included from C++.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
These declarations are necessary to allow C++ code to call C code
without causing unresolved symbols (which would make the driver fail
to load).
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Fix wrong cube/3D texture layout for the tailing levels whose width or
height is smaller than the align unit.
From 965 B-spec http://intellinuxgraphics.org/VOL_1_graphics_core.pdf at
page 135:
All of the LOD=0 q-planes are stacked vertically, then below that,
the LOD=1 qplanes are stacked two-wide, then the LOD=2 qplanes are
stacked four-wide below that, and so on.
Thus we should always inrease pack_x_nr, which results to the pitch of LODn
may greater than the pitch of LOD0. So we should refactor mt->total_width
when needed.
This would fix the following webgl test case on all gen4 platforms:
conformance/textures/texture-size-cube-maps.html
NOTE: This is a candidate for stable release branches.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This points to the object with the function body, allowing us to map
from a built-in prototype to the actual body with IR code to execute.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
- copy_masked_offset copies part of a constant into another,
assign-like.
- copy_offset copies a constant into (a subset of) another,
funcall-return like.
These methods are to be used to trace through assignments and function
calls when computing a constant expression.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
The method is used to get a reference to an ir_constant * within the
context of evaluating an assignment when calculating a
constant_expression_value.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
We were looping over all the vector components, but only dealing with
the first one. This was masked by the fact that constant expression
handling on built-ins went through custom code for the lessThan()
/function/ rather than the ir_binop_less expression operator.
NOTE: This is a candidate for all release branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The vbo module recomputes its states if _NEW_ARRAY is set, so it shouldn't use
the same flag to notify the driver. Since we've run out of bits in NewState
and NewState is for core Mesa anyway, we need to find another way.
This patch is the first to start decoupling the state flags meant only
for core Mesa and those only for drivers.
The idea is to have two flag sets:
- gl_context::NewState - used by core Mesa only
- gl_context::NewDriverState - used by drivers only (the flags are defined
by the driver and opaque to core Mesa)
It makes perfect sense to use NewState|=_NEW_ARRAY to notify the vbo module
that the user changed vertex arrays, and the vbo module in turn sets
a driver-specific flag to notify the driver that it should update its vertex
array bindings.
The driver decides which bits of NewDriverState should be set and stores them
in gl_context::DriverFlags. Then, Core Mesa can do this:
ctx->NewDriverState |= ctx->DriverFlags.NewArray;
This patch implements this behavior and adapts st/mesa.
DriverFlags.NewArray is set to ST_NEW_VERTEX_ARRAYS.
Core Mesa only sets NewDriverState. It's the driver's responsibility to read
it whenever it wants and reset it to 0.
Reviewed-by: Brian Paul <brianp@vmware.com>
In the future we'd like to treat vertex arrays as a state and
not as a parameter to the draw function. This is the first step
towards that goal. Part of the goal is to avoid array re-validation
for every draw call.
This commit adds:
const struct gl_client_array **gl_context::Array::_DrawArrays.
The pointer is changed in:
* vbo_draw_method
* vbo_rebase_prims - unused by gallium
* vbo_split_prims - unused by gallium
* st_RasterPos
Reviewed-by: Brian Paul <brianp@vmware.com>
Replacing "float equal to 1.0f" with "int not equal to 0".
This should help for further optimization of boolean computations.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
We're using float as default type, so basically for every instruction that
wants other types for dst/src operands we need to perform the bitcast
to/from default float. Currently bitcast produces no-op MOV instruction,
will be eliminated later.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
In i965 Gen7, Mesa has for a long time used the "depth coordinate
offset X/Y" settings (in 3DSTATE_DEPTH_BUFFER) to cause the GPU to
render to miplevels other than 0. Unfortunately, this doesn't work,
because these offsets must be aligned to multiples of 8, and miplevels
in the depth buffer are only guaranteed to be aligned to multiples of
4. When the offsets aren't aligned to a multiple of 8, the GPU
sometimes hangs.
As a temporary measure, to avoid GPU hangs, this patch smashes the 3
LSB's of "depth coordinate offset X/Y" to 0. This results in
incorrect rendering to mipmapped depth textures, but that seems like a
reasonable stopgap while we figure out a better solution.
Avoids GPU hangs in piglit test "depthstencil-render-miplevels" at
texture sizes that are not powers of 2.
Reviewed-by: Chad Verace <chad.versace@linux.intel.com>
In i965 Gen6, Mesa has for a long time used the "depth coordinate
offset X/Y" settings (in 3DSTATE_DEPTH_BUFFER) to cause the GPU to
render to miplevels other than 0. Unfortunately, this doesn't work,
because these offsets must be aligned to multiples of 8, and miplevels
in the depth buffer are only guaranteed to be aligned to multiples of
4. When the offsets aren't aligned to a multiple of 8, the GPU
sometimes hangs.
As a temporary measure, to avoid GPU hangs, this patch smashes the 3
LSB's of "depth coordinate offset X/Y" to 0. This results in
incorrect rendering to mipmapped depth textures, but that seems like a
reasonable stopgap while we figure out a better solution.
(Note that we have only ever observed this GPU hang on Gen6 when HiZ
is enabled, so another possible stopgap would be to disable HiZ).
Avoids GPU hangs in piglit test "depthstencil-render-miplevels" at
texture sizes that are not powers of 2.
Reviewed-by: Chad Verace <chad.versace@linux.intel.com>
When the user attaches a texture to one of the depth/stencil
attachment points (GL_STENCIL_ATTACHMENT or GL_DEPTH_ATTACHMENT), we
check to see if the same texture is also attached to the other
attachment point, and if so, we re-use the existing texture
attachment. This is necessary to ensure that if the user later
queries what is attached to GL_DEPTH_STENCIL_ATTACHMENT, they will not
receive an error.
If, however, the user attaches buffers to the two different attachment
points using different parameters (e.g. a different miplevel), then we
can't re-use the existing texture attachment, because it is pointing
to the wrong part of the texture. This might occur as a transitory
condition if, for example, if the user attached miplevel zero of a
texture to GL_STENCIL_ATTACHMENT and GL_DEPTH_ATTACHMENT, rendered to
it, and then later attempted to attach miplevel one of the same
texture to GL_STENCIL_ATTACHMENT and GL_DEPTH_ATTACHMENT.
This patch causes Mesa to check that GL_STENCIL_ATTACHMENT and
GL_DEPTH_ATTACHMENT use the same attachment parameters before
attempting to share the texture attachment.
On i965 Gen6, fixes piglit tests
"texturing/depthstencil-render-miplevels 1024 depth_stencil_shared"
and "texturing/depthstencil-render-miplevels 1024
stencil_depth_shared".
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
When rendering to a miplevel other than 0 within a color, depth,
stencil, or HiZ buffer, we need to tell the GPU to render to an offset
within the buffer, so that the data is written into the correct
miplevel. We do this using a coarse offset (in pages), and a fine
adjustment (the so-called "tile_x" and "tile_y" values, which are
measured in pixels).
We have always computed the coarse offset and fine adjustment using
intel_renderbuffer_tile_offsets() function. This worked fine for
color and combined depth/stencil buffers, but failed to work properly
when HiZ and separate stencil were in use. It failed to work because
there is only one set of fine adjustment controls shared by the HiZ,
depth, and stencil buffers, so we need to choose tile_x and tile_y
values that are compatible with the tiling of all three buffers, and
then compute separate coarse offsets for each buffer.
This patch fixes the HiZ and separate stencil case by replacing the
call to intel_renderbuffer_tile_offsets() with calls to two functions:
intel_region_get_tile_masks(), which determines how much of the
adjustment can be performed using offsets and how much can be
performed using tile_x and tile_y, and
intel_region_get_aligned_offset(), which computes the coarse offset.
intel_region_get_tile_offsets() is still used for color renderbuffers,
so to avoid code duplication, I've re-worked it to use
intel_region_get_tile_masks() and intel_region_get_aligned_offset().
On i965 Gen6, fixes piglit tests
"texturing/depthstencil-render-miplevels 1024 X" where X is one of
(depth, depth_and_stencil, depth_stencil_single_binding, depth_x,
depth_x_and_stencil, stencil, stencil_and_depth, stencil_and_depth_x).
On i965 Gen7, the variants of
"texturing/depthstencil-render-miplevels" that contain a stencil
buffer still fail, due to another problem: Gen7 seems to ignore the 3
LSB's of the tile_y adjustment (and possibly also tile_x).
v2: Removed spurious comments. Added assertions to check
preconditions of intel_region_get_aligned_offset().
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This patch removes ARB_framebuffer_object from the GLES1 and GLES2
extension lists in intel_extensions_es.c.
Fixes a crash in the Android browser on Ice Cream Sandwich.
The Android browser crashed because it did the following, which is legal
in GLES2 but not in ARB_framebuffer_object.
glGenFramebuffers(1, &fb);
glBindFramebuffer(GL_FRAMEBUFFER, fb);
// render render render...
glDeleteFramebuffers(1, &fb);
// go do other stuff...
glBindFramebuffer(GL_FRAMEBUFFER, fb);
// This bind unexpectedly failed, and the app panics.
The semantics of glBindFramebuffer specified by ARB_framebuffer_object (a
desktop GL extension) and GLES2 specs are incompatible. The ideal solution
to fix this is to create separate API entry points for glBindFramebuffer,
one for GL and the other for GLES2. But, until that work is complete,
disabling ARB_framebuffer_object in GLES2 contexts safely fixes the problem.
Likewise, the semantics of glBindFramebuffer in ARB_framebuffer_object and
of glBindFramebufferOES in OES_framebuffer_object (a GLES1 extension) are
incompatible. Even though the functions have different names, the semantic
difference still results in a bug because both API calls are implemented
by a single function, _mesa_BindFramebufferEXT, which handles the semantic
difference incorrectly. Again, disabling ARB_framebuffer_object in GLES1
contexts safely fixes this problem.
According to the ARB_framebuffer_object spec, the extension is an
amalgamation of
EXT_framebuffer_object
EXT_framebuffer_blit
EXT_packed_depth_stencil
EXT_framebuffer_multisample
By disabling this extension, however, no functionality is removed from
GLES1 and GLES2 contexts because 1) the first three extensions are
explicitly enabled in Intel's ES extension lists and 2) no functionality
of the last extension is exposed in an ES context.
Note: This is a candidate for the 8.0 branch.
See-also: http://www.mail-archive.com/mesa-dev@lists.freedesktop.org/msg21006.html
CC: Charles Johnson <charles.f.johnson@intel.com>
CC: Sean Kelley <sean.v.kelley@intel.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
When primitive restart is enabled, and glArrayElement is called
with the restart index value, then call glPrimitiveRestartNV.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul<brianp@vmware.com>
The signal.h include was missed in the commit
bc16c73407 which leads to broken
compilations under Linux.
Signed-off-by: José Fonseca <jose.r.fonseca@gmail.com>
When doing the var->assigned change in
f2475ca424, I overzealously indented the
second block of code into the "if (var)" test. Revert these blocks to
the way they were before, just taking advantage of "var" to avoid
re-calling variable_referenced().
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49066
I only considered var->assigned for FragColor and FragData, but
ignored when it was false for out vars. Fixes piglit
write-gl_FragColor-and-not-user-output.frag
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49068
It seems silly that GL lets you allocate these given that they're
framebuffer attachment incomplete, but the webgl conformance tests
actually go looking to see if the getters on 0-width/height
depth/stencil renderbuffers return good values. By failing out here,
they all got smashed to 0, which turned out to be correct for all the
getters they tested except for GL_RENDERBUFFER_INTERNAL_FORMAT. Now,
by succeeding but not making a miptree, that one also returns the
expected value.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The index is also used for GL_ARB_blend_func_extended. Cloning in
i965 was dropping a non-ARB_explicit_attrib_location index.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
When glTexImage or glCopyTexImage is called with internalFormat being a
generic compressed format (like GL_COMPRESSED_RGB) we need to do the same
error checks as for specific compressed formats. In particular, check if
the texture target is compatible with the format. None of the texture
compression formats we support so far work with GL_TEXTURE_1D, for example.
See also https://bugs.freedesktop.org/show_bug.cgi?id=49124
NOTE: This is a candidate for the 8.0 branch.
Otherwise it fails like so:
CC egl_dri2.lo
In file included from egl_dri2.h:40:0,
from egl_dri2.c:42:
../../../../../../src/egl/wayland/wayland-drm/wayland-drm.h:8:41:
fatal error: wayland-drm-server-protocol.h: No such file or directory
compilation terminated.
This new gbm entry point allows writing data into a gbm bo. The bo has
to be created with the GBM_BO_USE_WRITE flag, and it's only required to
work for GBM_BO_USE_CURSOR_64X64 bos.
The gbm API is designed to be the glue layer between EGL and KMS, but there
was never a mechanism initialize a buffer suitable for use with KMS
hw cursors. The hw cursor bo is typically not compatible with anything EGL
can render to, and thus there's no way to get data into such a bo.
gbm_bo_write() fills that gap while staying out of the efficient
cpu->gpu pixel transfer business.
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
At this point, in order for OpenCL to work correctly with r600g, OpenCL
specific intrinsics need to be defined in the LLVM tree. So, we need
to check for these intrinsics in the LLVM include directory to make sure
not to re-define them.
This is a pseudo instruction that enables the LLVM backend to encode
instructions and pass it through r600_bytecode_build()
Signed-off-by: Tom Stellard <thomas.stellard@amd.com>
This moves the alpha test control to derived state and disables alpha
testing for integer fbs.
fbo-blending test in piglit gets further when we do this (not a pass
but less fail).
v2: drop the fb_sx_alpha_test_control
Signed-off-by: Dave Airlie <airlied@redhat.com>
- Move to lp_bld_const where it belongs
- Rename to lp_build_const_string
- take the length from the argument (and don't count the zero terminator twice)
- bitcast the constant to generic i8 *
Allows the creation of const aos masks which have the mask swizzled
to match the correct format.
Updated existing mask creation code to use the swizzled version where
necessary (tgsi register masks and llvmpipe aos blending).
Signed-off-by: José Fonseca <jfonseca@vmware.com>
With this feature enabled, the LLVM backend will dump the MachineIntrs
prior to emitting code. The mesa env variable R600_DUMP_SHADERS will enable
this feature in the backend.
Fix uninitialized scalar field defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We can't delete MASK_WRITE instructions from the program, because this
will cause instructions being masked by MASK_WRITE to be marked dead and
then deleted in the dce pass.
Enabled MESA_FORMAT_RGBX8888_REV for RGBX. Android software
requires RGBX8888 format to be supported for software rendering.
That requires EGL to be capable of generating images from this
format.
Signed-off-by: Sean V Kelley <sean.v.kelley@linux.intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add new format __DRI_IMAGE_FORMAT_XBGR8888 to __DRI_IMAGE.
HAL_PIXEL_FORMAT_RGBX_8888 now maps to __DRI_IMAGE_FORMAT_XBGR8888.
Signed-off-by: Sean V Kelley <sean.v.kelley@linux.intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Only images created with intel_create_image() had the field properly
set. Set it also on intel_dup_image(), intel_create_image_from_name()
and intel_create_image_from_renderbuffer().
GL_ARB_texture_storage says:
The commands eglBindTexImage, wglBindTexImageARB, glXBindTexImageEXT or
EGLImageTargetTexture2DOES are not permitted on an immutable-format
texture.
They will generate the following errors:
- EGLImageTargetTexture2DOES: INVALID_OPERATION
- eglBindTexImage: EGL_BAD_MATCH
- wglBindTexImage: ERROR_INVALID_OPERATION
- glXBindTexImageEXT: BadMatch
Fixing the EGL and GLX cases requires extending the DRI interface,
since setTexBuffer2 doesn't currently return any error information.
Reviewed-by: Brian Paul <brianp@vmware.com>
Adapted drivers: i915, llvmpipe, r300, r600, radeonsi, softpipe.
User index buffers have been disabled in nv30, nv50, nvc0 and svga to keep
things working.
This reduces CPU overhead in st_draw_vbo and removes a lot of unnecessary code
in that function which was required only to comply with the gallium interface,
but wasn't any useful really.
Adapted drivers: i915, llvmpipe, r300, softpipe.
No changes required in: r600, radeonsi.
User vertex buffers have been disabled in nv30, nv50, nvc0 and svga to keep
things working.
This is required for any serious constant buffer support.
Constant buffer offsets on ATI and NVIDIA DX10 and DX11 GPUs must be
a multiple of 256.
In OpenGL, this can be queried via GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT.
As noted in commit be4e46b21a,
this was missing before.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A little analysis shows that the worst-case value for "nr" is 17:
- base_mrf = 2 ... 2
- header present (say gen == 5) ... 4
- aa_dest_stencil_reg (stencil test) ... 5
- SIMD16 mode: += 4 * reg_width ... 13
- source_depth_to_render_target ... 15
- dest_depth_reg ... 17
This resulted in us setting base_mrf to 2 and mlen to 15. In other
words, we'd try to use m2..m16. But m16 doesn't exist pre-Gen6. Also,
the instruction scheduler data structures use arrays of size 16, so this
would cause us to access them out of bounds.
While the debugger system routine may need m0 and m1, we don't use it
today, so the simplest solution is just to move base_mrf back to 1.
That way, our worst case message fits in m1..m15, which is legal.
An alternative would be to fail on SIMD16 in this case, but that seems
a bit unfortunate if there's no real need to reserve m0 and m1.
Fixes new piglit test shaders/depth-test-and-write on Ironlake.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48218
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
To ensure that the alloca is at the top of the function body, otherwise
LLVM will not eliminate them, causing stack misalignment on 32bits.
Reviewed-by: James Benton <jbenton@vmware.com>
This is taken from the ogl-math project, with Inverse renamed to adj
(since it's not actually the inverse), transposed, and our types
plugged in. There are potential CSE opportunities in this code
(particularly for hardware with RCP but not DIV), but we should be
doing CSE anyway, so don't hand-optimize.
Fixes piglit inverse tests.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This takes advantage of the builtin compiler to generate IR into a
string, the same way we read GLSL for function prototypes for our
profiles.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I keep getting lost in the Makefile trying to figure out what to edit
to work on builtin_compiler or glsl_compiler.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It appears that when using 'ld' with the offset bits, address bounds
checking happens before the offset is applied, so parts of the drawing
in piglit texelFetchOffset() with a negative texcoord go black.
It appears that when using 'ld' with the offset bits, address bounds
checking happens before the offset is applied, so parts of the drawing
in piglit texelFetchOffset() with a negative texcoord go black.
This couldn't be split because it would break bisecting.
Summary:
* r300g,r600g: stop using u_vbuf
* r300g,r600g: also report that the FIXED vertex type is unsupported
* u_vbuf: refactor for use in the state tracker
* cso: wire up u_vbuf with cso_context
* st/mesa: conditionally install u_vbuf
This adds the ability to initialize u_vbuf_caps before creating u_vbuf itself.
It will be useful for determining if u_vbuf should be used or not.
Also adapt r300g and r600g.
This fixes an assertion failure since:
commit 81afdd20f3
vbo: don't check twice whether it's valid to render
FLUSH_CURRENT may set _NEW_CURRENT_ATTRIB.
Reviewed-by: Brian Paul <brianp@vmware.com>
If the source region for a glCopyPixels is completely outside the
source buffer bounds, no-op the copy. Fixes a failed assertion.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The LLVM backend can now be enabled for r600g by using the
--enable-r600-llvm-compiler configure flag. If you configure with this
flag, you can still use the default compiler by setting the envrionment
variable R600_USE_LLVM=0
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise HAVE_LLVM won't be included in the $(DEFINES) variable for
Automake generated Makefiles.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This shaves 2k off the final dri.so, and removes lots of pointless
NULL, 0 passing.
most like pointless - but it looked nicer to me.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The __glapi_gentable_set_remaining_noop() routine treats the _glapi_struct
as an array of _glapi_get_dispatch_table_size() pointers, so we have to
allocate _glapi_get_dispatch_table_size()*sizeof(void*) bytes rather
than sizeof(struct _glapi_struct) bytes.
Reviewed-by: Jeremy Huddleston <jeremyhu@apple.com>
Alexandre Demers sent me some cayman results with no major problems.
I'll rip out the env var in a week or so.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Full piglit run on my rv610 with no regressions.
This only leaves cayman, however my cayman is resisting my attempt
to get through a full piglit run.
Signed-off-by: Dave Airlie <airlied@redhat.com>
I've done a piglit run on rv740 and confirmed no regressions.
We don't get GL3 on r700 due to transform feedback being busted still.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This cap is used by u_blitter to decide if it can use integers
in vertex data.
fixes some crashes with glsl130 in piglit
Signed-off-by: Dave Airlie <airlied@redhat.com>
I've done a piglit run on my SUMO machine and I see no regressions.
Lots of things to fix (skip->fail), but hey maybe we can fix them
if we can see them.
I'll try and work my way across r600,700,cayman sometime if nobody
else gets to them.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The field wasn't actually used before and it's not used now either.
But this is a more logical place for it and will hopefully allow
doing smarter draw/array validation (per array object) in the future.
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Our previous live interval analysis just said that anything in a loop
was live for the whole loop. If you had to spill a reg in a loop,
then we would consider the unspilled value live across the loop too,
so you never made progress by spilling. Eventually it would consider
everything in the loop unspillable and fail out.
With the new analysis, things completely deffed and used inside the
loop won't be marked live across the loop, so even if you
spill/unspill something that used to be live across the loop, you
reduce register pressure. But you usually don't even have to spill
any more, since our intervals are smaller than before.
This fixes assertion failure trying to compile the shader for the
"glyphy" text rasterier and piglit glsl-fs-unroll-explosion.
Improves Unigine Tropics performance 1.3% +/- 0.2% (n=5), by allowing
more shaders to be compiled in 16-wide mode.
This takes the fs_inst list generated by the visitor, and generates a
list of basic blocks with edges between them. This is a building
block for data-flow analysis.
We were checking for these at link time previously, which is not as
early as mandated, and would actually fail to detect conflicting
writes if dead code removal removed some writes.
Fixes failures in piglit
glsl-*/compiler/fragment-outputs/write-gl_Frag*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used for some compile-and-link-time error checking, where
currently we've been doing error checking only at link time.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This runs optimization-test and produces the usual automake test
output, which may be interesting to automated build systems.
This doesn't convert the tests to be individually exposed to the
automake runner, because automake doesn't like wildcards (due to being
nonportable in make, not that we care).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is the reason the declaration member existed in the reference
visitor, but I didn't copy the code from structure splitting that
avoided setting it.
This wasn't currently a problem, because we don't allow splitting of
in/out variables. But that would be nice to change some day.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was carried over from structure splitting, without thinking about
whether the name still made sense in this context.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes encoding of VOP3 shader instructions.
The shift was wrong for source registers 2 and 3, and the resulting value was
only 32 bits, so the shift in SICodeEmitter::VOPPostEncode() didn't work as
intended.
If a non-default array object was bound at context destruction time
we'd try to unreference the array object after it was already deleted
in _mesa_free_varray_data(). Now do the unref first.
Fixes a regression from commit 86f53e6d6b.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
It's not nice when you have several variables pointing to the same array
and you wanna ask your editor "where is this used" and you only get an answer
for one of the four currval, legacy_currval, generic_currval, mat_currval,
which is quite useless, because you never see the whole picture.
Let's get rid of the additional pointers.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
It's already done in _mesa_validate_Draw* and it's not needed to do it again
unless I am missing something.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
This is a frequently-updated state and _NEW_ARRAY already causes revalidation
of the vbo module. It's kinda counter-productive to recompute arrays
in the vbo module if _NEW_ARRAY is set and then set _NEW_ARRAY again.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
This moves the RebindArrays flag into the vbo module, consolidates the code,
and adds missing vbo_draw_method calls.
Also with this change, the vertex arrays are not needlessly recalculated twice.
The issue with the old code was:
- If recalculate_input_bindings updates vp_varying_inputs, _NEW_ARRAY is set.
- _mesa_update_state is called and the vp_varying_inputs change causes
regeneration of the fixed-function shaders, which also sets _NEW_PROGRAM.
- The occurence of either _NEW_ARRAY or _NEW_PROGRAM sets
the recalculate_inputs flag to TRUE again.
- The new code sets the flag to FALSE after the second _mesa_update_state,
because there can't possibly be any change which would require recalculating
the arrays.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Vinson reported that we failed to initialize this, which would lead to
all kinds of crashes if we actually used it. Since we don't use it,
we may as well just delete the broken code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we use separate binding tables for WM, VS, and GS, and have
BRW_MAX_VS_SURFACES and BRW_MAX_GS_SURFACES macros, we really shouldn't
have an unqualified BRW_MAX_SURFACES macro. It's confusing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
They had a number of issues:
- A paragraph states that we use a single binding table, but we don't.
- We labelled the WM binding table diagram as SOL/WM.
- The WM diagram had an "Only relevant to the WM" comment. Duh.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This change uses the array object factory for gl_array_objects. This
prevents crashes when deriving from gl_array_object.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
We don't normally clear immediately after drawing something. But as it
was, the drawing would incorrectly appear after the clear.
Fixes piglit clear-varray-2.0 failure.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Deletes a lot of pointless duplication, as well as some run-time effort.
Conveniently, GLSL 1.40 no longer needs a .vert variant, since it
doesn't define any built-ins specific to the vertex shader stage.
ARB_texture_rectangle and OES_EGL_image_external also only need a single
profile, since the .vert and .frag variants were identical.
I didn't bother with EXT_texture_array and OES_texture_3D because
they're so tiny that the savings would be miniscule.
Cuts the generated builtin_function.cpp from 1.7MB to 1.0MB (41%).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
The built-in subsystem uses "profiles," or GLSL shaders containing
prototypes for all built-ins supported within a particular language
version (or extension) and shader stage.
Since profiles were stage-specific, we had to cut and paste almost all
the prototypes between (e.g.) 110.vert and 110.frag. Naturally, this
led to sundry cut and paste bugs, where someone fixed an issue in .frag
but neglected to update .vert, or vice-versa. Geometry shaders would
have only made this worse.
This patch introduces support for a new '.glsl' profile suffix which
contains prototypes common to all shader stages. The existing '.frag'
and '.vert' profiles need only contain the few stage-specific built-ins.
Not only does this remove duplication, it makes built-in setup slightly
faster: we don't need to re-read the common prototypes and function
bodies for both the vertex and fragment shader stage.
Internally, this was trivial. We already create a list of gl_shader
objects to search through for built-ins: one for the core language
version/stage, and additional shaders for any extensions in use. This
patch simply adds another shader to the list: core/common, core/stage,
and extensions.
The next patch will update the profiles to remove the duplication.
It's separated out purely to make review easier.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Accelerates a few glReadPixels cases for WebGL.
See https://bugs.freedesktop.org/show_bug.cgi?id=48545
v2: Per Jose, use bit twiddling for the swizzle case instead of ubyte
arrays (it's about 44% faster).
Note: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The GLSL 1.30 -> 4.10 specs all erroneously say "vec2" for a few
overloads of textureProjGradOffset, while most overloads and all other
texturing functions use ivec types.
The GLSL 4.20 specification corrects these to "ivec2", but doesn't
mention this as being a conscious change in behavior. Nor does the
ARB_shading_language_420pack extension. So presumably it was a typo.
At any rate, our builtin functions all use ivec already, so the fact
that these prototypes use plain vecs will only lead to applications
dying in a fire when trying to use them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This reverts commit 4ec449a6ed.
I meant to not push this one. Review found that a link error is not
mandated: it should link, but you get undefined rendering if you rely
on a missing stage.
page 42/55 section 2.11 "Vertex Shaders":
"If the program object has no vertex shader, or no program object
is currently in use, the results of vertex shader execution are
undefined."
(and similar for page 160/173 section 3.9 "Fragment Shaders" for FS,
and page 45/58 section 2.11.2 "Program Objects" for program being 0)
It turns out the commit was broken anyway, because it was missing a
"goto done", so linkstatus got smashed back to true later and the
error just showed up as a warning in the infolog.
All I know of that needs finishing in Mesa is to enable the extension
in a GL3.1 core context on i965 -- we're not going to expose it in
non-3.1 core contexts.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes the new piglit texelFetch() tests on these. Note that the rest
of the new functions are not tested (same as the non-2DRect versions
of most of them).
The non-integer versions were already reserved in 1.30, but apparently
these were forgotten.
Fixes piglit glsl-1.40/compiler/reserved/
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Prevents this error with Automake 1.9:
src/gallium/drivers/Makefile.am: C objects in subdir but
`AM_PROG_CC_C_O' not in `configure.ac'
autoreconf: automake failed with exit status: 1
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
radeonsi and r600 have duplicate symbols, so it's not possible to
statically link both. Remove the newcomer, radeonsi, until duplicate
symbols are fixed.
Most things that work on Fermi should work on Kepler too.
There are a few performance optimizations left to do, like better
placement of texture barriers and adding scheduling data to the
shader instructions (without them, a thread group will be masked
for 32 cycles after each single instruction issue).
Edit: Don't do it for the main function of (graphics) shaders,
its inputs and outputs always go through TGSI_FILE_INPUT/OUTPUT.
This prevents all TEMPs from counting as live out and reduces
register pressure.
The point is to keep an independent dictionary for each function.
The array that was being used as dictionary has been converted into a
"bimap" for two different reasons: first, because having an almost
empty instance of an array with as many entries as registers there are
in the program, once for every function, would be wasteful, and
second, because we want to be able to map Value pointers back to
locations at some point.
The reason is that several passes (regalloc, function argument
binding, inlining) are going to require the callees of a function to
be processed before the caller.
Instruction attributes like WriteALUResult and ALUResultCompare
were being discarded during the some of the local transformations.
This fixes the following piglit tests:
glsl1-inequality (vec2, pass)
loopfunc
fs-any-bvec2-using-if
fs-op-ne-bvec2-bvec2-using-if
fs-op-ne-ivec2-ivec2-using-if
fs-op-ne-mat2-mat2-using-if
fs-op-ne-vec2-vec2-using-if
fs-op-ne-mat2x3-mat2x3-using-if
fs-op-ne-mat2x4-mat2x4-using-if
https://bugs.freedesktop.org/show_bug.cgi?id=45921
NOTE: This is a candidate for the stable branches.
The loop registers weren't being cleared, so any shader that was
executed after a shader containing loops was at risk of having a loop
randomly inserted into it.
This fixes over one hundred piglit tests, although these test
only failed during full piglit runs and would pass if
run individually. The exact number of piglit tests that this patch
fixes will vary depending on the version of piglit and the order the
tests are run.
NOTE: This is a candidate for the stable branches.
This lets us significantly shorten p->instructions->push_tail(ir), and
will be used in a few more places.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now we can fold a bunch of our expression setup in ff_fragment_shader
into single-line, parseable commits.
v2: Make it actually work. I wasn't setting num_components in the
mask structure, and not setting up a mask structure is way easier.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Having to explicitly dereference is irritating and bloats the code,
when the compiler can detect and do the right thing.
v2: Use a little shim class to produce the automatic dereference
generation at compile time as opposed to runtime, while also
allowing compile-time type checking.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The C++ constructors with placement new, while functional, are
extremely verbose, leading to generation of simple GLSL IR expressions
like (a * b + c * d) expanding to many lines of code and using lots of
temporary variables. By creating a new ir_builder.h that puts simple
generators in our namespace and taking advantage of ralloc_parent(),
we can generate much more compact code, at a minor runtime cost.
v2: Replace ir_instruction usage with just ir_rvalue.
v3: Drop remaining missed as_rvalue() in v2.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The primary motivation for this rewrite was to have a maintainable driver
going forward, as nvfx was quite horrible in a lot of ways.
The driver is heavily based on the design of the nv50/nvc0 3d drivers we
already have, and uses the same common buffer/fence code. It also passes
a HEAP more piglit tests than nvfx did, supports a couple more features,
and a few more to come still probably.
The CPU footprint of this driver is far far less than nvfx, and translates
into far greater framerates in a lot of applications (unless you're using
a CPU that's way way newer than the GPUs of these generations....)
Basically, we once again have a maintained driver for these chipsets \o/
Feel free to report bugs now!
This driver hasn't been maintained properly for a very long time, and for
many very good reasons. It's horrible.
A new driver supporting these chipsets will appear with the commits that
port vieux/nv50/nvc0 to libdrm_nouveau-2.0.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
TEXTURED_TRIANGLE and MULTITEX_TRIANGLE are both a bit special in that if
you use any other graph object in the meantime they'll forget their state
and spew a lovely METHOD_CNT error at you when you try to draw.
The pre-newlib driver has a flush_notify() hook which does this state
re-emit, and a number of random workarounds like extra flushes and state
dirtying after various operations to solve this issue.
I'm taking a slightly different approach to things instead, which has the
nice side-effect of removing the divergent code-paths for ttri/mtri, the
flush/dirty workarounds and the need for flush_notify. Also gives a few
FPS boost in OA, yay.
This is just a function to tell if a certain blend mode requires dual sources.
v2: move to inlines as per Brian's suggestion
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds the blend mode mapping, it also uses the var->index in the
glsl to tgsi convertor - this is the other half of my using 4 in the GLSL
compiler.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds index support to the GLSL compiler.
I'm not 100% sure of my approach here, esp without how output ordering
happens wrt location, index pairs, in the "mark" function.
Since current hw doesn't ever have a location > 0 with an index > 0,
we don't have to work out if the output ordering the hw requires is
location, index, location, index or location, location, index, index.
But we have no hw to know, so punt on it for now.
v2: index requires layout - catch and error
setup explicit index properly.
v3: drop idx_offset stuff, assume index follow location
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add implementations of the two API functions,
Add a new strings to uint mapping for index bindings
Add the blending mode validation for SRC1 + SRC_ALPHA_SATURATE
Add get for MAX_DUAL_SOURCE_DRAW_BUFFERS
v2:
Add check in valid_to_render to address case in spec ERRORS.
v3:
Add index to ir.h so this patch compiles on its own
fixup comment
v4: fixup Brian's comments
The GLSL patch will setup the indices.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This commit adds initial support for acceleration
on SI chips. egltri is starting to work.
The SI/R600 llvm backend is currently included in mesa
but that may change in the future.
The plan is to write a single gallium driver and
use gallium to support X acceleration.
This commit contains patches from:
Tom Stellard <thomas.stellard@amd.com>
Michel Dänzer <michel.daenzer@amd.com>
Alex Deucher <alexander.deucher@amd.com>
Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The following commits were squashed in:
======================================================================
radeonsi: Remove unused winsys pointer
This was removed from r600g in commit:
commit 96d882939d
Author: Marek Olšák <maraeo@gmail.com>
Date: Fri Feb 17 01:49:49 2012 +0100
gallium: remove unused winsys pointers in pipe_screen and pipe_context
A winsys is already a private object of a driver.
======================================================================
radeonsi: Copy color clamping CAPs from r600
Not sure if the values of these CAPS are correct for radeonsi, but the
same changed were made to r600g in commit:
commit bc1c836938
Author: Marek Olšák <maraeo@gmail.com>
Date: Mon Jan 23 03:11:17 2012 +0100
st/mesa: do vertex and fragment color clamping in shaders
For ARB_color_buffer_float. Most hardware can't do it and st/mesa is
the perfect place for a fallback.
The exceptions are:
- r500 (vertex clamp only)
- nv50 (both)
- nvc0 (both)
- softpipe (both)
We also have to take into account that r300 can do CLAMPED vertex colors only,
while r600 can do UNCLAMPED vertex colors only. The difference can be expressed
with the two new CAPs.
======================================================================
radeonsi: Remove PIPE_CAP_OUTPUT_READ
This CAP was dropped in commit:
commit 04e3240087
Author: Marek Olšák <maraeo@gmail.com>
Date: Thu Feb 23 23:44:36 2012 +0100
gallium: remove PIPE_SHADER_CAP_OUTPUT_READ
r600g is the only driver which has made use of it. The reason the CAP was
added was to fix some piglit tests when the GLSL pass lower_output_reads
didn't exist.
However, not removing output reads breaks the fallback for glClampColorARB,
which assumes outputs are not readable. The fix would be non-trivial
and my personal preference is to remove the CAP, considering that reading
outputs is uncommon and that we can now use lower_output_reads to fix
the issue that the CAP was supposed to workaround in the first place.
======================================================================
radeonsi: Add missing parameters to rws->buffer_get_tiling() call
This was changed in commit:
commit c0c979eebc
Author: Jerome Glisse <jglisse@redhat.com>
Date: Mon Jan 30 17:22:13 2012 -0500
r600g: add support for common surface allocator for tiling v13
Tiled surface have all kind of alignment constraint that needs to
be met. Instead of having all this code duplicated btw ddx and
mesa use common code in libdrm_radeon this also ensure that both
ddx and mesa compute those alignment in the same way.
v2 fix evergreen
v3 fix compressed texture and workaround cube texture issue by
disabling 2D array mode for cubemap (need to check if r7xx and
newer are also affected by the issue)
v4 fix texture array
v5 fix evergreen and newer, split surface values computation from
mipmap tree generation so that we can get them directly from the
ddx
v6 final fix to evergreen tile split value
v7 fix mipmap offset to avoid to use random value, use color view
depth view to address different layer as hardware is doing some
magic rotation depending on the layer
v8 fix COLOR_VIEW on r6xx for linear array mode, use COLOR_VIEW on
evergreen, align bytes per pixel to a multiple of a dword
v9 fix handling of stencil on evergreen, half fix for compressed
texture
v10 fix evergreen compressed texture proper support for stencil
tile split. Fix stencil issue when array mode was clear by
the kernel, always program stencil bo. On evergreen depth
buffer bo need to be big enough to hold depth buffer + stencil
buffer as even with stencil disabled things get written there.
v11 rebase on top of mesa, fix pitch issue with 1d surface on evergreen,
old ddx overestimate those. Fix linear case when pitch*height < 64.
Fix r300g.
v12 Fix linear case when pitch*height < 64 for old path, adapt to
libdrm API change
v13 add libdrm check
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
======================================================================
radeonsi: Remove PIPE_TRANSFER_MAP_PERMANENTLY
This was removed in commit:
commit 62f44f670b
Author: Marek Olšák <maraeo@gmail.com>
Date: Mon Mar 5 13:45:00 2012 +0100
Revert "gallium: add flag PIPE_TRANSFER_MAP_PERMANENTLY"
This reverts commit 0950086376.
It was decided to refactor the transfer API instead of adding workarounds
to address the performance issues.
======================================================================
radeonsi: Handle PIPE_VIDEO_CAP_PREFERED_FORMAT.
Reintroduced in commit 9d9afcb5ba.
======================================================================
radeonsi: nuke the fallback for vertex and fragment color clamping
Ported from r600g commit c2b800cf38.
======================================================================
radeonsi: don't expose transform_feedback2 without kernel support
Ported from r600g commit 15146fd1bc.
======================================================================
radeonsi: Handle PIPE_CAP_GLSL_FEATURE_LEVEL.
Ported from r600g part of commit 171be75522.
======================================================================
radeonsi: set minimum point size to 1.0 for non-sprite non-aa points.
Ported from r600g commit f183cc9ce3.
======================================================================
radeonsi: rework and consolidate stencilref state setting.
Ported from r600g commit a2361946e7.
======================================================================
radeonsi: cleanup setting DB_SHADER_CONTROL.
Ported from r600g commit 3d061caaed.
======================================================================
radeonsi: Get rid of register masks.
Ported from r600g commits
3d061caaed13b646ff40754f8ebe73f3d4983c5b..9344ab382a1765c1a7c2560e771485edf4954fe2.
======================================================================
radeonsi: get rid of r600_context_reg.
Ported from r600g commits
9344ab382a1765c1a7c2560e771485edf4954fe2..bed20f02a771f43e1c5092254705701c228cfa7f.
======================================================================
radeonsi: Fix regression from 'Get rid of register masks'.
======================================================================
radeonsi: optimize r600_resource_va.
Ported from r600g commit 669d8766ff.
======================================================================
radeonsi: remove u8,u16,u32,u64 types.
Ported from r600g commit 78293b99b2.
======================================================================
radeonsi: merge r600_context with r600_pipe_context.
Ported from r600g commit e4340c1908.
======================================================================
radeonsi: Miscellaneous context cleanups.
Ported from r600g commits
e4340c1908a6a3b09e1a15d5195f6da7d00494d0..621e0db71c5ddcb379171064a4f720c9cf01e888.
======================================================================
radeonsi: add a new simple API for state emission.
Ported from r600g commits
621e0db71c5ddcb379171064a4f720c9cf01e888..f661405637bba32c2cfbeecf6e2e56e414e9521e.
======================================================================
radeonsi: Also remove sbu_flags member of struct r600_reg.
Requires using sid.h instead of r600d.h for the new CP_COHER_CNTL definitions,
so some code needs to be disabled for now.
======================================================================
radeonsi: Miscellaneous simplifications.
Ported from r600g commits 38bf276348 and
b0337b679a.
======================================================================
radeonsi: Handle PIPE_CAP_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION.
Ported from commit 8b4f7b0672.
======================================================================
radeonsi: Use a fake reloc to sleep for fences.
Ported from r600g commit 8cd03b933c.
======================================================================
radeonsi: adapt to get_query_result interface change.
Ported from r600g commit 4445e170be.
clang warns on these:
stroker.c:626:19: warning: implicit conversion from enumeration
type 'VGPathCommand' to different enumeration type 'VGPathSegment'
[-Wconversion]
No change in the underlying value.
Reviewed-by: Brian Paul <brianp@vmware.com>
Noticed by clang:
brw_wm_surface_state.c:330:30: warning: initializer overrides prior
initialization of this subobject [-Winitializer-overrides]
[MESA_FORMAT_Z24_S8] = 0,
^
brw_wm_surface_state.c:326:30: note: previous initialization is here
[MESA_FORMAT_Z24_S8] = 0,
^
No functionality change, since the array is declared static so
it was zero-initialized by default.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Silences a clang warning:
format_pack.c:2546:30: warning: implicit conversion from 'int' to
'GLubyte' (aka 'unsigned char') changes value from 65535 to 255
[-Wconstant-conversion]
d[i] = d[i] ? 0xffff : 0x0;
~ ^~~~~~
Reviewed-by: Brian Paul <brianp@vmware.com>
Noticed by clang:
egl_st.c:57:50: warning: field precision should have type 'int',
but argument has type 'size_t' (aka 'unsigned long') [-Wformat]
ret = util_snprintf(path, sizeof(path), "%.*s/%s" UTIL_DL_EXT,
~~^~
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
C still treats array arguments exactly like pointer arguments.
By sheer coincidence, this still worked fine on 64-bit
machines where 2 * sizeof(float) == sizeof(void*), but not
on 32-bit.
Noticed by clang:
text.c:76:51: warning: sizeof on array function parameter will
return size of 'const VGfloat *' (aka 'const float *') instead of
'const VGfloat [2]' [-Wsizeof-array-argument]
memcpy(glyph->glyph_origin, glyphOrigin, sizeof(glyphOrigin));
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Noticed by clang:
eglimage.c:48:28: warning: argument to 'sizeof' in 'memset' call is
the same expression as the destination; did you mean to dereference
it? [-Wsizeof-pointer-memaccess]
memset(attrs, 0, sizeof(attrs));
~~~~~ ^~~~~
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Most of the 256 values in the 'generic_to_slot' table were supposed to
be initialized with the default value 0xff, but were left at zero
(from CALLOC_STRUCT()) instead.
Noticed by clang:
u_linkage.h:60:31: warning: argument to 'sizeof' in 'memset' call is the same expression as the destination;
did you mean to provide an explicit length? [-Wsizeof-pointer-memaccess]
memset(table, 0xff, sizeof(table));
~~~~~ ^~~~~
Also fix a signed/unsigned comparison and a comment typo here.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
container_of() can legally return anything, even invalid addresses
that cause segfaults, when 'sample' is an uninitialized pointer.
Bug exposed by clang.
NOTE: This is a candidate for the 8.0 branch.
Fix uninitialized scalar field defect reported by Coverity.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Commit 272bc48976 removed the damage implementation for the
wl_buffer_interface because that has been removed from git master of
Wayland. However this breaks building with the 0.85 branch of Wayland
because it would end up initialising the struct incorrectly.
For the time being it's quite convenient for some compositors to track
the 0.85 branch of Wayland because the protocol is stable but they
will also want to track the master branch of Mesa so that they can use
the gbm surface changes.
This patch adds a compile-time check for the version of Wayland so
that it can work with either Wayland master or the 0.85 branch.
krh: Edited to also account for API changes in 6802eaa68, which
removes the timestamp argument from wl_resource_destroy().
The upstream of gtest has decided that the intended usage model is for
projects to import the source and use it, which is reflected in their
recent removal of the gtest-config tool.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
By making a bool fs_reg only have a defined low bit (matching CMP
output), instead of being a full 0 or 1 value, we reduce the ANDs
generated in logic chains like:
if (v_texcoord.x < 0.0 || v_texcoord.x > texwidth ||
v_texcoord.y < 0.0 || v_texcoord.y > 1.0)
discard;
My concern originally when writing this code was that we would end up
generating unnecessary ANDs on bool uniforms, so I put the ANDs right
at the point of doing the CMPs that otherwise set only the low bit.
However, in order to use a bool, we're generating some instruction
anyway (e.g. moving it so as to produce a condition code update), and
those instructions can often be turned into an AND at that point. It
turns out in the shaders I have on hand, none of them regress in
instruction count:
Total instructions: 262649 -> 262545
39/2148 programs affected (1.8%)
14253 -> 14149 instructions in affected programs (0.7% reduction)
This change (before the previous two) produced a .23% +/- .11%
performance improvement in Unigine Tropics at 1024x768 on IVB.
Total instructions: 269270 -> 262649
614/2148 programs affected (28.6%)
179386 -> 172765 instructions in affected programs (3.7% reduction)
v2: Move some of the logic of finding the instruction that produced
the result of an expression tree to a helper.
This should fit in well with our lower_mat_op_to_vec code: now, in
addition to having expressions on each column of a matrix, we also
split the columns to separate variables so they can be tracked
individually by the copy propagation, dead code, and other passes.
This optimizes out some more code generation in unigine and gstreamer
shaders.
Total instructions: 269342 -> 269270
14/2148 programs affected (0.7%)
2226 -> 2154 instructions in affected programs (3.2% reduction)
I've had this code laying around almost done for a long time. The
idea is like opt_structure_splitting, that we've got a bunch of
transforms at the GLSL IR level that only understand scalars and
vectors, which just skip complicated dereferences. While driver
backends may manage some optimization after they split matrices up
themselves, it would be better to bring all of our optimization to
bear on the problem.
While I wasn't expecting changes quite yet, a few programs end up
winning: a gstreamer convolution shader, and the Humus dynamic
branching demo:
Total instructions: 269430 -> 269342
3/2148 programs affected (0.1%)
1498 -> 1410 instructions in affected programs (5.9% reduction)
The Android build was broken by
commit ca760181b4
Author: Kristian Høgsberg <krh@bitplanet.net>
Date: Fri Mar 16 12:55:40 2012 -0400
shared-glapi: Convert to automake
The offending change was that it redefined the filepaths in sources.mak
like this:
- FOO_FILES := bar.c
+ FOO_FILES := $(TOP)/src/mapi/mapi/bar.c
This broke the build because source filepaths in Android makefiles must be
relative to the makefile.
Ideally, this could be fixed by reverting the change in sources.mak and
making shared-glapi's Makefile.am use $(addprefix $(TOP)/src/mapi/mapi,
$(FOO_FILES)). However, automake doesn't understand builtin GNU make
functions, such as addprefix. So, it seems that automake and Android can
no longer share sources.mak.
Fix the build by duplicating the source lists from sources.mak into
Android.mk.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Keep a reference to any newly allocated aux buffers to avoid
re-allocating for every st_framebuffer_validate() (i.e. leaking).
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
When using a separate stencil buffer, i965 requires that the pitch of
the buffer (in the 3DSTATE_STENCIL_BUFFER command) be specified as 2x
the actual pitch.
Previously this was accomplished by doubling the "cpp" and "pitch"
values stored in the intel_region data structure, and halving the
height. However, this was confusing, and it led to a subtle (but
benign) bug: since a stencil buffer is W-tiled, its true height must
be aligned to a multiple of 64; we were accidentally aligning its faux
height to a multiple of 64, causing memory to be wasted.
Note that for window system stencil buffers, the DDX also doubles the
cpp and pitch values. To facilitate fixing this DDX server bug in the
future, we fix the cpp and pitch values we receive from the X server
only if cpp has the "incorrect" value of 2.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
v2: Clarify comments about the DDX.
This is a related fix for the Wayland change:
commit 83685c506e76212ae4e5cb722205d98d3b0603b9
Author: Kristian Høgsberg <krh@bitplanet.net>
Date: Mon Mar 26 16:33:24 2012 -0400
Remove wl_buffer.damage and simplify shm implementation
Apparently, this should also fix a memory leak. When wl_buffer.damage
was removed from Wayland and Mesa was not fixed, wl_buffer.destroy ended
up in the (empty) damage function instead of calling
wl_resource_destroy().
Spotted during build as:
CC wayland-drm-protocol.lo
wayland-drm.c:80:2: warning: initialization from incompatible pointer type
wayland-drm.c:82:1: warning: excess elements in struct initializer
wayland-drm.c:82:1: warning: (near initialization for 'drm_buffer_interface')
Signed-off-by: Pekka Paalanen <ppaalanen@gmail.com>
Fixes uninitialized member defects reported by Coverity.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This was hacked in in one place for EGL image stuff, but the right
thing to do was just to provide the mapping from the mesa format to
the native hardware format, which includes render target support.
This turns out to be required for GL_ARB_texture_buffer_object, which
sees data in this layout.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It turns out this field *is* used, and it's the stride between samples
from the buffer. Discovered during TBO debugging.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There was a function full of unused mappings from the GLenum to
datatype/comps, but that wasn't all the information a driver would
want, which includes the other fields that a gl_format has. Given
that all the texture buffer formats were represented in gl_format,
just use that as our description.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We have to skip some work that wants to look at texture images, since
buffer textures don't have any of that complexity.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All that should be needed is that it exists. Fixes segfaults on first
_mesa_update_context() with a samplerBuffer-using shader active but
without a particular buffer texture enabled.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fix texelFetch(sampler2DRect) and textureSize(samplerBuffer)
generation to not reference a LOD at the same time because it's easier
than not fixing it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The samplerBuffer type will be undefined in !glsl 1.40, and the
keyword is marked as reserved. The [iu]samplerBuffer types are not
marked as reserved pre-1.40, so they don't have separate tokens and
fall through to normal type handling.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're supposed to just immediately call it. Fixes piglit
GL_ARB_texture_buffer_object/dlist
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is set correctly in gl.spec, but was missed in Mesa. As a
result, only one of the two was hooked up in Mesa.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We have lexer recognition of a bunch of our types based on the
handling. This code was mapping those recognized tokens to an enum
and then to a string of their name. Just drop the enums and provide
the string directly in the parser.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Nothing actually relied on them being mutable, and there was at least
one cast which discarded const qualifiers. The next patch would have
introduced many more.
Casting away const qualifiers should be avoided if at all possible.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In "release" builds, Mesa would print this message if the MESA_DEBUG
variable was set. Make it so for debug builds as well.
I build debug builds all the time, but I'm not debugging this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
While ir_to_mesa contains code that attempts to support functions, I
honestly doubt it's been tested and have little confidence that it
works.
The comment in visit(ir_function *ir) doesn't inspire confidence:
/* Ignore function bodies other than main() -- we shouldn't see calls to
* them since they should all be inlined before we get to ir_to_mesa.
*/
Furthermore, hardware drivers such as i915, i965, and (AFAICT) r200
don't support the BGNSUB/ENDSUB/CAL opcodes anyway. Only swrast does.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This never worked. brwProgramStringNotify also explicitly rejects
programs that use CAL and RET. So there's no need for this to exist.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When SPRITE_POINT_ENABLE bit is set, the texture coord would be
replaced, and this is only needed when we called something like
glTexEnvi(GL_POINT_SPRITE, GL_COORD_REPLACE, GL_TRUE).
And more, we currently handle varying inputs as texture coord,
we would be careful when setting this bit and set it just when
needed, or you will find the value of varying input is not right
and changed.
Thus we do set SPRITE_POINT_ENABLE bit only when all enabled tex
coord units need do CoordReplace. Or fallback is needed to make
sure the rendering is right.
With handling the bit setup at i915_update_sprite_point_enable(),
we don't need the relative code at i915Enable then.
This patch would _really_ fix the webglc point-size.html test case and
of course, not regress piglit point-sprite and glean-pointSprite
testcase.
NOTE: This is a candidate for stable release branches.
v2: fallback just when all enabled tex coord units need do
CoordReplace (Eric)
v3: move the sprite point validate code at I915InvalidateState (Eric)
v4: sprite point enable bit update based on _NEW_PROGRAM, too
add relative _NEW-state comments to show what state is being used(Eric)
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Fix 'set but not used' warnings; gl_version, gl_versions_profiles and
glx_extensions variables are used just only HAVE_XCB_GLX_CREATE_CONTEXT
is defined. Thus those warnings are shown when that macro isn't defined.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Fixes clang error:
tgsi/tgsi_dump.c:72:12: error: no member named '__printf_chk' in 'struct dump_ctx'
ctx->printf( ctx, "%u", e );
~~~ ^
/usr/include/bits/stdio2.h:109:3: note: expanded from macro 'printf'
__printf_chk (__USE_FORTIFY_LEVEL - 1, __VA_ARGS__)
^
Idea stolen from:
http://www.mail-archive.com/pld-cvs-commit@lists.pld-linux.org/msg210998.html
Reviewed-by: Brian Paul <brianp@vmware.com>
Add the maximum base vertex offset to max_index for computing the
buffer size. Fixes a failed assertion in the u_upload_mgr.c code with
the VMware svga driver.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=48141
v2: incorporate Marek's suggestions.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Return 0 for features we don't support. Added debug_printf()
warnings when we fail to handle a new PIPE_CAP_x case. That will
alert us to interfaces changes in the future. We don't want to
just ignore new PIPE_CAPs and possibly miss something important.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Before, we weren't clamping the vertex colors produced by ARB vertex
programs. This could result in some rendering being too bright (in
ETQW, for example).
Also add cases for PIPE_CAP_VERTEX_COLOR_CLAMPED and
PIPE_CAP_FRAGMENT_COLOR_CLAMPED with comments to be complete.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We already program all the sampler state correctly, we just didn't give
the GPU a pointer to it for the VS stage. Thus, any texturing other
than texelFetch() wouldn't work.
Fixes piglit test vs-textureLod-miplevels and 99 of oglconform's
glsl-bif-tex subtests.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested with lp_test_arit with 100% passes and piglit tests with 100%
pass for log but some tests still fail for pow.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
This reduces a little of CPU overhead.
The idea is to translate pipe vertex buffers directly into the CS
and not using any intermediate representations.
Framerate in Torcs:
before: 32.2
after: 34.6
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
st/mesa doesn't allow src_offset to be greater than stride and the maximum
stride r600 supports is 2047.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
llvm-3.1svn r153860 makes MCInstrInfo available to the MCInstPrinter.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Limits maximum loop iterations in a TGSI shader to prevent infinite
loops from occurring, any iteration in any loop counts towards this
limit
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Fixes Coverity resource leak defects.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Variables have types, expression trees have types, but statements don't.
Rather than have a nonsensical field that stays NULL in the base class,
just move it to where it makes sense.
Fix up a few places that lazily used ir_instruction even though they
actually knew the particular subclass.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, set_callee() performed some assertions about the type of the
ir_call; protecting the bare pointer ensured these checks would be run.
However, ir_call no longer has a type, so the getter and setter methods
don't actually do anything useful. Remove them in favor of accessing
callee directly, as is done with most other fields in our IR.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Aside from ir_call, our IR is cleanly split into two classes:
- Statements (typeless; used for side effects, control flow)
- Values (deeply nestable, pure, typed expression trees)
Unfortunately, ir_call confused all this:
- For void functions, we placed ir_call directly in the instruction
stream, treating it as an untyped statement. Yet, it was a subclass
of ir_rvalue, and no other ir_rvalue could be used in this way.
- For functions with a return value, ir_call could be placed in
arbitrary expression trees. While this fit naturally with the source
language, it meant that expressions might not be pure, making it
difficult to transform and optimize them. To combat this, we always
emitted ir_call directly in the RHS of an ir_assignment, only using
a temporary variable in expression trees. Many passes relied on this
assumption; the acos and atan built-ins violated it.
This patch makes ir_call a statement (ir_instruction) rather than a
value (ir_rvalue). Non-void calls now take a ir_dereference of a
variable, and store the return value there---effectively a call and
assignment rolled into one. They cannot be embedded in expressions.
All expression trees are now pure, without exception.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Most of the time, we just want to read an ir_dereference, so there's no
need to have these in separate functions. However, the next patch will
want to read an ir_dereference_variable directly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When translating a call from AST to HIR, we need to decide whether it
can be evaluated to a constant before emitting any code (namely, the
temporary declaration, assignment, and call.)
Soon, ir_call will become a statement taking a dereference of where to
store the return value, rather than an rvalue to be used on the RHS of
an assignment. It will be more convenient to try evaluation before
creating a call. ir_function_signature seems like a reasonable place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Currently, ir_call can be used as either a statement (for void
functions) or a value (for non-void functions). This is rather awkward,
as it's the only class that can be used in both forms.
A number of places use ir_call::get_error_instruction() to construct a
generic value of error_type. If ir_call is to become a statement, it
can no longer serve this purpose.
Unfortunately, none of our classes are particularly well suited for
this, and creating a new one would be rather aggrandizing. So, this
patch introduces ir_rvalue::error_value(), a static method that creates
an instance of the base class, ir_rvalue. This has the nice property
that you can't accidentally try and access uninitialized fields (as it
doesn't have any). The downside is that the base class is no longer
abstract.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
generate_call() and ast_function_expression::hir() both tried to verify
that 'out' and 'inout' parameters used l-values. Irritatingly, it
turned out that this was not redundant; both checks caught -some- cases.
This patch combines the two into a single "complete" function that does
all the parameter mode checking. It also adds a comment clarifying why
AST-level checking is necessary in the first place.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We used to have one big function, match_signature_by_name, which found
a matching signature, performed out-parameter conversions, and generated
the ir_call. As the code for matching against built-in functions became
more complicated, I split it internally, creating generate_call().
However, I left the same awkward interface. This patch splits it into
three functions:
1. match_signature_by_name()
This now takes a name, a list of parameters, the symbol table, and
returns an ir_function_signature. Simple and one purpose: matching.
2. no_matching_function_error()
Generate the "no matching function" error and list of prototypes.
This was complex enough that I felt it deserved its own function.
3. generate_call()
Do the out-parameter conversion and generate the ir_call. This
could probably use more splitting.
The caller now has a more natural workflow: find a matching signature,
then either generate an error or a call.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Function calls may have side effects that alter variables used inside
the loop. In the fragment shader, they may even terminate the shader.
This means our analysis about loop-constant or induction variables may
be completely wrong.
In general it's impossible to determine whether they actually do or not
(due to the halting problem), so we'd need to perform conservative
static analysis. For now, it's not worth the complexity: most functions
will be inlined, at which point we can unroll them successfully.
Fixes Piglit tests:
- shaders/glsl-fs-unroll-out-param
- shaders/glsl-fs-unroll-side-effect
NOTE: This is a candidate for release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Certain applications don't call SwapBuffers before exiting. Yet, we'd
really like to see a bitmap containing the final rendered image even if
they choose never to present it.
In particular, Piglit tests (at least with -auto -fbo) fall into this
category. Many of them failed to dump any images at all.
Dumping one final image at context destruction time seems to work.
We may wish to pursue a more elegant solution later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes a Coverity resource leak defect.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These can be used to implement EXT_texture_swizzle without baking
state-dependent swizzle instructions into the shader and forcing
recompiles.
For now, just set them to pass-through mode, so everything continues to
work as it did on Ivybridge. We can optimize this later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We only need one sample, since we don't support multisampling yet.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Apparently this needs to be the same as in 3DSTATE_WM.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Getting HiZ working means updating all the state packets for resolves
and clears. It's not worth doing until we get the basics working.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For now, these all return 0, as I don't yet want to enable Haswell
support. Eventually they will be filled in with proper PCI IDs.
Also add an is_haswell field similar to is_g4x to make it easy to
distinguish Gen7 and Gen7.5.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
According to the BSpec ISA volume's "Accumulator Register" section:
"[DevIVB] SIMD16 execution on dwords is not allowed when accumulator is
explicit source or destination operand."
Fixes piglit tests:
- fs-multiply-const-ivec4
- fs-multiply-const-uvec4
- fs-multiply-ivec4-const
- fs-multiply-uvec4-const
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This replaces the cryptic void* parameter with a union.
(based on union r600_query_result)
Users of this can still pass uint64* in it, but that cannot work for every
query type, obviously. Most importantly, the code now documents what should
be expected from get_query_result.
This also adds pipe_query_data_pipeline_statistics as per the D3D11 docs.
v2: fix indentation, add comments and use the doxygen style
Reviewed-by: Brian Paul <brianp@vmware.com>
This option allows targets to link against the LLVM shared library
instead of the static libs. With LLVM 2.9, his saves ~11 MB for each of
the r300 target libraries.
Pass a dri2_loader extension to the dri driver when gbm creates the dri
screen. The implementation jumps through pointers in the gbm device
so that an EGL on GBM implementation can provide the real implementations.
The idea here is to be able to create an egl window surface from a
gbm_surface. This avoids the need for the surfaceless extension and
lets the EGL platform handle buffer allocation, while keeping the user
in charge of somehow presenting the buffers (using kms page flipping,
for example).
gbm_surface_lock_front_buffer() locks a surface's front buffer and
returns a gbm bo representing it. This bo should later be returned
to the gbm surface using gbm_surface_release_buffer().
The function that counts the number of TGSI immediates also needs to
emit the immediates. This fixes assorted failures when using polygon
stipple with fragment shaders that have their own immediates.
NOTE: This is a candidate for the 8.0 branch.
They aren't winsys of their own,
just help dealing with them.
v2: add some more comments in vl_winsys.h
Signed-off-by: Christian König <deathsimple@vodafone.de>
"Use -no-undefined to assure libtool that the library has no unresolved
symbols at link time, so that libtool will build a shared library on
platforms that require that all symbols are resolved when the library is linked."
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
This is a regression introduced by commit cdcfd5, which forget to
increase the map_refcount for successfully-mapped region. Thus caused a
wrong non-blanced map_refcount.
This would fix the regression found in the two following webglc testcase
on Pineview platform:
texture-npot.html
gl-max-texture-dimensions.html
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
createDrawable may return NULL value, we should check it, or it will
make a segment failed.
[minor-indent-issue-fixed-by: Yuanhan Liu]
Signed-off-by: Wang YanQing <udknight@gmail.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
This extension just permits GL_UNPACK_ROW_LENGTH, GL_UNPACK_SKIP_ROWS
and GL_UNPACK_SKIP_PIXELS to be passed to glPixelStore on GLES2 so it
is trivial to implement.
Also fixes the usage of GL_IMPLEMENTATION_COLOR_READ_FORMAT_OES,
which may be set to a BGRA format e.g. for a MESA_FORMAT_ARGB8888 fb.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The extension is already exposed for GLES1, but the APIspec
doesnt allow the usage of GL_BGRA_EXT in glTex(Sub)Image2D.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Noticed this was missing when writing the "glapi: sort ARB extensions
by number" commit, which at least shows it was effective.
Reviewed-by: Brian Paul <brianp@vmware.com>
Noticed it was missing based on the lack of a descriptive enum
name from this bug's error message:
https://bugs.freedesktop.org/show_bug.cgi?id=44039
This moves two enums out of GL3x.xml. Though since this and
GL_ARB_texture_compression_rgtc are both strict subsets of GL3,
both extensions should have had all their enums in that file
to begin with, not just two of them.
Reviewed-by: Brian Paul <brianp@vmware.com>
And add comments to fill in for extensions that aren't there.
Noticed the comment about "ARB extensions sorted by extension number"
didn't extend to the <xi:include> directives when it became clear
GL_ARB_texture_rg was missing, going by the error message seen here:
https://bugs.freedesktop.org/show_bug.cgi?id=44039
This makes it easier to notice in the future if an extension is missing
when it shouldn't be.
Reviewed-by: Brian Paul <brianp@vmware.com>
A later error prints this properly, fix this case to do the same.
v2: remove attribute as per Ian's suggestion
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This adds the xml file covering ARB_blend_func_extended.
v2: fix SRC1_ALPHA
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This also seems like a bad idea. There were too many instances for me
to thoroughly scan the code as I did with the last two patches, but a
quick scan indicated that most callers newly allocate a variable,
dereference it, or NULL-check. In some cases, it wasn't clear that the
value would be non-NULL, but they didn't check for error_type either.
At any rate, not checking for this is a bug, and assertions will trigger
it earlier and more reliably than returning error_type.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The constructor currently returns a ir_dereference_variable of error
type when provided NULL, but that's about to change in the next commit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Providing a NULL pointer to the ir_dereference_record() constructor
seems like a bad idea. Currently, if provided NULL, it returns a
partially constructed value of error type. However, none of the callers
are prepared to handle that scenario.
Code inspection shows that all callers do one of the following:
- Already NULL-check the argument prior to creating the dereference
- Already deference the argument (and thus would crash if it were NULL)
- Newly allocate the argument.
Thus, it should be safe to simply assert the value passed is not NULL.
This should also catch issues right away, rather than dying later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Providing a NULL pointer to the ir_dereference_array() constructor seems
like a bad idea. Currently, if provided NULL, it returns a partially
constructed value of error type. However, none of the callers are
prepared to handle that scenario.
Code inspection shows that all callers do one of the following:
- Already NULL-check the argument prior to creating the dereference
- Already deference the argument (and thus would crash if it were NULL)
- Newly allocate the argument.
Thus, it should be safe to simply assert the value passed is not NULL.
This should also catch issues right away, rather than dying later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
So if anything goes wrong we won't display a random image.
v2: flush before using the surface with the decoder.
Signed-off-by: Christian König <deathsimple@vodafone.de>
If you ran g-s in 16-bpp we'd do a bunch of memory corruption.
now it just misrenders for some other reasons.
applies to stable.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
ir_validate.cpp: In member function ‘virtual ir_visitor_status ir_validate::visit_leave(ir_swizzle*)’:
ir_validate.cpp:458:66: warning: narrowing conversion of ‘ir->ir_swizzle::mask.ir_swizzle_mask::x’ from ‘unsigned int’ to ‘int’ inside { } is ill-formed in C++11 [-Wnarrowing]
ir_validate.cpp:458:66: warning: narrowing conversion of ‘ir->ir_swizzle::mask.ir_swizzle_mask::y’ from ‘unsigned int’ to ‘int’ inside { } is ill-formed in C++11 [-Wnarrowing]
ir_validate.cpp:458:66: warning: narrowing conversion of ‘ir->ir_swizzle::mask.ir_swizzle_mask::z’ from ‘unsigned int’ to ‘int’ inside { } is ill-formed in C++11 [-Wnarrowing]
ir_validate.cpp:458:66: warning: narrowing conversion of ‘ir->ir_swizzle::mask.ir_swizzle_mask::w’ from ‘unsigned int’ to ‘int’ inside { } is ill-formed in C++11 [-Wnarrowing]
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
valgrind complained about an uninitialised value being used in
glsl_parser_extras.cpp, and this was the one it was giving out about.
Just initialise the value in the fakectx.
Signed-off-by: Dave Airlie <airlied@redhat.com>
for some reason when I configure --with-dri-drivers="" the src/mesa/drivers/dri
Makefile tries to call the am--refresh target in the toplevel Makefile,
we don't have one, and I'm not sure what it should look like.
This makes things continue on.
Signed-off-by: Dave Airlie <airlied@redhat.com>
piglit glx-tfp segfaults on llvmpipe when run vs a 16-bit radeon screen,
it now fails instead of segfaulting, much prettier.
Signed-off-by: Dave Airlie <airlied@redhat.com>
When a GL LD_PRELOAD library like apitrace was used,
glXGetProcAddress() would return the preload's symbols instead of
libGL's symbol, leading to infinite recursion when the returned
function was called. This didn't hit apitrace on most apps because
who calls glXGetProcAddress() on the global functions.
The -Bsymbolic, which was present in mklib before automake conversion,
causes the glxcmds.c:GLX_functions table to be resolved at link time,
so that LD_PRELOADs don't affect it any more.
Fixes crashes when running wine under apitrace.
Tested-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marek Olšák <maraeo@gmail.com>
The default was 32 for the EmitNoLoops=0 case. This allows the oZone3D
soft shadows test to work properly with the vmware driver. Jose reported
that SM3 supports up to 255 loop iterations.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Instead of the hard-coded value of 32. Note that MaxUnrollIterations
defaults to 32 so there's no net change. But the gallium state tracker
can override this.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Improves Unigine Tropics performance at 1024x768 by 2.06236% +/-
0.50272% (n=11).
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Unigine Tropics uses INVALIDATE_BUFFER and not UNSYNCHRONIZED to reset
the buffer object when its streaming wraps. Don't penalize it by
flushing the batch at the wrap point, just allocate a new BO and get
to using it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Use -no-undefined to assure libtool that the library has no unresolved
symbols at link time, so that libtool will build a shared library on
platforms that require that all symbols are resolved when the library
is linked.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
The force-enable option is dropped, now that the hardware we were
concerned about has HiZ on by default. Now, instead of doing
INTEL_HIZ=0 to test disabling hiz, you can set hiz=false.
v2: Disable separate stencil on gen6 when HIZ is turned off.
(previously, this had to be done manually in addition).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
This was a debug option during gen6 transform feedback bringup (and a
similar one existed during gen4 bringup). However, it looks like
we're done with that, and we don't anticipate it being used again,
either for geometry shaders or transform feedback.
Suggested by: Kenneth Graunke <kenneth@whitecape.org>
This was added in the i915/i965 merge from the i915 driver, but I
don't recall it ever being used since then.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If you want to test the graphics driver, you want to test it under the
conditions that users will see, not some set of additional fallbacks.
If you want to test swrast, run the swrast driver (or no_rast=true)
instead.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To avoid redundancies, this patch also removes .deps, .libs, and *.la
from .gitignore files in subdirectories.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As Eric pointed out, we know the cube faces are square at this point
so we only need to test the texture widths for consistency.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The max size was 16Kx16K so a 4 byte/pixel, six-sided cube would require
6 GBytes of memory. If mipmapped, 8 GB. Reduce the max size to 4K to
make the total size more reasonable.
Fixes a crash with the new piglit max-texture-size test.
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Per the spec, only nearest filtering is supported for integer textures.
Otherwise, the texture is incomplete.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Instead of gl_texture_object::_Complete there are now two fields:
_BaseComplete and _MipmapComplete. The former indicates whether the base
texture level is valid. The later indicates whether the whole mipmap is
valid.
With sampler objects, a single texture can appear to be both complete and
incomplete at the same time. See the GL_ARB_sampler_objects spec for more
details. To implement this we now check if the texture is complete with
respect to a sampler state.
Another benefit of this is we no longer need to invalidate a texture's
completeness state when we change the minification/magnification filters
with glTexParameter().
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Merge the mipmap level checking code that was separate cases for 1D,
2D, 3D and CUBE before.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Move the simple MaxLevel < BaseLevel test earlier to be closer to where
we error-check BaseLevel. Also, use the local baseLevel var in more places.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
To make the no-change case faster, as we do for the other object-reference
functions.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We want to start emitting an INVALID_OPERATION from here for transform
feedback. Note that this forced dlist.c to almost not use this
function, since it wants different behavior during dlist compile.
Just pull the non-TF, non-GS test out for compile, because:
1) TF doesn't matter in that case because there's no drawing.
2) I don't think we're going to see GSes and display lists in the same
context, if we don't do GL_ARB_compatibility.
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes a build problem where EGL links to libgbm.la, which encodes
a relative path to it's libglapi.so dependency. The relative path
breaks when the linker tries to resolve it from src/egl/main instead
of src/gbm. Typically we silently fall back to the system
libglapi.so, which is wrong and breaks when there isn't one.
Morale of the story: don't mix mklib and libtool.
Although some hardware support NPOT cubemap, but it seems we don't know
the right layout for NPOT cubemap. Thus seems we need do fallback for
other platforms as well.
See comments inline the code for more detailed info.
v2: give a more detailed info about why we need fallback for other
platfroms as well.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46666
NOTE: This is a candidate for stable release branches.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
If we failed to allocate a memory resource for the texture we'd crash
when we tried to map it. Now we propogate the NULL back up to the
texstore code and generate GL_OUT_OF_MEMORY.
Fixes a crash with the upcoming piglit max-texture-size test.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
From the GLSL 1.30 spec:
The discard keyword is only allowed within fragment shaders. It
can be used within a fragment shader to abandon the operation on
the current fragment. This keyword causes the fragment to be
discarded and no updates to any buffers will occur. Control flow
exits the shader, and subsequent implicit or explicit derivatives
are undefined when this control flow is non-uniform (meaning
different fragments within the primitive take different control
paths).
v2: Don't emit the final HALT if no other HALTs were emitted.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
By setting lod to 0 in the builtin function implementation, we avoid
needing to update all the visitors to ignore LOD in this case, when
the hardware drivers actually want to ask for LOD 0 for rectangular
textures.
Fixes piglit spec/GLSL-1.40/textureSize-*Rect.
v2: Change style of looking for substrings.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is the one builtin function claimed to be dropped due to the
ARB_compatibility split.
Fixes piglit spec/GLSL-1.40/compiler/ftransform.vert
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes the process slightly more debuggable, though it would be
nice if the build just failed immediately instead.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mostly this is a matter of removing variables that have been moved to
the compatibility profile. There's one addition: gl_InstanceID is
present in the core now.
This fixes the new piglit tests for GLSL 1.40 builtin variables.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
llvm-3.1svn r152620 refactored the OProfile profiling code.
createOProfileJITEventListener was moved from the llvm namespace to the
llvm::JITEventListener namespace.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This avoids extra if statements in the common case of just comparing
two expressions that don't involve assignments or function calls,
along with simplifying the handling of constant expressions. Reduces
i965 instructions generated in unigine tropics and sanctuary,
yofrankie, warsow, gstreamer shaders, and the weston compositor.
shader-db results:
Total instructions: 213052 -> 212752
38/1246 programs affected (3.0%)
14309 -> 14009 instructions in affected programs (2.1% reduction)
The error was removed in:
commit 719909698c
Author: Ian Romanick <ian.d.romanick@intel.com>
Date: Tue Oct 18 16:01:49 2011 -0700
mesa: Rewrite the way uniforms are tracked and handled
The GL_ARB_robustness spec doesn't say the implementation
should truncate the output, so just return after setting
the required error like it did before the above commit.
Also fixup an old comment and add an assert.
NOTE: This is a candidate for the 8.0 branch.
Handle the special case of glFramebufferTextureLayer() for which we pass
teximage = 0 internally in framebuffer_texture(). This patch makes failing
piglit test fbo-array, fbo-depth-array to pass.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47126
V4: Removed the duplicated code.
Note: This is a candidate for the stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This will replace the soon-to-be-removed _DD_NEW_SEPARATE_SPECULAR flag.
Note: there's a similar composite _MESA_NEW_NEED_EYE_COORDS flag set already.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Just use the corresponding _NEW_x flags intead. The _DD_NEW_x flags
will be removed in a following patch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The computed stencil.clear and depth.clear values aren't used anywhere.
Those fields have been removed too.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Set the close on exec flag when opening dri character devices, so they
will be closed and free any resouces allocated in exec.
Signed-off-by: David Fries <David@Fries.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This issue might recur on other OSes. If so then it might be better
to remove the C-preprocessor magic, and use fully qualified defines
instead.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This state is needed for deciding whether or not to log
application messages with IDs that haven't been specifically
passed to glDebugMessageControlARB yet.
State for each individual ID number ever passed to
glDebugMessageControlARB (per-context) still needs to be added.
Unfortunately, Unigine Heaven 3.0 still needs this.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
min_index/max_index are merely conservative guesses, so we can't
make buffer overflow detection based on their values.
Tested-by: Jakob Bornecrantz <jakob@vmware.com>
There are several cases in which we need to explicity "rebase" colors
(ex: set G=B=0) when getting GL_LUMINANCE textures:
1. If the luminance texture is actually stored as rgba
2. If getting a luminance texture, but returning rgba
3. If getting an rgba texture, but returning luminance
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=46679
Also fixes the new piglit getteximage-luminance test.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Based on a patch submitted by Vic Lee. The other part of his patch
which checked the fs pointer wasn't needed.
This fixes a crash when clear() is called before any VS or FS is set.
But this can only happen when the driver is used without the Mesa
state tracker.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This gets xine working with VDPAU.
v2: some minor bugfixes.
v3: create the resource with the subsampled
format to avoid tilling problems
Signed-off-by: Christian König <deathsimple@vodafone.de>
These will be used by glReadPixels() and glGetTexImage() to fix issues
with reading GL_LUMINANCE and other formats.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Before, we were only counting top-level instructions. But if we have
an assignment of a giant expression tree (such as the ones eventually
generated by glsl-fs-unroll), we were counting the same as an
assignment of a variable deref.
glsl-fs-unroll-explosion now fails in a reasonable amount of time on
i965 because the unrolling didn't go ridiculously far.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I will use SX_MISC instead.
This reverts commit 734792e83f.
Conflicts:
src/gallium/drivers/r600/evergreen_hw_context.c
src/gallium/drivers/r600/evergreen_state.c
src/gallium/drivers/r600/r600_hw_context.c
src/gallium/drivers/r600/r600_pipe.h
draw module calls back into the driver and sets certain parts
of the state to whatever it needs, unfortunately unless you
get the ordering of calls to draw just right you'll end up
reseting your own driver state. That's what was happening to us
draw module would under certain conditions reset our own driver
state.
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes the libGLU.so.* build when a system libGL.so is not present
since it is relying on the lib/ to build against until it gets
converted to automake.
Tested-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
That is by making the dri extension variables static in gbm_dri.c.
The image_lookup_extension is provided by egl_dri2 when using x11 or wayland
platforms, when using the drm platform, gbm_dri has a wrapper for it.
Both use the same variables name image_lookup_extension.
Since -fvisibility=hidden was (probably by mistake) removed when converting to
automake, the "image_lookup_extension" symbol from egl_dri2.c became exported
in libEGL.so, so "image_lookup_extension" from gbm_dri.c was ignored.
This resulted in calling incorrect callbacks.
We cant make the image_lookup_extension static in egl_dri2.c right now,
since its used across multiple files.
Bugzilla: https://bugs.freedesktop.org/attachment.cgi?id=58099
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
If the texture is a 1D array, don't remove the border pixel from the
height. Similarly for 2D array textures and the depth direction.
Simplify the function by assuming the border is always one pixel.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This patch add the support of gl_PointCoord gl builtin variable for
platform gen4 and gen5(ILK).
Unlike gen6+, we don't have a hardware support of gl_PointCoord, means
hardware will not calculate the interpolation coefficient for you.
Instead, you should handle it yourself in sf shader stage.
But badly, gl_PointCoord is a FS instead of VS builtin variable, thus
it's not included in c.vue_map generated in VS stage. Thus the current
code doesn't aware of this attribute. And to handle it correctly, we
need add it to c.vue_map manually to let SF shader generate the needed
interpolation coefficient for FS shader. SF stage has it's own copy of
vue_map, thus I think it's safe to do it manually.
Since handling gl_PointCoord for gen4 and gen5 platforms is somehow a
little special, I added a lot of comments and hope I didn't overdo it ;)
v2: add a /* _NEW_BUFFERS */ comment to note the state flag dependency
and also add the _NEW_BUFFERS dirty mask (Eric).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45975
Piglit: glsl-fs-pointcoord and fbo-gl_pointcoord
NOTE: This is a candidate for stable release branches.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
llvm-3.1svn r152043 changes createMCInstPrinter to take an additional
MCRegisterInfo argument.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This add clipdistance support like the non-llvm draw paths,
if we have a clip distance we compare with it instead of doing
the dot4.
We also have to put the have_clipvertex bit into the emitted
vertex header.
Fixes vs-clip-distance-all-planes-enabled, vs-clip-distance-const-reject,
vs-clip-distance-enables, vs-clip-distance-implicitly-sized,
vs-clip-distance-in-param, vs-clip-distance-uint-index.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes the rest of the piglit clipvertex tests.
v2: fixup comments.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We incorrectly setup clipmask for gl_ClipVertex, this fixes the clipmask
setup.
v2: fix comment
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
fix comment
This is just a simple text file containing a list of goals for gallivm/llvmpipe
and some info on what is required to get there along with some info on who
is looking at things.
v2: add EXT_texture_array.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
_mesa_max_texture_levels() is also used to test valid texture target
in _mesa_GetTexLevelParameteriv(). GL_TEXTURE_CUBE_MAP is not allowed
as texture target in glGetTexLevelParameter(). So, this should throw
GL_INVALID_ENUM error.
Few other functions which use _mesa_max_texture_levels() like
getcompressedteximage_error_check() and getteximage_error_check()
also don't accept GL_TEXTURE_CUBE_MAP.
Above fix makes piglit fbo-cubemap test to fail. This is because of
incorrect texture target passed to _mesa_max_texture_levels() in
framebuffer_texture(). Fixing that as well
Note: This is a candidate for the stable branches
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
"Use -no-undefined to assure libtool that the library has no
unresolved symbols at link time, so that libtool will build a shared
library on platforms require that all symbols are resolved when the
library is linked."
If I had a dollar for every time I wrote this patch, I'd have about
$10 :-)
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
There's even a comment in the code containing the right swizzling
computations!
Previously this has not been noticed because we need to manually
enabled swizzling on snb/ivb (kernel 3.4 will do that) and we
don't use the separate stencil on ilk (where the bios enables
swizzling). This fixes
piglit ./bin/fbo-stencil readpixels GL_DEPTH32F_STENCIL8 -auto
on recent drm-intel-next kernels.
Also remove the comment about ivb, it's stale now.
Swizzling detection is done by allocating a temporary x-tiled
buffer object. Unfortunately kernels before v3.2 lie on snb/ivb
because they claim that swizzling is enable, but it isn't. The
kernel commit that fixes this for backport to pre-v3.2 is
commit acc83eb5a1e0ae7dbbf89ca2a1a943ade224bb84
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Mon Sep 12 20:49:16 2011 +0200
drm/i915: fix swizzling on gen6+
But if the kernel doesn't lie, this now works on swizzling and
not swizzling machines.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This replaces the previously used wl_display_destroy.
wl_display_destroy was povided by wayland-client.so and
wayland-server.so, to resolve that conflict its renamed client-side.
Otherwise streamout with rasterizer discard will make the kernel upset
if the state tracker doesn't set a depth-stencil-alpha state.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Unused by the current stack and APIs, therefore untestable.
It was used to facilitate the transition to integers.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
For polygons, we have been using face culling with success, but that doesn't
work for points and lines.
Setting the point size and line width to 0 fixes it.
Also improve it even more by setting SCREEN_SCISSOR to a zero area.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Implement it right using STRMOUT_CONFIG.RAST_STREAM. This fixes rasterizer
discard with points and lines.
This also adds another derived state. It's a combination of rasterizer discard
and streamout enable.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
We must use VPORT_SCISSOR, because that's the only one we can use for multiple
scissor rectangles in ARB_viewport_array.
R700 can use the VPORT_SCISSOR_ENABLE bit, but R600 doesn't have that and must
emit a 8192x8192 rectangle if scissor is disabled.
This commit also cleanups magic numbers in create_rs_state.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
VPORT_SCISSOR is the OpenGL scissor. How do I know? Because there are
16 of them just like GL4.1 has multiple scissor rectangles.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Also use XXX in the other ones, because it's the most used word for that
purpose in Mesa.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Timer queries should be able to measure the time spent in u_blitter as well.
Queries are split into two groups: the timer ones and the others (streamout,
occlusion), because we should only suspend non-timer queries for u_blitter,
and later if the non-timer queries are suspended, the context flush should
only suspend and resume the timer queries.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
And rename or inline functions where appropriate.
There is no reason to keep this stuff in r600_hw_context.c.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
The current code would ignore the point size specified by gl_PointSize
builtin variable in vertex shader on Pineview. This patch servers as
fixing that.
This patch fixes the following issues on Pineview:
webglc: https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/sdk/tests/conformance/rendering/point-size.html
piglit: glsl-vs-point-size
NOTE: This is a candidate for stable release branches.
v2: pick Eric's nice tip for fixing this issue in hardware rendering.
v3: the last arg of EMIT_ATTR specify the size in _byte_. (Eric)
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes the egl_gallium.so driver build when no system libEGL.so is
present, since it's relying on the lib/ to build against until it gets
converted to automake.
We were looking at the size of batch.map for how big the batchbuffer
was, but on 865 we just use a single-page batchbuffer due to hardware
limits.
v2: Removed check for sizeof map < bo->size, since that's always false.
[change by anholt]
NOTE: This is a candidate for release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41495
In order to prevent an overflow of the batch buffer when emitting
triangles, we need to limit the initial primitive to fit within the
current batch. To do we need to measure the remaining space and thence
compute the maximum number of vertices that fit into that space.
Reported-by: Kurt Roeckx <kurt@roeckx.be>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41495
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Eric Anholt <eric@anholt.net>
NOTE: This is a candidate for release branches.
The hardware, like i915, uses an inclusive bounds on min and max for
the drawing rectangle, but we were providing a number for exclusive.
The number of bits used by the hardware only covers this value going
up to the maximum size, so when we programmed 2048 as the maximum
inclusive X, it saw a maximum X of 0 and clipped all rendering. This
caused rendering failures in gnome-shell.
Fixes piglit fbo-maxsize.
v2: dropped changes to the blitter, which does use an exclusive x2, y2.
[change by anholt]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45558
Reviewed-by: Eric Anholt <eric@anholt.net>
NOTE: This is a candidate for release branches.
Michel pointed out that my assumption of a global
index namespace is incorrect and breaks r300g.
Signed-off-by: Christian König <deathsimple@vodafone.de>
This reverts commit d5a6c17254.
llvm-3.1svn r151687 makes MemoryObject accessor members const again.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This should speed things up a bit, but also shows
some bugs with the kernel implementation.
v2: require xcb-dri2 version 1.8
Signed-off-by: Christian König <deathsimple@vodafone.de>
While the ARB_map_buffer_range extension spec says nothing about these
queries -- they were added in GL 3.0 --, it seems like this could be an
error in the extension spec. This is one of the extensions, like
ARB_framebuffer_object, that "back ports" OpenGL 3.0 functionality to
previous versions. These extensions are supposed to provide identical
functionality to OpenGL 3.0. The other cases of mismatches have been
determined to be bugs in the extension specs.
And tools like apitrace rely on such queries to function properly.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Acked-by: Brian Paul <brianp@vmware.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
We currently don't support gl_PrimitiveID, and I believe asking the
hardware to generate it results in vertex cache invalidations.
This could result in slowdowns for applications that use gl_InstanceID,
which would be counter-productive. Just turn it off for now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
visit(ir_variable *) sets dst_reg::writemask to the appropriate channel
for system values. Unfortunately, visit(ir_dereference_variable *) then
calls swizzle_for_size, which for a float, sets the swizzle to .x.
This works for gl_VertexID, since we store it in the .x component (see
brw_draw_upload.c:732 - VID), but fails for gl_InstanceID (IID) since we
store it in the .y channel.
To fix this, avoid calling swizzle_for_size on ir_var_system_values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Originally ARB_draw_instanced only specified that ARB decorated name.
Since no vendor actually implemented that behavior and some apps use
the undecorated name, the extension now specifies that both names are
available.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
When you called them in a display list compile before, you would just
end up calling through NULL.
Fixes piglit GL_ARB_draw_instanced/dlist.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Kernels prior to 271d81b84171d84723357ae6d172ec16b0d8139c (March 2011)
don't support relocations outside of the target buffer object. Rather
than guarding this with a I915_PARAM_HAS_RELAXED_DELTA check, just
smash the bound to 0xfffff001 like we do on Ironlake.
This effectively gives us no upper bound check, just like we did prior
to commit 271d81b84171d84723357ae6d172ec16b0d8139c.
Daniel Vetter would also like to mention that this relies on the guard
page at the end of the GTT.
NOTE: This is a candidate for release branches.
Fixes a regression since 271d81b84171d84723357ae6d172ec16b0d8139c.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46766
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The drivers/ walk-through-subdirs makefile is converted as well so I
didn't need to keep EGL_DRIVERS_DIRS along with the per-driver
HAVE_EGL_DRIVER_WHATEVER.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The default case code was set up in a separate way, while this makes
it more normal. I wanted to add code to the explicit x11 platform and
default x11 platform cases in the next commit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All users of the shine table outside of the tnl module
are gone. Move the implementation into the tnl module and
prefix the public functions with _tnl.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Since the shine tables are now only used in the tnl lighting stage, where
they are validated through the tnl driver function NotifyMaterialChange
called in tnl/t_vb_light.c, we can not omit calling
_mesa_validate_all_lighting_tables (which only validates the shine tables)
in main/light.c.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Use direct computation of pow for computing the shininess
in _tnl_RasterPos. Since the _tnl_RasterPos function is still
used by plenty drivers that do only need the shine table for
_tnl_RasterPos but do not make use of swtnl computations, this
enables pushing down the shine table computation and validation
into the tnl module, which will happen in a followup change.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Since the shine tables are implicitly invalidated by having
a different shininess value than the current one, we can
omit the explicit invalidation of the shine table.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
This lets us use the resource_copy_region() path when blitting from
R8G8B8A8 to R8G8B8x8, for example.
v2: be smarter when src_format==dst_format
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Assertions of the form assert(a && b) should be written as separate assertions
so that you can actually tell which part is false when there's a failure.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Move structs, enums, etc so they're in more logical order. In particular,
the shader and transform feedback-related structs/enums were pretty
scattered around.
After biasing we need to clamp to be sure we don't exceed the number of
levels in the mipmap. This fixes an assertion at svga_sampler_view.c:70
v2: simplify the biasing, clamping code per Jose's suggestion.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We need to allocate new space every time to avoid blocking on the last
HiZ op completing. There are two easy ways to do this:
brw_state_batch() and intel_upload_data(). brw_state_batch() is
simpler and avoids another buffer allocation.
Improves Unigine Tropics performance 0.376416% +/- 0.148722% (n=7).
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The ralloc string appending functions were originally intended for
simple, non-hot-path uses like printing to an info log.
Cuts Unigine Tropics load time by around 20% (6 seconds).
v2: Avoid strlen() on every newline, too.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Acked-by: José Fonseca <jfonseca@vmware.com> [v1]
Both callers of rewrite_tail immediately compute the new total string
length by adding the (known) length of the existing string plus the
length of the newly appended text. Unfortunately, callers generally
won't know the length of the new text, as it's printf-formatted.
Since ralloc already computes this length, it makes sense to add it in
and save the caller the effort. This simplifies both existing callers,
but more importantly, will allow for cheap-appending in the next commit.
v2: The link_uniforms code needs both the old and new length.
Apply the obvious fix (which sadly makes it less of a cleanup).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Acked-by: José Fonseca <jfonseca@vmware.com> [v1]
This adds support for all the opcodes needed for native integer
support with GLSL 1.20 enabled, and some of the ones for GLSL1.30
support.
I've split them between non-cpu and cpu along the same lines
Tom's code did for the other ones I think, but I'm open to review
on which ones should go where.
With instance ids fixed I get no regressions on my box here
with LLVM 2.8, will test with later LLVMs as well.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Backends usually advertise a SVGA3D_DEVCAP_MAX_POINT_SIZE between 63 and
256, so an hardcoded max point size of 80 is often incorrect.
This limitation does not apply for anti-aliased points (as they are done
via draw module) but we still advertise the same limit for both, because
all others pipe drivers do.
Reviewed-by: Brian Paul <brianp@vmware.com>
Mesa has a fast path for the generic fallback when using glReadPixels
for RGBA data which uses memcpy. However it was really difficult to
hit this case because it would not be used if any transferOps are
enabled. Any type apart from floating point or non-normalized integer
types (so any of the common types) would force enabling clamping so
the fast path could not be used. This patch makes it ignore clamping
when determining whether to use the fast path if the data type of the
buffer is an unsigned normalized type because in that case clamping
will not have any effect anyway.
https://bugs.freedesktop.org/show_bug.cgi?id=46631
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Brian Paul <brianp@vmware.com>
postpone unreferences until end of function, as the ones in use will
get naturally dereferenced.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Can't see any reason this wouldn't be better off as an inline.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Some backends may advertise more temps than SVGA3D_TEMPREG_MAX, but the
driver is hardwired to only support up to the value defined by
SVGA3D_TEMPREG_MAX, so clamp to it.
Reviewed-by: Brian Paul <brianp@vmware.com>
r600g is the only driver which has made use of it. The reason the CAP was
added was to fix some piglit tests when the GLSL pass lower_output_reads
didn't exist.
However, not removing output reads breaks the fallback for glClampColorARB,
which assumes outputs are not readable. The fix would be non-trivial
and my personal preference is to remove the CAP, considering that reading
outputs is uncommon and that we can now use lower_output_reads to fix
the issue that the CAP was supposed to workaround in the first place.
KILP instruction inside IF blocks were being lowered to an unconditional
KIL. Since r300 doesn't support branching, when the IF's were lowered
to conditional moves, the KIL would always be executed. This is not a
problem with the mesa state tracker, because the GLSL compiler handles
lowering IF's, but this bug was appearing in the VDPAU state tracker,
which does not use the GLSL compiler.
Note: This is a candidate for the stable branches.
The xvmc state tracker is completely seperate and
doesn't shares code or anything else with the
xorg state tracker.
Signed-off-by: Christian König <deathsimple@vodafone.de>
This patch allows the Mac OS X SCons build to complete. The assembly
sources contain psuedo-ops that not are supported on Mac OS X.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We were inverting the meaning of the stencil op flags: in svga/d3d
the normal incr/decr wraps and the SAT ops clamp.
This fixes piglit failures (at least stencil-twoside and stencil-wrap).
We should backport this everywhere we can.
Reviewed-by: Brian Paul <brianp@vmware.com>
Two of the switch cases used PIPE_FORMAT_ tokens instead of SVGA3D_ tokens.
As it happens, the token values are equal for these formats so there's no
net change.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
We always mapped the query buffer in begin_query, causing stalls
if the buffer was busy.
This commit reworks it such that the query buffer is only mapped
in get_query_result as it's supposed to be.
The query buffer is no longer treated as a ring buffer. Instead, the results
are just appended and when the buffer is full, we create a new one. One query
can have more than one query buffer, though that's a very rare case.
Begin_query releases all query buffers.
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
From http://www.opengl.org/registry/specs/ARB/seamless_cube_map.txt:
Accepted by the <cap> parameter of Enable, Disable and IsEnabled,
and by the <pname> parameter of GetBooleanv, GetIntegerv, GetFloatv
and GetDoublev:
TEXTURE_CUBE_MAP_SEAMLESS 0x884F
This caused a change in enums.c, which is manually built from the .xml
files.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The linked list of memory allocations was not protected by a mutex.
This lead to sporadic failures with multi-threaded apps.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This fixes another case of faulting when freeing a pipe_sampler_view
that belongs to a previously destroyed context.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Basically, instead of immediately freeing deleted surfaces, hang onto
them in a cache to do quick re-allocation. This helps when surfaces
are frequently destroyed and then reallocated a bit later.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
There was a SVGA_HOST_SURFACE_CACHE_BYTES symbol, but it was never
used.
Now when we go to add a newly deleted surface to the cache we check
if the cache size would be exceeded. If so, try to free the least
recently "unused" surfaces until the cache is smaller. If we can't
do that, simply don't cache the newly deleted surface. The alternative
involves flushing and waiting and we don't want to do that.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Before, if shader translation failed for any reason we'd keep trying
to translate the shader over and over again during state validation.
The dummy fragment shader emits solid red so that might be visual
clue that translation is failing.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The assertion recently added in dst_register() was invalid because that
function is also (suprisingly) used to declare constant registers.
Move the assertion to the callers where we're really creating temp
registers and add some code to prevent emitting invalid temp register
indexes for release builds.
Also, update the comment for get_temp(). It didn't return -1 if it
ran out of registers and none of the callers checked for that.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
And assert on the register index in dst_register(). The dest can
only be an output or temp reg and there's more of the later.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Commit 980f6f1 (mesa: move gl_texture_image::Width/Height/DepthScale
fields to swrast) moved the initialization of the Width, Height, and
DepthScale fields to _swrast_alloc_texture_image_buffer(). However,
i915 doesn't call this function because it performs its own buffer
allocation. As a result, the Width, Height, and DepthScale fields
weren't getting initialized properly, and some operations requiring
swrast would fail.
This patch ensures that Width, Height, and DepthScale are properly
initialized by separating the code that sets them into a new function,
_swrast_init_texture_image(), which is called by
intel_alloc_texture_image_buffer() as well as
_swrast_alloc_texture_image_buffer(). It also moves the
initialization of _IsPowerOfTwo into this function.
Fixes piglit test fbo/fbo-cubemap on i915.
Partially fixes https://bugs.freedesktop.org/show_bug.cgi?id=41216
This is a candidate for the 8.0 branch.
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
GBM needs the buffer format in order to communicate with DRM and clients
for things like scanout.
So track the DRI format requested in the various back ends and use it to
return the DRI format back to GBM when requested. GBM will then map
this into the GBM surface type (which is in turn based on the DRM fb
format list).
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
These were rotting in an internal branch, but contain nothing confidential,
and would be much more useful if kept up-to-date with latest gallium
interface changes.
Several authors including Keith Whitwell, Zack Rusin, and Brian Paul.
In the gen6 GS case, we were under-counting and so other state would
get smashed. In the VS case, we were over-counting, so everything was
fine.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
This was copy and paste from the VS where I had similar code. We're
only looking at things derived from BRW_NEW_VERTEX_PROGRAM in this
block.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
This is a state which is derived from other states and is actually the first
state which doesn't correspond to any gallium state.
There are two state flags:
bool occlusion_query_enabled
bool flush_depthstencil_enabled
Additional flags can be added later if needed, e.g. bool hiz_enabled.
The emit function will have to figure out the register values by itself.
It basically just emits the registers when the state changes.
This commit also adds a few helper functions for writing registers directly
into a command stream.
This is the first pure command buffer. It contains CS initialization
packets and emits invariant state (i.e. the registers which never or rarely
change).
The affected registers are removed from *_hw_context.c, so that both ways
of emitting commands can co-exist.
v2: emit context_control in cayman's start_cs too
Suggested by José.
We don't provide shader caching in CSO. Most of the time the api provides
object semantics for shaders anyway, and the cases where it doesn't
(eg mesa's internall-generated texenv programs), it will be up to
the state tracker to implement their own specialized caching.
Improves VS state change microbenchmark performance by 7.08729% +/-
1.22289% (n=10) on gen7, because we don't upload the 64 dwords of
unused binding table any more.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a step toward making the samplers/binding tables reflect
sampler uniform mappings instead of embedding those in the programs.
No significant performance difference on the microbenchmark (n=10).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We always say no. Improves VS state change microbenchmark performance
7.68747% +/- 1.40826% (n=10).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With this and the previous patch, 640x480 nexuiz is running 0.169118%
+/- 0.0863696% faster (n=121). On a VS state change microbenchmark,
performance is increased 8.28645% +/- 0.460478% (n=52).
v2: Fix CACHE_NEW_VS comment.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reduces recomputation of state based on non-clipping-related
transform changes, and is a step toward removing VUE map
recomputation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For a 1D texture array, the border only applies to the width. For a 2D
texture array the border applies to the width and height but not the depth.
Sucha cases were not handled correctly in _mesa_init_teximage_fields().
Note: This is a candidate for stable branches
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tracing function entry/exits is a bit pointless
when VDPAU_TRACE=1 does the same thing.
v2: use WARN instead of ERR for application problems
Signed-off-by: Christian König <deathsimple@vodafone.de>
Like TGSI_OPCODE_ARL, destination should be an integer.
This fixes invalid LLVM IR on an internal state tracker (currently Mesa
never emits this opcode).
In the future consider making ADDR register also a integer-as-float array,
like all other register kinds, or simply replace ADDR & ARR/ARL with
integer temp and instructions.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fixes this GCC warning.
native_drm.c:153:1: warning: ‘drm_display_authenticate’ defined but not
used [-Wunused-function]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Avoid setting dirty state flags when enabling or disabling a vertex
attribute arrays when there's no change.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
If you're resorting to the dummy shader, you've probably already turned
off SIMD16 mode. But if you didn't, it would die in a fire.
We could either fail to compile in SIMD16 mode...or just fix it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The dummy FB write failed to specify EOT and a message length, causing
the GPU to hang. Now we can enjoy "everyone's favorite color" again.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes this GCC warning.
mask.c: In function ‘mask_layer_fill’:
mask.c:387:12: warning: variable ‘alpha_color’ set but not used
[-Wunused-but-set-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Fixes these GCC warnings.
glx_api.c: In function ‘choose_visual’:
glx_api.c:678:8: warning: variable ‘trans_value’ set but not used
[-Wunused-but-set-variable]
glx_api.c:677:8: warning: variable ‘trans_type’ set but not used
[-Wunused-but-set-variable]
glx_api.c:663:8: warning: variable ‘min_ci’ set but not used
[-Wunused-but-set-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Now that we have a index_range_invalid flag, we can just use that rather
than calling vbo_validated_drawrangeelements directly and returning.
NOTE: This is a candidate for release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This failed to take basevertex into account:
If basevertex < 0:
(end + basevertex) might actually be in-bounds while 'end' is not.
We would have clamped in this case when we probably shouldn't.
This could break application drawing.
If basevertex > 0:
'end' might be in-bounds while (end + basevertex) might not.
We would have failed to clamp in this place. There's a comment
indicating the TNL module depends on max_index being in-bounds;
if so, it would likely break horribly.
Rather than trying to clamp correctly in the face of basevertex, simply
delete the clamping code and indicate that we don't have a valid range.
This causes _tnl_vbo_draw_prims to use vbo_get_minmax_indices() to
compute the actual bounds, which is much safer.
NOTE: This is a candidate for release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The application supplied [start, end] range is merely a conservative
hint of the ranges of index values inside the index buffer. There is no
requirement that all vertices in the range [start, end] be referenced.
Passing an 'end' value larger than the maximum legal index is perfectly
acceptible; applications can legally pass 0xffffffff when they don't
have a tighter bound readily available.
Thus, the warning doesn't indicate a correctness issue; it could only
indicate a performance issue. However, it does not even do that.
glDrawRangeElements is designed to optimize non-VBO vertex data uploads
by providing an upper bound on the size of buffers a driver would need
to allocate. With VBOs, the data is already in an uploaded buffer, so
the range doesn't help.
The clincher is: we only know _MaxElement for VBOs. For user-space
arrays, we just set it to 2,000,000,000 (see mesa/main/varray.h:63.)
So we can only check this in the case where it is not useful.
Many applications, including the Unigine demos, currently trigger this
warning, which suggests the applications are buggy when they're actually
fine. Eliminating the warning should confuse users less while not
actually losing any benefit to application developers.
NOTE: This is a candidate for release branches.
Suggested-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
There's a serious trap for drivers: RenderTexture() does not indicate
that the texture is currently bound to the draw buffer, despite
FinishRenderTexture() signaling that the texture is just now being
unbound from the draw buffer.
We were acting as if RenderTexture() *was* the start of rendering and
that we could make texturing incoherent with the current contents of
the renderbuffer. This caused intel oglconform sRGB
Mipmap.1D_textures to fail, because we got a call to TexImage() and
thus RenderTexture() on a texture bound to a framebuffer that wasn't
the draw buffer, so we skipped validating the new image into the
texture object used for rendering.
We can't (easily) make RenderTexture() indicate the start of drawing,
because both our driver and gallium are using it as the moment to set
up the renderbuffer wrapper used for things like MapRenderbuffer().
Instead, postpone the setup of the workaround render target miptree
until update_renderbuffer time, so that we no longer need to skip
validation of miptrees used as render targets. As a bonus, this
should make GL_NV_texture_barrier possible.
(This also fixes a regression in the gen4 small-mipmap rendering since
3b38b33c16, which switched
set_draw_offset from image->mt to irb->mt but didn't move the irb->mt
replacement up before set_draw_offset).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44961
NOTE: This is a candidate for the 8.0 branch.
Infer from the operand the type of value to store.
MOV is untyped but we use the float store path.
v2: make MOV use float store path.
I've had to squash merge the ARL fix to be stored
as an integer in here to avoid regressions in a number
of piglit tests.
From now on ARL stores to an integer just like HW does.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The infers the type of data required using the opcode,
and casts the input to the appropriate type.
So far this only handles non-indirect constant and temporaries.
v2: as per Jose suggestion, fetch immediates via floats
Signed-off-by: Dave Airlie <airlied@redhat.com>
These are used inside the action handlers for the integer opcodes.
v2: use uint_bld/int_bld, drop higher level uint_bld.
Signed-off-by: Dave Airlie <airlied@redhat.com>
For now just pass the current context, but when we want to
store int or unsigned we need to pass those later.
Signed-off-by: Dave Airlie <airlied@redhat.com>
These two functions produce the src/dst types for an opcode.
MOV is special since it can be used to mov float->float and int->int,
so just return VOID.
v2: use a new enum for the opcode type as per Jose's suggestion.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If the texture format is integer, the incoming user data must also be
integer (and similarly for non-integer textures).
NOTE: This is a candidate for the stable branches.
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
I recently discovered this text in the BSpec. It seems wise to comply,
though I haven't observed it to fix anything yet.
Fixes a regression in glean/fbo since 28cfa1fa21.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45221
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes (with the previous commit) piglit GL_ARB_multisample/pushpop.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
In the table of of push/pop attributes, this one doesn't fall under
the enable group.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes build errors like
In file included from glapi_dispatch.c:91:
../../../src/mapi/glapi/glapitemp.h:4641: error: no previous prototype for
'glDrawBuffersNV'
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Lucas Stach <dev@lynxeye.de>
Similar to the previous commit. Also fix incorrect setting of the
sampler view's state after it's created. We need to specify the
first/last_level fields in the template instead.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Rather than the one in st_texture_object. This sampler view really has
no connection to the one used for rendering.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
And remove needless & 0xff in _mesa_pack_uint_24_8_depth_stencil_row().
As suggested by José.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, this function only handled 2D textures.
The fallback texture is used when we try to sample from an incomplete
texture object. GLSL says sampling an incomplete texture should return
(0,0,0,1).
v2: use a 1-texel texture image, per José.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Added in _mesa_pack_uint_24_8_depth_stencil_row(). This could be hit
by something like glDrawPixels(GL_DEPTH_STENCIL, GL_UNSIGNED_INT_24_8)
into a MESA_FORMAT_Z32_FLOAT_X24S8 buffer.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The st_renderbuffer_alloc_storage() function is used to allocate both
window-system buffers and user-created renderbuffers. The later kind
are never directly displayed so don't set PIPE_BIND_DISPLAY_TARGET for
those surfaces.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Commit dc7f449d1a introduced a new method
for avoiding MOVs: try to rewrite the destination of the instruction
that produced the RHS so it writes into the LHS.
Unfortunately, this is not safe for swizzled texturing operations, as
they return a set of four contiguous registers. Consider the following:
(assign (x)
(var_ref vec_ctor_x)
(swiz x (tex vec4 (var_ref m_sampY) (var_ref m_cordY) 0 1 ())))
In this case, the source and destination registers are equal, since
reg_offset is 0 for both. Yet, this is only a partial move: the texture
operation generates four registers, and the LHS only covers one.
Fixes color distortion in XBMC when using GLSL shaders.
NOTE: This is a candidate for the 8.0 branch (with the previous commit).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44333
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Certain instructions write more than one register. Texturing, for
example, returns 4 registers. (We set rlen to 4 even for TXS and float
shadow sampling.) Some math functions return 2. Most return 1.
The next commit introduces a use of this function.
NOTE: This is a candidate for the 8.0 branch (dependency of a fix).
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reallocate/resize decompress FBO only if texture image width/height is
greater than existing decompress FBO width/height.
This is a candidate for stable branches.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
A filter strength of zero or one doesn't make any
sense. Thanks to Andy Furniss for pointing this out.
Signed-off-by: Christian König <deathsimple@vodafone.de>
The virtual address but follow the alignment requirement of the
tiled surface. The bo from handle case is not properly fix. Need
bigger change for a proper fix. Work around that by enforcing 1M
alignment for those bo.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Commit 2e5a1a2 (intel: Convert from GLboolean to 'bool' from
stdbool.h.) converted the "specoffset" local variable (in
intel_tris.c) from a GLboolean to a bool. However, GLboolean was the
wrong type for specoffset--it should have been a GLuint (to match the
declaration of specoffset in struct intel_context).
This patch changes specoffset to the proper type.
Fixes piglit test general/two-sided-lighting-separate-specular.
This is a candidate for stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45917
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It turns out the same messages work on gen7, we were just being paranoid.
Fixes the penumbra shadows mode of Lightsmark since the register
allocation fix.
NOTE: This is a candidate for release branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We just abort later, but at least this should result in more
informative bug reports.
NOTE: This is a candidate for release branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
r300g is able to sleep until a fence completes rather than busywait because
it creates a special buffer object and relocation that stays busy until the
CS containing the fence is finished.
Copy the idea into r600g, and use it to sleep if the user asked for an
infinite wait, falling back to busywaiting if the user provided a timeout.
Note: this is a candidate for the stable branches.
Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch adds the pixel store operations in decompress_texture_image().
decompress_texture_image() is used in glGetTexImage() for compressed
textures with unsigned, normalized values.
It also fixes the failures in intel oglconform pxstore-gettex due to
following sub test cases:
- Test all mipmaps with byte swapping enabled
- Test all small mipmaps with all allowable alignment values
- Test subimage packing for all mipmap levels
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40864
Note: This is a candidate for stable branches
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
X86Target is a variable, and therefore isn't defined at compile time. So
LLVM_NATIVE_ARCH == X86Target
is translated into
0 == 0
and since X86 is first, we always pick it.
Therefore we replace the logic with PIPE_ARCH_*.
https://bugs.freedesktop.org/show_bug.cgi?id=45420
Fixes a regression from commit 660ed923de.
The basic idea is to look at the format of the dest renderbuffer and
choose either GLubyte or GLfloat for colors. The previous code used
_mesa_format_to_type_and_comps() which could return a bunch types other
than ubyte/float.
Determine the datatype at renderbuffer mapping time to avoid frequent
calls to the format query functions.
NOTE: This is a candidate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45578
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45577
Fix build with llvm-3.1svn.
llvm-3.1svn r149918 changed BufferMemoryObject::getExtent and
BufferMemoryObject::readByte from const member functions to non-const
member functions in include/llvm/Support/MemoryObject.h.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Ironlake appears to check our pointer against the General State Base
Address upper bound, rather than ignoring the zero bound as it ought.
Unfortunately, since we leave GSBA set to zero, there is no logical
upper bound. Set it to the maximum possible value, which should work
since our virtual addresses only go up to 2GB.
+94 piglits.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=28924
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Improves nexuiz performance 0.65% +/- .10% (n=5) on my gen6, and .39%
+/- .11% (n=10) on gen7. No statistically significant performance
difference on warsow (n=5, but only one shader has MADs).
v2: Add support for MADs in 16-wide by using compression control.
v3: Don't generate MADs when it will force an immediate to be moved to a temp.
(it's not clear whether this is a win or not, but it should result in less
questionable change to codegen compared to v2).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2)
Our only instruction with a 3rd source so far was linterp, and that
value was never register-allocated.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
WGL_ARB_pixel_format establishes the existence of pixel formats which
are invisible to GDI.
However we still need to pass a valid pixelformat to GDI, so that
context creation/binding works.
The actual WGL_TYPE_RGBA_FLOAT_ARB implementation is from Brian Paul.
The mapping from TEXTURE_x_INDEX to GL_TEXTURE_x was broken in
alloc_proxy_textures() because the elements in the targets[] array
were in the wrong order.
This didn't actually cause any failures since we never really use the
proxy texture's Target field. But let's get it right.
NOTE: This is a candidate for the 8.0 branch.
Use the float tables instead. Pixel maps are seldom used so this
shouldn't be a big deal. Next, we can get rid of the gl_pixelmap::Map8
array.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
There's a mismatch in row strides for compressed textures between
what Driver.MapTextureImage() returns and what the software fetch-texel
functions use. Move it down a layer. The next step would be to fix
this in the fetch-texel functions.
Just use pow() instead. Spot lights aren't too common and fixed-function
lighting isn't as important as it used to me.
This saves 32KB per context. Each table was 4KB and there's 8 lights.
This is a shader based median filter, generally
used for noise reduction, it could still need some
improvements, but should usually work out of the box.
Signed-off-by: Christian König <deathsimple@vodafone.de>
The wm max threads is in the same dword as the dispatch enable. The
hardware gets super angry if you set max threads to 0, even if you
aren't dispatching threads.
Avoid unrollong loops that are either nested loops or
where the loop body times the unroll count is huge.
The change is far from being perfect but it extends the
loop unrolling decision heuristic by some additional
safeguard. In particular this cuts down compilation of
a shader precomputing atmospheric scattering integral
tables containing two nesting levels in a loop from
something way beyond some minutes (I never waited for
it to finish) to some fractions of a second.
This fixes piglit tests glsl-fs-unroll-explosion and
glsl-vs-unroll-explosion on r600g.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
width, height parameter in glTexImage2D() includes: texture image
width + 2 * border (if any). So when doing the texture size check
in _mesa_test_proxy_teximage() width and height should not exceed
maximum supported size for target texture type + 2 * border.
i.e. 1 << (ctx->Const.MaxTextureLevels - 1) + 2 * border
Texture border is anyway stripped out before it is given to intel
or gallium drivers.
This patch fixes Intel oglconform test case:
max_values negative.textureSize.textureCube
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44970
Note: This is a candidate for mesa 8.0 branch.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
If we have no more enabled samplers and we've reset all the previously
used ones, no need to keep going around this loop.
(just moved some stuff around to clean it up a bit).
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
From what I can see we were taking the debug path all the time,
when we probably only want it for enable debug path.
Signed-off-by: Dave Airlie <airlied@redhat.com>
We don't want our VBOs mapped when we're drawing. This change checks
if the vertex store VBO is mapped before we execute a list, unmaps it,
then remaps it after drawing. This situation pops up when building a
nested display list in GL_COMPILE_AND_EXECUTE mode.
Reviewed-by: Eric Anholt <eric@anholt.net>
Something has gone wrong if swrast is requested but cannot be
loaded. The user really should be made aware of this, (and instructed
to set LIBGL_DEBUG for more details).
The wording of this error message is updated from "reverting to
indirect rendering" to the more objectively descriptive "failed to
load driver: swrast". The former wording makes assumptions about what
the calling code will decide to do next, rather than simply describing
what went wrong within the current function. The new wording is
consistent with the critical errors recently added for hardware
drivers that fail to load.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Something has gone wrong if we were asked to load a driver of a
specific name, but it failed to load for some reason. The user really
should be made aware of this, (and instructed to set LIBGL_DEBUG for
more details).
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Sometimes an error is so sever that we want to print it even when the
user hasn't specifically requested debugging by setting LIBGL_DEBUG.
Add a CriticalErrorMessageF macro to be used for this case. (The error
message can still be slienced with the existing LIBGL_DEBUG=quiet).
For critical error messages we also direct the user to set the
LIBGL_DEBUG environment variable for more details.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
The description of ErrorMessageF was misleading in the case of
LIBGL_DEBUG being unset, (the previous comment could be understood to
mean the error should be printed, but the code does not print in this
case).
InfoMessageF previously had no comment at all.
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
The build was broken by the line below, added in commit 4f82fed4.
s_expression.cpp:26: #include <limits>
Mesa's half of the fix is to add 'external/astl/include' to the include
path. The other half of the fix requires implementing
numeric_limits<float>::infinity() in astl, for which I have patches
submitted upstream for review.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Outputs should be treated in the same way as
inputs and temporaries here.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
If we had no vertex textures or samplers previously and we have none now,
don't bother doing the enables dance.
I was profiling nexuiz on noop and noticed these two functions in the
profile, this drops their usage from 0.86% to 0.03% and 0.23% to 0.03%
for texture and samplers.
Signed-off-by: Dave Airlie <airlied@redhat.com>
We were doing saturate-based clamping on the [0,width] or [0,height]
coordinate, which meant only the first pixel was addressable.
Fixes piglit ARB_texture_rectangle/texwrap-RECT-bordercolor
NOTE: This is a candidate for the 8.0 release branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We should be able to merge self-move instruction into the MRF move
anyway, and this simplifies things for the next commit.
NOTE: This is a candidate for the 8.0 release branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The HiZ op was implemented as a meta-op. This patch reimplements it by
emitting a special HiZ batch. This fixes several known bugs, and likely
a lot of undiscovered ones too.
==== Why the HiZ meta-op needed to die ====
The HiZ op was implemented as a meta-op, which caused lots of trouble. All
other meta-ops occur as a result of some GL call (for example, glClear and
glGenerateMipmap), but the HiZ meta-op was special. It was called in
places that Mesa (in particular, the vbo and swrast modules) did not
expect---and were not prepared for---state changes to occur (for example:
glDraw; glCallList; within glBegin/End blocks; and within
swrast_prepare_render as a result of intel_miptree_map).
In an attempt to work around these unexpected state changes, I added two
hooks in i965:
- A hook for glDraw, located in brw_predraw_resolve_buffers (which is
called in the glDraw path). This hook detected if a predraw resolve
meta-op had occurred, and would hackishly repropagate some GL state
if necessary. This ensured that the meta-op state changes would not
intefere with the vbo module's subsequent execution of glDraw.
- A hook for glBegin, implemented by brwPrepareExecBegin. This hook
resolved all buffers before entering
a glBegin/End block, thus preventing an infinitely recurring call to
vbo_exec_FlushVertices. The vbo module calls vbo_exec_FlushVertices to
flush its vertex queue in response to GL state changes.
Unfortunately, these hooks were not sufficient. The meta-op state changes
still interacted badly with glPopAttrib (as discovered in bug 44927) and
with swrast rendering (as discovered by debugging gen6's swrast fallback
for glBitmap). I expect there are more undiscovered bugs. Rather than play
whack-a-mole in a minefield, the sane approach is to replace the HiZ
meta-op with something safer.
==== How it was killed ====
This patch consists of several logical components:
1. Rewrite the HiZ op by replacing function gen6_resolve_slice with
gen6_hiz_exec and gen7_hiz_exec. The new functions do not call
a meta-op, but instead manually construct and emit a batch to "draw"
the HiZ op's rectangle primitive. The new functions alter no GL
state.
2. Add fields to brw_context::hiz for the new HiZ op.
3. Emit a workaround flush when toggling 3DSTATE_VS.VsFunctionEnable.
4. Kill all dead HiZ code:
- the function gen6_resolve_slice
- the dirty flag BRW_NEW_HIZ
- the dead fields in brw_context::hiz
- the state packet manipulation triggered by the now removed
brw_context::hiz::op
- the meta-op workaround in brw_predraw_resolve_buffers (discussed
above)
- the meta-op workaround brwPrepareExecBegin (discussed above)
Note: This is a candidate for the 8.0 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43327
Reported-by: xunx.fang@intel.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44927
Reported-by: chao.a.chen@intel.com
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
If size is small (such as 1),
pitch = ROUND_DOWN_TO(MIN2(size, (1 << 15) - 1), 4);
makes pitch = 0. Then
height = size / pitch;
causes a division-by-zero exception. If pitch is zero, set height to
1 and avoid the division.
This fixes piglit's bin/getteximage-formats test and glean's
bufferObject test.
NOTE: This is a candidate for the 8.0 release branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44971
There are cases where a buffer can be mapped while another buffer is
flushed. This can happen in the CopyPixels meta-op path for piglit's
fbo-mipmap-copypix. After some discussion with Eric, it seems this
assertion is no longer necessary, and it has always been too strict.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43328
Cc: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This was only used by glReadPixels and glDrawPixels. Now those
functions do the corresponding error checks.
Signed-off-by: Brian Paul <brianp@vmware.com>
Basically the same story as the previous commit. But we were
already calling _mesa_source_buffer_exists() in ReadPixels().
Yeah, we were calling it twice.
Signed-off-by: Brian Paul <brianp@vmware.com>
The _mesa_error_check_format_type() function does two things: check
that format/type is legal and check that the destination (or source
buffer for glReadPixels) actually exists. Just move the relevant
parts of that into _mesa_DrawPixels().
We'll do a similar change in glReadPixels then get rid of the function
altogether.
Signed-off-by: Brian Paul <brianp@vmware.com>
This replaces the _mesa_is_legal_format_and_type() function.
According to the spec, some invalid format/type combinations to
glDrawPixels, ReadPixels and glTexImage should generate
GL_INVALID_ENUM but others should generate GL_INVALID_OPERATION.
With the old function we didn't make that distinction and generated
GL_INVALID_ENUM errors instead of GL_INVALID_OPERATION. The new
function returns one of those errors or GL_NO_ERROR.
This will also let us remove some redundant format/type checks in
follow-on commit.
v2: add more checks for ARB_texture_rgb10_a2ui at the top of
_mesa_error_check_format_and_type() per Ian.
Signed-off-by: Brian Paul <brianp@vmware.com>
Tiled surface have all kind of alignment constraint that needs to
be met. Instead of having all this code duplicated btw ddx and
mesa use common code in libdrm_radeon this also ensure that both
ddx and mesa compute those alignment in the same way.
v2 fix evergreen
v3 fix compressed texture and workaround cube texture issue by
disabling 2D array mode for cubemap (need to check if r7xx and
newer are also affected by the issue)
v4 fix texture array
v5 fix evergreen and newer, split surface values computation from
mipmap tree generation so that we can get them directly from the
ddx
v6 final fix to evergreen tile split value
v7 fix mipmap offset to avoid to use random value, use color view
depth view to address different layer as hardware is doing some
magic rotation depending on the layer
v8 fix COLOR_VIEW on r6xx for linear array mode, use COLOR_VIEW on
evergreen, align bytes per pixel to a multiple of a dword
v9 fix handling of stencil on evergreen, half fix for compressed
texture
v10 fix evergreen compressed texture proper support for stencil
tile split. Fix stencil issue when array mode was clear by
the kernel, always program stencil bo. On evergreen depth
buffer bo need to be big enough to hold depth buffer + stencil
buffer as even with stencil disabled things get written there.
v11 rebase on top of mesa, fix pitch issue with 1d surface on evergreen,
old ddx overestimate those. Fix linear case when pitch*height < 64.
Fix r300g.
v12 Fix linear case when pitch*height < 64 for old path, adapt to
libdrm API change
v13 add libdrm check
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Refine 80aa78142d "dri: make sure to build libdricommon.la"
so we don't build libdricommon if we aren't building a dri driver which needs it (i.e.
if we are just building swrast)
In particular, this restores the ability to build the swrast dri driver without having to
have a xf86drm.h
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
in check_index_bounds the comparison needs to be "greater equal" since
contrary to the name _MaxElement is the count of the array (this matches
similar code in vbo_exec_DrawRangeElementsBaseVertex).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes the build of builtin_compiler on my 32-bit build where xcb-dri2
is in a custom prefix but the custom prefix flags weren't available.
It shouldn't have been in LIBS anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This checks for advertised LLC support by the GPU instead of relying on
the GPU generation for detection.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Rely on libdrm HAS_LLC parameter to verify if hardware supports it. In
case the libdrm version does not supports this check, fallback to older
way of detecting it which assumed that GPUs newer than GEN6 have it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Was previously being done in a state-tracker, but in a way which was
difficult for some drivers to optimize. Push down to this level and
make it the individual drivers responsibility.
FBOs differ from textures in a significant way. With textures, we can
strip the border and get correct rendering except when the application
fetches texels outside [0,1].
With an FBO, the pixel at (0,0) is in the border. The
ARB_framebuffer_object spec says:
"If the attached image is a texture image, then the window
coordinates (x[w], y[w]) correspond to the texel (i, j, k), from
figure 3.10 as follows:
i = (x[w] - b)
j = (y[w] - b)
k = (layer - b)
where <b> is the texture image's border width..."
Since the border doesn't exist, we can never render any pixels in the
correct location. Just mark these FBOs FRAMEBUFFER_UNSUPPORTED.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42336
Ever since xserver commit 531869448d07e00ae241120b59f3aaaa5709d59c,
the server no longer sends invalidate events to clients, unless they
have performed a GetBuffers request since the drawable was last
invalidated.
If the drawable gets invalidated immediately after the GetBuffers
request was processed by the X server, it's possible that Xlib
will process the invalidate event while waiting for the GetBuffers
reply. So the server, thinking the client knows that the buffers
are invalid, is waiting for another GetBuffers request before
sending any more invalidate events. The client, on the other hand,
believes the buffers to be valid, and thus is expecting to receive
another invalidate event before it has to send another GetBuffers
request. The end result is that the client never again sends
a GetBuffers request.
To avoid this problem, take a snapshot of the lastStamp before
doing GetBuffers, and retry if the snapshot and the current
lastStamp no longer match after the GetBuffers reply has been
processed.
Signed-off-by: Ville Syrjälä <syrjala@sci.fi>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The error message I chose matches gcc's error. Fixes piglit
switch-case-duplicated.vert.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Otherwise, the upcoming error messages said the location was 0:0(0).
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's not quite spelled out in the spec text, but the grammar indicates
that only constant values are allowed as switch() case labels (and
only constant values make sense, anyway).
Fixes piglit glsl-1.30/compiler/switch-statement/switch-case-uniform-int.vert.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This stuffs them all in a struct for sanity. Fixes piglit
glsl-1.30/execution/switch/fs-uniform-nested.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For all the extension entrypoints using the get_buffer() helper, they
wanted the same error handling. In some cases, the error was doing
the same error return whether target was a bad enum, or a user buffer
wasn't bound.
(Actually, GL_ARB_map_buffer_range doesn't specify the error for a zero
buffer being bound for MapBufferRange, though it does for
FlushMappedBufferRange. This appears to be an oversight).
Fixes piglit GL_ARB_copy_buffer/negative-bound-zero.
Reviewed-by: Brian Paul <brianp@vmware.com>
Even though it should be safe to use them for one frame, better be sure.
Suggested by Michael Dänzer.
NOTE: This is a candidate for the 8.0 stable branch.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
This prevents a possible lapse of the depth buffer - the situation where
the app and pp have different depth buffers.
NOTE: This is a candidate for the 8.0 stable branch.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
In commit 6ecee54a9a a call to
talloc_reference was replaced with a call to talloc_steal. This was in
preparation for moving to ralloc which doesn't support reference
counting.
The justification for talloc_steal within token_list_append in that
commit is that the tokens are being copied already. But the copies are
shallow, so this does not work.
Fortunately, the lifetime of these tokens is easy to understand. A
token list for "replacements" is created and stored in a hash table
when a function-like macro is defined. This list will live until the
macro is #undefed (if ever).
Meanwhile, a shallow copy of the list is created when the macro is
used and the list expanded. This copy is short-lived, so is unsuitable
as a new parent.
So we can just let the original, longer-lived owner continue to own
the underlying objects and things will work.
This fixes bug #45082:
"ralloc.c:78: get_header: Assertion `info->canary == 0x5A1106'
failed." when using a macro in GLSL
https://bugs.freedesktop.org/show_bug.cgi?id=45082
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for stable release branches.
This test cases exposes a bug as described in this bug report:
"ralloc.c:78: get_header: Assertion `info->canary == 0x5A1106'
failed." when using a macro in GLSL
https://bugs.freedesktop.org/show_bug.cgi?id=45082
Clearly, some memory is getting (incorrectly) freed on the first macro
invocation, leading to problems with the second macro invocation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The trick here is that flex always chooses the rule that matches the most
text. So with a input text of "two:" which we want to be lexed as an
IDENTIFIER token "two" followed by an OTHER token ":" the previous OTHER
rule would match longer as a single token of "two:" which we don't want.
We prevent this by forcing the OTHER pattern to never match any
characters that appear in other constructs, (no letters, numbers, #,
_, whitespace, nor any punctuation that appear in CPP operators).
Fixes bug #44764:
GLSL preprocessor doesn't replace defines ending with ":"
https://bugs.freedesktop.org/show_bug.cgi?id=44764
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for stable release branches.
GL_RG_INTEGER only has two components, not three. I'll be surprised
if anyone ever tries to glReadPixels(..., GL_SHORT, GL_RG_INTEGER,
...). This was found by inspection.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With 0963990 the flag was only set when Bind created the object. In
all cases where ::ARBsemantics could be true, this path never
happened. Instead, add a _Used flag to track whether a VAO has ever
been bound. On the first Bind, set the _Used flag, and set the
ARBsemantics flag to the correct value.
NOTE: This is a candidate for release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=45423
This is a hack, and it will result in incorrect rendering. However,
it does eliminate spurious warnings in several piglit CopyPixels tests
that involve floating-point depth buffers.
The real solution is to add a zf field to SWspan to store float Z
values. When a float depth buffer is involved, swrast should also
populate the zf field. I'll consider this post-8.0 work.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Now that the draw module avoids flushing, it may flush precisely when
binding a NULL shader, so care must be taken when restoring the original
fragment shader.
Reviewed-by: Brian Paul <brianp@vmware.com>
When GLAPIENTRY is __stdcall (ie Windows), the stack is popped by the
callee making the number/type of arguments significant, therefore
using a generic no-op causes stack corruption for many entry-points.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
width, height parameter in glTexImage2D() includes: texture image
width + 2 * border (if any). So when doing the texture size check
in _mesa_test_proxy_teximage() width and height should not exceed
maximum supported size for target texture type.
i.e. 1 << (ctx->Const.MaxTextureLevels - 1)
Texture border is anyway stripped out before it is given to intel
or gallium drivers.
This patch fixes Intel oglconform test case: max_values
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44970
Note: This is a candidate for mesa 8.0 branch.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
In certain situations API's will call pipe->clear which doesn't
require fragment shader, but then we'd try to verify the pipeline
and assume fragment shader was always set. This was leading to
crash when API would just call simple clear's before anything else.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The node_attrsz[] array is initially copied from the node->attrsz[]
array but some values get rewritten. Thereafter, we need to use the
node_attrsz[] values.
Fixes a bug when replaying a display list that uses generic vertex
array[16] (at least).
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The warnings were:
nv50_pc_regalloc.c: In function ‘pass_generate_phi_movs’:
nv50_pc_regalloc.c:423:41: warning: array subscript is above array bounds
codegen/nv50_ir_peephole.cpp: In member function ‘bool nv50_ir::MemoryOpt::replaceStFromSt(nv50_ir::Instruction*, nv50_ir::MemoryOpt::Record*)’:
codegen/nv50_ir_peephole.cpp:1475:18: warning: array subscript is above array bounds
codegen/nv50_ir_peephole.cpp:1475:18: warning: array subscript is above array bounds
codegen/nv50_ir_peephole.cpp:1475:18: warning: array subscript is above array bounds
codegen/nv50_ir_peephole.cpp:1475:18: warning: array subscript is above array bounds
And add some assertions to catch this sooner in debug builds.
This fixes a dangling texture object pointer bug hit via wglShareLists().
When we push the GL_TEXTURE_BIT state we may push references to the default
texture objects which are owned by the gl_shared_state object. We don't
want to accidentally delete that shared state while the attribute stack
references shared objects. So keep a reference to it.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This cleans up the reference counting of shared context state.
The next patch will use this to fix an actual bug.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This also significantly improves the RV670 flush by using the CB1 flush
*always* and also DEST_BASE_0_ENA, which appears to magically fix some tests.
I am not entirely sure, but it's possible that RV670 flushing is fixed
completely.
v2: fix cayman by flushing texture cache instead of vertex cache
Thanks to Dave Airlie for testing Cayman.
Commit 99476561 (automake: src/glsl and src/glsl/glcpp) changed the
build system so that src/glsl/glsl_test is not built by default. This
inadvertently broke "make check", since the tests in
src/glsl/tests/lower_jumps (which are run by "make check") rely on
glsl_test.
This patch ensures that "make check" builds glsl_test before running
any tests.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes these GCC warnings.
osmesa.c: In function ‘osmesa_renderbuffer_storage’:
osmesa.c:417: warning: comparison is always false due to limited range of data type
osmesa.c:423: warning: comparison is always false due to limited range of data type
osmesa.c:431: warning: comparison is always false due to limited range of data type
osmesa.c:437: warning: comparison is always false due to limited range of data type
osmesa.c:447: warning: comparison is always false due to limited range of data type
osmesa.c:453: warning: comparison is always false due to limited range of data type
osmesa.c:463: warning: comparison is always false due to limited range of data type
osmesa.c:466: warning: comparison is always false due to limited range of data type
osmesa.c:476: warning: comparison is always false due to limited range of data type
osmesa.c:479: warning: comparison is always false due to limited range of data type
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
Success was (tests-passed AND valgrind-tests-passed) but this meant that
if the valgrind tests weren't run it would be considered a failure.
The logic is now (tests-passed AND (!valgrind OR valgrind-tests-passed))
which lets us return success if the valgrind tests aren't run.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Needed for automake. Using AC_PROG_PATH(bison/flex) causes automake to
fail to build .y and .l files.
It is up to the builder to use bison/flex instead of yacc/lex.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Exporting a publicly visible class with a generic name like
"variable_entry" via ir_variable_refcount.h is kind of mean.
Many IR transformers would like to define their own "variable_entry"
class. If they accidentally include this header, the compiler/linker
may get confused and try to instantiate the wrong variable_entry class,
leading to bizarre runtime crashes.
The hope is that renaming this one will allow .cpp files to safely
declare and use their own file-scope "variable_entry" classes.
This avoids crashes caused by converting src/glsl to automake.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-and-tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This uses point size clamping to force point size to a particular value,
making the vertex shader output irrelevant.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
We don't set the other bits anywhere else except the other DSA states,
which are mutually-exclusive with this one.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This fixes the gl_PointSize transform feedback test.
Point size clamping should happen at the rasterizer stage,
i.e. after the vertex and geometry shaders and transform feedback.
Drivers are expected to do this by themselves.
Simplifies the general case code in the ubyte-valued texture format
functions. More consolidation to come in subsequent commits.
Reviewed-by: Eric Anholt <eric@anholt.net>
Specifially, this being present works around a bug in Unigine
Sanctuary on i965 which previously resulted in bad rendering.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This can be used to work around broken application behavior, like in
Unigine where it attempts to use texture arrays without declaring
either "#extension GL_EXT_texture_array : enable" or "#version 130".
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While typing out the new decode, I added a fallback mode for dumping
when we fail to re-map the BO after execution. This should get us a
minimal dump when trying to dump a batch that results in a GPU hang.
We were allocating registers into the MRF hack region, resulting in
sparkly renering in a few of the scenes. We could do better
allocation by making an MRF class, having MRFs conflict with the
corresponding GRFs, and tracking the live intervals of the "MRF"s and
setting up the conflicts. But this is way easier for the moment.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
After the removal of the dri driver link test, this should help avoid
the original problem that it was designed to catch: The warning about
a missing prototype due to typoing a function name scrolling by in the
Mesa build spew, and you not noticing until you try to run an
application and it falls back to swrast.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The envvar works for R100 and R200 too, and the classic R300 driver
doesn't even exist anymore.
"RADEON_NO_TCL" is already mentioned in the code and is the same envvar
used for the R300g driver.
lp_bld_tgsi_soa.c has been adapted to use this new interface, but
lp_bld_tgsi_aos.c has only been partially adapted, since nothing in
gallium currently uses it.
v2:
- Rename lp_bld_tgsi_action.[ch] => lp_bld_tgsi_action.[ch]
- Initialize tgsi_info in lp_bld_tgsi_aos.c
- Fix copyright dates
Prior commit 576161289d,
the parameter format was bpp, thus both 24bit and 32bit formats were
requested with format set to 32. Handle 24bit seperately now.
Fixes RGBX formats in wayland platform for egl_dri2 (EGL_ALPHA_SIZE=0).
Note: This is a candidate for the 8.0 branch.
This just copies what the LUMINANCE_ALPHA bits do.
Fixes piglit tests on softpipe complaining about missing unpack.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cayman needs some of the MUL instructions spread across a full slot
of vectors.
It also no longer has RECIP_UINT, the recommendation is to replace it
with a U2F + RECIP_IEEE + MUL + F2U.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The warning is absolutely useless. It doesn't actually say that there are
uninitialized variables. It points out the fact that there are missing
initializers and that variables are initialized to zero implicitly, which is
exactly what we want and what we commonly make use of.
C90 and C99 require all unspecified variables in the initializer list to be set
to zero.
The check for ctx->API was unnecessary, because OES extensions are not exposed
in desktop GL.
Also require renderbuffer support for ARB_texture_rgb10_a2ui,
as per the spec.
Tested by comparing old and new glxinfo with softpipe and r600g.
v2: fix bugs
v3: rename need_only_one -> need_at_least_one
rename num_elements -> num_mappings
add comments
use const when appropriate
Reviewed-by: Brian Paul <brianp@vmware.com>
This change is not exactly equivalent (sometimes we checked for non-zero,
sometimes if >0 or >1), but the behavior shouldn't change, because all drivers
report 0 for unsupported CAPs.
Exposing CAP_STREAM_OUTPUT_PAUSE_RESUME without CAP_MAX_STREAM_OUTPUT_BUFFERS
is a driver bug and st/mesa does no checking if the latter is supported as
well. Drivers must report CAPs consistently.
v2: make the array const
v2: handle the cap in r300 and r600 as well
Additional info for r600g:
The env var R600_GLSL130=1 enables GLSL 1.3.
Along with R600_STREAMOUT=1, it enables full GL 3.
Fix an access to uninitialized memory pointed out by valgrind in
glsl_to_tgsi_visitor::simplify_cmp(void).
Note: This is a candidate for the 8.0 branch.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Fix this GCC warning.
draw_pipe_clip.c: In function ‘interp’:
draw_pipe_clip.c:122:13: warning: variable ‘clip_dist’ set but not used
[-Wunused-but-set-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
When rendering to FBO, rendering is inverted. At the same time, we would
also make sure the point sprite origin is inverted. Or, we will get an
inverted result correspoinding to rendering to the default winsys FBO.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44613
NOTE: This is a candidate for stable release branches.
v2: add the simliar logic to ivb, too (comments from Ian)
simplify the logic operation (comments from Brian)
v3: pick a better comment from Eric
use != for the logic instead of ^ (comments from Ian)
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This simplifies the code quite a bit, consolidates some cases and
possibly catches more cases for the memcpy path.
More such changes will follow. Do just a few at a time to help bisect
any possible regressions.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will let us use memcpy in more situations. We can also remove
the checks for byte spapping that happen before the calls to
_mesa_format_matches_format_and_type().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In a recent commit,
commit 1c0f1dd42a
Author: Chad Versace <chad.versace@linux.intel.com>
swrast: Fix fixed-function fragment processing
I defined a new function,_swrast_fragment_program, but neglected
to #include s_fragprog.h for clients of that function.
Note: This is a candidate for the 8.0 branch.
Reported-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The evergreen+ CB no longer supports the following formats
compared to 6xx/7xx:
- COLOR_4_4
- COLOR_3_3_2
- COLOR_6_5_5
- COLOR_8_24_FLOAT
- COLOR_24_8_FLOAT
- COLOR_11_11_10
- COLOR_11_11_10_FLOAT
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On i965, _mesa_ir_link_shader is never called. As a consequence, the
current fragment program (ctx->FragmentProgram->_Current) exists but is
invalid because it has no instructions. Yet swrast continued to attempt to
use the empty program.
To avoid using the empty program, this patch 1) defines a new function,
_swrast_use_fragment_program, which checks if the current fragment program
exists and differs from the fixed function fragment program, and, when
appropriate, 2) replaces checks of the form
if (ctx->FragmentProgram->_Current == NULL)
with
if (_swrast_use_fragment_program(ctx))
Fixes the following oglconform regressions on i965/gen6:
api-fogcoord(basic.allCases.log)
api-mtexcoord(basic.allCases.log)
api-seccolor(basic.allCases.log)
api-texcoord(basic.allCases.log)
blend-separate(basic.allCases)
colorsum(basic.allCases.log)
The tests were ran with the GLXFBConfig:
visual x bf lv rg d st colorbuffer sr ax dp st accumbuffer ms cav
id dep cl sp sz l ci b ro r g b a F gb bf th cl r g b a ns b eat
----------------------------------------------------------------------------
0x021 24 tc 0 32 0 r y . 8 8 8 8 . . 0 24 8 0 0 0 0 0 0 None
(Note: I originally believed that the hunk in
_swrast_update_fragment_program was unnecessary. But it is required to fix
blend-separate.)
Note: This is a candidate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43327
Reveiwed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Color clamping should be enabled in glGetTexImage if texture dataType is
GL_UNSIGNED_NORMALIZED and format is GL_LUMINANCE or GL_LUMINANCE_ALPHA
Fixes 2 Intel oglconform test cases: pxconv-gettex and pxtrans-gettex
https://bugs.freedesktop.org/show_bug.cgi?id=40864
NOTE: This is a candidate for the 8.0 branch
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This was losing bits of precision. Fixes (with the previous commits):
piglit EXT_texture_integer/getteximage-clamping
piglit EXT_texture_integer/getteximage-clamping GL_ARB_texture_rg
oglc advanced.mipmap.upload
Regresses oglc negative.typeFormatMismatch.teximage from fail to
abort, because it's been hitting texstore for a format/type combo that
shouldn't happen.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
In the core, we always treat spans of int/uint data as uint, so this
extract function was truncating storage of integer pixel data to a n
int texture to (0, max_int) instead of (min_int, max_int). There is
probably missing code for handling truncation on conversion between
pixel formats, still, but this does improve things.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Mostly fixes piglit EXT_texture_integer/getteximage-clamping. The
remaining failure involves precision loss on storing of int32 texture
data (something I knew was an issue, but wasn't trying to test).
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This cut and paste is pretty awful. I'm tempted to do a lot of this
using preprocessor tricks for customizing the parameter type from a
template function, but that's just a different sort of hideous.
Fixes 8 Intel oglconform int-textures cases.
NOTE: This is a candidate for the 8.0 branch.
v2: Add alpha formats, too.
Reviewed-by: Brian Paul <brianp@vmware.com>
Otherwise, when you asked for the _BaseFormat of an rb wrapping a
GL_RGB texture, you got GL_RGBA because that's what we were storing
the texture data as.
NOTE: This is a candidate for the 8.0 branch.
Most of this function was just calling
intel_renderbuffer_update_wrapper(), which was called immediately
afterwards in the only caller.
NOTE: This is a candidate for the 8.0 branch.
Fixes piglit ARB_copy_buffer-overlap, on swrast, which previously
assertion failed.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
A pure swrast-allocated buffer gets an irb of NULL, so we segfaulted
in the clear-accum test. Just look at the swrast renderbuffer pointer
for handling swrast rbs.
From the extension spec:
Added to section 5.4, as part of the discussion of which commands
are not compiled into display lists:
"Certain commands, when called while compiling a display list, are
not compiled into the display list but are executed immediately.
These are: ..., RenderbufferStorageMultisampleEXT..."
Fixes piglit EXT_framebuffer_multisample/dlist.
Reviewed-by: Brian Paul <brianp@vmware.com>
Noticed when handling a similar problem in EXT_framebuffer_multisample.
From the EXT_framebuffer_object spec:
Added to section 5.4, as part of the discussion of which commands
are not compiled into display lists:
"Certain commands, when called while compiling a display list, are
not compiled into the display list but are executed immediately.
These are: ..., GenFramebuffersEXT, BindFramebufferEXT,
DeleteFramebuffersEXT, CheckFramebufferStatusEXT,
GenRenderbuffersEXT, BindRenderbufferEXT, DeleteRenderbuffersEXT,
RenderbufferStorageEXT, FramebufferTexture1DEXT,
FramebufferTexture2DEXT, FramebufferTexture3DEXT,
FramebufferRenderbufferEXT, GenerateMipmapEXT..."
Reviewed-by: Brian Paul <brianp@vmware.com>
Constants array is always assumed to be RGBA, which means we need to
swizzle the constant elements into place to match the AoS ordering
(e.g., BGRA) that was passed to lp_build_tgsi_aos().
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Should avoid dangling pointer derreference with
glean --run results --overwrite --quick --tests texSwizzle
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
We just prefix the $CLANG environment variable in configure.ac with acv_mesa_
Found by: tinderbox
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This was horribly broken and has cost everyone more time than we were
ever going to save using it. It might have been fixable, but the
problem it was originally trying to solve can be better solved with
-Werror=missing-prototypes and -Werror=implicit-function-declaration.
Also, it was always producing a big scary warning about how the link
test was non-portable.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44928
Substantially increases performance in GLBenchmark PRO:
- 320x240 => 3.28x
- 1920x1080 => 1.47x
- 2560x1440 => 1.27x
The LD message ignores the sampler unit index and SAMPLER_STATE pointer,
instead relying on hard-wired default state. Thus, there's no need to
worry about running out of sampler units or providing SAMPLER_STATE;
this small patch should be all that's required.
NOTE: This is a candidate for release branches.
(It requires the preceding commit to compile.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
brw_SAMPLE is full of complex workarounds for original Broadwater
hardware, and I'd rather avoid all that for my next Ivybridge patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This function releases the buffer that contains user-space vertex data.
The buffer_offset field points into that buffer. So reset the
buffer_offset to zero when we release the buffer so that subsequent
draws don't inadvertantly get a bad offset.
Fixes error messages / failed assertions (in the draw module's bounds/size
checking code) when running piglit's polygon-mode test.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
-fvisibility=hidden was preventing them from being exported, which
combined with shared-glapi was causing undefined symbol errors at
runtime.
We don't want to make these functions part of the ABI, and given
how simple they are, we simply inline them.
From c998f732d42da5e962fe5da294493132c3e8dc5f Mon Sep 17 00:00:00 2001
From: Lucas Stach <dev@lynxeye.de>
Date: Tue, 24 Jan 2012 09:46:32 +0100
Subject: [PATCH] nvfx: fix nv3x fallout from state validation changes
Apparently nv3x needs some curde hacks to work properly. This
is clearly not the right fix, but it's the behaviour of the old
code and fixes regressions seen by users.
This has the drawback that when creating configure for
distribution, wayland needs to be available for the packager.
Also the the macros has the wayland prefix hardcoded, so
we cant copy it in mesa right now.
In bad applications like ipers which does a lot of draw calls with
no state changes this helps to greatly reduce time spent in prepare.
In ipers around 7% of CPU was spent in various prepare functions,
after this commit no prepare function show on the profile.
This commit also has the added benefit of now grouping all pipelined
drawing into a single draw call if the driver uses vbuf_render.
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Tested-by: Stéphane Marchesin <marcheu@chromium.org>
Previously, max_vs_entries was set to 128 for GT1, and 256 for GT2,
based on the PRM (see Vol2, part1, p28). However, Bspec section 1.6.5
indicates that the maximum number of VS entries is 256 for GT1.
No piglit regressions on GT1.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When storing data in a buffer of type DYNAMIC_DRAW, we don't create a
drm_intel_bo for it; instead we store the data in system memory and
defer allocation of the GPU buffer until it is needed. Therefore, in
brw_update_sol_surface(), we can't just consult the "buffer" field of
the intel_buffer_object structure; we need to call
intel_bufferobj_buffer() to ensure that the deferred allocation
occurs.
This parallels a similar fix for gen7 (see commit ba6f4c9).
Fixes piglit test EXT_transform_feedback/buffer-usage on gen6.
This is a candidate for the 8.0 release branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
It always had the same value as ctx->Extensions.EXT_framebuffer_sRGB.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Strictly speaking, it's not legal to expose EXT_texture_integer without
EXT_gpu_shader4. It might be even dangerous (apps can assume EXT_gpu_shader4
is available without checking for it).
The check in compute_version is removed as well, because that's already
covered by GLSLVersion >= 130.
Reviewed-by: Brian Paul <brianp@vmware.com>
- use OR to combine bind flags
- combine both conditionals into one
- move the ARB_fbo enable where it belongs
Reviewed-by: Brian Paul <brianp@vmware.com>
For ARB_color_buffer_float. Most hardware can't do it and st/mesa is
the perfect place for a fallback.
The exceptions are:
- r500 (vertex clamp only)
- nv50 (both)
- nvc0 (both)
- softpipe (both)
We also have to take into account that r300 can do CLAMPED vertex colors only,
while r600 can do UNCLAMPED vertex colors only. The difference can be expressed
with the two new CAPs.
A current incomplete framebuffer was incorrectly used as a
st_framebuffer. When accessing st_framebuffer childs bad things happen:
e.g. st_framebuffer::iface was used to check whether its an incomplete
fb, instead we need to compare st_framebuffer::Base against
mesa_get_incomplete_framebuffer.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44919
Note: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes Intel oglconform negative.typeFormatMismatch.copyteximage.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This is part of fixing Intel oglconform
negative.typeFormatMismatch.copyteximage.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This code is unprepared for handling integer (particularly, the
baseFormat of the TexFormat comes out as GL_RGBA, not GL_RGBA_INTEGER,
so the direct call of Driver.ReadPixels crashes due to the int vs
non-int error checking not having happened). I'm frankly tempted to
convert this code to MapRenderbuffer/MapTexImage rather than doing it
as meta ops, now that we have that support.
Improves the remaining crash in Intel oglconform for int-textures to
just a rendering failure.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This aborts and crashes in intel oglconform's int-textures into being
just rendering failures. Clamping isn't handled yet.
v2: Add missing "break".
v3: Drop the int/uint distinction, since they don't need different clamping.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com> (v2)
Similarly to how we handle this in texstore, we have to remap height
to depth so that we MapTextureImage each image layer individually.
Fixes part of Intel oglconform's int-textures advanced.fbo.rtt
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This is a step toward fixing Intel oglconform's
int-textures advanced.fbo.rtt.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
This doesn't result in correct rendering -- GL requires that logic ops
work, while the hardware specs say it doesn't do them. I'm not sure
how we would want to handle this.
NOTE: This is a candidate for the 8.0 branch.
When we're actually rendering into a texture, map the texture image
instead of the corresponding renderbuffer. Before, we just copied
a pointer from the texture image to the renderbuffer. This change
will make the code usable by hardware drivers.
ctx->Driver.MapTexture() always points to _swrast_map_texture().
We're already reaching into swrast from t_vb_program.c anyway.
This will let us remove the ctx->Driver.Map/UnmapTexture() functions.
These are temporary, actually, but they'll make follow-on work easier to
implement in a step-by-step manner. Eventually the Map and RowStrideBytes
fields will go into a new swrast_renderbuffer type, but adding that type
now would involve touching a _lot_ of code that'll eventually be removed.
The fields marked as obsolete will go away completely at some point.
That field is only used by swrast code so there's no reason to mess
with it in the gallium state tracker.
This also lets us remove the unused st_format_data() type function and
related code.
When ARB VAOs are used, glPopClientAttrib does not resurrect a deleted
VAO or VBO. This difference between the two spec is, unfortunately,
not very well spelled out in the specs.
Fixes oglc vao(advanced.pushPop.deleteVAO) and
vao(advanced.pushPop.deleteVBO) tests.
NOTE: This is a candidate for release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There are more differences between Apple and ARB than just requiring
that all arrays be stored in VBOs. Additional uses will be added in
following commits.
Also, set the flag at Bind time instead of Gen time. The ARB_vao spec
specifies that behavior.
NOTE: This is a candidate for release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This is a hack to work around drivers such as i965 that:
- Set _MaintainTexEnvProgram to generate GLSL IR for
fixed-function fragment processing.
- Don't call _mesa_ir_link_shader to generate Mesa IR from the
GLSL IR.
- May use swrast to handle glDrawPixels.
Since _mesa_ir_link_shader is never called, there is no Mesa IR to
execute. Instead do regular fixed-function processing.
Even on platforms that don't need this, the software fixed-function
code is much faster than the software shader code.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44749
At least one place, the _mesa_need_secondary_color function in
state.h, uses this to make decisions. The next patch in this series
will add another dependency. Ideally, this field would go away and be
replace by a flag or something.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
When rowstride was negatie, unsigned promotion caused a segfault here:
299│ if (rb->Format == MESA_FORMAT_S8) {
300│ const GLuint rowStride = rb->RowStride;
301│ for (i = 0; i < count; i++) {
302│ if (x[i] >= 0 && y[i] >= 0 && x[i] < w && y[i] < h) {
303├> stencil[i] = *(map + y[i] * rowStride + x[i]);
304│ }
305│ }
306│ }
Fixes segfault in oglconform
separatestencil-neu(NonPolygon.BothFacesBitmapCoreAPI),
though test still fails.
Note: This is a candidate for the stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43327
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
i965 processes assignments of whole structures using
vec4_visitor::emit_block_move, a recursive function which visits each
element of a structure or array (to arbitrary nesting depth) and
copies it from the source to the destination. Then it increments the
source and destination register numbers so that further recursive
invocations will copy the rest of the structure. In addition, it sets
the swizzle field for the source register to an appropriate value of
swizzle_for_size(...) for the size of each element being copied, so
that later optimization passes won't be fooled into thinking that
unused vector elements are live.
This all works fine. However, emit_block_move also contains an
assertion to verify, before setting the swizzle field for the source
register, that the source register doesn't already contain a
nontrivial swizzle. The intention is to make sure that the caller of
emit_block_move hasn't already done some swizzling of the data before
the call, which emit_block_move would then counteract when it
overwrites the swizzle field. But the assertion is at the lowest
level of nesting of emit_block_move, which means that after the first
element is copied, instead of checking the swizzle field set by the
caller, it checks the swizzle field used when moving the previous
element. That means that if the structure contains elements of
different vector sizes (which therefore require different swizzles),
the assertion will erroneously fire.
This patch moves the assertion from emit_block_move to the calling
function, vec4_visitor::visit(ir_assignment *). Since the caller is
non-recursive, the assertion will only happen once, and won't be
fooled by emit_block_move's modification of the swizzle field.
This patch also reverts commit fe006a7 (i965/vs: Fix swizzle related
assertion), which attempted to fix the bug by making the assertion
more lenient, but only worked properly for structures, arrays, and
matrices in which each constituent vector is the same size.
This fixes the problem described in comment 9 of
https://bugs.freedesktop.org/show_bug.cgi?id=40865. Unfortunately, it
doesn't fix the whole bug, since the test in question is also failing
due to lack of register spilling support in the VS.
Fixes piglit test vs-assign-varied-struct. No piglit regressions on
Sandy Bridge.
This is a candidate for the 8.0 release branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40865#c9
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is similar to a commit that did the same for the FS.
Shaves several more instructions off of the VS in Lightsmark, but no
statistically significant performance difference (n=5).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Shaves a few instructions off of the VS in Lightsmark, but no
statistically significant performance difference on gen7 (n=5).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
AC_CHECK_LIB has this nasty behavior, like the cflags tests, of
automatically putting the tested value into the global LIBS on
success. This caused -lexpat to end up in LIBS, but without the
--with-expat dir, so my 32-bit build on a 64 system using expat from a
custom prefix could only find the system expat and fail to link on the
one current consumer of the LIBS variable: the dri driver test link.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While reading through the simulator, I found some interesting code that
looks like it checks the sampler default color pointer against the bound
set in STATE_BASE_ADDRESS. On failure, it appears to program it to the
base address itself.
So I decided to try programming a legitimate bound, and lo and behold,
border color worked.
+92 piglits on Sandybridge. Also fixes Lightsmark on Ivybridge.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=28924
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38868
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we now always build shared glapi, this exposes the fact that libOSMesa was
underlinked when glapi was built shared.
Fix this by doing the same thing as drivers/X11/Makefile already does, ensuring
that the library is linked with the shared glapi library.
(I'm not clear why we link with both glapi.a and glapi.so, so this may be all wrong)
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Refine "always build shared dricore" so we don't build it if we don't need
it because we aren't actually building any dri drivers because of --disable-driglx-direct
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Looks insane, but it does appear we need a full slot per input/output.
This fixes another 180 or so piglit tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Adds all the easier lowhanging opcodes.
Fixes ~3000 piglit tests with GLSL1.30 enabled on cayman.
This just leaves the mul/div/mod ops to fix up.
Signed-off-by: Dave Airlie <airlied@redhat.com>
"If set, forces degamma on XYZ if format is
FMT_8_8_8_8, FMT_BC1, FMT_BC2, or FMT_BC3"
Don't claim support for sRGB on any other formts.
This fixes glean texture_srgb.
Signed-off-by: Dave Airlie <airlied@redhat.com>
It doesn't pass the piglit test, but it seems to be a lot closer
than it was before. I need to track down if there is another problem.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Due to the changes for multiple kcache banks support, now we are assigning
final SRCx_SEL values for kcache access at the later stage, when building the
bytecode. So we need to take into account kcache banks to distinguish
the constants with the same address but different bank index.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Same fix as previously done by Dave Airlie for r600/r700
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Clip planes are uploaded as a constant buffer and used by the vertex
shader to produce corresponding clip distances for hw clipping.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add support for multiple kcache banks (constant buffers).
Lock the required lines only.
Allow up to 4 kcache line sets in the alu clause by using ALU_EXTENDED on eg+.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fix this GCC warning on non-debug builds.
glsl_types.cpp: In member function 'gl_texture_index
glsl_type::sampler_index() const':
glsl_types.cpp:157: warning: control reaches end of non-void function
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Enable it in the evergreen_context_draw if needed.
Same as already done in the r600_context_draw for r6xx/r7xx.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
BURST_COUNT is clipped with ARRAY_SIZE, so set it to the max value
to avoid clipping.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
libglapi.so, libGL.so, libGLESv2.so, libGLESv1_CM.so must all
come from the same version of Mesa or bad things may happen.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Matt Turner <mattst88@gmail.com>
When the framebuffer has separate depth and stencil buffers, and HiZ is
not enabled on the depth buffer, mark the framebuffer as unsupported. This
happens when trying to create a framebuffer with Z16/S8 because we haven't
enabled HiZ on Z16 yet.
Fixes gles2conform test stencil8.
Note: This is a candiate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44948
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed--by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This loosens the format validation in glBlitFramebuffer. When blitting
depth bits, don't require an exact match between the depth formats; only
require that the two formats have the same number of depth bits and the
same depth datatype (float vs uint). Ditto for stencil.
Between S8_Z24 buffers, the EXT_framebuffer_blit spec allows
glBlitFramebuffer to blit the depth and stencil bits separately. So I see
no reason to prevent blitting the depth bits between X8_Z24 and S8_Z24 or
the stencil bits between S8 and S8_Z24. However, we of course don't want
to allow blitting from Z32 to Z32_FLOAT.
Fixes Piglit fbo/fbo-blit-d24s8 on Intel drivers with separate stencil
enabled.
The problem was that, on Intel drivers with separate stencil, the default
framebuffer has separate depth and stencil buffers with formats X8_Z24 and
S8. The test attempts to blit the depth bits from a S8_Z24 buffer into the
default framebuffer.
v2: Check that depth datatypes match.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44665
Note: This is a candidate for the 8.0 branch.
Reported-by: Xunx Fang <xunx.fang@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The nvc0 gallium driver is advertising 128 MAX_INTERLEAVED_COMPS
which made it always assert in the linker when TFB was used since
the Outputs array was smaller than that maximum.
v2: added assertions
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
So it appears R600s (except rv670) do AR handling different using a different
opcode. This patch fixes up r600g to work properly on r600.
This fixes ~100 piglit tests here (in GLSL1.30 mode) on rv610.
v3: add index_mode as per the docs.
This still fails any dst relative tests for some reason I can't quite see yet,
but it passes a lot more tests than without.
v4: add a nop after dst.rel this could be improved using a second pass,
where we only insert nops if two instructions are sure to collide.
The docs say r600, rv610, rv630 needs this, and not rv670, rs780, rs880,
need AMD to confirm rv620, rv635.
v5: add is_nop_inst.
NOTE: This is a candidate for stable branches.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Use a bitmask approach to compute gl_array_object::_MaxElement.
To make this work correctly depending on the shader type actually used,
make use of the newly introduced typed bitmask getters.
With this change I gain about 5% draw time on some osgviewer examples.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Depending on the installed shader type, different arrays are used
from gl_array_object. Provide helper functions that compute
the bitmask of these arrays that are finally enabled for a given
shader type. The will be used in a followup change.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Commit ede60bc467 (glsl: Add isinf() and
isnan() builtins) uses "+INF" in the .ir file to represent infinity.
This worked on C99-compliant compilers, since the s-expression reader
uses strtod() to read numbers, and C99 requires strtod() to understand
"+INF". However, it didn't work on non-C99-compliant compilers such
as MSVC.
This patch modifies the s-expression reader to explicitly check for
"+INF" rather than relying on strtod() to support it.
This is a candidate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44767
Tested-by: Morgan Armand <morgan.devel@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
To fix failed assertions when calling glCopyBufferSubData().
svga_texture() asserts that the resource is a texture. Simply move the
calls to svga_texture() after the code that handles non-texture copies
so that we don't call it with non-texture resources.
Fixes glean bufferObject failure.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Two assignments to num_immediates were missing in
get_pixel_transfer_visitor() and get_bitmap_visitor().
The uninitialized value led to valgrind errors and crashes in some
cases.
Added new assertions to catch future problems in this area. Also
changed num_immediates to unsigned to avoid signed/unsigned
comparison warnings.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The default access flags for OpenGL ES (via GL_OES_map_buffer) and
desktop OpenGL are different. The code previously tried to handle
this, but the decision was made at compile time. Since the same
driver binary can be used for both OpenGL ES and desktop OpenGL, the
decision must be made at run-time.
This should fix bug #44433. It appears that the test case does
various map and unmap operations and inspects the state of the buffer
object around each. When it sees that GL_BUFFER_ACCESS does not match
its expectations, it fails.
NOTE: This is a candidate for release branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44433
If we don't find an exact PIPE_FORMAT_x for a GL_(COMPRESSED)_RED/RG format,
try uncompressed formats. We were already doing this for the RGB(A) formats.
Fixes piglit arb_texture_compression-internal-format-query test.
NOTE: This is a candidate for the stable branches.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
msg_type moved by a bit, so the message type was being disassembled
incorrectly. In particular, render target writes were showing up as
"OWORD block write".
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Compared to sampler_gen5, simd_mode shifted by a bit and msg_type grew
by a bit. So we were printing slightly incorrect numbers.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Both the VF and VS share space in the URB. First, the VF stores
attributes (shader inputs) there. The VS then reads the attributes,
executes, and reuses the space to store varyings (shader outputs).
Thus, we need to calculate the amount of URB space necessary for inputs,
outputs, and pick whichever is greater.
The old VS backend correctly did this (brw_vs_emit.c:408), but the new
VS backend only considered outputs.
Fixes vertex scrambling in GLBenchmark PRO on Ivybridge.
NOTE: This is a candidate for the 8.0 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41318
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
In the following scenario:
- CreateContext C1
- MakeCurrent C1
- DestroyContext C1 (does not actually destroy the first context, postponed
until the next MakeCurrent)
- CreateContext C2
- MakeCurrent C2
MakeCurrent will call flush on a half destroyed context, leading to crashes.
Since the other paths (destroy and makecurrent) already flush the context,
there is no need to flush here, so we remove this useless flush front call.
This fixes GPU crashes with Chrome and gallium drivers.
v2: Don't flag the format as being HiZ ready (there's DRI2 handshake
pain to go through).
Fixes piglit gl-3.0-required-sized-texture-formats
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This is required for Z16 support for texturing, which is the first
thing to have a horizontal alignment of 8. Renderbuffers don't need
it, since they're always set up as the only mip level, but do it for
completeness anyway.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This field is actually set up above.
NOTE: This is a candidate for the 8.0 branch, to avoid conflicts.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
I copy-and-pasted the thing I was allocating for as the context, so
the first time it would be NULL (root of a ralloc context) and they'd
chain off each other from then on.
NOTE: This is a candidate for the 8.0 branch.
The legal range for the device is apparently [-16.0, +15.0].
Limiting the range to [-15, +15] fixes piglit's lodbias test.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The interaction between the mipmap lod min/max limits and the texture
base/max level limits is kind of tricky. Changing the base level
didn't work as expected before.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This makes lod clamping more consistent with other drivers.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Update the dd.h docs to indicate that GL_MAP_INVALIDATE_RANGE_BIT
can be used with GL_MAP_WRITE_BIT when mapping renderbuffers and
texture images.
Pass the flag when mapping texture images for glTexImage, glTexSubImage,
etc. It's up to drivers whether to actually make use of the flag.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
To try to use less tex memory and maybe get better performance.
Spotted by Roland Scheidegger.
NOTE: This is a candidate for the 8.0 and 7.11 branches.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The i965 driver advertises GL_ARB_texture_float and GL_ARB_texture_rg
support but the ctx->TextureFormatSupported[] table entries for
MESA_FORMAT_R_FLOAT32 and MESA_FORMAT_RGBA_FLOAT32 are false on gen 4
hardware. So the case for GL_R32F would fail and we'd print an
implementation error.
This patch adds more Mesa tex format options for GL_R32F and other R/G
formats so we fall back to 16-bit formats when 32-bit formats aren't
available.
Eric made the same fix in commit 6216a5b4 for the non R/G formats.
v2: try 16-bit formats before 32-bit formats and try RG formats before
RGBA where possible.
This should fix https://bugs.freedesktop.org/show_bug.cgi?id=44039
NOTE: This is a candidate for the 8.0 and 7.11 branches.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This enables linear gradients if we need a linear,
it also sets the flat shade flag for color/constant interpolations.
Signed-off-by: Dave Airlie <airlied@redhat.com>
When I originally implemented the hack to use GRFs 111+ as fake MRFs, I
did so purely to avoid rewriting all the code that dealt with MRFs.
However, it turns out that a similar hack is actually required.
Newly discovered language in the BSpec indicates that SEND instructions
with EOT set "should" use g112-g127 as their source registers. Based on
assertions in the simulator, this is actually a requirement on certain
platforms.
Since we're faking MRFs already, we may as well use the officially
sanctioned range. My guess is that we avoided this issue because we
seldom use m0: URB writes in the new VS backend start at m1, and RT
writes in the new FS backend start at m2.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that we no longer generate Mesa IR from GLSL IR, it's impossible to
use the old vertex shader backend for GLSL programs. There's simply no
Mesa IR to codegen from.
Any attempt to do so would result in immediate GPU hangs, presumably due
to the driver uploading an empty program with no EOT message.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
According to Table 6.8 (Page 348) in the OpenGL 3.0 specification,
glGetVertexAttribiv supports GL_VERTEX_ATTRIB_ARRAY_INTEGER.
NOTE: This is a candidate for the 8.0 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes the following OGLConform tests on gen5:
depth-stencil(misc.state_on.depth_int)
fbo_db_ARBfp(basic.OnlyDepthBuffDrawBufferRender)
The problem was that, if the depth buffer's Mesa format was X8_Z24, then
we emitted the hardware format D24_UNORM_X8. But, on gen5, D24_UNORM_S8
must be emitted.
This bug was introduced by:
commit d84a180417
Author: Eric Anholt <eric@anholt.net>
i965: Base HW depth format setup based on MESA_FORMAT, not bpp.
v2: Deref 'intel' directly. Move the branch for newer chipset to top.
Quote the PRM. As requested by Ken.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43408
Note: This is a candidate for the 8.0 branch.
Reported-by: Xunx Fang <xunx.fang@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The original R600 requires the UNCACHED_FIRST_INST bit
to be set in the PS.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Note: this is candidate for the stable branches.
With the conversion to automake in commit
e326480e4e, several additional build
artifacts are created:
src/mesa/drivers/dri/i965/.deps/
src/mesa/drivers/dri/i965/.libs/
src/mesa/drivers/dri/i965/Makefile
src/mesa/drivers/dri/i965/Makefile.in
src/mesa/drivers/dri/i965/i965_dri.la
src/mesa/drivers/dri/i965/i965_symbols_test
This patch adds all of these files to .gitignore.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
TestMipMaps() function in src/OGLconform/textureNPOT.c calls glTexImage2D()
with width = 0. Texture with zero size skips miptree allocation due to a
condition in function _mesa_store_teximage3d(). While calling glGetTexImage()
it results in assertion failure in intel_map_texture_image() due to null mt
pointer.
This patch fixes the issue by detecting the zero size texture early in
glGetTexImage and glGetCompressedTexImage functions. In such a case function
simply returns doing nothing.
Verified that below mentioned bug is fixed by this patch.
https://bugs.freedesktop.org/show_bug.cgi?id=42334
NOTE: This is a candidate for stable branches
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This does introduce a warning by the automake build system, that the
missing-symbols test build is non-portable. That's true -- Mac OS X
can't take something built as a loadable module and just link it as a
library. Of course, we aren't building this on OS X at all, so it
would be nice to be able to suppress it, but I haven't found a way.
Still, the build is going to be much quieter than we have ever had
before, so I think this is a fair tradeoff until we find a way to shut
that warning up.
v2: Put a link in /lib to avoid transition pains for people.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Nothing works if HiZ is enabled and the DDX is incapable of HiZ (that is,
the DDX version is < 2.16).
The problem is that the refactoring that eliminated
intel_renderbuffer::stencil_rb broke the recovery path in
intel_verify_dri2_has_hiz(). Specifically, it broke line
intel_context.c:1445, which allocates the region for
DRI_BUFFER_DEPTH_STENCIL. That allocation was creating a separate stencil
miptree, despite the buffer being a packed depthstencil buffer. Havoc
ensued.
This patch introduces a bool flag that prevents allocation of that stencil
miptree.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44103
Tested-by: Ian Romanick <idr@freedesktop.org>
Note: This is a candidate for the 8.0 branch.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Fix this GCC warning with non-LLVM builds.
sp_screen.c: In function ‘softpipe_get_shader_param’:
sp_screen.c:141:28: warning: unused variable ‘sp_screen’ [-Wunused-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Calling glXSwapBuffers with no bound context causes segmentation
fault in function intelDRI2Flush. All the gl calls should be
ignored after setting the current context to null. So the contents
of framebuffer stay unchanged. But the driver should not seg fault.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44614
Reported-by: Yi Sun <yi.sun@intel.com>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Yi Sun <yi.sun@intel.com>
Fix this GCC warning.
lp_test_round.c: In function ‘test_round’:
lp_test_round.c:126:13: warning: variable ‘packed’ set but not used
[-Wunused-but-set-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Fix this GCC 4.6 warning with 64-bit builds.
u_debug_stack.c: In function ‘debug_backtrace_capture’:
u_debug_stack.c:45:17: warning: variable ‘frame_pointer’ set but not
used [-Wunused-but-set-variable]
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The i915 GPU can't do A8 dst, so we abuse GREEN8 buffers for that
purpose. However, things get hairy as we start to do blending,
because then GL_DST_*_ALPHA should be replaced with GL_DST_*_COLOR.
This is what we do here.
Fixes piglt fbo-alpha.
v2: select the colors in the pixel shader
v3: fix rs state creation for pre-evergreen
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Create the video buffers in the format the driver preffers.
This temporary creates problems with decoder less VDPAU video playback.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Add a second extened constructor that takes plane
textures for the video buffer. Also provide a
function for texture templates.
Signed-off-by: Christian König <deathsimple@vodafone.de>
This requires GLSL 1.30 enabled, which requires integer types enabled,
so don't bother doing an INT to FLT conversion on it.
We should probably remove the instance id flt->int conversion when
turning on native integers.
this passes the three piglit tests with GLSL 1.30 forced on.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This commit rewrites a lot of the state_fb code to support
rendering to targets not aligned to 64 byte.
This allows us to drop the render temporaries as unaligned
targets are the only use-case where they are really needed. The
temporaries code was used for a lot of things more, but apparently
those also work without temps.
There is one regression in piglit fbo-clear-formats, but this will
be fixed with the use of real hardware clears and doesn't matter in
practice as no real application tries to scissor clear a 2x2 pixel
render target.
Signed-off-by: Lucas Stach <dev@lynxeye.de>
There are 3 changes:
1) stride is specified for each buffer, not just one, so that drivers don't
have to derive it from the outputs
2) new per-output property dst_offset, which specifies the offset
into the buffer in dwords where the output should be stored,
so that drivers don't have to compute the offsets manually;
this will also be useful for gl_SkipComponents
from ARB_transform_feedback3
3) register_mask is removed, instead, there is start_component
and num_components; register_mask with non-consecutive 1s
doesn't make much sense (some hardware cannot do packing of components)
Christoph Bumiller: fixed nvc0.
v2: resolve merge conflicts in Draw and clean it up
Virtual address space put the userspace in charge of their GPU
address space. It's up to userspace to bind bo into the virtual
address space. Command stream can them be executed using the
IB_VM chunck.
This patch add support for this configuration. It doesn't remove
the 64K ib size limit thought this limit can be extanded up to
1M for IB_VM chunk.
v2: fix rendering
v3: fix rendering when using index buffer
v4: make vm conditional on kernel support add basic va management
v5: catch the case when we already have va for a bo
v6: agd5f: update on top of ioctl changes
v7: agd5f: further ioctl updates
v8: indentation cleanup + fix non cayman
v9: rebase against lastest mesa + improvement from Marek & Michel
v10: fix cut/paste bug
v11: don't rely on updated radeon_drm.h
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Make the comments precise. Explain why each branch is needed and correct.
Document the potential pitfall in the true-branch.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
When using Mesa with a GLES API, calling _mesa_FramebufferRenderbuffer
with GL_DRAW_FRAMEBUFFER will report a 'user error' because
get_framebuffer_target validates that this enum from the framebuffer
blit extension is only used on GL. To work around it this patch makes
it use the GL_FRAMEBUFFER enum instead in that case.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43418
Note: This is a candidate for the 8.0 branch.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The gl_renderbuffer::Format field wasn't always set properly. This
didn't matter much in the past but with the recent swrast/renderbuffer
mapping changes, core Mesa will be directly touching OSMesa colorbuffers
so using the right MESA_FORMAT_x value is important.
Unfortunately, there aren't MESA_FORMATs for all the possible OSmesa
format/type combinations, such as GL_FLOAT / OSMESA_ARGB. If anyone
runs into these we can add new Mesa formats.
v2: add warnings for unsupported formats, fix ARGB_REV mix-up.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We always access pull constant buffers using the message types "OWord
Block Read" or "OWord Dual Block Read". According to the Sandy Bridge
PRM, Vol 4 Part 1, pages 214 and 218, when using these messages:
"the surface pitch is ignored, the surface is treated as a
1-dimensional surface. An element size (pitch) of 16 bytes is
used to determine the size of the buffer for out-of-bounds
checking if using the surface state model."
Previously we were setting the pitch for pull constant buffers to the
size of the whole constant buffer--this made no sense and would have
led to incorrect behavior if it were not for the fact that the pitch
is ignored.
For clarity, this patch sets the pitch for pull constant buffers to 16
bytes, consistent with the hardware's behavior.
v2: Clarify the meaning of the ignored values by writing them as (16 - 1).
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 9bdc44a528 (i965: Replace struct
with bit shifting for WM pull constant surfaces) accidentally
introduced off-by-one errors into the calculation of the surface
width, height, and depth. This patch restores the correct
computation.
The reason this wasn't noticed by Piglit tests is that the size of our
constant surfaces is always less than 2^20, therefore the off-by-one
error was causing the "depth" field of the surface to be set to all
1's. The hardware interpreted this as an extremely large surface, so
overflow checking was effectively disabled.
No Piglit regressions on Sandy Bridge.
NOTE: This is a candidate for the 7.11 and 8.0 branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Flat SHADE_MODEL still overrides any non-flat interpolation
qualifier, but pulling that state out of the rasterizer cso
isn't really worth the effort, is it ?
NOTE: This is a candidate for the 8.0 branch.
This fixes accum buffer operations. The accumulation buffer is the
only malloc-based renderbuffer for the intel drivers.
v2: apply x/y offset to returned pointer
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes piglit EXT_framebuffer_multisample/negative-copypixels.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
Fixes piglit EXT_framebuffer_multisample/negative-copyteximage.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
Fixes piglit EXT_framebuffer_multisample-negative-readpixels.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
Fixes piglit EXT_framebuffer_multisample/renderbuffer-samples.
Reviewed-by: Brian Paul <brianp@vmware.com>
NOTE: This is a candidate for the 8.0 branch.
Previously, we were saying that everything from the starting tile to
region width+height was part of the limits of our depthbuffer, even if
the tile was near the bottom of the depthbuffer. This mean that our
range was not clipping to buffer buonds if the start tile was anything
but the start of the buffer.
In bebc91f0f3, this was changed to
saying that we're just rendering to a region of the size of the
renderbuffer. This is great -- we get a range that should actually
match what we want. However, the hardware's range checking occurs
after the X/Y offset addition, so we were clipping out rendering to
small depth mip levels when an X/Y offset was present. Just add
tile_x/y to the width in that case -- the WM won't produce negative
x/y values pre-offset, so we just need to get the left/bottom sides of
the region to cover our buffer.
Fixes the following Piglit regressions on gen7:
spec/ARB_depth_buffer_float/fbo-clear-formats
spec/ARB_depth_texture/fbo-clear-formats
spec/EXT_packed_depth_stencil/fbo-clear-formats
NOTE: This is a candidate for the 8.0 branch.
The array holds GLuint values so remove the float cast.
Note, however, that to compute the average of four GLuints we really
want to do (a+b+c+d)/4 but that could overflow. This change doesn't
address that for now.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
In the first case, the newImage[] array contains GLuint values.
In the second case, the parameter type is GLuint, but the maxDepth
value is never used in this case (GL_FLOAT_32_UNSIGNED_INT_24_8_REV).
Pass ~OU just to be safe.
NOTE: This is a candidate for the 8.0 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We include both imports.h and u_math.h in the state tracker. This
leads to multiple, conflicting definitions of ffs() with MSVC.
Use FFS_DEFINED to skip the ffs() in u_math.h.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
include mesa headers before gallium headers to avoid problem with
ffs() being defined in u_math.h and then again in imports.h
The next commit will add some #ifdefs to prevent multiple definitions
of ffs().
Call ffs() and ffsll() everywhere. Define our own ffs(), ffsll()
functions when the platform doesn't have them.
v2: remove #ifdef _WIN32, __IBMC__, __IBMCPP_ tests inside ffs()
implementation. The #else clause was recursive.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Alexander von Gluck <kallisti5@unixzen.com>
Introduce vbo_get_minmax_indices() function to handle the min/max index
computation for nr_prims(>= 1). The old code just compute the first
prim's min/max index; this would results an error rendering if user
called functions like glMultiDrawElements(). This patch servers as
fixing this issue.
As when nr_prims = 1, we can pass 1 to paramter nr_prims, thus I made
vbo_get_minmax_index() static.
v2: per Roland's suggestion, put the indices address compuation into
vbo_get_minmax_index() instead.
Also do comination if possible to reduce map/unmap count
v3: per Brian's suggestion, use a pointer for start_prim to avoid
structure copy per loop.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Instead, do the uniform setting and input / output mapping directly in
brw_link_shader. Hurray for not generating Mesa IR! However, once
the i965 driver stops calling _mesa_ir_link_shader, UsesClipDistance
and UsesKill are no longer set.
Ideally gen6_upload_vs_push_constants should use the
gl_shader_program, but I don't see a way to propagate the information
there. The other alternative, since this is the only usage, is to
move gl_vertex_program::UsesClipDistance to brw_vertex_program.
The compile (and precompile) stages use UsesKill to determine the
cache key for the shader. This is then used to determine whether or
not to compile the shader. Calculating this data during compilation
is too late.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
This previously enabled some optimizations in the fragment shader
(interpolation, etc.) if some input components were always 0.0 or
1.0. However, this data was generated by analyzing Mesa IR. The
next patch in this series removes generation of Mesa IR for GLSL
paths. When we detect that case, just set the used mask to ~0 and
circumvent the optimizations.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It used to be done in ir_to_mesa, and that was kind of a bad place.
I didn't change st_glsl_to_tgsi because there is some strange stuff
happening in the code that generates glDrawPixels shaders. It looked
like this would break horribly if I touched anything.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Track the calculated data in gl_shader_program instead of the
individual assembly shaders.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rather than looking at the settings in individual assembly programs,
look at the settings in the top-level uniform values. The old code
was flawed because examining each shader stage in isolation could
allow inconsitent usage across stages (e.g., bind unit 0 to a
sampler2D in the vertex shader and sampler1DShadow in the fragment
shader).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously the fixed-function fragment shader was tracked as a
gl_program. This means that it shows up in the driver as a Mesa IR
program instead of as a GLSL IR program. If a driver doesn't generate
Mesa IR from the GLSL IR, that program is empty. If the program is
empty there is either no rendering or a GPU hang.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Poking directly at the backing resources works only by luck. Core
Mesa code should only know about the gl_uniform_storage structure.
Soon other code that looks at samplers will use the gl_uniform_storage
structures instead of the data in the gl_program.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
It looks like AC_PROG_SED was added in 2.59b, and wasn't in the
original 2.59 in the original 2.59. Presumably that's why, though
it could've been an oversight.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Matt Turner <mattst88@gmail.com>
The gen7_urb atom depends on CACHE_NEW_VS_PROG and CACHE_NEW_GS_PROG,
causing gen7_upload_urb() to be called when switching to a new VS
program.
In addition to partitioning the URB space between the VS and GS,
gen7_upload_urb() also allocated space for VS and PS push constants.
Unfortunately, this meant that whenever CACHE_NEW_VS was flagged, we'd
reallocate the space for the PS push constants. According to the BSpec,
after sending 3DSTATE_PUSH_CONSTANT_ALLOC_PS, we must reprogram
3DSTATE_CONSTANT_PS prior to the next 3DPRIMITIVE.
Since our URB allocation for push constants is entirely static, it makes
sense to split it out into its own atom that only subscribes to
BRW_NEW_CONTEXT. This avoids reallocating the space and trashing
constants.
Fixes a rendering artifact in Extreme Tuxracer, where instead of a snow
trail, you'd get a bright red streak (affectionately known as the
"bloody penguin bug").
This also explains why adding VS-related dirty bits to gen7_ps_state
made the problem disappear: it made 3DSTATE_CONSTANT_PS be emitted after
every 3DSTATE_PUSH_CONSTANT_ALLOC_PS packet.
NOTE: This is a candidate for the 7.11 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38868
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Both dri2_create_context_attribs and drisw_create_context_attribs call
dri2_convert_glx_attribs, expecting it to fill in *api on success.
However, when num_attribs == 0, it was returning true without setting
*api, causing the caller to use an uninitialized value.
Tested-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
svga_sampler_view contains a pointer to a pipe_resource (base class of
svga_texture) and svga_texture contains a pointer to an svga_sampler_view.
This circular dependency prevented the objects from ever being freed when
they pointed to each other. Make the svga_sampler_view::texture pointer
a "weak reference" (no reference counting) to break the dependency.
This is safe to do because the pipe_resource/texture always has a longer
lifespan than the sampler view so when svga_sampler_view stops referencing
the texture, the texture's refcount never hits zero.
Fixes a memory leak seen with google earth and other apps.
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
We were naively emitting each component at a time, even if we were
emitting the same value to multiple channels. Improves on a codegen
regression from the old VS to the new VS on some unigine shaders
(because we emit constant vecs/matrices as immediates instead of
loading them as push constants, so we had over 4x the instructions for
using them).
shader-db results:
Total instructions: 58594 -> 58540
11/870 programs affected (1.3%)
765 -> 711 instructions in affected programs (7.1% reduction)
WGL_ARB_extensions_string states that wglGetExtensionsStringARB should
return NULL for invalid HDCs. And some applications rely on it.
Reviewed-By: "Keith Whitwell" <keithw@vmware.com>
It caused an X protocol error in some (rare) situations.
This is a follow-on to the previous commits which fixes a bug reported
by Wayne E. Robertz.
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Adam Jackson <ajax@redhat.com>
This is the same fix as the previous commit, except it's for the gallium
glx/xlib state tracker.
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Adam Jackson <ajax@redhat.com>
as we do in Fake_glXChooseVisual(). This registers the MesaGLX
extension on the display so we can clean up buffers, etc. when
the display connection is closed.
Fixes a bug reported by Wayne E. Robertz.
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Prevent any state from carrying over to a new translation in cases
where we assume that data is still zero from initial calloc (these
would require us to do individual zeroing before translation which
would be more code).
When creating an EGLImage from a struct wl_buffer * this ensures
that we create an XRGB8888 image if the wayland buffer doesn't have an
alpha channel. To determine if a wl_buffer has a valid alpha channel
this patch adds an internal wayland_drm_buffer_has_alpha() function.
It's important to get the internal format for an EGLImage right so that
if a GL texture is later created from the image then the GL driver will
know if it should sample the alpha from the texture or flatten it to
a constant of 1.0.
This avoids needing fragment program workarounds in wayland compositors
to manually ignore the alpha component of textures created from wayland
buffers.
krh: Edited to use wl_buffer_get_format() instead of wl_buffer_has_alpha().
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
No functional change. In the function
__indirect_glAreTexturesResident(), the variable cmdlen is only used
if USE_XCB is not defined. This patch avoids a compile warning in the
event that USE_XCB is defined.
v2: just move cmdlen declaration inside the #else part.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previous to this patch, we didn't do the limit check for
MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS until the end of the
store_tfeedback_info() function, *after* storing all of the transform
feedback info in the gl_transform_feedback_info::Outputs array. This
meant that the limit check wouldn't prevent us from overflowing the
array and corrupting memory.
This patch moves the limit check to the top of tfeedback_decl::store()
so that there is no risk of overflowing the array. It also adds
assertions to verify that the checks for
MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS and
MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS are sufficient to avoid
array overflow.
Note: strictly speaking this patch isn't necessary, since the maximum
possible number of varyings is MAX_VARYING (16), whereas the size of
the Outputs array is MAX_PROGRAM_OUTPUTS (64), so it's impossible to
have enough varyings to overflow the array. However it seems prudent
to do the limit check before the array access in case these limits
change in the future.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
On drivers that set gl_shader_compiler_options::LowerClipDistance (for
example i965), we need to handle transform feedback of gl_ClipDistance
specially, to account for the fact that the hardware represents it as
an array of vec4's rather than an array of floats.
The previous way this was accounted for (translating the request for
gl_ClipDistance[n] to a request for a component of
gl_ClipDistanceMESA[n/4]) doesn't work when performing transform
feedback on the whole unsubscripted array, because we need to keep
track of the size of the gl_ClipDistance array prior to the lowering
pass. So I replaced it with a boolean is_clip_distance_mesa, which
switches on the special logic that is needed to handle the lowered
version of gl_ClipDistance.
Fixes Piglit tests "EXT_transform_feedback/builtin-varyings
gl_ClipDistance[{1,2,3,5,6,7}]-no-subscript".
Reviewed-by: Eric Anholt <eric@anholt.net>
The function tfeedback_decl::num_components() was not correctly
accounting for transform feedback of whole arrays and gl_ClipDistance.
The bug was hard to notice in tests, because it only affected the
checks for MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS and
MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS.
This patch fixes the computation, and adds an assertion to verify
num_components() even when MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS
and MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS are not exceeded.
The assertion requires keeping track of components_so_far in
tfeedback_decl::store(); this will be useful in a future patch to fix
non-multiple-of-4-sized gl_ClipDistance.
Reviewed-by: Eric Anholt <eric@anholt.net>
This just fixes up the enables for native integers and EXT_texture_integer
support in st/mesa.
It also set the MaxClipPlanes to 8.
We should consider exposing caps for MCP vs MCD, but since core
mesa doesn't care yet maybe we can wait for now.
v2: use 32-bit formats as per Marek's mail.
v3: add calim's fix for INT_DIV_TO_MUL_RCP disabling.
Signed-off-by: Dave Airlie <airlied@redhat.com>
It doesn't look like the GLSL compiler will produce sign op
for an unsigned anyways (seems insane anyways).
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds integer version of SSG that GLSL 1.30 can produce.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This enables fragment clamping in softpipe, it passes more
tests than it did previously with no regressions, There are still
a couple of failures in the SNORM types to investigate.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes a number of texelFetch swizzle tests, and consoldiates
the swizzle handling in a new function.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add support for using the clipdistance instead of clip plane.
Passes all piglit clipdistance tests.
v2: fixup some comments from Brian in review.
Signed-off-by: Dave Airlie <airlied@redhat.com>
softpipe always clipped using the position vector, however for unclipped
vertices it stored the position in window coordinates, however when position
and clipping are separated, we need to store the clip-space position and
the clip-space vertex clip, so we can interpolate both separately.
This means we have to take the clip space position and store it to use later.
This allows softpipe to pass all the clip-vertex piglit tests.
v2: fix llvm draw regression, the structure being passed into llvm needed
updating, remove some hardcoded ints that should have been enums while there.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This required changing the system value semantics, so we stored
a system value per vertex, instance id is the only other system
value we currently support, so I span it across the channels.
This passes the 3 vertexid-* piglit tests + lots of instanceid tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If draw isn't using llvm we can support vertex texture and integers,
These will be fixed up later, but for now allow this check to happen
at run-time.
v2: since 3e22c7a253 we can ask draw for a non-llvm
context. Just track if ask and set the vars accordingly. This probably isn't perfect but should cover the cases we care about.
v3: use debug option, restructure to store in screen, as suggested by Jakob.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Mesa shouldn't call into the drivers if there are no renderbuffers
bound to the attachments for the buffers to be cleared.
Fixes a number of the clearbuffer-* tests on softpipe.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes the test to allow cube/depth combinations on GL3
or EXT_gpu_shader4.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We're not quite ready to actually support it in the implementation,
but at least this allows GL 3.0 API-reliant applications to hopefully
run successfully, though they won't get multisampling.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The EXT_texture_array required only 64, but GL 3.0 required 256.
Since we're already exposing values that can get us way beyond our
ability to map the single object directly, go ahead and expose all the
way to hardware limits.
Tested with new piglit EXT_texture_array/maxlayers on gen7.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were doing the kill of the updated channels, then adding our copy
to the list of available stuff to copy. But if the copy was updating
its own source channels, we didn't notice, breaking this code:
R0.xyzw = arg0 + arg1;
R0.xyzw = R0.wwwx;
gl_FragColor.xyzw = clamp(R0.xyzw, 0.0, 1.0);
Fixes piglit glsl-copy-propagation-self-2.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch modifies all batches needed for HiZ. The batch length for
3DSTATE_HIER_DEPTH_BUFFER is also corrected from 4 to 3.
Performance +6.7% on Citybench.
num-frames: 400
resolution: 1918x1031
avg-hiz-off: 127.90 fps
avg-hiz-on: 136.50 fps
kernel: git://people.freedesktop.org/~anholt/linux.git branch=gen7-reset-sol sha=23360e4
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
It is unwise to use a stencil region's size to determine its
renderbuffer's size, because at region creation we fudge the width and
height to accomodate interleaved rows. (See the comment for MESA_FORMAT_S8
in intel_miptree_create()). Most users of stencil_region->{width,height}
should be converted to use stencil_rb->{Width,Height}.
We have already done the replacement in several locations. This patch
continues the replacement in {brw,gen7}_emit_depthbuffer(). To make those
functions look consistent, I've also done the equivalent replacement for
the depth buffer.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
It was named GEN6_WM_DEPTH_RESOLVE. Luckily, this caused no conflict,
because the value is identical for gen6 and gen7.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Don't create clip outputs if no clip planes are enabled.
Move clip validation after program validation: we were calling
linkage validation in case the VP needed rebuilding before the
FP was validated.
The vertex program needs to be built first because when
ClipDistance is used we'll want to only enable those outputs that
are also written.
These initialization functions weren't initializing all the fields so
some had undefined values. The callers of these functions sometimes use
a structure assignment to initialize new objects from these templates
so we'd just propagate the undefined values. That made for some confusing
info when debugging, plus it could lead to bugs.
v2: fix surf pointer mix-up: "&surf" -> "surf"
Jakob Bornecrantz <jakob@vmware.com>
This code isn't used anymore in preference for DRI2 client side swap buffers
throttling or throttling done inside the xa or xorg driver.
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by Brian Paul <brianp@vmware.com>
This peice of code has been here since the inital commit (c5c5cd71) and the
code that used instance_id_index was removed in (caede752) by José.
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by Brian Paul <brianp@vmware.com>
So the targets can drop the sw_wrapper winsys when no sw driver is being used.
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by Brian Paul <brianp@vmware.com>
This replaces the current code with an implementation compatible with
the new gallium interface. I've left some of the remains of the interface
intact so llvmpipe keeps building correctly, and I'll take a look at fixing
llvmpipe up later.
v2: fixup as per Brian's review
Signed-off-by: Dave Airlie <airlied@redhat.com>
This introduces an unspecified interpolation paramter that is only allowed for
color semantics, so a specified GLSL interpolation will override the ShadeModel
specified interpolation, but not vice-versa.
This fixes a lot of the interpolation tests in piglit.
v2: rename from unspecified to color
Signed-off-by: Dave Airlie <airlied@redhat.com>
This could lead to incorrect code when fixed regs are involved.
Surprisingly, the increased freedom actually leads to lower
register usage in some cases. Still want to find a better way
to treat constraints though ...
Always set position to insert before the current instruction,
the previous behaviour led to confusion (bug in checkPredicate
for BBs with only a single conditional branch).
Conflicts:
src/gallium/auxiliary/tgsi/tgsi_strings.c
src/mesa/state_tracker/st_atom_clip.c
commit d919791f2742e913173d6b335128e7d4c63c0840
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Fri Jan 6 17:59:22 2012 +0100
d3d1x: adapt to new clip state
commit cfec82bca3fefcdefafca3f4555285ec1d1ae421
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Fri Jan 6 14:16:51 2012 +0100
gallium/docs: update for clip state changes
commit c02bfeb81ad9f62041a2285ea6373bbbd602912a
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Fri Jan 6 14:21:43 2012 +0100
tgsi: add TGSI_PROPERTY_PROHIBIT_UCPS
commit d4e0a785a6a23ad2f6819fd72e236acb9750028d
Author: Brian Paul <brianp@vmware.com>
Date: Thu Jan 5 08:30:00 2012 -0700
tgsi: consolidate TGSI string arrays in new tgsi_strings.h
There was some duplication between the tgsi_dump.c and tgsi_text.c
files. Also use some static assertions to help catch errors when
adding new TGSI values.
v2: put strings in tgsi_strings.c file instead of the .h file.
Reviewed-by: Dave Airlie <airlied@redhat.com>
commit c28584ce0d8c62bd92c8f140729d344f88a0b3cd
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Fri Jan 6 12:48:09 2012 +0100
gallium: extend user_clip_plane_enable to apply to clip distances
commit f1d5016c07f786229ed057effbe55fbfd160b019
Author: Marek Olšák <maraeo@gmail.com>
Date: Fri Jan 6 02:39:09 2012 +0100
nvfx: adapt to new clip state
commit 6f6fa1c26bd19f797c1996731708e3569c9bfe24
Author: Marek Olšák <maraeo@gmail.com>
Date: Fri Jan 6 01:41:39 2012 +0100
st/mesa: fix DrawPixels with GL_DEPTH_CLAMP
commit c86ad730aa1c017788ae88a55f54071bf222be12
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Tue Jan 3 23:51:30 2012 +0100
nv50: adapt to new clip state
commit 3a8ae6ac243bae5970729dc4057fe02d992543dc
Author: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Date: Tue Jan 3 23:32:36 2012 +0100
nvc0: adapt to new clip state
commit 6243a8246997f8d2fcc69ab741a2c2dea080ff11
Author: Marek Olšák <maraeo@gmail.com>
Date: Thu Dec 29 01:32:51 2011 +0100
draw: initalize pt.user.planes in draw_init
This fixes a crash in glean/fpexceptions.
commit e3056524b19b56d473f4faff84ffa0eb41497408
Author: Marek Olšák <maraeo@gmail.com>
Date: Mon Dec 26 06:26:55 2011 +0100
svga: adapt to new clip state
commit c5bfa8b37d6d489271df457229081d6bbb51b4b7
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 14:11:51 2011 +0100
r600g: adapt to new clip state
commit f11890905362f62627c4a28a8255b76eb7de7df2
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 14:10:26 2011 +0100
r300g: adapt to new clip state
commit e37465327c79a01112f15f6278d9accc5bf3103f
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 12:39:16 2011 +0100
draw: adapt to new clip state
This adds a regression in the LLVM clipping path. Can anybody see anything
wrong with the code? It works for every other case, just glean/fpexceptions
crashes when doing the "Infinite clip plane test".
commit b474d2b18c72d965eefae4e427c269cba5ce6ba2
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 13:14:59 2011 +0100
u_blitter: don't save/set/restore clip state
commit 9dd240ea91f523a677af45e8d0adb9e661e28602
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 13:11:56 2011 +0100
gallium: don't cso_save/set/restore clip state
The enable bits are in the rasterizer state.
commit a4f7031179f5f4ad524b34b394214b984ac950f6
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 12:58:55 2011 +0100
gallium: default depth_clip to 1
depth_clip = !depth_clamp
commit fe21147a00ab90e549d63fe12ee4625c9c2ffcc3
Author: Marek Olšák <maraeo@gmail.com>
Date: Mon Dec 26 06:14:19 2011 +0100
trace,util: update state logging to new clip state
Also dump the other missing flags.
commit 2a3b96e84ac872dcc5bc1de049fe76bb58d64b23
Author: Marek Olšák <maraeo@gmail.com>
Date: Sun Dec 25 10:43:43 2011 +0100
st/mesa: adapt to new clip state
commit b7b656a42fca19d7c85267f42649a206a85a2c72
Author: Marek Olšák <maraeo@gmail.com>
Date: Sat Dec 17 15:45:19 2011 +0100
gallium: move state enable bits from clip_state to rasterizer_state
This brings the code in sync with gen6_sf_state.c; presumably the
mistake was a botched rebase on initial Ivybridge bring-up patches.
Found by diffing batch buffer dumps and noticing the random values.
Thanks to Eric for catching the obvious mistake.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
In f3e9ccb3b, I renamed gen6_upload_wm_constants to
gen6_upload_wm_push_constants, but neglected to update this comment.
I don't think there ever was a gen7_prepare_wm_constants function; it
was probably a search and replace error. Of course, "prepare" functions
died a while back as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
CACHE_NEW_SAMPLER doesn't cover max_wm_threads, but it does cover
brw->sampler.count. BRW_NEW_PS_BINDING_TABLE is obvious, but it's
probably worth adding a comment anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The dirty bit was already correctly in place, but there was no comment.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Also, annotate the use of _NEW_POINT as long as we're adding a comment.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
According to a comment in gen6_sf_state, calls to get_attr_override need
both _NEW_PROGRAM and _NEW_LIGHT. Since Gen7 reuses the same function,
the same dirty bits should apply.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The BRW_NEW_CURBE_OFFSETS dirty bit is only flagged by the
brw_curbe_offsets state atom which is only used on Gen4-5.
Since it's never flagged, there's no reason to depend on it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The BRW_NEW_URB_FENCE dirty bit is only flagged by the
brw_recalculate_urb_fence state atom which isn't used on Gen6+.
Since it's never flagged, there's no reason to depend on it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This brings the dirty bits in line with the comments.
This does /not/ need to be cherry-picked to stable branches because the
access requiring _NEW_BUFFERS was added in master as part of HiZ.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The trick was to produce an assignment in the IR along the lines of:
(assign (xyzw) (var_ref R0) (swiz wwww (var_ref R0) ))
which occurs only rarely even in code that looks like it should do
this, because of the assignment temporaries generated in ast_to_hir.
From the IR above, this optimization pass would then propagate
references of R0 into R0.wwww (seems reasonable), but without this
patch, a later reference of R0.wwww would see R0 first, turning that
into R0.wwww.wwww, which triggered opt_swizzle_swizzle, and then we
looped back to this code to do it again. Avoid that by skipping over
the usual ir_rvalue visitor's ir_swizzle hook, so that we get
handle_rvalue() on the ir_swizzle itself, not its referenced value.
Looking at only the swizzle will always optimize away at least as much
as looking at the swizzle's refererenced value.
We now still claim to propagate r0.w into r0.w, but at least we don't
trigger the loop.
v2: Rewrite commit message (changes by anholt)
Fixes piglit glsl-copy-propagation-self-1
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=34006
The r300 driver requires LLVM when building and other drivers that
depend on it for all TNL, like i915g will be a lot slower without it.
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The optimization was supposed to turn an attribute component that was
always 1.0 into a mov of 1.0. But by leaving loop this patch removes
out of that test, we applied the projection correction to the 1.0 and
got some other value, breaking openarena once it was converted to
using the new compiler backend.
Originally this hunk was separate from the former loop to make the
generated instructions slightly better pipelined. We now have
automatic instruction scheduling to handle that, and the generated
instruction sequence looked the same to me after this change (except
for the bugfix).
Previous to this patch, if the client requested transform feedback
using a subscript, but the variable was not an array
(e.g. "gl_FrontColor[0]"), we would produce a bogus error message like
"Transform feedback varying gl_FrontColor[0] found, but it's an array
([] expected)".
Changed the error message to e.g. "Transfrorm feedback varying
gl_FrontColor[0] requested, but gl_FrontColor is not an array."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were only comparing the number of depth and stencil bits but the
extension spec actually says the formats must match:
The error INVALID_OPERATION is generated if BlitFramebufferEXT is
called and <mask> includes DEPTH_BUFFER_BIT or STENCIL_BUFFER_BIT
and the source and destination depth or stencil buffer formats do
not match.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit acf82657f4 supposedly enabled
SIMD16 dispatch, but neglected to set the "16 Pixel Dispatch Enable"
bit, so nothing actually got enabled.
Furthermore, it neglected to set up the Dispatch GRF Start Register for
kernel 2, which is the SIMD16 program.
Increases performance in Nexuiz by ~15% at 800x600 (n=3).
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
$ dict invarient
No definitions found for "invarient", perhaps you mean:
gcide: Invariant
wn: invariant
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Replace target, level parameters with gl_texture_image.
Add gl_renderbuffer parameter to indicate source buffer for the copy.
This removes some redundant code in the drivers to find the source
renderbuffer and the destination texture image (which we already had
in _mesa_CopyTexSubImage).
Signed-off-by: Brian Paul <brianp@vmware.com>
We were comparing 32-bit Z buffer values against 16-bit fragment values.
Need to do scaling like for the 24-bit case.
Triangle Z testing was OK since it didn't hit this code path.
If the assertion was hit, it probably meant that we were unable to allocate
or map a vertex buffer. Instead of dying in a debug build, issue a warning
and continue.
We need to pass the pre-projection matrix clip planes into the driver,
instead of the post for the case we have a vertex shader that writes clip
vertex.
Signed-off-by: Dave Airlie <airlied@redhat.com>
translate signed/unsigned integers to coresponding uint/sint r32g32b32a32 types.
This fixes a bunch of piglit tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Brian mentioned that mesa-demos/reflect was broken on softpipe,
by my previous commit. The problem was were blindly translating none
to perspective, when color/pntc at least need it linear.
this is the final version that fixes the reflect regression.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The IR for mix(float, float, bool) was missing a write mask, causing the
IR reader to die horribly. Furthermore, I neglected to add any of the
new prototypes to the 1.30 profiles.
Fixes oglconform's glsl-bif-com advanced.mix test cases.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44477
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The various l-value errors this was designed to catch are now caught
by other means. Marking the temporaries as read-only now just
prevents sensible error messages from being generated. It's
0:0(0): error: function parameter 'out p' references the read-only variable '_post_incdec_tmp'
versus
0:13(5): error: function parameter 'out p' references a post-decrement operation
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These were used by swrast to make a combined depth+stencil buffer look
like separate depth and stencil buffers. But that's no longer needed
after rewriting the depth/stencil code in swrast.
Reviewed-by: Eric Anholt <eric@anholt.net>
These functions updated the gl_renderbuffer::_DepthBuffer and
_StencilBuffer fields. But those fields are no longer used.
Reviewed-by: Eric Anholt <eric@anholt.net>
Everything about this that we have tests for works except for the
deprecated metaops. The conclusion we came to on IRC sounded like we
were OK with turning it on as long as core functionality works. The
remaining failures (copypixels, drawpixels) should just be a matter of
finishing the MapRenderbuffer for them.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes on i965:
ARB_depth_buffer_float/fbo-depthstencil-GL_DEPTH32F_STENCIL8-blit
ARB_depth_buffer_float/fbo-stencil-GL_DEPTH32F_STENCIL8-blit
Reviewed-by: Brian Paul <brianp@vmware.com>
We were converting our ubyte stencil value to a float. Just write it
as a uint, which overwrites the X24 part of X24S8 with 0 but shouldn't
matter.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
I'm so surprised that gcc didn't catch this that I feel like I must be
misreading. srcMap is what we initialize (along with dstMap) from
this map value right after this check.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
They were meaning to do the same thing of memcpying rows, so just
write the code once.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
I'm going to reuse this function from glBlitFramebuffer() handling,
which wants to do the same thing.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
The last major issue (intervening-read) is fixed, so let's turn this
on for real. The only other known issue is a hardware limitation for
tesselation with flat shading.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
We need the kernel to reset our pointers to 0 in between. Note that
the initialization of function pointer had to move to after
InitContext since we didn't have intel->gen set up yet.
Fixes piglit EXT_transform_feedback/immediate-reuse
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The new kernel patch I submitted makes the interface opt-in, so all
batchbuffers aren't preceded by the 4 MI_LOAD_REGISTER_IMMs. This
requires the updated i915_drm.h present in libdrm 2.4.30.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The existing glsl_to_tgsi::remove_output_read pass did not work properly
when indirect addressing was involved; this commit replaces it with a
lowering pass that occurs before TGSI code generation.
Fixes varying-array related piglit tests.
Signed-off-by: Vincent Lejeune <vljn@ovi.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is similar to Gallium's existing glsl_to_tgsi::remove_output_read
lowering pass, but done entirely inside the GLSL compiler.
Signed-off-by: Vincent Lejeune <vljn@ovi.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes
draw-elements-base-vertex user_varrays
draw-elements-instanced-base-vertex user_varrays
for softpipe with no llvm support (DRAW_USE_LLVM=false)
I'm not sure if this is the correct answer, but these tests were showing
a max_index of 7, then trying to fetch up to 43, maybe it should be fixing
max_index earlier somewhere to take care of this.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch silences these GCC warnings.
warning: unused variable 'texelBytes'
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
It is not explicitly stated in the GL 3.0 spec that transform feedback
can be performed on a whole varying array (without supplying a
subscript). However, it seems clear from context that this was the
intent. Section 2.15 (TransformFeedback) says this:
When writing varying variables that are arrays, individual array
elements are written in order.
And section 2.20.3 (Shader Variables), says this, in the description
of GetTransformFeedbackVarying:
For the selected varying variable, its type is returned into
type. The size of the varying is returned into size. The value in
size is in units of the type returned in type.
If it were not possible to perform transform feedback on an
unsubscripted array, the returned size would always be 1.
This patch fixes the linker so that transform feedback on an
unsubscripted array is supported.
Fixes piglit tests "EXT_transform_feedback/builtin-varyings
gl_ClipDistance[{4,8}]-no-subscript" and
"EXT_transform_feedback/output_type *[2]-no-subscript".
Note: on back-ends that set
gl_shader_compiler_options::LowerClipDistance (for example i965),
tests "EXT_transform_feedback/builtin-varyings
gl_ClipDistance[{1,2,3,5,6,7}]" still fail. I hope to address this in
a later patch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
With the addition of unit tests in commit
3ef3ba4d2e, several additional build
artifacts are created:
bin/depcomp
bin/missing
tests/Makefile
tests/Makefile.in
tests/glx/Makefile
tests/glx/Makefile.in
tests/glx/.deps/
tests/glx/.gitignore
This patch adds all of these files to .gitignore.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously we were using
gl_transform_feedback_object::Buffers[i]->Name to service an indexed
get request for GL_TRANSFORM_FEEDBACK_BUFFER_BINDING. However, if no
buffer has been bound, gl_transform_feedback_object::Buffers[i] is
NULL, so this was causing a segfault.
This patch switches to using
gl_transform_feedback_object::BufferNames[i], which is equal to
gl_transform_feedback_object::Buffers[i]->Name if
gl_transform_feedback_object::Buffers[i] is not NULL, and 0 if it is
NULL.
Fixes piglit test "EXT_transform_feedback/get-buffer-state
indexed_binding".
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On drivers that set gl_shader_compiler_options::LowerClipDistance (for
example i965), references to gl_ClipDistance (a float[8] array) will
be converted to references to gl_ClipDistanceMESA (a vec4[2] array).
This patch modifies the linker so that requests for transform feedback
of gl_ClipDistance are similarly converted.
Fixes Piglit test "EXT_transform_feedback/builtin-varyings
gl_ClipDistance".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When using transform feedback, there are three circumstances in which
it is useful for Mesa to instruct a driver to stream out just a
portion of a varying slot (rather than the whole vec4):
(a) When a varying is smaller than a vec4, Mesa needs to instruct the
driver to stream out just the first one, two, or three components of
the varying slot.
(b) In the future, when we implement varying packing, some varyings
will be offset within the vec4, so Mesa will have to instruct the
driver to stream out an arbitrary contiguous subset of the components
of the varying slot (e.g. .yzw or .yz).
(c) On drivers that set gl_shader_compiler_options::LowerClipDistance,
if the client requests that an element of gl_ClipDistance be streamed
out using transform feedback, Mesa will have to instruct the driver to
stream out a single component of one of the gl_ClipDistance varying
slots.
Previous to this patch, only (a) was possible, since
gl_transform_feedback_info specified only the number of components of
the varying slot to stream out. This patch adds
gl_transform_feedback_info::ComponentOffset, which indicates which
components should be streamed out.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, on i965 Gen6 and above, we weren't allocating space for
gl_ClipVertex in the VUE, since the VS was automatically converting it
to clip distances. This prevented transform feedback from being able
to capture gl_ClipVertex.
This patch goes aheads and allocates space for gl_ClipVertex in the
VUE on Gen6 and above. The old behavior is retained on Gen5 and
below, since (a) transform feedback is not yet supported on those
platforms, and (b) those platforms don't currently support
gl_ClipVertex anyhow.
Note: this constitutes a slight waste of VUE space for shaders that
use gl_ClipVertex and don't use transform feedback to capture it.
However, that seems preferable to making the VUE map (and all of the
state that depends on it) dependent on transform feedback settings.
Fixes Piglit test "EXT_transform_feedback/builtin-varyings
gl_ClipVertex".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On i965 Gen6 and above, gl_PointSize is stored in component W of the
first VUE slot (which corresponds to VERT_RESULT_PSIZ in the VUE map).
Normally we store varying floats in component X of a VUE slot, so we
need special case logic for gl_PointSize.
For Gen6, we do this with a ".wwww" swizzle in the GS. For Gen7, we
shift the component mask by 3 to select the W component.
Fixes Piglit test "EXT_transform_feedback/builtin-varyings
gl_PointSize".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 9d36c96d6e (mesa: Fix
glGetTransformFeedbackVarying()) accidentally added an extra memset()
call to the store_tfeedback_info() function, causing
prog->LinkedTransformFeedback.NumBuffers to be erased.
This patch removes the extra memset and rearranges the other
operations in store_tfeedback_info() to be in the correct order.
Fixes piglit tests "EXT_transform_feedback/api-errors *unbound*"
Reviewed-by: Eric Anholt <eric@anholt.net>
The src/dst arrays would overlap but dst was less than src so a simple
version of memcpy() would do the right thing. But this isn't guaranteed
when memcpy() is optimized.
Fixes demos/copypix when the dest region was clipped by the left side of
the window.
Reviewed-by: Adam Jackson <ajax@redhat.com>
This is useful for apps which don't print FPS.
Only enabled in SwapBuffers.
v2: track state per drawable, use libGL prefix
Reviewed-by: Michel Dänzer <michel@daenzer.net>
Do it after we check whether inst_end != -1.
Also move the code structure at the beginning of r300_fragment_shader_code
to detect underflows easily with valgrind.
Improves performance from cca 1 fps to 23 fps in Cogs.
This new codepath is not always used, instead, there is a heuristic which
determines whether to use it. Using translate for uploads is generally
slower than what we have had already, it's a win only in a few cases.
This is for GL_ARB_vertex_type_2_10_10_10_rev.
I just took the code from u_format_table.c. It's based on pack_rgba_float.
I had no other choice. The u_format hooks are not exactly compatible
with translate. The cleanup of it is left for future work.
Reviewed-by: Dave Airlie <airlied@redhat.com>
The conversion is limited to only a few cases, because converting to any other
type shouldn't happen in any driver.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Fetching int as float and vice versa is not allowed.
Fetching unsigned int as signed int and vice versa is not allowed either.
Doing conversions like that isn't allowed for samplers in OpenGL.
The three hooks could be consolidated into one fetch hook, which would fetch
uint as uint32, sint as sint32, and everything else as float. The receiving
parameter would be void*. This would be useful for implementing vertex fetches
for shader model 4.0, which has untyped registers.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Please see the diff for further info.
This paves the way for moving user buffer uploads out of drivers and should
allow to clean up the mess in u_upload_mgr in the meantime.
For now only allowed for buffers on r300 and r600.
Acked-by: Christian König <deathsimple@vodafone.de>
We don't wanna convert per-instance or constant (zero-stride) attribs into
ordinary vertex attribs.
More importantly, the translation of instance attribs now finally works.
To match what transfer_map returns. Really, subtracting the offset leads
to bugs if someone expects it to work exactly like transfer_map.
Reviewed-by: Brian Paul <brianp@vmware.com>
The current implementation was totally broken -- it was looking in an
unpopulated structure for varyings, and trying to do so using the
current list of varying names, not the list used at link time.
v2: Fix leaking of memory into the program per re-link.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
There was some duplication between the tgsi_dump.c and tgsi_text.c
files. Also use some static assertions to help catch errors when
adding new TGSI values.
v2: put strings in tgsi_strings.c file instead of the .h file.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This reverts commit 5a478976ae.
It broke the build. DRI drivers were no longer being installed by
`make install` (and probably not being built at all). It appears to be
due to a few small, subtle mistakes, and the fix isn't clear enough to
simply commit without going through review. In the meantime, revert it.
All other xorg modules require at least 2.60 (released in 2006), so we
may as well increase it to match. It's also doubtful anyone tests the
build with 2.59 (from 2003), so it may not even work anyway.
Adds two missing '|| srcFormat == GL_RG_INTEGER' in assertions and a
bunch of missing pixel converions cases.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 0ed11e3331 fixed a "use after free"
bug by getting the next pointer before deleting the current node.
Unfortunately, it also made "next" never get updated if i->need != need.
Fixes infinite loops in piglit tests fbo-depth-array and fbo-depthtex.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
textureSize() returns an int, ivec2, or ivec3, but never an ivec4.
Creating the destination register as an ivec4 triggered later failures,
even though the register did hold the proper values.
For example, piglit test vs-textureSize-compare calls textureSize on a
2D texture and compares the result to an expected value. Unfortunately,
our generated code also tried to compare the third and fourth components
which were undefined, and failed.
Fixes piglit test vs-textureSize-compare as well as 19 subcases of
oglconform's glsl-bif-tex-size test.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44339
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Commit d45814c925 totally added a data
dependency on _NEW_TEXTURE, even including the comment, but didn't
actually add the dirty bit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
From the EXT_transform_feedback spec:
The error INVALID_OPERATION is also generated by BeginTransformFeedbackEXT
if no binding points would be used, either because no program object is
active or because the active program object has specified no varying
variables to record.
...
The error INVALID_VALUE is generated by BindBufferRangeEXT or
BindBufferOffsetEXT if <offset> is not word-aligned.
Fixes Piglit tests:
- EXT_transform_feedback/api-errors no_prog_active
- EXT_transform_feedback/api-errors interleaved_no_varyings
- EXT_transform_feedback/api-errors separate_no_varyings
- EXT_transform_feedback/api-errors bind_offset_offset_1
- EXT_transform_feedback/api-errors bind_offset_offset_2
- EXT_transform_feedback/api-errors bind_offset_offset_3
- EXT_transform_feedback/api-errors bind_offset_offset_5
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
From the EXT_transform_feedback spec:
The error INVALID_OPERATION is generated by
BeginTransformFeedbackEXT if any transform feedback buffer object
binding point used in transform feedback mode does not have a
buffer object bound.
This required adding a new NumBuffers field to the
gl_transform_feedback_info struct, to keep track of how many transform
feedback buffers are required by the current program.
Fixes Piglit tests:
- EXT_transform_feedback/api-errors interleaved_unbound
- EXT_transform_feedback/api-errors separate_unbound_0_1
- EXT_transform_feedback/api-errors separate_unbound_0_2
- EXT_transform_feedback/api-errors separate_unbound_1_2
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Other parts of the compiler assume that expressions will have
well-formed types or the error type. Just using the type of the thing
being operated on can cause expressions like ~3.14 or ~false to not
have a well-formed type. This could then result in an assertion
failure in the context epxression handler.
If there is an error processing the expression, set the type of the IR
expression to error.
Fixes piglit's bit-not-0[789].frag tests.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42755
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Vinson Lee <vlee@vmware.com>
Detect whether a new enough version of XCB is installed at configure
time. If it is not, don't enable the extension and don't build the
unit tests.
v2: Move the AM_CONDIATION outside the case-statement so that it is
invoked even for non-GLX builds. This prevents build failures with
osmesa, for example.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Robert Hooker <robert.hooker@canonical.com>
The core Mesa code does the equivalent memory allocation, image mapping,
storing and unmapping. We just need to call prep_teximage() first to
handle the 'surface_based' stuff.
The other change is to always use the level=0 mipmap image when accessing
individual mipmap level images that are stored in resources/buffers.
Apparently, we were always using malloc'd memory for individual mipmap
images, not resource buffers, before.
Signed-off-by: Brian Paul <brianp@vmware.com>
This was disabled a year ago due to not having a story for handling
the blitter at the time. We're fine with using the blitter now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The new assert in intelEmitCopyBlit() gets angry if we don't align to
dwords. Rather than make the assert have a special case for height ==
1 on the assumption that the hardware doesn't use it in that case,
just supply a correct pitch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43214
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We didn't consume these flags in any way that would produce a
functional difference, but we might have some day.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't want to match the visual against the default screen. If the
drawable is on a non-default screen then the appropriate visual might not
exist on the default screen. Conversely, if the same visual is
available on multiple screens then simply selecting for the right VID is
sufficient, since the server has promised that the same visual is
compatible with multiple screens.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
There are a couple scenarios where the source could be zero and the
operand could be either SRC_ALPHA or ONE_MINUS_SRC_ALPHA. For
example, if the source was ZERO. This would result in something like
(0).w, and a later call to ir_validate would get angry.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42517
Coverity reported a read from pointer after free defect in
src/mesa/drivers/dri/intel/intel_mipmap_tree.c. Bug# 44205
In intel_miptree_all_slices_resolve() function, i = i->next was
executing after freeing i. I have defined a temporary variable
(next) to store the value of i->next before freeing i
Reported-by: Vinson Lee <vlee@vmware.com>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
In the interest of softpipe preferring correctness over speed and passing more
piglit tests, set this to off by default. For speed you really want llvmpipe.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch remove the 32bits limitation. As a side effect, it bring the support for the GL_ARB_depth_buffer_float extension.
No regression have been found on piglit, and all tests for GL_ARB_depth_buffer_float pass successfully.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If glUniform1i and friends are going to dump data directly in
driver-allocated, the pointers have to be updated when the storage
moves. This should fix the regressions seen with commit 7199096.
I'm not sure if this is the only place that needs this treatment. I'm
a little uncertain about the various functions in st_glsl_to_tgsi that
modify the TGSI IR and try to propagate changes about that up to the
gl_program. That seems sketchy to me.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
v2:
Revalidate when shader_program is not NULL.
Update the pointers for all _LinkedShaders.
Init glsl_to_tgsi_visitor::shader_program to NULL in the
get_pixel_transfer_visitor & get_bitmap_visitor.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
A lot of tests in 'make check' will fail under these circumstances,
but at least the build should work.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds a new tests directory at the top-level and some extra build
infrastructure. The tests use the Google C++ Testing Framework, and
they will only be built if configure can detect its availability. The
tests are automatically wired-in to run with 'make check'.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Chad Versace <chad.versace@linux.intel.com>
Using 'new' as a function parameter name prevents including
glxclient.h the unit tests (future patch) that use the Google C++
Testing Framework.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This extension is only enabled if the underlying driver advertises
support for OpenGL ES 2.0. This happens either through the getAPIMask
function in version 2 of the DRI2 extension or implicity through
version 2 of the DRISW extension.
Since there is no OpenGL ES 2.0 protocol, this extension is marked as
only available with direct-rendering.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This also enables GLX_ARB_create_context and
GLX_ARB_create_context_profile if the driver supports DRI_DRISW
version 3 or greater.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This converts all of the GLX data from glXCreateContextAttribsARB to
the values expected by the DRI driver interfaces.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This adds the function and modifies dri2CreateNewContextForAPI to call
it. At this point only version 2 of the DRI2 API is advertised to the
loader.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Note that these extensions are not automatically enabled for screens
capable of direct-rendering.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This also enables GLX_ARB_create_context and
GLX_ARB_create_context_profile if the driver supports DRI_DRI2 version
3 or greater.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
__glX_send_client_info only supports XCB, so use that instead of
__glXClientInfo when USE_XCB is defined.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This function picks the correct client-info protocol (based on the
server's GLX version and set of extensions) and sends it to the
server.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
This code was generating the gcc warning:
variable ‘clearValue’ set but not used [-Wunused-but-set-variable]
Reviewed-by: Brian Paul <brianp@vmare.com>
The were always zero. When doing a sub-texture replacement we account
for the dstX/Y/Zoffsets when we map the texture image. So no need to
pass them into the texstore code anymore.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Buffers for shader based decoding can now be
released without its component still being around.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Acked-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Changing pipe_resource was wrong, because it can be used by other contexts
at the same time. This fixes the last possible race condition in r300g
that I know of.
This also fixes blitting NPOT compressed textures. Random pixels sometimes
appeared at the right-hand edge of the texture.
Finally, this removes r300_texture_desc::stride_in_pixels. It makes little
sense with sampler views and surfaces being able to override width0, height0,
and the format entirely.
This fixes a regresssion (broken cube maps) caused by the
ctx->Driver.TexImage parameter simplification commit. The target var
is always GL_TEXTURE_CUBE_MAP at this point so the Face field was always
getting set to zero.
These field assignments aren't needed anyway since core Mesa sets them.
This fixes the latc fetches for llvmpipe, fixes
fbo-generatemipmap-formats GL_ARB_texture_compression
fbo-generatemipmap-formats GL_ATI_texture_compression_3dc
fbo-generatemipmap-formats GL_EXT_texture_compression_latc
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@gmail.com>
As with previous commits, the target, level and texObj info can be
obtained through the texImage pointer.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
As with TexSubImage(), the target, level and texObj values can be obtained
through the texImage pointer.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
There's no need to pass the target, level and texObj parameters since
they can be easily obtained from the texImage pointer.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since the move to Map/UnmapTextureImage, the core mesa routines are
equivalent to what the state tracker was doing.
The TexImage functions can be replaced too, but there's a few differences
that will need to be handled.
inv_swizzles is used in lp_tile_soa.py to create lp_tile_soa.c, we overwrite swizzles if they are already set.
This results in the i8 format getting alpha instead of red, and the l8 format
getting blue instead of red.
Fixes fbo-alphatest-formats, fbo-alphatest-formats ARB_texture_float,
and fbo-alphatest-formats EXT_texture_snorm on llvmpipe.
Signed-off-by: Dave Airlie <airlied@redhat.com>
introduce vbo_sizeof_ib_type() function to return the index data type
size. I see some place use switch(ib->type) to get the index data type,
which is sort of duplicate.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
On failure, intel_miptree_create() needs to *release* the miptree, not
just free it, so that the stencil_mt gets released too.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This saves a couple of instructions on most programs with control
flow. More interestingly, 6 shaders from unigine sanctuary now fit
into 16-wide without register spilling.
We were printing out the line triggering the flush, but a variety of
different causes just printed the line number for intel_flush()'s call
of intel_batchbuffer_flush(). Plumb the line numbers from the caller
of intel_flush() on through.
Since the refactor in d7b33309fe, depth
in the miptree changed from 1 to 6, so we always decided it didn't
match, and we would relayout to something that would still not
"match".
Improves performance 23.8% (+/- 1.1%, n=4)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43329
Fixes this GCC warning.
arrayobj.c: In function '_mesa_update_array_object_max_element':
arrayobj.c:310: warning: implicit declaration of function 'ffsll'
Signed-off-by: Vinson Lee <vlee@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
bitset.h is still used by classic nouveau -- see `git grep '\<BITSET_'`
-- and the state stored is too big to fit in 64bit integers (it requires
approximately 87 bits), so there is no obvious alternative here.
This effecively reverts commit 196800d798.
Since commit 82b9661894 and
34eae1c72a vbo support
is mandatory for all drivers. So, remove the remaining
FEATURE_ARB_vertex_buffer_object guards.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Consider the following code:
MOV A.x, B.x
MOV B.x, C.x
After the first line, cur_value[A][0] == B, indicating that A.x's
current value came from register B.
When processing the second line, we update cur_value[B][0] to C.
However, for drect copies, we fail to reset cur_value[A][0] to NULL.
This is necessary because the value of A is no longer the value of B.
Fixes Counter-Strike: Source in Wine (where the menu rendered completely
black in DX9 mode), completely white textures in Civilization V, and the
new Piglit test glsl-vs-copy-propagation-1.shader_test.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42032
Tested-by: Matt Turner <mattst88@gmail.com>
Tested-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
In this code, 'i' loops over the number of virtual GRFs, while 'j' loops
over the number of vector components (0 <= j <= 3).
It can't possibly be correct to see if bit 'i' is set in the destination
writemask, as it will have values much larger than 3. Clearly this is
supposed to be 'j'.
Found by inspection.
Tested-by: Matt Turner <mattst88@gmail.com>
Tested-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
In Android IceCreamSandwich, SurfaceFlinger requires GL_OES_image_external
for basic compositing tasks. Without the extension, SurfaceFlinger fails
to start.
Despite the incompleteness of the extension's implementation introduced by
this patch, it is good enough to enable SurfaceFlinger and to unblock the
people who need to begin testing Mesa on IceCreamSandwich.
To enable the extension, set the environment variable
MESA_EXTENSION_OVERRIDE="+GL_OES_EGL_image_external". Ideally, Android
should set this in init.rc.
WARNING: This implementation of GL_OES_EGL_image_external is not complete.
Some of it is even incorrect. When we begin to really implement
GL_OES_EGL_image_external, much of the patch will need reverting.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
If the meta flag MESA_META_TEXTURE is present, then disable the texture
target GL_TEXTURE_EXTERNAL_OES.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
just noticed this in passing, not sure it actually fixes any issus.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmare.com>
Now the gl_array_object's layout matches the one used in
recalculate_input_bindings. Make use of this and remove the
bind_array_obj function.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmare.com>
This fixes a regression seen with the isosurf demo when switching between
glBegin/End and glDrawArrays (do it several times). The problem was the
driver wasn't getting _NEW_ARRAY when the arrays were subtly changed:
(vertex3f, normal3f) vs. (normal3f, vertex3f).
This patch fixes that by signaling _NEW_ARRAY whenever we transition
between glBegin/End and glDrawArrays mode and display lists.
The patch also fixes up the initialization of the map_vp_none[] array
to stop putting strange values in the last five elements of the array.
v2: remove DRAW_ELEMENTS, don't distinguish between glDrawArrays and
glDrawElements
v3: add DRAW_DISPLAY_LIST for the display list case, just to be safe.
Reviewed-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Tested-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Remove gl_light::_dli and gl_light::_sli.
Both are only used for a value previously used in
color indexed rendering. Also both variables are only used
and never written.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Here is the final patch to enable dynamic eu instruction store size:
increase the brw eu instruction store size dynamically instead of just
allocating it statically with a constant limit. This would fix something
that 'GL_MAX_PROGRAM_INSTRUCTIONS_ARB was 16384 while the driver would
limit it to 10000'.
v2: comments from ken, do not hardcode the eu limit to (1024 * 1024)
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
A single next_insn may change the base address of instruction store
memory(p->store), so call it first before referencing the instruction
store pointer from an index.
This the final prepare work to enable the dynamic store size.
v2: comments from Ken, define emit_endif as bool type
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
If dynamic instruction store size is enabled, while after the brw_JMPI()
and before the brw_land_fwd_jump() function, the eu instruction store
base address(p->store) may change. Thus, the safe way to reference the
jmp instruction is by index instead of by the instruction address.
v2: comments from Eric, don't change the prototype of brw_JMPI
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
If dynamic instruction store size is enabled, while after
the brw_IF/ELSE() and before the brw_ENDIF() function, the
eu instruction store base address(p->store) may change.
Thus let if_stack just store the instruction index. This is
somehow more flexible and safe than store the instruction
memory address.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This partially reverts commit 363ff84475.
It caused severe performance drops in Nexuiz. Reported by Phoronix.
Tested by me on r300g and by IRC people on r600g.
When updating SOL indices, we were accidentally putting the starting
index in dword 1 and the SVBI number to increment in dword 2--these
should be reversed. Usually both of these values are zero, so we
didn't see any problem. However, if a transform feedback operation
spans multiple batch buffers, the starting index will be nonzero.
Fixes piglit test "EXT_transform_feedback/intervening-read output".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When rendering triangle strips, vertices come down the pipeline in the
order specified, even though this causes alternate triangles to have
reversed winding order. For example, if the vertices are ABCDE, then
the GS is invoked on triangles ABC, BCD, and CDE, even though this
means that triangle BCD is in the reverse of the normal winding order.
The hardware automatically flags the triangles with reversed winding
order as _3DPRIM_TRISTRIP_REVERSE, so that face culling and two-sided
coloring can be adjusted to account for the reversed order.
In order to ensure that winding order is correct when streaming
vertices out to a transform feedback buffer, we need to alter the
ordering of BCD to BDC when the first provoking vertex convention is
in use, and to CBD when the last provoking vertex convention is in
use.
To do this, we precompute an array of indices indicating where each
vertex will be placed in the transform feedback buffer; normally this
is SVBI[0] + (0, 1, 2), indicating that vertex order should be
preserved. When the primitive type is _3DPRIM_TRISTRIP_REVERSE, we
change this order to either SVBI[0] + (0, 2, 1) or SVBI[0] + (1, 0,
2), depending on the provoking vertex convention.
Fixes piglit tests "EXT_transform_feedback/tessellation
triangle_strip" on Gen6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The code for storing 1D, 2D and 3D tex images (whole or sub-images) was
all pretty similar. This consolidates those six paths.
v2: rework switch statement to catch unexpected targets
Reviewed-by: José Fonseca <jfonseca@vmware.com>
For 1D arrays, map each slice separately. Note that this was handled
correctly in _mesa_store_teximage2d() but not here.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This gets rid of another renderbuffer->PutRow() call and _DepthBuffer
usage. We always work with 32-bit uint Z values now.
Reviewed-by: Eric Anholt <eric@anholt.net>
Use Map/UnmapRenderbuffer() for the special, optimized cases we care about.
Note that we're dropping some seldom-used cases in the new fast-path
code: as CI->RGB conversion and zooming.
Reviewed-by: Eric Anholt <eric@anholt.net>
We don't want to call these functions where we'll be using
Map/UnmapRenderbuffer(). So push them further down in the drawpixels
cases so that we can switch over to Map/UnmapRenderbuffer() step by step.
Reviewed-by: Eric Anholt <eric@anholt.net>
Stop using deprecated renderbuffer PutRow() function. Note that we
aren't using Map/UnmapRenderbuffer() yet because this call is inside
a swrast_render_start/finish() pair.
v2: use _mesa_pack_uint_24_8_depth_stencil_row(), per Eric.
Hopefully glCopyPixels(GL_DEPTH_STENCIL) will be handled by the
fast copy function. Otherwise, just do the copy with separate
depth + stencil copies. That's effectively what the removed code
did anyway.
Reviewed-by: Eric Anholt <eric@anholt.net>
The functions that read depth/stencil values understand all (packed)
depth/stencil buffer formats now so there's no reason to use the
wrappers.
Also, improve the format checks in fast_copy_pixels() to catch mismatched
depth/stencil cases.
v2: fix the test for combined depth+stencil buffers, per Eric.
Stop using the deprecated renderbuffer Get/Put Row/Values functions.
Consolidate code paths, etc. The file is nearly half the size it used
to be!
Reviewed-by: Eric Anholt <eric@anholt.net>
Use format pack/unpack functions instead of deprecated renderbuffer
GetRow/PutRow functions.
v2: use get_stencil_address(), s/destVals/newVals/
Reviewed-by: Eric Anholt <eric@anholt.net>
The former was only used for clearing buffers. The later wasn't used
anywhere! Remove them and all implementations of those functions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Another step toward getting rid of the renderbuffer PutRow/etc functions.
v2: fix assorted depth/stencil clear bugs found by Eric
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Fixed the build failure, fixed a warning where attributs and error arguments had
been
inverted and fixed another call that was missing an argument.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Fixes almost all of the transform feedback piglit tests. Remaining
are a few tests related to tesselation for
quads/trifans/tristrips/polygons with flat shading.
v2: Incorporate Paul's feedback (squash with previous, state flag note,
static assert, update FINISHME)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The code was relying on gs.prog_data's copy of the
number-of-verts-per-prim, which segfaulted on gen7 since it doesn't
make a GS program. We can easily calculate that value right here.
v2: Fix svbi_0_starting_index regression.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Although there is not much documentation of this fact, there are in
fact two separate VF caches:
- an "index-based" cache (described in the Sandy Bridge PRM, vol 2
part 1, section 2.1.2 "Vertex Cache"). This cache stores URB
handles of vertex shader outputs; its purpose is to avoid redundant
invocations of the vertex shader when drawing in random access mode
(e.g. glDrawElements()), and the same vertex index is specified
multiple times. It is automatically invalidated between
3D_PRIMITIVE commands and between instances within a single
3D_PRIMITIVE command.
- an "address-based" cache (mentioned briefly in vol 2 part 1, section
1.7.4 "PIPE_CONTROL Command"). This cache stores the data read from
vertex buffers; its purpose is to avoid redundant memory accesses
when doing instanced drawing or when multiple 3D_PRIMITIVE commands
access the same vertex data. It needs to be manually invalidated
whenever new data is written to a buffer that is used for vertex
data.
Previous to this patch, it was not necessary for Mesa to explicitly
invalidate the address-based cache, because there were no reasonable
use cases in which the GPU would write to a vertex data buffer during
a batch, and inter-batch flushing was taken care of by the kernel.
However, with transform feedback, there is now a reasonable use case:
vertex data is written to a buffer using transform feedback, and then
that data is immediately re-used as vertex input in the next drawing
operation. To make this use case work, we need to flush the
address-based VF cache between transform feedback and the next draw
operation. Since we are already calling
intel_batchbuffer_emit_mi_flush() when transform feedback completes,
and intel_batchbuffer_emit_mi_flush() is intended to invalidate all
caches, it seems reasonable to add VF cache invalidation to this
function.
As with commit 63cf7fad13 (i965: Flush
pipeline on EndTransformFeedback), this is not an ideal solution. It
would be preferable to only invalidate the VF cache if the next draw
call was about to consume data generated by a previous draw call in
the same batch. However, since we don't have the necessary dependency
tracking infrastructure to figure that out right now, we have to
overzealously invalidate the cache.
Fixes Piglit test "EXT_transform_feedback/immediate-reuse".
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
After creating new binding table entries for transform feedback, we
need to set the dirty flag BRW_NEW_SURFACES, so that a new binding
table pointer will be sent to the hardware. Otherwise the new binding
table entries will not take effect.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The surface states tracked by BRW_NEW_WM_SURFACES are no longer used
for just WM. They are also used for vertex texturing and transform
feedback. To avoid confusion, this patch renames BRW_NEW_WM_SURFACES
to BRW_NEW_SURFACES.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
X8 depth formats weren't supported until Ironlake (Gen 5).
Fixes GPU hangs introduced in d84a180417.
One example test case was "fbo-missing-attachment-blit from".
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes piglit tests "EXT_transform_feedback/generatemipmap buffer" and
"EXT_transform_feedback/generatemipmap prims_written" on i965 Gen6.
Reviewed-by: Brian Paul <brianp@vmare.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Although i965 gen6 does not yet support ARB_transform_feedback2 or
NV_transform_feedback2, it needs to support pause/resume functionality
so that meta-ops will work correctly.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When transform feedback is paused, it is legal to change programs or
to perform drawing operations using a drawing mode that doesn't match
the transform feedback mode.
Reviewed-by: Brian Paul <brianp@vmare.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If a client calls BeginTransformFeedback(), then
PauseTransformFeedback(), then EndTransformFeedback(), we need to make
sure that the transform feedback object is not left in a "paused"
state, otherwise the next call to BeginTransformFeedback() will leave
transform feedback paused.
Reviewed-by: Brian Paul <brianp@vmare.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
During meta-operations (such as _mesa_meta_GenerateMipmap()), we need
to be able to draw even if GL_RASTERIZER_DISCARD is enabled. This
patch causes _mesa_meta_begin() to save the state of
GL_RASTERIZER_DISCARD and disable it (so that drawing can be done
during the meta-op), and causes _mesa_meta_end() to restore it.
Fixes piglit test "EXT_transform_feedback/generatemipmap discard" on
i965 Gen6.
Reviewed-by: Brian Paul <brianp@vmare.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This won't be used in the client-side libGL, but the xserver has to
generate a different protocol error depending on the reason context
creation failed.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
There seems to have been two different ways to communicate the
profile. There were flags and there were profiles. I've opted to
remove the profile flags and use ST_PROFILE_DEFAULT (compatibility
profile) and ST_PROFILE_OPENGL_CORE (core profile) consistently
instead.
Also change the values of the ST_CONTEXT_FLAG_DEBUG and
ST_CONTEXT_FLAG_FORWARD_COMPATIBLE flags to match the WGL and GLX
values.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
If the server returned BadContext, the error would just get droped on
the floor.
Fixes the piglit test glx-import-context-single-process
NOTE: This is a candidate for the 7.11 branch, but it also requires
the previous patch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Only initialize vlc in MPEG2 decoding once for all slices,
add more sanity checks to vlc decoding functions, support
multiple vlc input buffer, improve documentation of the
vlc functions.
v2: also implement multiple inputs for the vlc functions
v3: some bug fixes for buffer size and alignment corner cases
v4: rework of the patch, some more improvements
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Double free and array overflow, even if only 2 members are
used the last one needs to be set to NULL explicitly.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com
Hi all
This fixes a memory leak of 32 bytes on exit.
From 924f8fdccb41b011f372bc57252005bcdb096105 Mon Sep 17 00:00:00 2001
From: Lauri Kasanen <curaga@operamail.com>
Date: Thu, 22 Dec 2011 21:28:33 +0200
Subject: [PATCH] gallivm: Close a memory leak
As reported by "valgrind --leak-check=full glxgears".
Signed-off-by: Lauri Kasanen <curaga@operamail.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
In the case where a front and back output are specified, the draw code will
copy the back output into the front color slot and everything is happy.
However if no front is specified then the draw code will do a bad copy (separate patch), but also the frag shader won't pick up the color as there there is
no write to COLOR from the vertex shader just BCOLOR.
This patch fixes that problem so if it can't find a vertex shader output
for the front color slot, it will go and lookup and use one for the back color
slot.
Signed-off-by: Dave Airlie <airlied@redhat.com>
fixing these makes piglit fbo-integer pass on softpipe.
modified to re-order things, haven't addressed Eric's concerns,
can't find anything in spec that mentions sign extensions, it does say
integers aren't clamped or modified.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This makes it easier to keep track of which dirty bits correspond to
which pieces of context, since it makes _NEW_RASTERIZER_DISCARD
correspond with ctx->RasterDiscard.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Previously we were storing the RasterDiscard flag (for
GL_RASTERIZER_DISCARD) in gl_context::TransformFeedback. This was
confusing, because we use the _NEW_TRANSFORM flag (not
_NEW_TRANSFORM_FEEDBACK) to track state updates to it, and because
rasterizer discard has effects even when transform feedback is not in
use.
This patch makes RasterDiscard a toplevel element in gl_context rather
than a subfield of gl_context::TransformFeedback.
Note: We can't put RasterDiscard inside gl_context::Transform, since
all items inside gl_context::Transform need to be pieces of state that
are saved and restored using PushAttrib and PopAttrib.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Previously, we only enabled transform feedback when
MESA_GL_VERSION_OVERRIDE was 3.0 or greater, since transform feedback
support was not completely finished, so it didn't make sense to
advertise support for it unless absolutely necessary.
Now that transform feedback is fully implemented on gen6, we can
enable this extension unconditionally.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch adds software-based PRIMITIVES_GENERATED and
TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN queries that work by keeping
track of the number of primitives that are sent down the pipeline, and
adjusting as necessary to account for the way each primitive type is
tessellated.
In the long run we'll want to replace this with a hardware-based
implementation, because the software approach won't work with geometry
shaders or primitive restart. However, at the moment, we don't have
the necessary kernel support to implement a hardware-based query (we
would need the kernel to save GPU registers when context switching, so
that drawing performed by another process doesn't get counted).
Fixes Piglit tests EXT_transform_feedback/query-primitives_generated-*
and EXT_transform_feedback/query-primitives-written-*.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, i965 only supported two query types: GL_TIME_ELAPSED_EXT
and GL_SAMPLES_PASSED_ARB, and it distinguished between the two using
if/else statements that compared query->Base.Target to
GL_TIME_ELAPSED_EXT.
This patch changes the if/else statements to switch statements so that
we can add more query types without having to have a chain of
else-ifs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't currently have kernel support for saving GPU registers on a
context switch, so if multiple processes are performing transform
feedback at the same time, their SVBI registers will interfere with
each other. To avoid this situation, we keep a software shadow of the
state of the SVBI 0 register (which is the only register we use), and
re-upload it on every new batch.
The function that updates the shadow state of SVBI 0 is called
brw_update_primitive_count, since it will also be used to update the
counters for the PRIMITIVES_GENERATED and
TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN queries.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is needed by i965 to ensure that transform feedback counters are
not incremented during meta-ops.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This function computes the number of primitives that will be generated
when the given drawing operation is performed. It accounts for the
tessellation that is performed on line strips, line loops, triangle
strips, triangle fans, quads, quad strips, and polygons, so it is
suitable for implementing the primitive counters needed by transform
feedback.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It isn't necessary to call FLUSH_VERTICES from bind_buffer_range,
because transform feedback buffers are not allowed to be changed when
transform feedback is active.
Thanks to Marek Olšák for pointing out this bug.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
This patch enables rasterizer discard functionality (a part of
transform feedback) in Gen6, by generating an alternate GS program
when rasterizer discard is active. Instead of forwarding vertices
down the pipeline, the alternate GS program uses a URB Write message
to deallocate the URB entry that was allocated by FF sync and
terminate the thread.
Note: parts of the Sandy Bridge PRM seem to imply that we could do
this more efficiently, by clearing the GEN6_GS_RENDERING_ENABLE bit,
and not allocating a URB entry at all. However, it's not clear how we
are supposed to terminate the thread if we do that. Volume 2 part 1,
section 4.5.4, says "GS threads must terminate by sending a URB_WRITE
message with the EOT and Complete bits set.", and my experiments so
far confirm that.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A common use case for transform feedback is to perform one draw
operation that writes transform feedback output to a buffer, followed
by a second draw operation that consumes that buffer as vertex input.
Since vertex input is consumed at an earlier pipeline stage than
writing transform feedback output, we need to flush the pipeline to
ensure that the transform feedback output is completely written before
the data is consumed.
In an ideal world, we would do some dependency tracking, so that we
would only flush the pipeline if the next draw call was about to
consume data generated by a previous draw call in the same batch.
However, since we don't have that sort of dependency tracking
infrastructure right now, we just unconditionally flush the buffer
every time glEndTransformFeedback() is called. This will cause a
performance hit compared to the ideal case (since we will sometimes
flush the pipeline unnecessarily), but fortunately the performance hit
will be confined to circumstances where transform feedback is in use.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previous to this patch, the function intel_batchbuffer_emit_mi_flush()
was a bit of a misnomer. On Gen4+, when not using the blit engine, it
didn't actually flush the pipeline--it simply generated a PIPE_CONTROL
command with the necessary bits set to flush GPU caches. This was
usually sufficient, since in most situations where
intel_batchbuffer_emit_mi_flush() was called, all we really care about
was ensuring cache coherency.
However, with the advent of OpenGL 3.0, there are two cases in which
data output by one stage of the pipeline might be consumed, in a later
draw operation, by an earlier stage of the pipeline:
(a) When using textures in the vertex shader.
(b) When using drawing with a vertex buffer that was previously
generated using transform feedback.
This patch addresses case (a) by changing
intel_batchbuffer_emit_mi_flush() so that on Gen6+, it sets the
PIPE_CONTROL_CS_STALL bit (this forces the pipeline to actually
flush). (Case (b) will be addressed by the next patch in the series).
This is not an ideal solution--in a perfect world, the driver would
have some buffer dependency tracking so that we would only have to
flush the pipeline in the two cases above. Until that dependency
tracking is implemented, however, it seems prudent to have
intel_batchbuffer_emit_mi_flush() actually flush the pipeline, so that
we get correct rendering, at the expense of a (hopefully small)
performance hit.
The change is only applied to Gen6+, since at the moment only Gen6+
supports the OpenGL 3.0 features that make a full pipeline flush
necessary.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch advertises support for EXT_transform_feedback on Intel
Gen6.
Since transform feedback support is not completely finished yet, for
now we only advertise support for it when MESA_GL_VERSION_OVERRIDE is
3.0 or greater (since transform feedback is required by GL version
3.0).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch adds basic transform feedback capability for Gen6 hardware.
This consists of several related pieces of functionality:
(1) In gen6_sol.c, we set up binding table entries for use by
transform feedback. We use one binding table entry per transform
feedback varying (this allows us to avoid doing pointer arithmetic in
the shader, since we can set up the binding table entries with the
appropriate offsets and surface pitches to place each varying at the
correct address).
(2) In brw_context.c, we advertise the hardware capabilities, which
are as follows:
MAX_TRANSFORM_FEEDBACK_INTERLEAVED_COMPONENTS 64
MAX_TRANSFORM_FEEDBACK_SEPARATE_ATTRIBS 4
MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS 16
OpenGL 3.0 requires these values to be at least 64, 4, and 4,
respectively. The reason we advertise a larger value than required
for MAX_TRANSFORM_FEEDBACK_SEPARATE_COMPONENTS is that we have already
set aside 64 binding table entries, so we might as well make them all
available in both separate attribs and interleaved modes.
(3) We set aside a single SVBI ("streamed vertex buffer index") for
use by transform feedback. The hardware supports four independent
SVBI's, but we only need one, since vertices are added to all
transform feedback buffers at the same rate. Note: at the moment this
index is reset to 0 only when the driver is initialized. It needs to
be reset to 0 whenever BeginTransformFeedback() is called, and
otherwise preserved.
(4) In brw_gs_emit.c and brw_gs.c, we modify the geometry shader
program to output transform feedback data as a side effect.
(5) In gen6_gs_state.c, we configure the geometry shader stage to
handle the SVBI pointer correctly.
Note: ordering of vertices is not yet correct for triangle strips
(alternate triangles are improperly oriented). This will be addressed
in a future patch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch stores the geometry shader VUE map from a local variable in
compile_gs_prog() to a field in the brw_gs_compile struct, so that it
will be available while compiling the geometry shader. This is
necessary in order to support transform feedback on Gen6, because the
Gen6 geometry shader code that supports transform feedback needs to be
able to inspect the VUE map in order to find the correct vertex data
to output.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The Sandy Bridge PRM, volume 4, part 2, section 5.3.10 ("5.3.10
Register Region Restrictions") contains the following restriction on
the execution size and operand width of instructions:
"3. ExecSize must be equal to or greater than Width."
When emitting an IF instruction in single program flow mode on Gen6+,
we use an ExecSize of 1, therefore the Width of each operand must also
be 1.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In _mesa_BindBufferRange(), we need to verify that the offset and size
specified by the client do not exceed the size of the underlying
buffer. We were accidentally doing this check using ">=" rather than
">", so we were generating a bogus error if the client specified an
offset and size that fit exactly in the underlying buffer.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch adds two new fields to the gl_transform_feedback_info
struct:
- BufferStride records the total number of components (per vertex)
that transform feedback is being instructed to store in each buffer.
- Outputs[i].DstOffset records the offset within the interleaved
structure of each transform feedback output.
These values are needed by the i965 gen6 and r600g back-ends, so it
seems better to have the linker provide them rather than force each
back-end to compute them independently.
Also, DstOffset helps pave the way for supporting
ARB_transform_feedback3, which allows the transform feedback output to
contain holes between attributes by specifying
gl_SkipComponents{1,2,3,4} as the varying name.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Mapping to software and uploading again clearing is killing performance.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Valgrind complains about a definitely lost block allocated in
intelNewTextureImage(). This leak was apparently created by
6e0f9001fe, "mesa: move
gl_texture_image::Data, RowStride, ImageOffsets to swrast", as it
removes the free() from _mesa_delete_texture_image().
Put the free() back, fixes a Valgrind error.
Signed-off-by: Pekka Paalanen <ppaalanen@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There is no point in having them when we distribute eglext.h.
As for unofficial extensions, there is a chance that we might remove some of
them evetually. Keeping the #ifdef's for now should make that easier.
Update to revision 15052.
EGL_MESA_drm_image is now official. But apparently we have our own extension
to it and we need this in eglmesaext.h:
#ifdef EGL_MESA_drm_image
/* Mesa's extension to EGL_MESA_drm_image... */
#ifndef EGL_DRM_BUFFER_USE_CURSOR_MESA
#define EGL_DRM_BUFFER_USE_CURSOR_MESA 0x0004
#endif
#endif
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, we advertised 0 VS texture units. Now that we have proper
support for using the sampling engine in the VS, we can advertise 16,
which is conveniently the number required for OpenGL 3.0.
v2: Enable on Gen4. I hacked up my tests to not use flat ivec varyings
and they pass.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This should avoid state-dependent FS recompiles when samplers that are
only used by the VS change.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The idea is to reuse this for the VS and (in the future) GS as well.
v2: Include yuvtex data since we're not dropping GL_MESA_ycbycr.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The visit() half computes the values to put in the header based on the
IR and simply stuffs that in the vec4_instruction; the emit() half uses
this to set up the message header. This works out well since emit() can
use brw_reg directly and access individual DWords without kludgery.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We'll want to reuse this for the VS, and it's complex enough that I'd
rather not cut and paste it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This translates the GLSL compiler's IR into vec4_instruction IR,
generating code to load coordinates, LOD info, shadow comparitors, and
so on into the appropriate message registers.
It turns out that the SIMD4x2 parameters are identical on Gen 5-7, and
the Gen4 code is similar enough that, unlike in the FS, it's easy enough
to support all generations in a single function.
v2: Load zeros for missing coordinates (fixing vs-texelFetch-sampler1D
and 2D on G45), and fix G45 message length for shadow comparisons.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This is the part that takes the vec4_instruction IR and turns it into
actual Gen ISA.
v2: Add Gen4 messages, don't retype m0 to UW.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Prior to Ironlake, cube maps were stored as 3D textures. In recent
refactoring, we removed a separate "layers" parameter in favor of using
depth. Unfortunately, depth was getting minified, which is only correct
for actual 3D textures.
Fixes piglit tests:
- bugs/crash-cubemap-order
- fbo/fbo-cubemap
- texturing/cubemap
Also changes texturing/cubemap npot from abort to fail.
This hasn't seen a full test run since Piglit on Mesa master hangs
GM45 a lot.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
All of the extensions require that both libGL and either the server or
the direct rendering driver (or both) enable the extension before it's
advertised. It seems safe to assume that none of the other components
on OS X will enable these extensions, so all the #ifdef blocks here
just clutter the code.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: Jeremy Huddleston <jeremyhu@apple.com>
There are a few unsupported extensions (e.g., the ATI and NV float
extensions) that are still in the list. There is some small chance
that these may be supported some day.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
__glXInitialize calls AllocAndFetchScreenConfigs.
AllocAndFetchScreenConfigs unconditionally sends a glXQuerySeverString
request to the server. This request is only supported with GLX 1.1 or
later, so we were already implicitly incompatible with GLX 1.0
servers. How many more similar bugs lurk in the code that nobody has
noticed in years?
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously the share_xid was only set in the glXImportContextEXT path,
and it was left set to None in all of the other create-context paths.
Fixes the piglit test glx-query-context-info-ext.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Send the DestroyContext protocol immediately when glXDestroyContext is
called, and never call it when glXFreeContextEXT is called. In both
cases, either destroy the client-side structures or, if the context is
current, set xid to None so that the client-side structures will be
destroyed later.
I believe this restores the behavior of the original SGI code. See
src/glx/x11 around commit 5df82c8. The spec doesn't say anything
about glXDestroyContext not really destroying imported contexts (it
acts like glXFreeContextEXT instead), but that's what the original
code did. Note that glXFreeContextEXT on a non-imported context does
not destroy it either.
Fixes the piglit test glx-free-context.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes the piglit test glx-get-context-id.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The primary problem was that the number of reply bytes read is clamped
to sizeof(propList), but the loop that processes the properties tries
to examine all of the properties sent by the server. If the server
sends 47,000 properties, we only read 3 but process all 47,000.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Each of the DRI, DRI2, and DRISW backends contain code like the
following in their create-context routine:
if (shareList) {
pcp_shared = (struct dri2_context *) shareList;
shared = pcp_shared->driContext;
}
This assumes that the glx_context *shareList is actually the correct
derived type. However, if shareList was created as an
indirect-rendering context, it will not be the expected type. As a
result, shared will contain garbage. This garbage will be passed to
the driver, and the driver will probably segfault. This can be
observed with the following GLX code:
ctx0 = glXCreateContext(dpy, visinfo, NULL, False);
ctx1 = glXCreateContext(dpy, visinfo, ctx0, True);
Create-context is the only case where this occurs. All other cases
where a context is passed to the backend, it is the 'this' pointer
(i.e., we got to the backend by call something from ctx->vtable).
To work around this, check that the shareList->vtable->destroy method
is the same as the destroy method of the expected type. We could also
check that shareList->vtable matches the vtable or by adding a "tag"
to glx_context to identify the derived type.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is not exposed generally yet because some of the swrast paths hit
in piglit (drawpixels, copypixels, blit) aren't yet converted to
MapRenderbuffer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a little more unusual than the separate MESA_FORMAT_S8_Z24
support, because in addition to storing the real stencil data in a
MESA_FORMAT_S8 miptree, we also make the Z miptree be
MESA_FORMAT_Z32_FLOAT instead of the requested format.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With separate stencil GL_DEPTH32F_STENCIL8, the miptree will have a
really different format (MESA_FORMAT_Z32_FLOAT) from the teximage
(MESA_FORMAT_Z32_FLOAT_X24S8).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The format handling here is tricky, because we're not actually
generating a Z32_FLOAT_X24S8 miptree, so we're guessing the format
that GL wants based on seeing Z32_FLOAT with a separate stencil.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All the operations were just trying to get at irb->wrapped_depth->mt,
which is the same as irb->mt now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
gen7 only supports the non-packed formats, even if you associate a
real separate stencil buffer -- otherwise it's as if the depth test
always fails.
This requires a little bit of care in the match_texture_image case,
since the miptree format no longer matches the texture image format.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This little bit of logic was duplicated, which isn't much, but I was
going to need to duplicate a bit of additional logic in the next
commit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There were only two places it was really used at this point, which was
in the batchbuffer emit of the separate stencil packets for gen6/7.
Just write in the ->stencil_mt reference in those two places and ditch
all this flailing around with allocation and refcounts.
v2: Fix separate stencil on gen7.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
this mentions which channels are used for slice and depth comparison values.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The 4th texcoord is used in this case for the comparison.
This fixes piglit glsl-fs-shadow2DArray* on softpipe.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The code didn't handle the case where front wasn't specified in the vertex
shader outputs, but back was.
In that case we were doing a copy from back to non-existant front,
this code checks we have existant front/backs and only does the copy when
they both exist.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Sets rgba layer as zeroth layer if a custom background_surface is specified.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
It's harmless to add support for attributes we don't support,
since they require a feature enabled for them to affect
something. As long as they aren't enabled, nothing happens.
This enables support for custom colorspaces and background colors.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Currently only validating, since nothing else can be done with it yet
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
v2: removed check_video_surface
Signed-off-by: Christian König <deathsimple@vodafone.de>
This sample compare was always doing linear, and this makes the
glsl-fs-shadow1DArray test render like the Intel driver.
fix wrong 0->j from initial patch
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This is the first part of a fix to piglit glsl-fs-shadow1DArray
also fix the passing of unused r[2] in the normal 1D case.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The piglit draw-pixel-with-texture was asserting in the glsl->tgsi code,
due to 0 texture target, this makes sure the texture target is copied over
correctly when we copy instructions around.
v2: drive-by fix bitmap on the way past.
This avoids the assertion, have to contemplate fixing things as per the spec
later.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This will be especially useful for loading texturing parameters, where I
need to (for example) reference m3.xz<D>.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copy and pasted from fs_inst::is_tex(), but without TXB.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We'll be reusing most of these for the VS shortly. The one exception is
TXB (texturing with LOD bias), which is explicitly forbidden in the VS.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes a regression since d2235b0f46,
in my new textureSize sampler(1DArrayShadow|2DShadow|2DArrayShadow)
piglit tests, though I'm not honestly sure how this ever worked.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Noticed a "warning: array subscript is above array bounds" given at one of
the existing sanity-check asserts. Turns out all the arrays of strings
haven't matched the corresponding enum values in a while, if ever.
I didn't know the proper names for any of these and couldn't find
them in the base specs aside from "result.pointsize" in
ARB_vertex_program, so I just filled in the enum's value
as was done with other slots.
Also add four STATIC_ASSERT()s to be sure and catch future additions
or bumps to MAX_VARYING/etc again, and some more non-static asserts
where there weren't any before.
(Note, the fragment enum that corresponded to result.color(half) was removed in
8d475822e6e19fa79719c856a2db5b6a205db1b9.)
Reviewed-by: Brian Paul <brianp@vmware.com>
llvm-3.1svn r145714 moved global variables into a new TargetOptions
class. TargetMachine constructor now needs a TargetOptions object as
well.
Signed-off-by: Vinson Lee <vlee@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This helper function is used during mipmap generation to prepare space
for the destination mipmap levels.
This improves/fixes two things:
1. If the texture object was created with glTexStorage2D, calling
_mesa_TexImage2D() to allocate the new image would generate
INVALID_OPERATION since the texture is marked as immutable.
2. _mesa_TexImage2D() always frees any existing texture image memory
before allocating new memory. That's inefficient if the existing
image is the right size already.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Namely:
- EXT_transform_feedback
- ARB_transform_feedback2
- ARB_transform_feedback_instanced
The old interface was not useful for OpenGL and had to be reworked.
This interface was originally designed for OpenGL, but additional
changes have been made in order to make st/d3d1x support easier.
The most notable change is the stream-out info must be linked
with a vertex or geometry shader and cannot be set independently.
This is due to limitations of existing hardware (special shader
instructions must be used to write into stream-out buffers),
and it's also how OpenGL works (stream outputs must be specified
prior to linking shaders).
Other than that, each stream output buffer has a "view" into it that
internally maintains the number of bytes which have been written
into it. (one buffer can be bound in several different transform
feedback objects in OpenGL, so we must be able to have several views
around) The set_stream_output_targets function contains a parameter
saying whether new data should be appended or not.
Also, the view can optionally be used to provide the vertex
count for draw_vbo. Note that the count is supposed to be stored
in device memory and the CPU never gets to know its value.
OpenGL way | Gallium way
------------------------------------
BeginTF = set_so_targets(append_bitmask = 0)
PauseTF = set_so_targets(num_targets = 0)
ResumeTF = set_so_targets(append_bitmask = ~0)
EndTF = set_so_targets(num_targets = 0)
DrawTF = use pipe_draw_info::count_from_stream_output
v2: * removed the reset_stream_output_targets function
* added a parameter append_bitmask to set_stream_output_targets,
each bit specifies whether new data should be appended to each
buffer or not.
v3: * added PIPE_CAP_STREAM_OUTPUT_PAUSE_RESUME for ARB_tfb2,
note that the draw-auto subset is always required (for d3d10),
only the pause/resume functionality is limited if the CAP is not
advertised
v4: * update gallium/docs
v5: * compactified struct pipe_stream_output_info, updated dump/trace
It's like DrawArrays, but the count is taken from a transform feedback
object.
This removes DrawTransformFeedback from dd_function_table and adds the same
function to GLvertexformat (with the function parameters matching GL).
The vbo_draw_func callback has a new parameter
"struct gl_transform_feedback_object *tfb_vertcount".
The rest of the code just validates states and forwards the transform
feedback object into vbo_draw_func.
Xa doesn't support it yet. Trying to do that would cause a segfault.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
When doing format conversion copies between a format without an
alpha channel and a format with an alpha channel, make sure the
destination alpha is set to 1.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Backends indicate that they support this extension by returning
EGL_TRUE when native_display::get_param() is called with
NATIVE_PARAM_PRESENT_REGION and NATIVE_PARAM_PRESERVE_BUFFER.
native_present_control is extended to include the region that should
be presented. When native_present_control::num_rects is zero,
the whole surface is to be presented.
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
The comment said they deserved to be in emit_depthbuffer, and at this
point they were all there already.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we have miptrees for everything, we can more easily test for
!has_separate_stencil completeness. Also, test for whether the
stencil rb is the wrong kind of format for separate stencil, or if we
are trying to do packed to different images of a single miptree.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now there's the thing that CALLOCs and sets up window system vtable,
and the thing that CALLOCs and sets up user renderbuffer vtable. The
user renderbuffer vtable gets replaced later by
intel_renderbuffer_update_wrapper for wrapped renderbuffers (things
with name == ~0).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were doing it in the caller in the renderbuffer code, but it was
missed in the separate stencil creation for textures. Apparently our
testing was using renderbuffers or pre-aligned sizes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This used to be needed because irb->mt would be unset for fake packed
depth/stencil, but no longer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The cool part was that in the "fbo-depthstencil -drawpixels
GL_DEPTH24_STENCIL8 32F_24_8_REV" testcase, the shifting happened to
end up with a value awfully close to the expected value, except for
every other pixel being 0 (the stencil value, shifted away to
nothing).
Reviewed-by: Brian Paul <brianp@vmware.com>
vlVdpPresentationQueueDisplay shouldn't scale, so
use size of destination surface as source rectangle.
Based on work of Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Correctly use destination_rect and destination_video_rect
in the mixer, and also use a dirty area tracking for output surfaces.
Based on work of Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
Take viewport and scissors into account and make
the dirty area a parameter instead of a member.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Invalid shaders containing the character % at an unexpected location
would cause Bison to call yyerror with a message of:
syntax error, unexpected '%'
Bison expects yyerror() to take a string, while _mesa_glsl_error() is a
printf-style function. This hit the classic printf string escape issue:
_mesa_glsl_error(loc, state, "unexpected '%'"); // invalid!
_mesa_glsl_error(loc, state, "%s", "unexpected '%'"); // correct.
This caused assertion failures after ralloc_asprintf_append called
vsnprintf to determine the length of the text that would be printed:
vsnprintf would see the invalid format and return -1, an invalid length.
The solution is to define a proper yyerror() wrapper function that calls
_mesa_glsl_error with the "%s". Since we compile with -p "_mesa_glsl",
yyerror is defined as:
#define yyerror _mesa_glsl_error
So we have to #undef yyerror in order to be able to declare it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43564
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Paul Berry <stereotype441@gmail.com>
The versions in the xserver and in libGL have diverged enough that the
xserver doesn't want these.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
glext.h doesn't have GL_MIN_PROGRAM_TEXEL_OFFSET_EXT or
GL_MAX_PROGRAM_TEXEL_OFFSET_EXT. Using them in the XML causes code to
be generated for the xserver that won't compile. Use the names that
exist instead.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
That file was removed from the xserver with commit:
commit a80780a7638f847c3be20e5e0c7fe85e83d9bdd1
Author: Adam Jackson <ajax@redhat.com>
Date: Wed Nov 17 09:03:06 2010 -0500
glx: Remove swap barrier and hyperpipe support
Never implemented in any open source driver. The implementation
assumed explicit DDX driver knowledge of how the client-side driver
worked, since at the time the server's GL renderer was not a DRI driver.
But now, it is, so any implementation of these should be done with
additional DRI driver API, like the swap control extension.
Reviewed-by: Julien Cristau <jcristau@debian.org>
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
This is only temporary until a better solution is available.
v2: print warnings and add gallium CAPs
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since gl_framebuffer::_DepthBuffer and _StencilBuffer are only used
by swrast, do the validation of those fields in swrast too.
The main/depthstencil.[ch] code is no longer used and will be removed
next.
Reviewed-by: Eric Anholt <eric@anholt.net>
These files are copies of main/depthstencil.[ch] with s/mesa/swrast/.
The main/depthstencil.[ch] will go away soon.
Reviewed-by: Eric Anholt <eric@anholt.net>
These functions update the gl_framebuffer::_DepthBuffer and _StencilBuffer
fields, possibly creating renderbuffer wrappers that make a shared
depth+stencil accessible as depth-only or stencil only.
This stuff is only used by swrast now and will be moved there next.
Reviewed-by: Eric Anholt <eric@anholt.net>
We're just looking at the depth/stencil renderbuffers to do error
checking. We don't need to look at the depth/stencil wrappers to do
that. Also, remove pointless readRb = depthRb = NULL assignments.
Reviewed-by: Eric Anholt <eric@anholt.net>
We never want to use the depth/stencil buffer wrappers so always just
use the attachment renderbuffers. This is a step toward removing the
_DepthBuffer, _StencilBuffer fields.
Reviewed-by: Eric Anholt <eric@anholt.net>
GLfloat doesn't have enough precision to exactly represent 0xffffff
and 0xffffffff. (and a reciprocal of those, if I am not mistaken)
If -ffast-math is enabled, using GLfloat causes assertion failures in:
- fbo-blit-d24s8
- fbo-depth-sample-compare
- fbo-readpixels-depth-formats
- glean/depthStencil
For example:
fbo-depth-sample-compare: main/format_unpack.c:1769:
unpack_float_z_Z24_X8: Assertion `dst[i] <= 1.0F' failed.
Reviewed-by: Brian Paul <brianp@vmware.com>
fixes the following build error since
c83fb4d45f:
/usr/include/strings.h:46:13: error: expected declaration specifiers or
‘...’ before numeric constant
/usr/include/strings.h:46:13: error: conflicting types for ‘memset’
In file included from
../../../../src/gallium/winsys/g3dvl/xlib/xsp_winsys.c:34:0:
../../../../src/gallium/auxiliary/util/u_inlines.h: In function
‘pipe_buffer_create’:
../../../../src/gallium/auxiliary/util/u_inlines.h:189:4: error: too
many arguments to function ‘memset’
/usr/include/strings.h:46:13: note: declared here
bzero is defined in X11 as: #define bzero(b,len) memset(b,0,len)
including strings.h after the X11 header results in preprocessor
replacing 'bzero' in strings.h and generating unbuildable code.
Signed-off-by: Tobias Droste <tdroste@gmx.de>
This fixes the segfault, and seems to put this closer to where other
properties are being set. Hopefully it still conforms.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds the correct checks and asserts in the right places. This doesn't
fix all the tests that I've sent to piglit, need to add int paths to go alongside the uint paths that don't go via float to fix it up properly.
I'm not sure how much of that could be templated/shared will have a look
once I write it the long way.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
It sets the wrong values (GL_XXX_LEFT instead of GL_XXX), and no other
Mesa driver does this, given that Mesa sets the right draw/read buffers
provided the Mesa visual has the doublebuffer flag filled correctly
which is the case.
Reviewed-by: Brian Paul <brianp@vmware.com>
This avoids forming invalid pointers needlessly, which even if
never dereferenced is undefined behavior. It also makes
_mesa_validate_pbo_access() more comprehensible.
Reviewed-by: Brian Paul <brianp@vmware.com>
NULL as an error indicator is meaningless, since it will return NULL
on success anyway if the caller passes in zero as the image's address
and asks to calculate the offset of the first pixel. For example,
_mesa_validate_pbo_access() does this.
This also matches the code in the non-GL_BITMAP codepath, which
already has an assert like this.
v2: Per Brian Paul's review, remove the function call entirely
and tighten the assert to only accept the two formats compatible with
GL_BITMAP. They always have one component per pixel.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The bug, reported to me by Vadim Girlin on IRC, was causing overzealous
elimination of code in parallel if statements such as the following:
if (x) {
r = false;
}
if (y) {
r = true;
}
Before this commit, the assignment inside the first if block would be
misdetected as dead code and removed.
Number of fragment shader variants is not very representative of the
memory used by LLVM, neither is number of shader instructions, as often
texture sampling constitutes most of the generated code.
This change adds an additional trim criteria: least recently used
fragment shader variants will be freed until the total number of LLVM IR
instruction falls below a specified threshold.
Reviewed-by: Brian Paul <brianp@vmware.com>
u_simple_list.h uses a sentinel element, and not a NULL element. So
ensure list is not empty when reducing the list of shader variants.
Something I noticed while trying to free variants more aggressively.
Reviewed-by: Brian Paul <brianp@vmware.com>
In a few places we need to allocate space for some number of generic
pixels. Use this new define instead of a magic number like 16 or
4 * sizeof(GLuint).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copying these files is the first step in moving the software buffer
code from main/renderbuffer.c to swrast/s_renderbuffer.c
Reviewed-by: Eric Anholt <eric@anholt.net>
Implemented in terms of renderbuffer mapping/unmapping and format
packing/unpacking functions.
The swrast and state tracker code for implementing accumulation are
unused and will be removed in the next commit.
v2: don't use memcpy() in _mesa_clear_accum_buffer()
v3: don't allocate MAX_WIDTH arrays, be more careful with mapping flags
Reviewed-by: Eric Anholt <eric@anholt.net>
Change and document the interpretation of the color conversion matrix
in order to make the function more versatile and to simplify the
generated shader.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
In Gen6, transform feedback is accomplished by having the geometry
shader send vertex data to the data port using "Streamed Vertex Buffer
Write" messages, while simultaneously passing vertices through to the
rest of the graphics pipeline (if rendering is enabled).
This patch adds a geometry shader program that simply passes vertices
through to the rest of the graphics pipeline. The rest of transform
feedback functionality will be added in future patches.
To make the new geometry shader easier to test, I've added an
environment variable "INTEL_FORCE_GS". If this environment variable
is enabled, then the pass-through geometry shader will always be used,
regardless of whether transform feedback is in effect.
On my Sandy Bridge laptop, I'm able to enable INTEL_FORCE_GS with no
Piglit regressions.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>
R02_PRIM_END and R02_PRIM_START don't actually refer to bits in DWORD
2 of R0 (as the name, and comments in the code, would seem to
indicate). Actually they refer to bits in DWORD 2 of the header for
URB_WRITE messages.
This patch renames the defines to reflect what they actually mean. It
also addes a define URB_WRITE_PRIM_TYPE_SHIFT, which previously was
just hardcoded in .c files.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Prior to this patch, in the Gen4 and Gen5 GS, we used GRF 0 (called
"R0" in the code) as a staging area to prepare the message header for
the FF_SYNC and URB_WRITE messages. This cleverly avoided an
unnecessary MOV operation (since the initial value of GRF 0 contains
data that needs to be included in the message header), but it made the
code confusing, since GRF 0 could no longer be relied upon to contain
its initial value once the GS started preparing its first message.
This patch avoids confusion by using a separate register ("header") as
the staging area, at the cost of one MOV instruction.
Worse yet, prior to this patch, the GS would completely overwrite the
contents of GRF 0 with the writeback data it received from a completed
FF_SYNC or URB_WRITE message. It did this because DWORD 0 of the
writeback data contains the new URB handle, and that neds to be
included in DWORD 0 of the next URB_WRITE message header. However,
that caused the rest of the message header to be corrupted either with
undefined data or zeros. Astonishingly, this did not produce any
known failures (probably by dumb luck). However, it seems really
dodgy--corrupting FFTID in particular seems likely to cause GPU hangs.
This patch avoids the corruption by storing the writeback data in a
temporary register and then copying just DWORD 0 to the header for the
next message. This costs one extra MOV instruction per message sent,
except for the final message.
Also, this patch moves the logic for overriding DWORD 2 of the header
(which contains PrimType, PrimStart, PrimEnd, and some other data that
we don't care about yet). This logic is now in the function
brw_gs_overwrite_header_dw2() rather than in brw_gs_emit_vue(). This
saves one MOV instruction in brw_gs_quads() and brw_gs_quad_strip(),
and paves the way for the Gen6 GS, which will need more complex logic
to override DWORD 2 of the header.
Finally, the function brw_gs_alloc_regs() contained a benign bug: it
neglected to increment the register counter when allocating space for
the "temp" register. This turned out not to have any effect because
the temp register wasn't used on Gen4 and Gen5, the only hardware
models (so far) to require a GS program. Now, all the registers
allocated by brw_gs_alloc_regs() are actually used, and properly
accounted for.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When the GS is not in use, the entire URB space is available for the
VS. When the GS is in use, we split the URB space 50/50.
The 50/50 split is probably not optimal--we'll probably want tune this
for performance in a future patch. For example, in most situations,
it's probably worth allocating more than 50% of the space to the VS,
since VS space is used for vertex caching. But for now this is good
enough.
Based on previous work by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We never filled this in before because we didn't care.
I'm skeptical these are correct; my sources indicate that both the VS
and GS # of entries are 256 on both GT1 and GT2.
I'm also loathe to change it and break stuff.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Normally when outputting instructions in SPF (single program flow)
mode, we convert IF and ELSE instructions to conditional ADD
instructions applied to the IP register. On platforms prior to Gen6,
flow control instructions cause an implied thread switch, so this is a
significant savings.
However, according to the SandyBridge PRM (Volume 4 part 2, p79):
[Errata DevSNB{WA}] - When SPF is ON, IP may not be updated by
non-flow control instructions.
So we have to disable this optimization on Gen6.
On later platforms, there is no significant benefit to converting flow
control instructions to ADDs, so for the sake of consistency, this
patch disables the optimization on later platforms too.
The reason we never noticed this problem before is that so far we
haven't needed to use SPF mode on Gen6. However, later patches in
this series will introduce a Gen6 GS program which uses SPF mode.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, GS generation code contained a lookup table that mapped
primitive types POLYGON, TRISTRIP, and TRIFAN to TRILIST, mapped
LINESTRIP to LINELIST, and left all other primitives unchanged. This
was silly, because we never generate a GS program for those primitive
types anyhow.
This patch removes the unnecessary lookup table.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch adds a new bit to the ctx->NewState bitfield,
_NEW_TRANSFORM_FEEDBACK, to track state changes that affect
ctx->TransformFeedback. This bit can be used by driver back-ends to
avoid expensive recomputations when transform feedback state has not
been modified.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When driCreateScreen calls driConvertConfigs to try to convert the
configs for swrast, it fails and returns NULL. Instead of checking,
it just clobbers psc->base.configs. Then, when the application asks
for the FBconfigs, there aren't any.
Instead, make the caller responsible for freeing the old modes lists
if both calls to driConvertConfigs succeed.
Without the second fix, glxinfo fails unless you run it with
LIBGL_ALWAYS_INDIRECT:
$ glxinfo
name of display: :0.0
Error: couldn't find RGB GLX visual or fbconfig
$ LIBGL_ALWAYS_INDIRECT=1 glxinfo
name of display: :0.0
display: :0 screen: 0
direct rendering: No (LIBGL_ALWAYS_INDIRECT set)
server glx vendor string: NVIDIA Corporation
server glx version string: 1.4
[...]
Signed-off-by: Aaron Plattner <aplattner@nvidia.com>
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
This patch fixes the samplerCubeShadow support in GLSL shader compiler.
shader compiler was picking the 'r' texture coordinate for shadow comparison
when the expected behaviour is to use 'q' texture coordinate in case of cube
shadow maps.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes piglit tests fbo-array, fbo-depth-array, fbo-generatemipmap-array,
and array-texture, as well as the array variants of my new textureSize
and texelFetch tests.
Not a candidate for 7.11 because EXT_texture_array wasn't supported.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes many crashes on Ivybridge due to upload_sf_state calling
brw_depthbuffer_format without an actual depth buffer. This was a
recent regression on master.
+3992 piglits on Ivybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This evolved over several commits, and I also wanted to document some
new information about how we handle formats.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Now that all RBs have miptrees, and miptree mapping covered these last
two code paths, consistently use them.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Right now the fake packed d/s RBs are creating two sub-renderbuffers
with their own storage, and the hardware setup and the mapping code
have been explicitly referencing them. By setting miptrees on them,
we'll be able to make our renderbuffer code for fake packed
depth/stencil more consistent with all our other renderbuffers.
The interesting new behavior here is that there is now a mt with a
non-depthstencil format (X8Z24) that has a stencil_mt field
associated. This looks like it should be safe, and we'll need to be
able to do this for floating point depth/stencil as well.
Before, we had an uncached read of S8 to untile, then a RMW (so
uncached penalty) of the packed S8Z24 to store the value, then the
consumer would uncached read that once per pixel. If data was written
to the map, we would then have to uncached read the written data back
out and do the scatter to the tiled S8 buffer (also uncached access
penalties, since WC couldn't actually combine). So 3 or 5 uncached
accesses per pixel in the ROI (and we we were ignoring the ROI, so it
was the whole image).
Now we get an uncached read of S8 to untile, and an uncached read of
Z. The consumer gets to do cached accesses. Then if data was
written, we do streaming Z writes (WC success), and scattered S8
tiling writes (uncached penalty). So 2 or 3 uncached accesses per
pixel in the ROI.
This should be a performance win, to the extent that anybody is doing
software accesses of packed depth/stencil buffers.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
We don't gripe about void * arithmetic for our driver, and this
prevents silly casting when assigning the result of mapping to
non-byte types.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
We're going to want to reuse this logic in mapping of fake packed
miptrees wrapping separate depth/stencil miptrees.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This code will be incrementally moving to a model like intel_fbo.c's
renderbuffer mapping with helper functions, as I move that code here.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This will be used for things like packed depth/stencil temporaries and
making LLC-cached temporary mappings using blits.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This will let us share teximage mapping logic with renderbuffer
mapping, which has an intel_mipmap_tree but not a gl_texture_image.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
This required is_hiz_depth_format to start returning true on S8_Z24 as
well, since that's the format we have here. The two previous callers
are only calling it on non-depthstencil formats.
This avoids us needing to have HiZ working on a new Z format
immediately upon exposing the format (particularly painful for
Z32_FLOAT_X24S8, which means all the fake packed depth/stencil paths).
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Some hardware can't reinterpret the format of hardware buffers and thus
the X server needs to know the format when the buffer is created.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Michel Daenzer <michel@daenzer.net>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
See intel_vertical_texture_alignment_unit() in intel_tex_layout.c;
certain surface types require setting this to VALIGN_4.
Analogous to commit dd0e46c410 on Gen6.
Fixes piglit test fbo-generatemipmap-formats with the
GL_ARB_depth_texture and GL_EXT_packed_depth_stencil arguments.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The code forces single program flow to be enabled on Ironlake, or
equivalently, disables multiple program flow. The comment was reversed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This moves the detiling to the fbo mapping, r200 depth is always tiled,
and we can't detile it with the blitter.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This could have been split up better, but the driver is just broken now,
so bisecting the brokenness is going to be painful no matter what.
This adds renderbuffer mapping/unmapping along with texture image allocation.
It drops all the old texture upload paths, some of which could possible be
reimplemented with the blitter later.
It also redoes the span code paths to use its own set of image mapping handlers,
along with removing the tiling decode paths for the color buffers, since
we now hope to use the blitter for this.
Signed-off-by: Dave Airlie <airlied@redhat.com>
I think there is a missing state update or flush somewhere, and every
so often PP_CNTL goes to the kernel with a texture enabled but no texture.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously a zero writemask would result in dst_chan == -1, meaning an
unnecessary MOV with the destination register dictated by undefined
memory contents would be emitted before returning. This caused
intermittent GPU hangs, e.g. with glean/texCombine.
Reviewed-by: Eric Anholt <eric@anholt.net>
Anything of less than (bw, bh) size is possible when you consider
rectangular textures, and this code is (now) safe for those. Even for
power-of-two textures, width could be 4 for FXT1 while not being
aligned to block size.
Fixes piglit compressedteximage GL_COMPRESSED_RGB_FXT1_3DFX
Reviewed-by: Brian Paul <brianp@vmware.com>
Generally this code works with width and height aligned to compressed
blocks, but at the 2x2 and 1x1 levels of a square texture (or height <
bh in general), we were skipping uploading our single row of blocks.
Fixes piglit compressedteximage GL_COMPRESSED_RGBA_S3TC_DXT5_EXT.
Reviewed-by: Brian Paul <brianp@vmware.com>
Since the MapTextureImage changes on Intel, nwn had corruption in the
scrollbar at the load game menu, and corrupted ground textures in the
starting zone. Heroes of Newerth's intro screen was also thoroughly
garbled. A new piglit test "compressedteximage" was created to
regression test this.
The issue was this code now seeing dstRowStride aligned to hardware
requirements instead of a temporary buffer that gets uploaded to
hardware later. The existing code was just trying to memcpy
srcRowStride * height / bh, while the glCompressedTexSubImage2D()
storage code nearby did the correct walking by blockheight rows at a
time. Just reuse the subimage upload instead of duplicating that
logic.
v2: Update comment at the top of the function (suggestion by Joel
Forsberg)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41451
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
We checked if srcType == GL_UNSIGNED_BYTE earlier so there was no
way to reach this code. This was left-over code from the GLchan
removal work.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
If mode is not GL_POINT/LINE/FILL we'll have already reported the
error earlier in the function and returned so we can never get here.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We shouldn't call _mesa_error() if the target is a proxy texture.
Errors are handled later in the function.
Fixes a Coverity warning.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Attempting to move an MRF to a MRF is not only pointless, it will fail
because MRFs are read-only, resulting in garbage in your register.
If we already set up a MRF source, there's nothing to resolve anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The BITSET64_{TEST,SET,CLEAR}_RANGE macros only work on ranges
wither in the lower 32 or in the upper 32 bits of the bitset.
This change extends these macros to work on arbitrary ranges
possibly crossing the bitset word boundary.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
The format is defined by GL_OES_compressed_ETC1_RGB8_texture.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Add support for GL_OES_compressed_ETC1_RGB8_texture to core mesa. There is no
driver support yet.
Unlike desktop GL compressed texture formats, GLES compressed texture formats
usually can only be used with glCompressedTexImage2D. All other gl*Tex*Image*
functions are updated to check for that.
Reviewed-by: Brian Paul <brianp@vmware.com>
The format is defined by GL_OES_compressed_ETC1_RGB8_texture. These routines
will be used in the following commit.
Reviewed-by: Brian Paul <brianp@vmware.com>
In swrast_map_renderbuffer negative strides lead to
render buffer map pointers that are off by 2^32.
Make sure that intermediate negative values are not
converted to an unsigned.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes these GCC warnings.
u_vbuf.c: In function ‘u_vbuf_draw_begin’:
u_vbuf.c:839:20: warning: ‘max_index’ may be used uninitialized in this function [-Wuninitialized]
u_vbuf.c:838:20: warning: ‘min_index’ may be used uninitialized in this function [-Wuninitialized]
Signed-off-by: Vinson Lee <vlee@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
We weren't doing the necessary byte swap.
v2: use same arithmetic as unpack_ARGB1555() to be consistent.
Reviewed-by: Michel Dänzer <michel@daenzer.net>
In the refactor for handling user-defined out params, we failed to set
up the new color output tracking when there was no color drawbuffer in
place but alpha testing was on. Just always set up at least one when
handling gl_FragColor, since we won't make use of its value unless we
need to.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42806
In 6d874d0ee1, I checked whether a
register that had been stored was BAD_FILE (as opposed to a legitimate
GRF), but actually the unset register was ARF NULL because it had been
memset to 0. Finding BAD_FILE for unset values in debugging was my
intention with that file, so make it the case more often by
rearranging the enum. There was only one place we relied on the magic
enum register_file to hardware register file correspondance anyway.
It is useful to have this option for shader-db, and it was also good
at the time where we were rejecting shaders due to various internal
limits we hadn't supported yet. However, at this point the precompile
step takes extra time (since not all NOS is known at link time) and
spews misleading debug in the common case of debugging a real app.
This is left in place for VS, where we still have a couple of codegen
failure paths that result in link failure through precompile. Those
need to be fixed.
shader-db can still get at the debug info it wants using
"shader_precompile=true" driconf option. Long term, we can probably
build a good-enough app for shader-db to trigger real codegen.
When new MESA_FORMAT_x enums are added we need to add a new entry in
the table of texture fetch functions. In the past this has been
missed if swrast isn't actually tested. Using a static assertion
should help with that.
This can be used to check that tables have the right number of entries,
etc. at compile-time. This will hopefully catch things that are missed
if particular drivers aren't tested, for example.
v2: Simplify the macro to omit the extra line number info (the compiler
already indicates the line number). And wrap the macro for readability.
There was only one consumer of this API, meta.c, which was intending
to ask "is this format just stencil index (and nothing else)?".
Instead, if one tried to glDrawPixels of GL_DEPTH_STENCIL-type
formats, it would just try to draw the stencil parts. Nothing good
came of this.
This function looks rather silly at this point, but I'm leaving it in
place to be the obvious parallel API to _mesa_is_depth_format(). Note
that if you want the old behavior, you should use it as
(_mesa_is_stencil_format() || _mesa_is_depthstencil_format()) like is
commonly done for depth-related tests.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Asking for the datatype of MESA_FORMAT_Z32_FLOAT_X24S8 is a bit funny
-- there's a float depth channel, and a stencil channel that doesn't
have a particular GLenum associated with its type, so what's the
correct response?
Because there is no query for stencil, just make this format's
datatype be that of the depth channel. It fixes the depth query (and
thus a failure in piglit gl-3.0-required-sized-formats), and none of
the other consumers of the _mesa_get_format_datatype() API care.
v2: Add a comment for why the DataType is this way for this format.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were already doing it through the shader (layered underneath
GL_EXT_texture_swizzle) in the shadow compare case. This avoids
having per-format logic for switching out the surface format dependent
on the depth mode.
v2: Also do the swizzling for DEPTH_STENCIL. oops.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I tripped over this bug in the next commit, relying on our
EXT_texture_swizzle to do some shadow sampler-related swizzling. If a
writemask was masking out a channel of the destination that was a live
channel of the texture swizzle, it would read undefined values.
Fixes piglit ARB_fragment_program_shadow/masked.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will make handling new formats (like actually exposing Z32F)
easier and more reliable.
v2: Remove the check for hiz buffer -- the MESA_FORMAT should really
be giving us the value we want even for hiz.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Complicates Gallium3D development and doesn't seem to have active users.
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
For the non-separate-stencil-only case, we've been using a NULL
surface for depth, so we didn't have to care. However, to support
separate stencil with no depthbuffer, we have to make the depth
surface non-NULL or the stencil test always fails thanks to separate
stencil inheriting the surface type of depth.
Fixes hiz-depth-stencil-test-d0-s8.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Page 77 (page 91 of the PDF) says about glGetActiveAttrib:
"The returned attribute name can be the name of a generic
attribute or a conventional attribute (which begin with the prefix
"gl_", see the OpenGL Shading Language specification for a
complete list)."
Page 261 (page 275 of the PDF) says about glGetProgramiv:
"If pname is ACTIVE_ATTRIBUTES, the number of active attributes in
program is returned."
It doesn't say anything about built-in vs. user-defined attributes.
From the language around glGetActiveAttrib and the lack of an
exclusion of built-in attributes, which exists other places (e.g.,
around glBindAttribLocation), we can infer that GL_ACTIVE_ATTRIBUTES
should include the active attribute count. It should also be included
in the values returned by glGetActiveAttrib.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=43138
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Yi Sun <yi.sun@intel.com>
To each switch statement in s_texfilter.c, add a break statement to the
default case.
Eliminates the Eclipse static analysis warning: No break at the end of
this case.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Replace the distinct struct gl_client_array members in gl_array_object by
an array of gl_client_arrays indexed by VERT_ATTRIB_*.
Renumber the vertex attributes slightly to keep the old semantics of the
distinct array members. Make use of the upper 32 bits in VERT_BIT_*.
Update all occurances of the distinct struct members with the array
equivalents.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Make gl_program::InputsRead a 64 bits bitfield.
Adapt the intel and radeon driver to handle a 64 bits
InputsRead value.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Introduce a set of defines for VERT_ATTRIB_* and VERT_BIT_*
that will be used in the followup patches.
Signed-off-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
According opengl spec 4.2.pdf table 6.12 (Vertex Array Object State) at
page 515, the element buffer object is listed in vertex array object.
So, move the ElementArrayBufferObj inside gl_array_object to make
element buffer object per-vao.
This would fix most of(3 left) intel oglc vao test fail
NOTE: this is a candidate for the 7.11 branch.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The position of the red and green bits was misstated in the comments.
Arguably, the names of these formats should be changed to "GR" to reflect
the component ordering and to be consistent with other formats.
And warn loudly in case people want to use it. Too many tester report
gpu hangs on irc and we rootcause this ...
Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
If alpha test is enabled and there's no color buffers we still need the
fragment shader to emit a color.
v2: add _NEW_COLOR flag in _mesa_update_state_locked()
Fixes piglit fbo-alphatest-nocolor-ff failures with Gallium drivers.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Eric Anholt <eric@anholt.net> (i965)
The source array elements are 8-bytes (float + uint) so we need
to multiply the src index by 2 to get the right array stride.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This format is used in the ARB_texture_rgb10_a2ui spec.
It adds core mesa support, texformat + texstore support, format_unpack
and fbobject.c (all patches from list merged + fixed up).
also fixes some whitespace issues.
Parts were:
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These codepaths were missing the cases for BGR_INTEGER/BGRA_INTEGER.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
After reading ARB_texture_rgb10_a2ui it appears the packed formats
for integer types are only specified via this extension, and not via
the original ones. So condition the checks on this.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
MESA_FORMAT_RGBX8888_REV is one of the opaque pixel formats used on Android.
Thanks to texture-from-pixmap, drivers may actually see texture images with
this format on Android.
MESA_FORMAT_RGBX8888 is added only for completeness.
Reviewed-by: Brian Paul <brianp@vmware.com>
[olv: Move the new formats after MESA_FORMAT_ARGB8888_REV in gl_format. I
accidentally moved them to the wrong place when preparing the patch.]
GLX functions are sometimes directly available in the current binary. In such
cases, we do not need any alternate library loaded using dlopen. Otherwise,
dlopen may find the wrong libGL library and get functions that conflicts with
the current loaded ones.
For example, on Debian Sid with nvidia binary drivers, using mesa's libEGL with
GLX driver leads to wrong glXGetFBConfigs symbol loaded (or loaded twice?),
which leads to "GLX: failed to create any config" error message as the
glXGetFBConfigs symbol seems to return garbage. If the binary is linked with
nvidia's libGL, the GLX symbols are already available.
Without this patch, convert_fbconfig (src/egl/drivers/glx/egl_glx.c:233) fails
for every config found, after glXGetFBConfigAttrib(... GLX_RENDER_TYPE, ...)
call, as the value returned has GLX_COLOR_INDEX_BIT and not GLX_RGBA_BIT.
[olv: initialize handle, prepend egl_glx to the commit log]
Also fix up Makefiles to use the default mesa compilation flags.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrants <jakob@vmware.com>
Also remove some unused variables in the st/xa makefile.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
With ICS (Android 4.0), several headers and structs are renamed. Define
ANDROID_VERSION so that we can choose a different path depending on the
platform version.
I've tested only softpipe and llvmpipe. r600g is also reported to work.
Enable the bit 3DSTATE_DEPTH_BUFFER.Tiled_Surface. From the Sandybridge
PRM, Volume 2, Part 1, Section 7.5.5.1.1 3DSTATE_DEPTH_BUFFER, Bit 1.27
Tiled Surface:
[DevGT+]: This field must be set to TRUE.
Fixes GPU hangs on the following Piglit tests:
hiz-stencil-test-fbo-d0-s8
hiz-stencil-read-fbo-d0-s8
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
v2: Guard against rb->mt being NULL, since we may enter the draw
regions path before intel_prepare_render() has been called to set
them.
Reviewed-by: Chad Versace <chad.versace@linux.intel.com> (v1)
This is a no-op change on gen6, but should result in some
actually-unsupported formats on gen4 no longer being chosen (like
RGBA_FLOAT32 now being RGBA_FLOAT16).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GL 3.0 specifies GL_RGB10_A2 as a required sized format for rendering
and texturing.
This introduces two piglit regressions: one due to fbo-mipmap-copypix
hitting swrast GetRow (we want to convert swrast to MapRenderbuffer),
and one due to fbo-blending-formats being too picky while leaving
dithering on.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that all the rest of the driver is driven off of the surface
formats table, all we really need to do is add the mapping from
MESA_FORMAT to BRW_SURFACEFORMAT. However, we also add format
override for I16/L16 render targets at the same time, so that existing
users of I16 that were getting promoted to I32 and then getting the
I32->R32 override still get FBO support.
Fixes failures in piglit gl-3.0-required-sized-texture-formats, and
will prevent regressions in ARB_texture_float on gen4 when moving to
fully table-driven texture format setup.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Until GL 3.0, there isn't any requirement on the actual sizes of
channels chosen. By falling back to 16 here, we can correctly support
ARB_texture_float on original i965 hardware, which can't correctly
filter 32-bit floats.
Not all i965 hardware can do RGB float16, and this will at least save
half the memory and have expected behavior in terms of precision.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This should be a no-op change. The initializers are reordered to
match the ordering of the enum, since there isn't a clearly sensible
ordering, but "the order they were added to the driver, sort of" is
definitely not one.
Also, the unsupported formats are explicitly initialized to 0, so it's
more obvious what we aren't claiming to support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is currently duplicated with intel_context.c's setup of the
formats table, and sets true for exactly the same set of formats on
gen6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I've never seen a use for the thread ID value, but knowing the format
being rendered is kind of a big deal.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We are already testing this if appropriate in
intel_validate_framebuffer (FBO completeness), so no need to avoid
attaching the texture to the renderbuffer here.
This causes MESA_FORMAT_R11_G11_B10_FLOAT to now be renderable as a texture
attachment on i965.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't want to go writing GetRow/PutRow for every format required by
GL 3.0, when it's very hard to get those functions called, and in
every case we want to make swrast do direct mapping through
MapRenderbuffer anyway.
This causes MESA_FORMAT_R11_G11_B10_FLOAT to be considered complete on gen6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This moves any chipset-dependent logic we want for render target
format choices to init time as well. There is still logic left at
state update for SRGB handling, where format choices change based on
GL state.
The brw_render_target_supported() function should now return correct
results, instead of relying on the limited results from
intel_span_supports_format() to avoid lying about FBO completeness.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used to drive chosing formats and determining framebuffer
completeness, instead of the bunch of ad-hoc checks we have had until
now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
The formats.c code's "datatype" value is "what does this value mean",
i.e. unorm or snorm or float, and is the return value from the
GL_TEXTURE_RED_TYPE class of queries. The depth formats were marked
as GL_UNSIGNED_INT, which is what we use for integer, and not what we
should be returning from the glGetTexLevelParameter.
In texstore, we were inappropriately using it as an argument to
_mesa_unpack_depth_span() that was expecting a value like
GL_UNSIGNED_INT or GL_UNSIGNED_SHORT. Just hardcode
_mesa_unpack_depth_span()'s arguments for now, though it looks like
the consumers of that interface would be happier with using
MESA_FORMAT.
Reviewed-by: Brian Paul <brianp@vmware.com>
The GL_TEXTURE_WHATEVER_SIZE entrypoints were checking if the
specified base type of the texture allowed that channel to be present
before reporting the size of the channel, so that GL_RGB didn't end up
with an alpha size if the hardware driver had to store it that way.
The GL_TEXTURE_WHATEVER_TYPE entrypoints weren't checking it, so you
would end up with strange responses from the GL involving 0-bit
floating-point alpha components in GL_RGB32F, even though it says
GL_NONE as expected for other 0-sized channels.
Make _TYPE check _BaseFormat the same as _SIZE, which results in
fixing most of the GL_RGB* testcases of gl-3.0-required-sized-formats
pass on i965.
v2: Add a default case with a warning (suggestion by Brian Paul)
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
The motivation behind this is to add some self-documentation in the code
about how each CAP can be used.
The idea is:
- enum pipe_cap is only valid in get_param
- enum pipe_capf is only valid in get_paramf
Which CAPs are floating-point have been determined based on how everybody
except svga implemented the functions. svga have been modified to match all
the other drivers.
Besides that, the floating-point CAPs are now prefixed with PIPE_CAPF_.
Regresses one Piglit test: bugs/fdo10370.
I'm not enabling HiZ for gen7 yet because it causes a mysterious
performance regression.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
For depthstencil renderbuffers, we were using separate stencil only if the
hardware required it. Since the performance gains from HiZ is so high, we
should always use separate stencil if the hardware supports it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
I implemented functions for horizontal/vertical alignment units separately
because I find it easier to read that way...especially with all the
corner-cases.
[chad] Corrected the vertical alignment calculation by checking for
depthstencil formats.
v2:
- Fix typos in intel_horizontal_texture_alignment_unit():
s/height/width/ and s/VALIGN/HALIGN.
- Remove special case for compressed formats in
intel_get_texture_alignment unit(). Compressed formats are already
handled in the halign and valign functions.
- Replace check ``_mesa_is_depth_format(...) ||
_mesa_is_depthstencil_format(...)`` with explcitit checks against
GL_DEPTH_COMPONENT and GL_DEPTH_STENCIL.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This allows us to replace all the calls to
intel_get_texture_alignment_unit() with a single call at miptree creation.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
When a depth texture is first attached to framebuffer, allocate a HiZ
miptree for it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
After brw_try_draw_prims() emits a batch, mark that the depth buffer needs
a depth resolve if the buffer was written to and if it has an accompanying
HiZ buffer.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Resolve all buffers that will be mapped by intelSpanRenderStart. This
comprises resolving the depth buffer of each enabled texture and of the
read and draw buffers.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Factor the mapping loops from intelSpanRenderStart() into
intel_span_map_buffers(). This in preparation for the next commit,
which resolves the buffers before mapping.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Before emitting primitives in brw_try_draw_prims(), resolve the depth
buffer's HiZ buffer and resolve the depth buffer of each enabled depth
texture.
v2: [anholt] The driver no longer validates drm bo's, so update a comment
to reflect that.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
To do so, we must resolve all buffers on entering a glBegin/glEnd block.
For the detailed explanation, see the Doxygen comments in this patch.
v2:
- Fix typo: s/enusure/ensure/.
- In brwPrepareExecBegin(), do the same resolves as done by
brw_predraw_resolve_buffers().
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
A lot of the state manipulation is handled by the meta-op state setup.
However, some batches need manual intervention.
v2:
Do not special-case the 3DSTATE_DEPTH_STENCIL.Depth_Test_Enable bit
for HiZ in gen6_upload_depth_stencil(). The HiZ meta-op sets
ctx->Depth.Test, just read the value from that.
v3:
Add a new dirty flag, BRW_STATE_HIZ, for brw_tracked_state. Flag it
immediately before and after executing the HiZ operation in
gen6_resolve_slice(). Add the flag to the the dirty bits for the
following state packets:
gen6_clip_state
gen6_depth_stencil_state
gen6_sf_state
gen6_wm_state
v4:
- Add BRW_NEW_STATE_HIZ to the dirty bit table in brw_state_upload.c.
This is needed for INTEL_DEBUG=state.
- Align brw dirty bit for gen6_depth_stencil_state.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Some state batches also need to be manipulated. That's done in the next
commit.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
brw_context::hiz contains state needed to perform HiZ meta-ops and
indicates if a HiZ operation is currently in progress.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add the following functions:
intel_renderbuffer_resolve_hiz
intel_renderbuffer_resolve_depth
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add functions that
- set a miptree slice as needing a resolve
- resolve a single slice of a miptree
- resolve all slices of a miptree
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This is a map of miptree slices to needed resolves, implemented as
a linked list. A future commit will embed such a list in
intel_mipmap_tree.
If you think I'm crazy to put a list in a miptree, read the Doxygen in
this patch for intel_resolve_map.
v2: [anholt] Move Doxygen from functin prototypes to definitions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Now that intel_renderbuffer::region has been replaced with a miptree, the
HiZ functions region parameter must be replaced with a miptree parameter.
Change the return type from bool to void.
Rename the 'depth' parameter to 'layer', because it will correspond to
irb->mt_layer.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Remove the following functions:
i830_hiz_resolve_noop
i915_hiz_resolve_noop
brw_hiz_resolve_noop
My original strategy for how intel->vtbl.resolve_*buffer was used has
substantially changed. The above functions are no longer called in the
current strategy.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This is required to correctly implement HiZ for mipmapped and
multi-layered textures.
v2: Accomodate refcount fixes in intel_process_dri2_buffer_*() that were
introduced in v2 of commit
intel: Replace intel_renderbuffer::region with a miptree [v2]
Reviewed-by: Eric Anholt <eric@anholt>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
For depthstencil textures using separate stencil, we embedded a stencil
buffer in intel_texture_image. The intention was that the embedded stencil
buffer would be the golden copy of the texture's stencil bits. When
necessary, we scattered/gathered the stencil bits between the texture
miptree and the embedded stencil buffer.
This approach had a serious deficiency for mipmapped or multi-layer
textures. Any given moment the embedded stencil buffer was consistent with
exactly one miptree slice, the most recent one to be scattered. This
permitted tests of type A to pass, but broke tests of type B.
Test A:
1. Create a depthstencil texture.
2. Upload data into (level=x1,layer=y1).
3. Read and test stencil data at (level=x1, layer=y1).
4. Upload data into (level=x2,layer=y2).
5. Read and test stencil data at (level=x2, layer=y2).
Test B:
1. Create a depthstencil texture.
2. Upload data into (level=x1,layer=y1).
3. Upload data into (level=x2,layer=y2).
4. Read and test stencil data at (level=x1, layer=y1).
5. Read and test stencil data at (level=x2, layer=y2).
v2:
Only allocate stencil miptree if intel->must_use_separate_stencil,
because we don't make the conversion from must_use_separate_stencil to
has_separate_stencil until commit
intel: Use separate stencil whenever possible
v3:
Don't call ChooseNewTexture in intel_renderbuffer_wrap_miptree() in
order to determine the renderbuffer format. Instead, pass the format as
a param to that function.
CC: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This is in preparation for properly implementing glFramebufferTexture*()
for mipmapped depthstencil textures. The FIXME comments deleted by this
patch give a rough explanation of what was broken.
This refactor does the following:
- In intel_update_wrapper() and intel_wrap_texture(), change the
parameters to prepare to remove functions' dependency on
gl_texture_image.
- Move the call to intel_renderbuffer_set_draw_offsets() from
intel_render_texture() into intel_udpate_wrapper().
Each time I encounter those functions, I dislike their vague names.
(Update which wrapper? What is wrapped? What is the wrapper?). So, while
I was mucking around, I also renamed the functions.
v2:
In addition to the ``GLenum internal_format`` parameter to
intel_wrap_miptree(), add a ``gl_format format`` parameter. This
removes the need to recalculate for the true format from
internal_format with ChooseNewTextureFormat, which was just weird.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
This is a small helper function that asserts that a given level and layer
are valid for a miptree. I will be extensively using it in the future
miptree HiZ functions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Since the renderbuffer tracks the miptree level and layer that it wraps,
the 'tex_image' and 'zoffset' params are no longer needed to calculate the draw
offsets.
Not only are they no longer needed, but their presence would prevent
calculating the renderbuffer draw offsets in situations where there were
no texture image. Such situations will occur during the HiZ meta-op and
during scatter/gather of separate stencil textures.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
TODO: Make v2 for kwg.
Add two fields to intel_renderbuffer:
mt_level
mt_layer
Multiple renderbuffers may simultaneously wrap a single texture and each
provide a different view into that texture. [Consider
glFramebufferTextureLayer()]. The new fields indicate which slice of the
miptree is wrapped by the renderbuffer.
The buffer resolve operations, to be introduced in the future, require
these fields in order to resolve the correct slice in the miptree.
To add the fields, it was necessary to replace the type of some function
parameters from gl_texture_image to gl_renderbuffer_attachment.
v2: [kwg] Replace confusing condition `CubeMapFace > 0` with the more
sensible `Target == GL_TEXTURE_CUBE_MAP`.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
For all texture targets except GL_TEXTURE_CUBE_MAP, the 'nr_images' and
'depth' fields of intel_mipmap_level were identical. In the exceptional
case, nr_images == 6 and depth == 1.
It is simple to determine if a texture is a cube or not, so the presence
of two fields here was not helpful. Worse, it was confusing. When we
eventually implement GL_ARB_texture_cube_map_array, this mess would have
become even more confusing.
This patch removes 'nr_images' and assigns to 'depth' a consistent
meaning: depth is the number of 2D slices at each miplevel. The exact
semantics of depth varies according to the texture target:
- For GL_TEXTURE_CUBE_MAP, depth is 6.
- For GL_TEXTURE_2D_ARRAY, depth is the number of array slices. It is
identical for all miplevels in the texture.
- For GL_TEXTURE_3D, it is the texture's depth at each miplevel. Its
value, like width and height, varies with miplevel.
- For other texture types, depth is 1.
As a consequence, parameters were removed from the following function
signatures:
intel_miptree_set_level_info
Remove 'nr_images'.
i945_miptree_layout
brw_miptree_layout_texture
brw_miptree_layout_texture_array
Remove 'slices'.
v2:
- Replace "It's" with "Its".
- Remove all hunks in intel_fbo.c. The hunks were spurious and sneaked
in during a rebase.
- Remove unneeded hunk in intel_tex_map_image_for_swrast(). It was
a little refactor of the for-loop's upper bound.
v4:
In intel_miptree_get_image_offset(), document the conditions under
which different if-branches are taken.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
They're not supported by hw directly, but it's easy to emulate
them with a shader swizzling fixup.
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
[danvet: The important thing is to write a 1 to the unused alpha
channel, the ddx is relying on this for render accel.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If the gallium driver doesn't support PIPE_FORMAT_R16G16B16A16_SNORM
the call to st_choose_renderbuffer_format() would fail and we'd generate
an GL_OUT_OF_MEMORY error. We'd never get to the subsequent code that
handles software/malloc-based renderbuffers.
Add a special-case check for PIPE_FORMAT_R16G16B16A16_SNORM which is used
for software-based accum buffers. This could be fixed in other ways but
it would be a much larger patch. st_renderbuffer_alloc_storage() could
be reorganized in the future.
This fixes accum buffer allocation for the svga driver.
Note: This is a candidate for the 7.11 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Extract the body of the inner loop into a new function,
intel_miptree_copy_slice().
This is in preparation for adding support for separate stencil and HiZ to
intel_miptree_copy_teximage(). When copying a slice of a depthstencil
miptree that uses separate stencil, we will also need to copy the
corresponding slice of the stencil miptree. The easiest way to do this
will be to call intel_miptree_copy_slice() recursively. Analogous
reasoning applies to copying a slice of a depth miptree with HiZ.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Add a new field, intel_mipmap_level::slice, and move the offset fields
into it. Also add some much needed documentation for these fields.
Before this patch, a separate array was allocated for the
intel_mipmap_level::{x,y}_offsets. This was just silly; it incurred an
extra call to malloc and diminished memory locality.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Essentially, this patch just globally substitutes `irb->region` with
`irb->mt->region` and then does some minor cleanups to avoid segfaults
and other problems.
This is in preparation for
1. Fixing scatter/gather for mipmapped separate stencil textures.
2. Supporting HiZ for mipmapped depth textures.
As a nice benefit, this lays down some preliminary groundwork for easily
texturing from any renderbuffer, even those of the window system.
A future commit will replace intel_mipmap_tree::hiz_region with a miptree.
v2:
- Return early in intel_process_dri2_buffer_*() if region allocation
fails.
- Fix double semicolon.
- Fix miptree reference leaks in the following functions:
intel_process_dri2_buffer_with_separate_stencil()
intel_image_target_renderbuffer_storage()
v3:
- [anholt] Fix check for hiz allocation failure. Replace
``if (!irb->mt)` with ``if(!irb->mt->hiz_region)``.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Move the following inline functions:
intel_get_rb_region
intel_framebuffer_has_hiz
A future commit will replace the renderbuffer's region with a miptree.
This small refactor will eliminate the need for intel_fbo.h to include
intel_mipmap_tree.h on that commit. I'd like to avoid the situation where
each header transitively includes every other header.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The only user of intel_framebuffer_get_hiz_region() was
intel_framebuffer_has_hiz(). So I folded the body of the former into the
latter.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
A great refactor thrashing begins after this commit for HiZ and separate
stencil. Removing code for texture HiZ will make that refactoring easier,
because then we don't have to maintain that code during the refactor.
To disable HiZ for textures, I've removed the hook in
intel_update_wrapper() that allocates a HiZ buffer when attaching a depth
texture to a framebuffer.
HiZ was broken for textures anyway, so there's no regression here.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The function gathered the stencil buffer into the depth buffer only when
the map mode contained the read bit. But we must do the gather even if the
map mode is write-only. If we do not, then, when the depth buffer's stencil
bits are scattered into the stencil buffer by intel_unmap_renderbuffer(),
some of the scattered stencil bits would be invalid.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
1. Don't map the depthstencil buffer twice
Place a guard in intel_renderbuffer_map() to prevent a renderbuffer
from being mapped twice. This happened if a single buffer was attached to
the framebuffer's depth and stencil attachment points. (Interestingly,
because intel_map_renderbuffer_gtt() is idempotent, the double mapping did
not cause bugs for depthstencil buffers *without* separate stencil).
2. Stop overriding gl_framebuffer::_DepthBuffer,_StencilBuffer
Normally, if a depthstencil buffer is attached to the framebuffer's
depth attachment point, then _mesa_update_framebuffer() installs
a wrapper depth renderbuffer at gl_framebuffer::_DepthBuffer. Ditto for
the stencil attachment point and gl_framebuffer::_StencilBuffer
A depthstencil intel_renderbuffer with separate stencil contains hidden
depth and stencil renderbuffers, which are the *real* renderbuffers. In
order to force swrast to work, we were installing, in
brw_update_draw_buffer(), the hidden renderbuffers at
gl_framebuffer::_DepthBuffer and _StencilBuffer, thus overriding the
behavior of _mesa_update_framebuffer(). However, now that
intel_renderbuffer_map() is implemented with MapRenderbuffer(),
overriding _mesa_update_framebuffer's introduces bugs. This patch
removes the override code.
Fixes several Piglit tests on gen7.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The special stencil span accessors, as set by intel_span_init_funcs.
perform software W detiling. Since intel_renderbuffer_map() now uses
MapRenderbuffer, rb->Data points to an *untiled* stencil buffer.
Fixes several Piglit tests on gen7.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
It's intended to indicate whether the driver/hardware supports reading
of the values written into shader outputs.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
texture_combine converts the result rgba to CHAN_TYPE from FLOAT. At the
same time, make sure the span->array->ChanType is changed, too.
v2: pick a nicer comment from Brian
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Parameter n and rgbaChan are both from structure span, thus using span
as paramter to simplify the prototype. Function texture_combine is only
used by _swrast_texture_span, so I guess it's safe to do so.
This patch is mainly for the next patch.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
And update r300g.
This is different from util_draw_max_index in how it obtains vertex elements
and that it doesn't have to call util_format_description due to additional
precomputed data in vertex elements.
This forks vbo_get_minmax_index. We need to know the index range when
translating non-native vertices into native ones. There is no other way
around it.
Previously we were mapping/unmapping the index buffer each time we
found the restart index in the buffer. This is bad when the restart
index is frequently used. Now just map the index buffer once, scan
it to produce a list of sub-primitives, unmap the buffer, then draw
the sub-primitives.
Also, clean up the logic of testing for indexed primitives and calling
handle_fallback_primitive_restart(). Don't call it for non-indexed
primitives.
v2: per Jose, only map the relevant part of the index buffer with
pipe_buffer_map_range()
Reviewed-by: José Fonseca <jfonseca@vmware.com>
On original gen4, the surface format didn't determine the return data
type from sampling like it does on g45 and later.
Fixes GL_EXT_texture_integer/texture_integer_glsl130
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Merge may produce incorrect order of operations for r600-eg:
x: inst1 R0.x, ... ; //from current group
...
t: inst0 R0.x, ... ; //from previous group, same destination
Result of inst1 will be lost.
So compare destinations and don't allow this.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Previously a vertex shader that used no samplers would get updated (by
calling the driver's ProgramStringNotify) when a sampler in the
fragment shader was updated. This was discovered while investigating
some spurious code generation for shaders in Cogs. The behavior in
Cogs is especially pessimal because it ping-pongs sampler uniform
settings:
glUniform1i(sampler1, 0);
glUniform1i(sampler2, 1);
draw();
glUniform1i(sampler1, 1);
glUniform1i(sampler2, 0);
draw();
glUniform1i(sampler1, 0);
glUniform1i(sampler2, 1);
draw();
// etc.
ProgramStringNotify is still too big of a hammer. Applications like
Cogs will still defeat the shader cache. A lighter-weight mechanism
that can work with the shader cache is needed. However, this patch at
least restores the previous behavior.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In slow_read_depth_stencil_pixels_separate() we might have separate
depth and stencil buffers or a combined buffer. In the later case,
don't map the buffer twice. This function is used when the depth
scale/bias pixel transfer values are not the defaults.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=42963
Reviewed-by: José Fonseca <jfonseca@vmware.com>
glspec doesn't say that we should skip the attenuation and spot
calculation for infinite light(Ppli.w == 0). Instead, it gives a same
formula to do the light calculation for both finite light and infinite
light(see page 62 of glspec 2.1.pdf)
Also from the formula (2.4) at page 62 of glspec 2.1.pdf, we can skip
attenuation calculation if Ppli.w == 0.
This would fix all the intel oglc l_sed fail subcases and introduces no
intel oglc regressions.
v2: fix an wrong intendation(comments from Brian).
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
Make sure all lighting tables are updated before using the table to
calculate something, say using _SpotExpTable to calculate
_VP_inf_spot_attenuation.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes regression in piglit:
ARB_color_buffer_float/GL_RGBA16F-getteximage
ARB_color_buffer_float/GL_RGBA16F-readpixels
ARB_color_buffer_float/GL_RGBA32F-getteximage
ARB_color_buffer_float/GL_RGBA32F-readpixels
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes some spurious GL errors in the upcoming
gl-3.0-required-sized-formats piglit test.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
intelAllocateBuffer() was oblivious to separate stencil buffers. This
patch fixes it to allocate a non-tiled stencil buffer with special pitch,
just as the DDX does.
Without this, any app that attempted to create an EGL surface with stencil
bits would crash. Of course, this affected only environments that used the
builtin DRI2 backend, such as Android and Wayland.
Fixes GLBenchmark2.1 on Android on gen7.
Note: This is a candidate for the 7.11 branch.
Tested-by: Louie Tsaie <louie.tsai@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
I changed the dimensions of the stencil buffer's region, as allocated by
the DDX, at xf86-video-intel commit
commit 3e55f3e88b40471706d5cd45c4df4010f8675c75
dri: Do not tile stencil buffer
But I forgot to make the analogous update to the Intel DRI2 glue in Mesa.
This patch makes that update.
Surprisingly, the mismatch did not cause any bugs. But the mismatch, if
left unfixed, *would* create bugs in the next commit.
Note: This is a candidate for the 7.11 branch.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
When calculating the y offset needed for detiling window system stencil
buffers, replace the term
region->height * 2 + region->height % 2 - 1
with
rb->Height - 1 .
The two terms are incidentally equivalent due to some out-of-date,
incorrect code in the Intel DRI2 glue for DDX. (See
intel_process_dri2_buffer_with_separate_stencil(), line ``buffer_height /=
2;``).
Note: This is a candidate for the 7.11 branch (only the intel_span.c hunk).
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Rather than redefining the BYTE/SHORT_TO_FLOAT macros, just define new
ones with different names. These macros preserve zero when converting.
Reviewed-by: Eric Anholt <eric@anholt.net>
Don't assert/die if a VBO is too small. Return zero instead. For
debug builds, emit a warning message since this is an unusual situation
that might indicate that there's a bug in the app.
Note that util_draw_max_index() now returns max_index+1 instead of
max_index. This lets us return zero to indicate that one of the VBOs
is too small to draw anything.
Fixes a failure with the new piglit vbo-too-small test.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The swrast ReadPixels code has no dependencies on swrast since moving
to Map/UnmapRenderbuffer(). We'll be able to remove s_readpix.c and
remove the state tracker's glReadPixels code next.
Acked-by: Eric Anholt <eric@anholt.net>
This was only used by the xlib driver to add an alpha channel to the
front/window color buffer. This was no longer going to work well with
the move to direct mapping of renderbuffers.
Reviewed-by: Eric Anholt <eric@anholt.net>
Seldom used and this won't work when we move to using Map/UnmapRenderbuffer
everywhere. This will let us remove a bunch of core Mesa code too.
Reviewed-by: Eric Anholt <eric@anholt.net>
For a depthstencil buffer with separate stencil,
intel_renderbuffer::region is null. (The regions are kept in hidden depth
and stencil buffers). Since the region is null, intel_map_renderbuffer()
assumed there was no data and returned a null map pointer, which in turn
was dereferenced (!) by MapRenderbuffer's caller.
This patch fixes intel_map_renderbuffer() to map the hidden depth buffer
through the GTT and return that as the mapped pointer. Also, the stencil
bits are scattered and gathered when needed.
Fixes the following Piglit tests on gen7:
fbo/fbo-readpixels-depth-formats
hiz/hiz-depth-read-fbo-d24s8
hiz/hiz-stencil-read-fbo-d24s8
EXT_packed_depth_stencil/fbo-clear-formats
EXT_packed_depth_stencil/fbo-depth-GL_DEPTH24_STENCIL8-blit
EXT_packed_depth_stencil/fbo-depth-GL_DEPTH24_STENCIL8-drawpixels
EXT_packed_depth_stencil/fbo-depth-GL_DEPTH24_STENCIL8-readpixels
EXT_packed_depth_stencil/fbo-depthstencil-GL_DEPTH24_STENCIL8-readpixels-24_8
EXT_packed_depth_stencil/fbo-depthstencil-GL_DEPTH24_STENCIL8-readpixels-FLOAT-and-USHORT
EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-readpixels
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
If a window system stencil buffer had a region with odd height, then the
calculated y offset needed for software detiling was off by one. The bug
existed in intel_{map,unmap}_renderbuffer_s8() and in the intel_span.c
accessors.
Fixes the following Piglit tests on gen7:
general/depthstencil-default_fb-readpixels-24_8
general/depthstencil-default_fb-readpixels-FLOAT-and-USHORT
Fixes SIGABRT in the following Piglit tests on gen7:
general/depthstencil-default_fb-blit
general/depthstencil-default_fb-copypixels
general/depthstencil-default_fb-drawpixels-24_8
general/depthstencil-default_fb-drawpixels-FLOAT-and-USHORT
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
When gathering the temporary buffer's pixles into the gem buffer, we had
the two buffers juxtaposed. Oops.
Fixes the following Piglit tests on gen7:
general/GL_SELECT - alpha-test enabled
general/GL_SELECT - depth-test enabled
general/GL_SELECT - no test function
general/GL_SELECT - scissor-test enabled
general/GL_SELECT - stencil-test enabled
Fixes SIGABRT in Piglit tests EXT_framebuffer_object/fbo-stencil-* on
gen7.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The function already implements 3 cases (map through GTT, blit to
a temporary, and detile stencil buffer to temporary), and a 4th will be
added soon: scatter/gather for depthstencil buffers using separate
stencil. For sanity's sake, this factors each case out into its own
function.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Don't call set_unfiform_initializers if link failed, or it would trigger
a GL_INVALID_OPERATION error. That's not an expected behavior of
glLinkProgram function.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Previously, we would fail to compile the following shader due to a bug
in lazy built-in importing:
#version 130
void main() {
float f = abs(5.0);
int i = abs(5);
}
The first call, abs(5.0), would fail to find a local signature, look
through the built-ins, and import "float abs(float)".
The second call, abs(5), would find the newly imported float signature
in the local shader, and settle for that. Unfortunately, it failed to
search the built-ins for the correct/exact signature, "int abs(int)".
Thus, abs(5) ended up being a float, causing a bizarre type error when
we tried to assign it to an int.
Fixes piglit test builtin-overload-matching.frag.
This is /not/ a candidate for stable branches, as it should only be
possible to trigger this bug using GLSL 1.30's built-in functions that
take integer arguments. Plus, the changes are fairly invasive.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
match_function_by_name performs two fairly separate tasks:
1. Hunt down the appropriate ir_function_signature for the callee.
2. Generate the actual ir_call (assuming we found the callee).
Both of these are complicated. The first has to handle exact/inexact
matches, lazy importing of built-in prototypes, different scoping rules
for 1.10, 1.20+, and ES. Not to mention printing a user-friendly error
message with pretty-printed "maybe you meant this" candidate signatures.
The second has to deal with void/non-void functions, pre-call implicit
conversions for "in" parmeters, and post-call "out" call conversions.
Trying to do both in one function is just too unwieldy. Time to split.
This patch purely moves the code to generate an ir_call into a separate
function and reindents it. Otherwise, the code is identical.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
When matching function signatures across multiple linked shaders, we
often want to see if the current shader has _any_ match, but also know
whether or not it was exact. (If not, we may want to keep looking.)
This could be done via the existing mechanisms:
sig = f->exact_matching_signature(params);
if (sig != NULL) {
exact = true;
} else {
sig = f->matching_signature(params);
exact = false;
}
However, this requires walking the list of function signatures twice,
which also means walking each signature's formal parameter lists twice.
This could be rather expensive.
Since matching_signature already internally knows whether a match was
exact or not, we can just return it to get that information for free.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
We need something that looks like a compiler and not like some hacker
put some functions together. /rant
This is a band-aid for these two problems:
- The R600 and EG control-flow instructions appear in switch statements
next to each other, causing conflicts when adding new instructions.
- The ALU control-flow instructions are bitshifted by 3 (from CF_INST 26:29
to CF_INST 23:29, as is defined by r600 ISA) even for EG, where CF_INST
is 22:29.
To fix this mess, the 'inst' field is bitshifted to the left either by 22, 23,
or 26 (directly in the definitions), such that it can be just or'd when making
bytecode without any shifting. All switch statements have been divided into
two, one for R600 and the other for EG.
Of course, there is a better way to do this, but that is left for future
work.
Tested on RV730 and REDWOOD with no regressions.
v2: minor cleanup as per Alex's comment.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This is also done in ir_to_mesa and st_glsl_to_tgsi, but that code
will be removed soon.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If they were disabled on entry, and we enabled one (like for
BlitFramebuffer), we wouldn't disable it on the way out. Retain the
attempted optimization here (don't keep calling to set each bit for
changes that won't matter) by just setting the bits directly with
appropriate flushing.
Fixes misrendering on the second draw of piglit fbo-blit.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's unlikely that we changed the object but no other texture
parameter, but be correct anyway. Noticed by inspection.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Also, actually update const_storage_size, therefore avoiding to
unnecessarily reallocate aligned_constant_storage every single time
draw_vs_set_constants() is called.
Reviewed-by: Brian Paul <brianp@vmware.com>
Although textureSize is represented as an ir_texture with op == ir_txs,
it doesn't have a coordinate, so normalizing it doesn't make sense.
Fixes crashes in oglconform glsl-bif-tex-size basic.samplerCube.* tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch solves three bugs.
1. When a texture was attached to the GL_DEPTH_STENCIL_ATTACHMENT point,
Mesa attached the texture only to the depth attachment point
gl_framebuffer::Attachment[BUFFER_DEPTH]
and failed to attach it to the stencil attachment point
gl_framebuffer::Attachment[BUFFER_STENCIL]
2. When a texture was attached to the GL_DEPTH_ATTACHMENT point and then
later attached to the GL_STENCIL_ATTACHMENT point, Mesa created two
separate renderbuffer wrappers. This caused a GL error in
glGetFramebufferAttachmentParameteriv().
3. Same as 2, but with depth and stencil juxtaposed.
Fixes Piglit test ARB_framebuffer_object/same-attachment-glFramebufferTexture2D-GL_DEPTH_STENCIL
Note: This is a candidate for the stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
The compiler setup for these VF-uploaded attributes looks a little
cheesy with mixing system values and real VBO-sourced attributes. It
would be nice if we could just compute the ATTR[] map to GRF index up
front and use it at visit time instead of using ir->location in the
ATTR file. However, we don't know the reg_offset at
visit(ir_variable *) time, so we can't do the mapping that early.
Fixes piglit vertexid test.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We only allow 16 vec4s of attributes in our GLSL/ARB_vp programs, and
1 more element will get used for gl_VertexID/gl_InstanceID. So it
should never have been possible to hit this fallback, unless there was
another bug. If you do hit this, you're probably using gl_VertexID
and falling back to swrast won't work for you anyway.
This also updates the limits for gen6+.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This used to be script-generated, but now it's just a bunch of static
variables in a .h file for no good reason.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: check if pipe_buffer_map() returns NULL, and return NULL from
svga_vbuf_render_map_vertices(). Per Jose's suggestion.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Previously, if we failed to allocate a VBO (either for display list
compilation or immediate mode rendering) we'd eventually segfault
when trying to map the non-existant buffer or in a glVertex/Color/etc
call when we hit a null pointer.
Now we don't try to map non-existant buffers and if we do fail to
allocate a VBO we plug in no-op functions for glVertex/Color/etc
so we don't segfault.
None of the code in api_noop.c was used anymore. The new vbo_noop.c
functions are true no-ops. They'll be used to no-op glBegin/End functions
when we run out of VBO memory.
Only a handful of functions from api_noop.c are actually used by
the VBO module. Move them to the VBO module. With this change,
none of the code in api_noop.c is actually used anymore.
NEW_COLOR is only needed on Gen4-5 as brw_update_renderbuffer_surfaces
only uses ctx->Color when intel->gen < 6.
This should reduce unnecessary state updates.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Constant expressions which called GLSL's equal() and notEqual()
built-ins on bvecs would hit an assertion failure; we simply forgot to
implement them for booleans.
NOTE: This is a candidate for stable release branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
These simply don't exist in the 1.30 specification---none of the Offset
variants allow samplerCube. This must have been a cut and paste error
from textureGrad, which /does/ allow cubemaps.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Due to a cut and paste error, these were accidentally misnamed
textureProj() rather than textureProjOffset().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
From the GLSL 1.30 spec, section 8.7 "Texture Lookup Functions":
"In all functions below, the bias parameter is optional for fragment
shaders. The bias parameter is not accepted in a vertex shader."
This was a cut and paste mistake.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
brw_wm_samplers actually enables any active samplers regardless of what
pipeline stage is using them, so it doesn't make much sense for it to be
WM-specific. So, rename it to "brw_samplers."
To properly generalize it, move sampler_count and sampler_offset from
brw_context::wm to a new brw_context::sampler that can be shared without
looking strange.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Like for the WM pull constants, we can merge the former prepare/emit
stages into one tracked state atom. Furthermore, the code that used to
handle the binding table was removed in the last commit, leaving some
rather silly looking short functions that can easily be folded in.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Although the hardware supports separate binding tables for each pipeline
stage, we don't see much advantage over a single shared table.
Consider the contents of the binding table:
- Textures (16)
- Draw buffers (8)
- Pull constant buffers (1 for VS, 1 for WM)
OpenGL's texture bindings are global: the same set of textures is
available to all shader targets. So our binding table entries for
textures would be exactly the same in every table.
There are only two pull constant buffers (not many), and although draw
buffers aren't interesting to the VS, it shouldn't hurt to have them in
the table. The hardware supports up to 254 binding table entries, and
we currently only use 26.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
First, the texturing setup code is relevant for all pipeline stages,
while renderbuffer surfaces are only used by the WM.
Secondly, renderbuffer and texture setup depends on a different set of
dirty bits. There's no reason to walk the array of textures when
changing draw buffers, or vice-versa.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These were only split for historical reasons: brw_wm_constants used to
be the "prepare" step, while brw_wm_constant_surface was "emit". Now
that both happen at emit time, it makes sense to combine them.
Call the newly combined state atom "brw_wm_pull_constants" to indicate
help distinguish it from the Gen6+ atoms that handle push constants.
Finally, remove the BRW_NEW_WM_CONSTBUF dirty bit entirely now that it's
never flagged nor used.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
When reading the "brw_wm_constants" and "gen6_wm_constants" atoms
side-by-side, I initially failed to notice the crucial difference:
the Gen6 atoms are for Push Constants, while brw_wm_constants handles
Pull Constants. (Gen4/5 Push Constants are handled by "brw_curbe.")
Renaming these should clarify the code and save me from constant
confusion over the fact that "gen6_wm_constants" isn't just a newer
version of "brw_wm_constants."
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This code is fairly fragile, as it depends on the ordering of the
entries in the binding table, which will change soon.
Also, stop listening on the BRW_NEW_WM_CONSTBUF dirty bit as it's no
longer required.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
These fields control how many entries the hardware prefetches into the
state cache, so they only impact performance, not correctness. However,
it's not clear how to use this in a way that's beneficial.
According to the documentation, kernels "using a large number" of
entries may wish to program this to zero to avoid thrashing the cache;
it's unclear how many is too many. Also, Ironlake's WM was missing this
feature entirely---the count had to be zero.
The dirty bit tracking to handle this complicates the surface state
and binding table setup; removing it should simplify things and make
future refactoring easier. So just set 0 for the number of entries
rather than trying to compute and track it.
Appears to have no impact on Nexuiz and OpenArena on Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The comment states that brw_update_vs_constant_surface produces a
CACHE_NEW_SURF_BIND dirty bit, but it doesn't. In fact, that bit
no longer even exists.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
brw_vs_surfaces _produces_ the BRW_NEW_NR_VS_SURFACES dirty bit, so it
makes no sense for it to subscribe to it.
Fixes an assertion failure in many piglit tests when INTEL_DEBUG is set:
brw_state_upload.c:484: void brw_upload_state(struct brw_context *):
Assertion `!check_state(&examined, &generated)' failed.
One such piglit test is vs-uniform-array-mat2-col-rd.shader_test.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Comparing brw_upload_vs_pull_constants and brw_upload_wm_pull_constants,
it became evident that something was amiss: the VS code had both
CACHE_NEW_VS_PROG and BRW_NEW_VERTEX_PROGRAM, while the WM code was
missing the CACHE_NEW_WM_PROG flag.
Not observed to fix anything, but likely necessary.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Now that we have vtable entries in place, we should use them. This
allows us to drop the cut and pasted Gen7 brw_tracked_state atoms as
they now do exactly the same thing as their brw_wm_surface_state
counterparts.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Gen7+ SURFACE_STATE is different from Gen4-6, so we need separate
per-generation functions for creating and updating it. However, the
usage is the same, and callers just want to utilize the appropriate
functions with minimal pain. So, put them in the vtable.
Since these take a brw_context pointer and are only used on Gen4, just
add a forward declaration. This is the simplest (if not cleanest)
solution. It would be nicer to have a i965-specific vtable, but that's
a refactor for another day.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These two are fairly unique types so add specific cases for decoding them.
Passes piglit fbo-clear-format and fbo-generatemipmap-format tests for these
two extensions.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This ports the softpipe NV_conditional_render support to llvmpipe.
This passes the nv_conditional_render-* piglit tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds a new function r600_need_cs_space. Currently, it's easy to overflow
the CS - queries are not counted in. I guess that's not the only case where
the driver may crap out.
This is the inverse operation to _mesa_pack_rgba_span_int. The 16-bit
code isn't done because of lack of testing and not being sure how sign
extension/clamping should be handled between, say, 16-bit int and
32-bit int or uint.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This requires using a new fragment shader to get the integer color
output, and a new vertex shader because #version has to match between
the two.
v2: Clarify that there's no need for BindFragDataLocation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
We're missing support for the software paths still, but basic
rendering is working.
v2: Override RGB_INT32/UINT32 to not be renderable, since the hardware
can't do it but we do allow texturing from it now. Drop the
DataType override, since the _mesa_problem() isn't in that path
any more.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Before, I was tracking the ir_variable * found for gl_FragColor or
gl_FragData[]. Instead, when visiting those variables, set up an
array of per-render-target fs_regs to copy the output data from. This
cleans up the color emit path, while making handling of multiple
user-defined out variables easier.
v2: incorporate idr's feedback about ir->location (changes by Kenneth Graunke)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When rendering to integer color buffers, we need to be careful to use
MRFs of the correct type when emitting color writes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, brw_type_for_base_type returned UD for array variables,
similar to structures. For structures, each field may have a different
type, so every field access must explicitly override the register's type
with that field's type. We chose to return UD in this case since it was
the least common, so errors would be more obvious.
For arrays, it makes far more sense to return the type corresponding to
an element of the array. This allows normal array access to work
without the hassle of explicitly overriding the register's type.
This should obsolete a bunch of type overrides throughout the code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: s/GL_TRUE/true/, and re-enable RGB_INT32 based on discussion
yesterday about required RB formats vs texture formats.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
This will let the feature be incrementally developed, hidden behind
the flag we're all using as we work on GL 3.0 support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
While not required by any particular spec version, mplayer was asking
for L16 and hoping for actual L16 without checking. The 8 bits
allocated led to 10-bit planar video data stored in the lower 10 bits
giving only 2 bits of precision in video. While it was an amusing
effect, give them what they actually wanted instead.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41461
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We want to be able to support some formats for texturing that we can't
render to, which means that some choices for RenderbufferStorage end
up being incomplete (for example, L8 currently). For these, where we
don't render to them, we don't want to have to make up an rb->DataType
that's only used for GetRow()/PutRow().
Commit 1401b96b (radeon: cleanup radeon shared code after r300 and
r600 classic drivers removal) removed the file
src/mesa/drivers/dri/radeon/server/radeon.h, but it left behind the
symlink which was used to share that file into the
src/mesa/drivers/dri/r200/server directory.
This patch removes the dangling symlink.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
This patch modifies the GLSL linker to assign additional slots for
varying variables used by transform feedback, and record the varying
slots used by transform feedback for use by the driver back-end.
This required modifying assign_varying_locations() so that it assigns
a varying location if either (a) the varying is used by the next stage
of the GL pipeline, or (b) the varying is required by transform
feedback. In order to avoid duplicating the code to assign a single
varying location, I moved it into its own function,
assign_varying_location().
In addition, to support transform feedback in the case where there is
no fragment shader, it is now possible to call
assign_varying_locations() with a consumer of NULL.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Marek Olšák <maraeo@gmail.com>
This just validates the input parameters so far.
Fixes piglit's bindfragdata-invalid-parameters test.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Up until now modifying the GLSL compiler has been pretty straightforward.
This is where things get interesting. But still pretty straightforward.
Switch statements can be thought of a series of if/then/else statements.
Case labels are compared with the value of a test expression and the case
statements are executed if the comparison is true.
There are a couple of aspects of switch statements that complicate this simple
view of the world. The primary one is that cases can fall through sequentially
to subsequent case, unless a break statement is encountered, in which case,
the switch statement exits completely.
But break handling is further complicated by the fact that a break statement
can impact the exit of a loop. Thus, we need to coordinate break processing
between switch statements and loop statements.
The code generated by a switch statement maintains three temporary state
variables:
int test_value;
bool is_fallthru;
bool is_break;
test_value is initialized to the value of the test expression at the head of
the switch statement. This is the value that case labels are compared against.
is_fallthru is used to sequentially fall through to subsequent cases and is
initialized to false. When a case label matches the test expression, this
state variable is set to true. It will also be forced to false if a break
statement has been encountered. This forcing to false on break MUST be
after every case test. In practice, we defer that forcing to immediately after
the last case comparison prior to executing a case statement, but that is
an optimization.
is_break is used to indicate that a break statement has been executed and is
initialized to false. When a break statement is encountered, it is set to true.
This state variable is then used to conditionally force is_fallthru to to false
to prevent subsequent case statements from executing.
Code generation for break statements depends on whether the break statement is
inside a switch statement or inside a loop statement. If it inside a loop
statement is inside a break statement, the same code as before gets generated.
But if a switch statement is inside a loop statement, code is emitted to set
the is_break state to true.
Just as ASTs for loop statements are managed in a stack-like
manner to handle nesting, we also add a bool to capture the innermost switch
or loop condition. Note that we still need to maintain a loop AST stack to
properly handle for-loop code generation on a continue statement. Technically,
we don't (yet) need a switch AST stack, but I am using one for orthogonality
with loop statements, in anticipation of future use. Note that a simple
boolean stack would have sufficed.
We will illustrate a switch statement with its analogous conditional code that
a switch statement corresponds to by examining an example.
Consider the following switch statement:
switch (42) {
case 0:
case 1:
gl_FragColor = vec4(1.0, 2.0, 3.0, 4.0);
case 2:
case 3:
gl_FragColor = vec4(4.0, 3.0, 2.0, 1.0);
break;
case 4:
default:
gl_FragColor = vec4(0.0, 0.0, 0.0, 0.0);
}
Note that case 0 and case 1 fall through to cases 2 and 3 if they occur.
Note that case 4 and the default case must be reached explicitly, since cases
2 and 3 break at the end of their case.
Finally, note that case 4 and the default case don't break but simply fall
through to the end of the switch.
For this code, the equivalent code can be expressed as:
int test_val = 42; // capture value of test expression
bool is_fallthru = false; // prevent initial fall through
bool is_break = false; // capture the execution of a break stmt
is_fallthru |= (test_val == 0); // enable fallthru on case 0
is_fallthru |= (test_val == 1); // enable fallthru on case 1
is_fallthru &= !is_break; // inhibit fallthru on previous break
if (is_fallthru) {
gl_FragColor = vec4(1.0, 2.0, 3.0, 4.0);
}
is_fallthru |= (test_val == 2); // enable fallthru on case 2
is_fallthru |= (test_val == 3); // enable fallthru on case 3
is_fallthru &= !is_break; // inhibit fallthru on previous break
if (is_fallthru) {
gl_FragColor = vec4(4.0, 3.0, 2.0, 1.0);
is_break = true; // inhibit all subsequent fallthru for break
}
is_fallthru |= (test_val == 4); // enable fallthru on case 4
is_fallthru = true; // enable fallthru for default case
is_fallthru &= !is_break; // inhibit fallthru on previous break
if (is_fallthru) {
gl_FragColor = vec4(0.0, 0.0, 0.0, 0.0);
}
The code generate for |= and &= uses the conditional assignment capabilities
of the IR.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We now tie the grammar to the ctors of the ASTs they reference.
This requires that we actually have definitions of the ctors.
In addition, we also need to define "print" and "hir" methods for the AST
classes. The Print methods are pretty simple to flesh out. However, at this
stage of the development, we simply stub out the "hir" methods and flesh
them out later.
Also, since actual class instances get returned by the productions in the
grammar, we also need to designate the type of the productions that
reference those instances.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The grammar is modified to support switch statements. Rather than follow the
grammar in the appendix, which allows case labels to be placed ANYWHERE
as a regular statement, we follow the development of the grammar as
described in the body of the GLSL spec.
In this variation, the switch statement has a body which consists of a list
of case statements. A case statement is preceded by a list of case labels and
ends with a list of statements.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Data structures for switch statement and case label are created that parallel
the structure of other AST data.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It is needed for nv50's new shader backend. With this change, both u_math.h
and imports.h in core mesa define the same function. I have to #undef log2f
here to avoid the conflict. Not sure if there is a better way to deal with
the situation.
Acked-by: José Fonseca <jfonseca@vmware.com>
Switch all of the code in ir_to_mesa, st_glsl_to_tgsi, glUniform*,
glGetUniform, glGetUniformLocation, and glGetActiveUniforms to use the
gl_uniform_storage structures in the gl_shader_program.
A couple of notes:
* Like most rewrite-the-world patches, this should be reviewed by
applying the patch and examining the modified functions.
* This leaves a lot of dead code around in linker.cpp and
uniform_query.cpp. This will be deleted in the next patches.
v2: Update the comment block (previously a FINISHME) in _mesa_uniform
about generating GL_INVALID_VALUE when an out-of-range sampler index
is specified.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
_mesa_ir_link_shader needs to be called before cloning the IR tree so
that the var->location field for uniforms is set.
WARNING: This change breaks several integer division related piglit
tests. The tests break because _mesa_ir_link_shader lowers integer
division to an RCP followed by a MUL. The fix is to factor out more
of the code from ir_to_mesa so that _mesa_ir_link_shader does not need
to be called at all by the i965 driver. This will be the subject of
several follow-on patches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
These were both useful debugging aids while developing this code.
log_uniform will be used to keep the MESA_GLSL=uniform behavior.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Connects all of the gl_program_parameter structures with the correct
gl_uniform_storage structures.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
These functions are used to create and destroy the connections between
a uniform and the storage used by the driver to hold its value.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
This function propagates the values from the backing storage of a
gl_uniform_storage structure to the driver supplied data locations.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
v2: Remane class count_uniform_size based on feedback from Eric:
"Maybe just "count_uniform_size"? "usage" makes me think "way it's
dereferenced" or something."
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
v2: Update a comment block about the different treatment of
location=-1 based on feedback from Ken.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Prepend _mesa_uniform_ to the names and rework the calling
convention. The calling convention was changed for a couple reasons.
1. Having a single variable named 'location' have completely different
meanings at different places in the function is confusing. Before
calling split_location_offset the location is the encoded value
returned by glGetUniformLocation. After calling split_location_offset
it's the index of the uniform in the gl_uniform_list::Uniforms array.
2. In a later commit the original value of 'location' is needed after
split_location_offset has been called.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
v2: Update some comments based on feedback from Eric Anholt.
v3: Remove gl_uniform_storage::dirty field. Make
gl_uniform_storage::initialized be bool, and make
gl_uniform_storage::sampler be uint8_t.
v4: Include stdbool.h after Tom Stellard noticed a build failure that
was introduced by the changes in v2. Oops.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
There are cases where we might want to internally query the location
of a uniform in a shader that failed linking.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Some C code will want access to the glsl_base_type and
glsl_sampler_dim enums in the near future.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
The spec says "Only ClearBufferiv should be used to clear
stencil buffers." and "Only ClearBufferfv should be used to clear
depth buffers." However, on the following page it also says:
"The result of ClearBuffer is undefined if no conversion between
the type of the specified value and the type of the buffer being
cleared is defined (for example, if ClearBufferiv is called for a
fixed- or floating-point buffer, or if ClearBufferfv is called
for a signed or unsigned integer buffer). *This is not an error.*"
Emphasis mine.
Fixes problems with piglit's clearbuffer-invalid-drawbuffer test.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes a regression from the recent glReadPixels changes found
with the piglit hiz tests.
Use either MESA_FORMAT_RGBA8888 or MESA_FORMAT_RGBA8888_REV for color
buffers depending on endian-ness. Before, the gl_renderbuffer::Format
field was MESA_FORMAT_RGBA8888 but the data was really stored as
MESA_FORMAT_RGBA8888_REV when using a little endian machine.
Getting this right matters now that we can access renderbuffer data
without going through the span functions (namely glReadPixels() +
MapRenderbuffer()).
These vars will just get overwritten when we call _mesa_add_renderbuffer()
anyway. We only need to set the InternalFormat field when we create the
software renderbuffer.
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes the glReadPixels() regression for reading from the front/back
color buffers.
Note, we only allow one mapping of an XImage/Pixmap renderbuffer
at any time. That might need to be revisited in the future.
One of the points of GL_ARB_texture_storage is to make it impossible
to have malformed mipmap stacks. If we know the texture object is
immutable, we can skip a bunch of size checking.
Commit a73c65c534 had a typo which
accidentally enabled the workaround-free Gen7 code on Gen6.
Fixes GPU hangs in anything using pow() or integer division/modulus.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
According to the documentation, Ivybridge's math instruction works in
SIMD16 mode for the fragment shader, and no longer forbids align16 mode
for the vertex shader.
The documentation claims that SIMD16 mode isn't supported for INT DIV,
but empirical evidence shows that it works fine. Presumably the note
is trying to warn us that the variant that returns both quotient and
remainder in (dst, dst + 1) doesn't work in SIMD16 mode since dst + 1
would be sechalf(dst), trashing half your results. Since we don't use
that variant, we don't care and can just enable SIMD16 everywhere.
The documentation also still claims that source modifiers and
conditional modifiers aren't supported, but empirical evidence and
study of the simulator both show that they work just fine.
Goodbye workarounds. Math just works now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
It seems line loop stipple in hardware needs something I don't know, it might
need a proper geometry shader who knows.
Signed-off-by: Dave Airlie <airlied@redhat.com>
SPI semantic indices for PS/VS are now static, so we don't
need to update spi config for every shaders combination. We can move
the functionality of r600_spi_update to r600(evergreen)_pipe_shader_ps.
Flatshade state is now controlled by the global FLAT_SHADE_ENA flag
instead of updating FLAT_SHADE for all inputs.
Sprite coord still requires the update of spi setup when
sprite_coord_enable is first changed from zero (enabled), and then
only when it's changed to other non-zero value (enabled for other input).
Change to zero (disabling) and back to the same value is handled via
global SPRITE_COORD_ENA.
New field "sprite_coord_enable" added to "struct r600_pipe_shader"
to track current state for the pixel shader. It's checked in the
r600_update_derived_state.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There is no need to duplicate semantic mapping which is done in hw, so get
rid of r600_find_vs_semantic_index.
TGSI name/sid pair is mapped to the 8-bit semantic index for SPI.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
gbm_gallium does not depend on DRI, but its build rules depend on DRI_LIB_DEPS
being set. Output an error when the user enables gbm_gallium but disables
DRI. This is just a workaround.
If the VS has outputs that aren't consumed by the FS we were mapping
them all to one unused VS output index, but that's illegal. Instead,
map unused VS outputs to unique indexes.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The code expects the geometry shader to be NULL.
We don't have geometry shaders now, but it's good to be prepared.
v2: check for support in the cso context
SPI semantic indices for PS/VS are now static, so we don't
need to update spi config for every shaders combination. We can move
the functionality of r600_spi_update to r600(evergreen)_pipe_shader_ps.
Flatshade state is now controlled by the global FLAT_SHADE_ENA flag
instead of updating FLAT_SHADE for all inputs.
Sprite coord still requires the update of spi setup when
sprite_coord_enable is first changed from zero (enabled), and then
only when it's changed to other non-zero value (enabled for other input).
Change to zero (disabling) and back to the same value is handled via
global SPRITE_COORD_ENA.
New field "sprite_coord_enable" added to "struct r600_pipe_shader"
to track current state for the pixel shader. It's checked in the
r600_update_derived_state.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
There is no need to duplicate semantic mapping which is done in hw, so get
rid of r600_find_vs_semantic_index.
TGSI name/sid pair is mapped to the 8-bit semantic index for SPI.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
The readpixels microbenchmark in mesa-demos goes from 47Mpix/sec at
1000x1000 to 450Mpix/sec. The 10x10 sizes stay about the same.
Reviewed-by: Brian Paul <brianp@vmware.com>
Renderbuffer mapping handles flushing the batchbuffer if required, so
all we need to do is make sure any pending rendering has reached the
batchbuffer.
Reviewed-by: Brian Paul <brianp@vmware.com>
There doesn't appear to be any particular reason for this -- it's not
like the width is changing between the deref and the use.
Reviewed-by: Brian Paul <brianp@vmware.com>
In all of piglit, only two tests hit it (reading to RGBA float, where
GetRow would drop floats into place from R, RG, or RGB). Mostly this
is because _ColorReadClamp has been causing transferOps to always be
set, skipping any fast-paths anyway.
Reviewed-by: Brian Paul <brianp@vmware.com>
This may be a bit slower than before because we're switching from
per-format compiled loops in GetRow to
_mesa_unpack_rgba_block_unpack's loop around a callback to unpack a
pixel. The solution there would be to make _mesa_unpack_rgba_block
fold the span loop into the format handlers.
(On the other hand, function call overhead will hardly matter if
MapRenderbuffer means the driver gets the data into cacheable memory
instead of uncached).
The adjust_colors code should no longer be required, since the unpack
function does the 565 to float conversion in a single pass instead of
converting it (poorly) through 8888 as apparently happened in the
past.
Reviewed-by: Brian Paul <brianp@vmware.com>
This should be useful in making more generic fast paths in the pixel
paths.
v2: Add note about PACK_SWAP_BYTES, and fix up for endianness by
synchronizing with memcpy_texture paths in texstore.c.
Reviewed-by: Brian Paul <brianp@vmware.com>
This introduces two new span helper functions we'll want to use in
several places as we move to MapRenderbuffer, which pull out integer
depth and stencil values from a renderbuffer mapping based on the
renderbuffer format.
v2: Use format_unpack helper for stencil read.
v3: Clean up comment after conversion to format_unpack.
Reviewed-by: Brian Paul <brianp@vmware.com>
This also makes it handle 24/8 vs 8/24, fixing piglit
depthstencil-default_fb-readpixels-24_8 on i965. While here, avoid
incorrectly fast-pathing if packing->SwapBytes is set.
v2: Move the unpack code to format_unpack.c, fix BUFFER_DEPTH typo
v3: Fix signed/unsigned comparison.
Reviewed-by: Brian Paul <brianp@vmware.com>
This avoids going through the wrapper that has to rewrite the data for
packed depth/stencil. This isn't done in _swrast_read_stencil_span
because we don't want to map/unmap for each span.
v2: Move the unpack code to format_unpack.c.
v3: Fix signed/unsigned comparison.
Reviewed-by: Brian Paul <brianp@vmware.com>
i965's MUL instruction can't take an immediate value as its first
argument. So normally, if constant propagation wants to propagate a
constant into the first argument of a MUL instruction, it swaps the
order of the two arguments.
This doesn't work for 32-bit integer (and unsigned integer)
multiplies, because the MUL operation is asymmetric in that case (it
multiplies 16 bits of one operand by 32 bits of the other).
Fixes piglit tests {vs,fs}-multiply-const-{ivec4,uvec4}.
Reviewed-by: Eric Anholt <eric@anholt.net>
If we're drawing sprites and the fragment shader needs both auto-
generated texcoords and user-defined varying vars we need to use
this fallback path.
The reason is when we enable auto texcoord generation, it gets
enabled for all texcoord sets. And that clobbers the user-defined
varying vars.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
If we use the draw-module for wide point/line/etc drawing we'll need
a fragment shader too (like we pass in the vertex shader).
This fixes sprite point rendering when forcing the swtnl path.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The state tracker may generate shaders that use generic vs outputs /
fs inputs like:
DCL IN[0], GENERIC[0]
DCL IN[1], GENERIC[10]
DCL IN[2], GENERIC[11]
This patch remaps 0, 10, 11 to small integers like 1, 2, 3 so that we
stay inside the SVGA3D limit (8).
The remapping is done to both the vertex shader outputs and the
fragment shader inputs. The same mapping must be used for a vs/fs
pair.
Note that 'union svga_compile_key' is now 'struct svga_compile_key'
because we needed to add the register remapping table. The change in
size isn't really significant though (it's not a search key).
Also, add assertions when building up SVGA3D src/dst registers to we
don't try to store too large of value for the bitfield size.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Missed this back in the arb_robustness branch
<6b329b9274b18c50f4177eef7ee087d50ebc1525>.
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The returned value should be a texture target index, not a bit.
I spotted this from seeing a new compiler warning caused by the increase
in the number of texture targets. This has been broken for a long time.
Note: This is a candidate for the 7.11 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This requires tracking a couple extra fields in ir_variable:
* A flag to indicate that a variable had an initializer.
* For non-const variables, a field to track the constant value of the
variable's initializer.
For variables non-constant initalizers, ir_variable::has_initializer
will be true, but ir_variable::constant_initializer will be NULL. The
linker can use the values of these fields to check adherence to the
GLSL 4.20 rules for shared global variables:
"If a shared global has multiple initializers, the initializers
must all be constant expressions, and they must all have the same
value. Otherwise, a link error will result. (A shared global
having only one initializer does not require that initializer to
be a constant expression.)"
Previous to 4.20 the GLSL spec simply said that initializers must have
the same value. In this case of non-constant initializers, this was
impossible to determine. As a result, no vendor actually implemented
that behavior. The 4.20 behavior matches the behavior of NVIDIA's
shipping implementations.
NOTE: This is candidate for the 7.11 branch. This patch also needs
the preceding patch "glsl: Refactor generate_ARB_draw_buffers_variables
to use add_builtin_constant"
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34687
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Incrementing "td" before initializing it is
pointless and just leads to an uninitialized
variable warning with MSVC.
Signed-off-by: Christian König <deathsimple@vodafone.de>
This is an OpenGL ES specific extension. External textures are textures that
may be sampled from, but not be updated (no glTexSubImage* and etc.). The
image data are taken from an EGLImage.
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Jakob Bornecrantz <jakob@vmware.com>
GL_TEXTURE_RECTANGLE_NV (and soon GL_TEXTURE_EXTERNAL_OES) is special. Handle
it in its own if-block. There should be no functional change.
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Jakob Bornecrantz <jakob@vmware.com>
This extension introduces a new sampler type: samplerExternalOES.
texture2D (and texture2DProj) can be used to do a texture look up in an
external texture.
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
3-bit fields are used store texture target in several places. That will fail
when TEXTURE_EXTERNAL_INDEX, which happends to be the 9th texture target, is
added. Make them 4-bit fields.
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Jakob Bornecrantz <jakob@vmware.com>
remove another long if condition test. I don't feel a strong need of
this patch. But for it make the code a little simpler(I do think so),
I send it out.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
glRenderbufferStorage man page says:
GL_INVALID_VALUE is generated if either of width or height is negative,
or greater than the value of GL_MAX_RENDERBUFFER_SIZE.
NOTE: this is a candidate for the 7.11 branch
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
EXT_framebuffer_object bspec says:
Get Value Type Get Command Initial Value
------------------------------- ------ ----------- -----------
RENDERBUFFER_INTERNAL_FORMAT_EXT Z+ GetRenderbufferParameterivEXT RGBA
NOTE: this is a candidate for the 7.11 branch
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The ARB_texture_swizzle spec says:
The error INVALID_OPERATION is generated if TexParameteri,
TexParameterf, TexParameteriv, or TexParameterfv, parameter <pname>
is TEXTURE_SWIZZLE_R, TEXTURE_SWIZZLE_G, TEXTURE_SWIZZLE_B,
or TEXTURE_SWIZZLE_A, and <param> is not RED, GREEN, BLUE, ALPHA,
ZERO, or ONE.
The error INVALID_OPERATION is generated if TexParameteriv, or
TexParameterfv, parameter <pname> TEXTURE_SWIZZLE_RGBA, and the four
consecutive values pointed to by <param> are not all RED, GREEN, BLUE,
ALPHA, ZERO, or ONE.
So, the GL_TEXTURE_SWIZZLE* pname is legal for glTexParameterf(v)
NOTE: this is a candidate for the 7.11 branch
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
When a vertex shader input attribute is declared with an integral type
(e.g. ivec4), we need to ensure that the generated vertex shader code
addresses the vertex attribute register using the proper register
type. (Previously, we assumed all vertex shader input attributes were
floating-point).
In addition, when uploading vertex data that was specified with
VertexAttribIPointer, we need to instruct the vertex fetch unit to
convert the data to signed or unsigned int, rather than float. And
when filling in the implied w=1 on a vector with less than 4
components, we need to fill it in with the integer representation of 1
rather than the floating-point representation of 1.
Fixes piglit tests vs-attrib-{ivec4,uvec4}-precision.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch ensures that gl_client_array::Integer is properly set to
GL_TRUE for vertex attributes specified using glVertexAttribIPointer,
and to GL_FALSE for vertex attributes specified using
glVertexAttribPointer, so that the vertex attributes can be
interpreted properly by driver back-ends.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When converting an expression like "++x" to GLSL IR we were failing to
account for the possibility that x might be an unsigned integral type.
As a result the user would receive a bogus error message "Could not
implicitly convert operands to arithmetic operator".
Fixes piglit tests {vs,fs}-{increment,decrement}-uint.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
TODO: check if GetImage works with passing the pitch as width, similar to PutImage,
which avoids the extra copy, ala dri_sw_displaytarget_display() in src/gallium/winsys/sw/dri/dri_sw_winsys.c
This is a cleanup of commit 02f1b50987.
Update tex buffer using a dri_drawable hook from implemented in sw/drisw.c.
This saves us the duplication of dri_drawable.c.
CC: Stuart Abercrombie <sabercrombie@chromium.org>
CC: Stéphane Marchesin <marcheu@chromium.org>
Certain exports (position, point size, etc.) are treated
specially by the shader and not counted as generic exports.
Note the exports and any relevant related state bits.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This fixes issues with the code playing fast and loose with types of
buffers, and as a bonus avoids the wrappers that were previously used
to pull bits out of packed depth/stencil buffers.
Reviewed-by: Brian Paul <brianp@vmware.com>
Some of the return values were u32, some were 24 bits, and z16
returned 16 bits. The caller would have to do all the work of
interpreting the format all over again. However, there are no callers
of this function at this point.
Reviewed-by: Brian Paul <brianp@vmware.com>
Perhaps the easiest implementation, nouveau can directly map buffers
even if tiled, and uses separate surfaces for its texture
renderbuffers so we don't have to worry about that offset.
Reviewed-by: Brian Paul <brianp@vmware.com>
Unlike intel, we do a blit to/from GTT memory in order to
untile/retile the renderbuffer data, since we don't have fence
registers for accessing it.
(There is software tiling code in radeon_tile.c, but it's unused and
doesn't support macro tiling)
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Add separate stencil S8 W-tile swizzling/deswizzling. Tested for
the swizzling case with env INTEL_SEPARATE_STENCIL=1 INTEL_HIZ=1
./bin/hiz-depth-stencil-test-fbo-d24-s8
v3: Apply Chad's fix for S8 window system buffers.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Mesa core's is generic for things like osmesa.
For swrast_dri.so, we have to do Y flipping. The front-buffer path
isn't actually tested, though, because both before and after it fails
with a BadMatch in XGetImage.
Reviewed-by: Brian Paul <brianp@vmware.com>
This reverts commit abaebcee78.
The assertion I made was that "the zero-copy code in validation" would
zero copy. Of course, I deleted that check back in January because
the two sites that would trigger it (glTexImage() and this one) both
immediately bound their mt to the object, making the other check
pointless.
Removes two extra blits in glx-tfp. Also fixed the Android home
screen, which wasn't rendering because the extra copy broke the
relationship between the texture and the eglimage.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42152
Tested-by: Chad Versace <chad@chad-versace.us>
Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
This also fixes the build error due to missing link_uniforms.cpp in the source
lists.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chad Versace <chad@chad-versace.us>
[olv: the missing link_uniforms.cpp was added before this patch is committed]
With the hope that Android.mk and SConscript can share the file to reduce
future breakage.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chad Versace <chad@chad-versace.us>
MinGW uses MSVC's runtime DLLs for most of C runtime's functions, and
there has same semantics for vsnprintf.
Not sure how this worked until now -- maybe one of the internal
vsnprintf implementations was taking precedence.
Previously, the vertex and fragment shader back-ends assumed that all
varyings were floats. In GLSL 1.30 this is no longer true--they can
also be of integral types provided that they have an interpolation
qualifier of "flat".
This required two changes in each back-end: assigning the correct type
to the register that holds the varying value during shader execution,
and assigning the correct type to the register that ties the varying
value to the rest of the graphics pipeline (the message register in
the case of VS, and the payload register in the case of FS).
Fixes piglit tests fs-int-interpolation and fs-uint-interpolation.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This function is similar to get_base_type(), but when called on
arrays, it returns the scalar type composing the array. For example,
glsl_type(vec4[]) => float_type.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
i965 graphics hardware has two floating point modes: ALT and IEEE. In
ALT mode, floating-point operations never generate infinities or NaNs,
and MOV instructions translate infinities and NaNs to finite values.
In IEEE mode, infinities and NaNs behave as specified in the IEEE 754
spec.
Previously, we used ALT mode for all vertex and fragment programs,
whether they were GLSL programs or ARB programs. The GLSL spec is
sufficiently vague about how infs and nans are to be handled that it
was unclear whether this mode was compliant with the GLSL 1.30 spec or
not, and it made it very difficult to test the isinf() and isnan()
functions.
This patch changes i965 GLSL programs to use IEEE floating-point mode,
which is clearly compliant with GLSL 1.30's inf/nan requirements. In
addition to making the Piglit isinf and isnan tests pass, this paves
the way for future support of the ARB_shader_precision extension.
Unfortunately we still have to use ALT floating-point mode when
executing ARB programs, because those programs require 0^0 == 1, and
i965 hardware generates 0^0 == NaN in IEEE mode.
Fixes piglit tests "isinf-and-isnan fs_fbo", "isinf-and-isnan vs_fbo",
and {fs,vs}-{isinf,isnan}-{vec2,vec3,vec4}.
The implementations are as follows:
isinf(x) = (abs(x) == +infinity)
isnan(x) = (x != x)
Note: the latter formula is not necessarily obvious. It works because
NaN is the only floating point number that does not equal itself.
Fixes piglit tests "isinf-and-isnan fs_basic" and "isinf-and-isnan
vs_basic".
This patch adds the extension '.ir' to all the files in
src/glsl/builtins/ir/, and changes generate_builtins.py so that it no
longer globs on '*' to find the files to build. This prevents
spurious files (such as EMACS' infamous *~ backup files) from breaking
the build.
The implementation of ir_binop_nequal in constant_expression_value()
appears to have been copy-and-pasted from the implementation of
ir_binop_equal, but with all instances of '==' changed to '!='. This
is correct except for one minor flaw: one of those '==' operators was
in an assertion checking that the types of the two arguments were
equal. That one needs to stay an '=='.
Fixes piglit tests {fs,vs}-inline-notequal.
svga keeps a small queue of similar primitive draws in order to coalesce
them into a single draw primitive command.
But the buffers referred in primitives not yet emitted were being ignored
in the considerations to flush or not the context.
This fixes piglit vbo-map-remap, vbo-subdata-sync, vbo-subdata-zero, and
Seeker.
Based on investigation and patch from Brian Paul.
Reviewed-By: Brian Paul <brianp@vmware.com>
Forgot to destroy the pipe context on xa context destroy.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
pb_debug_manager_dump was trying to take a lock already
held by all callers.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Jos Fonseca <jfonseca@vmware.com>
This drops all the old drmSupports* checks since KMS does them all, and it
also drop R300_CLASS and R600_CLASS.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
It caught one possible bug I recall in my time working on the driver,
and we haven't been setting it for non-fixed-function since the new FS
backend came along. The bug it caught was likely a confusion about
sampler mappings, which we have tests for these days.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
This was the last prepare() function, and it's the first state atom,
so it must be ready to move.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
I don't really want to touch this impenetrable code in this series, so
just call the one function from the other, since no other atom cares
about them.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
While other units need to know about our constant buffer offsets,
nothing else cared about which particular BO other than the emit() half.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
This rearranges the code a bit, and makes the upload of the binding
table take only as many surfaces as there are in use.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
These produce BRW_NEW_SURFACES (used by binding table emit()) and
BRW_NEW_NR_WM_SURFACES (used by WM unit emit()). Fixes a bug where
with no texturing and no color buffer, we wouldn't consider the null
renderbuffer in nr_surfaces. This was harmless because nr_surfaces is
only used for the prefetch info in the unit state.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
These produce BRW_NEW_SURFACES (used by binding table emit()) and
BRW_NEW_NR_WM_SURFACES (used by WM unit emit()). Fixes a bug where
with no texturing and no color buffer, we wouldn't consider the null
renderbuffer in nr_surfaces. This was harmless because nr_surfaces is
only used for the prefetch info in the unit state.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
This is part of a series trying to eliminate the separate prepare()
hook in state upload. The prepare() hook existed to support the
check_aperture in between calculating state updates and setting up the
batch, but there should be no reason for that any more.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
As we move state to emit() time from prepare() time, a couple of the
places that flag fallbacks will move here.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
We were doing the BO validate step in prepare() (brw_validate_state())
hooks of atoms so that we could check_aperture before emitting the
relocation trees during brw_upload_state() that would actually make
the batchbuffer reference too much memory to be executed. Now that
all relocations occur in the batchbuffer, we can instead
check_aperture after emitting our state into the batchbuffer, and
easily roll back, flush, and retry if we happened to go over the
limits.
This will let us remove the whole prepare() vs emit() split in our
state atoms, which is a source of tricky dependencies and duplicated
code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
This will be used to avoid the prepare() step in the i965 driver's
state setup. Instead, we can just speculatively emit the primitive
into the batchbuffer, then check if the batch is too big, rollback and
flush, and replay the primitive.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
This could have broken always_flush_cache on i965, since
reserved_space doesn't reflect the size of the workaround flushes, and
we might run out of space. This should make always_flush_cache more
useful on pre-i965, anyway (since the point is to flush around each
draw call, even within a batchbuffer).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Paul Berry <stereotype441@gmail.com>
Replace pipe->flush() with pipe->texture_barrier() in
the texture upload path for the staging texture.
This should be enough to get data out of the gpu
caches ready to be read for texture fetch.
It's not useful for anything.
The rest of the patch is just a cleanup resulting
from some of the variables being no longer used.
There are no piglit regressions.
With the recent changes to interpolation stuff, we can now get the value
direct from the program instead of just being fail.
fixes some of the glsl-1.30 interpolation tests with softpipe
Signed-off-by: Dave Airlie <airlied@redhat.com>
DRI2 supports this now - and already enables it explicitly - but drisw
does not and should not. Otherwise toolkits like clutter will only ever
SwapBuffers once and wait forever for an event that's not coming.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Previously check_resources could fail, but we'd still try to optimize
the shader, do device-specific code generation, etc. In some cases,
this could explode (especially in the device-specific code
generation). I haven't found that I could trigger this with the
current code. When too many samplers were used with the new uniform
handling code, I observed several crashes deep down in the driver.
NOTE: This is candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41609
Cc: Eric Anholt <eric@anholt.net>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
Previously a shader like
int X;
struct X { int i; };
void main() { gl_Position = vec4(0.0); }
would generate two error message:
0:2(19): error: struct `X' previously defined
0:2(20): error: incomplete declaration
The first one is the real error, and the second is spurious.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Other parts of the code already caught things like 'float x[4][2]'.
However, nothing caught 'float [4] x[2]'.
Fixes piglit test array-multidimensional-new-syntax.vert.
NOTE: This is candidate for the 7.11 branch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The idea here is to set up the message header with the Sampler State
pointer which the hardware provides as part of the PS Thread Payload in
register g0.
Unfortunately, the existing code
fs_reg(GRF, 0, BRW_REGISTER_TYPE_UD))
actually references "virtual GRF 0" rather than the hardware g0. This
is just some arbitrary GRF temporary which will get register allocated.
So, we ended up setting up the header with garbage.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes the 1000000.0 overflow cases of piglit
GL_EXT_packed_float/pack.c
Reviewed-by: Marek Ol ák <maraeo@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From the GL_EXT_packed_float spec:
For an RGBA color, if <type> is not one of FLOAT,
UNSIGNED_INT_5_9_9_9_REV_EXT, or UNSIGNED_INT_10F_11F_11F_REV_EXT,
or if the CLAMP_READ_COLOR_ARB is TRUE, or CLAMP_READ_COLOR_ARB
is FIXED_ONLY_ARB and the selected color (or texture) buffer is
a fixed-point buffer, each component is first clamped to [0,1].
Then the appropriate conversion formula from table 4.7 is applied
the component."
(but we previously resolved that the CLAMP_READ_COLOR bit is not
relevant to glGetTexImage())
This fixes most of the cases in piglit GL_EXT_packed_float/pack.
Reviewed-by: Marek Ol ák <maraeo@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is only used in the code for packing to INF, and resulted in an
extra bit set that was set anyway, so it was harmless except for the
confusion caused.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From page 22 (28 of PDF) of GLSL 1.30 spec:
It is an error to provide a literal integer whose magnitude is too
large to store in a variable of matching signed or unsigned type.
Unsigned integers have exactly 32 bits of precision. Signed integers
use 32 bits, including a sign bit, in two's complement form.
Fixes piglit int-literal-too-large-0[123].frag.
v2: Take care with INT_MIN, use stroull, and make it a function.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There's no sense in building a broken driver. Previously, there was
the potential of building a DRI1-only driver that would work for DRI1
and fail on DRI2 because the newer libdrm code wasn't present. Now
the radeon build system should be matching intel and nouveau.
This can probably be reduced even further by moving this logic to the
scissor state update or just removing the logic entirely, but I don't
trust myself in radeon quite that much.
It's past time, and it was going to get in the way of the renderbuffer
mapping refactor. We dropped all the other DRI1 drivers for this
release, and I can't imagine anybody supporting DRI1 radeon classic in
a new release of Mesa.
Diff produced by treating kernel_mm as true, deleting the DRI1 paths
that produce kernel_mm false, and deleting code.
It's past time, and it was going to get in the way of the renderbuffer
mapping refactor. We dropped all the other DRI1 drivers for this
release, and I can't imagine anybody supporting DRI1 radeon classic in
a new release of Mesa.
Cleanup of the resulting dead code to follow.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
These are effectively doing type->get_base_type()->base_type, which is
equivalent to type->base_type. Just use that, as it's simpler.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
And not all existing queries. The only reason we have that list is to be able
to suspend and resume the active ones.
This reduces looping over queries when suspending and resuming.
The queries no longer have to track some of their states.
We weren't setting TEX_SEM_WAIT on instructions that read the value of a
TEX instruction and also wrote the same register as the TEX instruction.
This is the sequence we were miscompiling:
1: TEX temp[0], input[2].xy__, 2D[0]
...
16: src0.xyz = temp[22], src1.xyz = temp[0], src2.xyz = temp[19]
MAD temp[0].xyz, src0.xxx, src1.xyz, src2.xxx
https://bugs.freedesktop.org/show_bug.cgi?id=42090
This required the following changes:
- WM setup now makes the appropriate set of barycentric coordinates
(perspective vs. noperspective) available to the fragment shader,
based on whether the shader requires perspective interpolation,
noperspective interpolation, both, or neither.
- The fragment shader backend now uses the appropriate set of
barycentric coordiantes when interpolating, based on the
interpolation mode returned by
ir_variable::determine_interpolation_mode().
- SF setup now uses gl_fragment_program::InterpQualifier to determine
which attributes are to be flat shaded (as opposed to the old logic,
which only flat shaded colors).
- CLIP setup now ensures that the clipper outputs non-perspective
barycentric coordinates when they are needed by the fragment shader.
Fixes the remaining piglit tests of interpolation qualifiers that were
failing:
- interpolation-flat-*-smooth-none
- interpolation-flat-other-flat-none
- interpolation-noperspective-*
- interpolation-smooth-gl_*Color-flat-*
Reviewed-by: Eric Anholt <eric@anholt.net>
The name was misleading. The actual effect of the bit is to cause
the clipper to emit *non-perspective* barycentric coordinate
information (which is only needed when doing noperspective
interpolation).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch changes how fs_visitor::emit_general_interpolation()
decides what kind of interpolation to do. Previously, it used the
shade model to determine how to interpolate colors, and used smooth
interpolation on everything else. Now it uses
ir_variable::determine_interpolation_mode(), so that it respects GLSL
1.30 interpolation qualifiers.
Fixes piglit tests interpolation-flat-*-smooth-{distance,fixed,vertex}
and interpolation-flat-other-flat-{distance,fixed,vertex}.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch modifies the fragment shader back-end so that instead of
using a single delta_x/delta_y register pair to store barycentric
coordinates, it uses an array of such register pairs, one for each
possible intepolation mode.
When setting up the WM, we intstruct it to only provide the
barycentric coordinates that are actually needed by the fragment
shader--that is computed by brw_compute_barycentric_interp_modes().
Currently this function returns just
BRW_WM_PERSPECTIVE_PIXEL_BARYCENTRIC, because this is the only
interpolation mode we support. However, that will change in a later
patch.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch modifies the special case in
fs_visitor::split_virtual_grfs() that prevents splitting from being
applied to the delta_x/delta_y register pair (this register pair needs
to remain contiguous so that it can be used by the PLN instruction).
When gen>=6, this register pair is in a fixed location, not a virtual
register, so it was in no danger of being split. And
split_virtual_grfs' attempt not to split it was preventing some other
unrelated register from being split.
Reviewed-by: Eric Anholt <eric@anholt.net>
This function determines how a variable should be interpolated based
both on interpolation qualifiers and the current shade model.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, we treated the 'smooth' qualifier as equivalent to no
qualifier at all. However, this is incorrect for the built-in color
variables (gl_FrontColor, gl_BackColor, gl_FrontSecondaryColor, and
gl_BackSecondaryColor). For those variables, if there is no qualifier
at all, interpolation should be flat if the shade model is GL_FLAT,
and smooth if the shade model is GL_SMOOTH.
To make this possible, I added a new value to the
glsl_interp_qualifier enum, INTERP_QUALIFIER_NONE.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch makes GLSL interpolation qualifiers visible to drivers via
the array InterpQualifier[] in gl_fragment_program, so that they can
easily be used by driver back-ends to select the correct interpolation
mode.
Previous to this patch, the GLSL compiler was using the enum
ir_variable_interpolation to represent interpolation types. Rather
than make a duplicate enum in core mesa to represent the same thing, I
moved the enum into mtypes.h and renamed it to be more consistent with
the other enums defined there.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Without this it's possible to wind up in a draw call with the
glBegin/End VBO still in a mapped state. This is a problem for
the SVGA3D driver and probably not good for other HW drivers.
Now that texture borders are gone, we never need to allocate our
textures through non-miptrees, which simplifies some irritating paths.
v2: Remove the !mt support case from intel_map_texture_image()
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Brian Paul <brianp@vmware.com>
This replaces software rendering of textures with the deprecated
1-pixel border (which is always bad, since mipmapping is rather broken
in swrast, and GLSL 1.30 is unsupported) with hardware rendering that
just pretends there was never a border (so you have potential seams on
apps that actually intentionally used the 1-pixel borders, but correct
rendering otherwise).
This doesn't regress any piglit tests on gen6 (since the texwrap
border/bordercolor cases already failed due to broken border color
handling), but regresses texwrap border cases on original gen4 since
those end up sampling the border color instead of the border pixels.
It's a small price to pay for not thinking about texture borders any
more.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
We wanted to reuse this in the Intel driver.
v2: Move the flag to ctx->Const
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Brian Paul <brianp@vmware.com>
The intel driver (and gallium, it looks like, though it doesn't use
these texstore functions at this point) doesn't bother making storage
for textures with 0 width, height, or depth. This avoids them having
to deal with returning a mapping for that nonexistent data.
Fixes assertion failures with an upcoming intel driver change.
Reviewed-by: Brian Paul <brianp@vmware.com>
This can be useful if you want to create a bunch of temporary strings
with a common prefix. For example, when iterating over uniform
structure fields, one might want to create temporary strings like
"pallete.primary", "palette.outline", and "pallette.shadow".
This could be done by overwriting the '.' with a null-byte and calling
ralloc_asprintf_append, but that incurs the cost of strlen("pallete")
every time...when this is already known.
These new functions allow you rewrite the tail of the string, given a
starting index. If the starting index is the length of the string, this
is equivalent to appending.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Consider the following vertex shader and fragment shader:
// vertex shader
varying vec4 v;
uniform vec4 u;
void main() { gl_Position = vec4(0.0); v = u; }
// fragment shader
void main() { gl_FragColor = vec4(0.0); }
Since the fragment shader does not use 'v', it is demoted from a
varying to a simple global variable. Once that happens, the
assignment to 'v' is useless, and it should be removed. In addition,
'u' is no longer active, and it should also be removed.
Performing extra dead code elimination after demoting shader inputs
and outputs takes care of this. This elimination must occur before
assigning uniform locations, or the declaration of 'u' cannot be
removed.
This change *breaks* the piglit test getuniform-01, but that test is
already incorrect. The test uses a vertex shader that assigns to a
user-defined varying, but it has no fragment shader. Since Mesa does
not support ARB_separate_shader_objects (we only support the EXT
version), the linker correctly eliminates the user-defined varying.
The cascading effect is that the uniform queried by the C code of the
test is also (correctly) eliminated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41980
Tested-by: Brian Paul <brianp@vmware.com>
Cc: Bryan Cain <bryancain3@gmail.com>
Cc: Vinson Lee <vlee@vmware.com>
Cc: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
These should be useful for doing transform feedback on Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These are correct to the best of my knowledge, gleaned from a variety of
internal sources. Sadly, the Sandybridge PRM has incorrect limits.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The inconsistency between vs_max_threads and max_vs_entries was rather
annoying. I could never seem to remember which one was reversed, which
made it harder to find quickly. "Max __ Threads" seems more natural.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
According to the docs for 3DSTATE_PS (Gen7+) and 3DSTATE_WM (Gen6),
there is a platform dependent value for the minimum number of pixel
shader threads. It may also vary based on whether WIZ Hashing is on.
For example, Ivybridge requires at least 4 threads if WIZ hashing is
disabled, and 8 if it's enabled. Programming it to use less threads is
illegal. Sandybridge appears to have similar restrictions.
So on newer platforms, INTEL_DEBUG=sing will probably just hang the GPU.
Rather than try to patch it up for newer platforms and extend it to
support geometry shaders, just remove it as it isn't that useful anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For GL_RGB_SCALE and GL_ALPHA_SCALE targets, the API wrapper code
attempts to ensure the parameter is 1.0, 2.0, or 4.0.
This is unnecessary: set_combiner_scale in texenv.c (called by
_mesa_TexEnvfv) already checks this and raises an appropriate error.
It's also incorrect: For glTexEnvx, the API validation code directly
compares the GLfixed input parameter with a floating point constant,
prior to converting fixed-point to floating point.
Fixes an issue in the OpenGL ES 1.1 conformance suite.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Kill the code paths taken when src_mt is null. It is never null, otherwise
there would be a segfault on line 4 of this function:
GLuint width = src_mt->level[level].width;
(Some interleaved lines in the diff make the real diff non-obvious. All
I did was delete some code and then left-shifted what remained to correct
the indentation.)
Reviewed-by: Eric Anholt <eric@aholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
A driver trying to set up builtin uniforms is faced with a problem:
How do I walk the ir_variable structure (representing an array of
structs, or array of matrices, or struct, or whatever), and set up
driver structures so that dereference of that uniform gets the
corresponding ParameterValues[] entry. The rule in general is that
each corresponding vector-sized field of an array of structs is one
builtin uniform state slot. i965 relied on another invariant: each
state slot has a number of unique channel swizzles corresponding to
the number of elements in the field's vector, to avoid needing to walk
the glsl_type in parallel to get at vector_elements.
All of the builtin uniforms followed this behavior, except for
gl_NormalMatrix. That's a mat3 (so 3 vec3s), but it was swizzled as 3
vec4s.
Fixes piglit glsl-fs-normalmatrix.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This is the new name for gl_MaxVaryingFloats now that non-float
varyings exist. Fixes piglit
glsl-1.30/execution/maximums/gl_MaxVaryingFloats
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In commit 3e5d3626, Eric added a homebrew workaround to fix GPU hangs in
the Mesa "engine" demo and oglc's api-texcoord test.
Unfortunately, his PIPE_CONTROL contains a Depth Stall, which
necessitates the post-sync non-zero workaround,
Fixes GPU hangs in Civilization 4, PlaneShift, and 3DMMES.
Hopefully Heroes of Newerth as well, though I haven't tested that.
NOTE: This is candidate for the 7.11 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40324
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41096
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-and-tested-by: Eric Anholt <eric@anholt.net>
I had a colleague hitting issues compiling with an old gcc3.2
system. These patches got them through.
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Use VRAM for static and immutable buffers. This restores the
recently removed r600g winsys behaviour for memory locations.
This also improoves rendering times on the gpu for some
OpenSceneGraph based test cases by about 15%.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Make sure we do not run into the classic ABA problem on buffer object bind,
reusing this name and may be never rebind since we get an new name
that was just deleted and never rebound in between.
The explicit rebinding to the debault object in the current context
prevents the above in the current context, but another context
sharing the same objects might suffer from this problem.
Minor var renaming and comments edited by Brian.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Signed-off-by: Brian Paul <brianp@vmware.com>
Buffer objects may be shared across contexts.
Rework the array attrib push/pop implementation
to be thread safe. Make use of more library functions
for this purpose.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
In the past, swrast_texture_image::Data has been overloaded. It could
either point to malloc'd memory storing texture data, or it could point
to a current mapping of GPU memory.
Now, Buffer always points to malloc'd memory (if we're not using GPU
memory) and Data always points to mapped memory. The next step would
be to rename Data -> Map.
This change also involves adding swrast functions for mapping textures
and renderbuffers prior to rendering to setup the Data pointer. Plus,
corresponding functions to unmap texures and renderbuffers. This is
very much like similar code in the dri drivers.
Only swrast and the drivers that fall back to swrast need these fields now.
This removes the last of the fields related to software rendering from
gl_texture_image.
In lp_build_stencil_op() the incoming 'stencil' var is a 2-element array.
There's a front-face writemask and a back-face writemask but we're ignoring
the later. This patch doesn't fix anything but at least points out the
problem.
v2: Avoid the C99 rounding functions, because I don't trust
get/setting the C99 rounding mode from inside our library not having
other side effects. Instead, open-code roundEven() behavior around
Mesa's IROUND, which we're already testing for C99 rounding mode
safety.
Fixes glsl-1.30/compiler/built-in-functions/round*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
- Fix _GNUC__ typo in both checks
- Fix logic error in check for gcc < 3.4 that breaks for gcc 2.x & older
Without this fix, builds with gcc 3.4.x end up depending on undefined
_mesa_bitcount instead of gcc's __builtin_popcount.
NOTE: This is a candidate for the stable branches.
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Should fall back to shader based decoding (g3dvl) for now.
This is probably broken on systems that support xvmc, because
nouveau_video_buffer_create has no way to know for what api
the buffer is created, so I think this call might need a
separate argument as workaround.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
The phase instance counts are not necessarily redeclared so with
the separation of declarations and instructions we wouldn't know
which instance count applies to which phase.
Correct linkage requires examining the signature itself, it cannot
be reconstructed from declarations only since unused registers may
have been omitted from them.
We don't want to clutter the code or handicap new hardware for
the sake of ancient GPUs on which d3d1x won't ever be used,
much less be fully compliant, anyway.
We were mis-computing the size of the user-space vertex buffer in
some circumstances. This led to a failed assertion at u_inlines.h:222
when using the VMware svga driver.
For example, if we had arrays such as:
array[0]: element_offset = 12, stride = 24
array[1]: element_offset = 0, stride = 24
We'd mistakenly compute 'bytes' to be 12 bytes too small.
I've reorganized the function too. By time it's called, we know that
we've got interleaved arrays either all in one VBO or all in user memory
and the stride is equal for all arrays.
Move the code that lived inside the attr==0 test after the loop.
In the loop we compute the true vertex size. That size factors into the
pipe->redefine_user_buffer() call later. Using the vertex size instead
of array[0]'s element_offset fixes the failed assertion.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Setting MaxIfDepth to UINT_MAX effectively means "don't lower anything."
Explicitly checking for this common case allows us to avoid walking the
IR, computing nesting levels, and so on.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bryan Cain <bryancain3@gmail.com>
Commit 488fe51cf8 converted the EmitNoIfs
flag to MaxIfDepth, an unsigned integer saying "flatten if-statements
nested beyond this depth."
Unfortunately, i965 left this initialized to 0, which made ir_to_mesa
attempt to flatten all if-statements. We didn't notice right away
because we usually throw away ir_to_mesa's code in favor of the native
VS and FS backends...but this still creates a lot of unnecessary work.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Now that this is identical to gen6_wm_constants, just use that instead.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it match gen6_prepare_wm_push_constants. For some reason, it
had been using AUB_TRACE_NO_TYPE.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We definitely want CACHE_NEW_WM_PROG, not CACHE_NEW_VS_PROG.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The DDX may allocate a buffer with a too small size.
Instead of failing, let's pretend everything's alright.
Such bugs should be fixed in the DDX, of course.
NOTE: This is a candidate for the stable branches.
The condmod instruction ends up generating garbage condition codes,
because apparently the comparison happens on the accumulator value (33
bits for UD), not the truncated value that would be written.
Fixes vs-op-neg-*
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The condmod instruction ends up generating garbage condition codes,
because apparently the comparison happens on the accumulator value (33
bits for UD), not the truncated value that would be written.
Fixes fs-op-neg-*
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When there is no ARB_vertex_program program enabled, the Current
pointer points at a default program, so we were always using
VERTEX_PROGRAM_TWO_SIDE, even for fixed function lighting.
Fixes piglit two-sided-lighting*
Reviewed-by: Brian Paul <brianp@vmware.com>
From the GL 2.1 specification, page 114 (page 128 of the PDF):
"The version of PixelStore that takes a floating-point value
may be used to set any type of parameter; if the parameter is
boolean, then it is set to FALSE if the passed value is 0.0
and TRUE otherwise, while if the parameter is an integer, then
the passed value is rounded to the nearest integer."
Fixes piglit roundmode-pixelstore.
Note: This is a candidate for the 7.11 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Simply generate GL_INVALID_OPERATION error at display list mode. As
explained by Brian, we are going to access PBO data at compile time.
No need to defer the error at execution time.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Using IROUND() to convert a float depth value to a 32-bit uint Z value.
didn't work (it returns a signed value). Just use a cast instead
Fixes piglit fbo-depth-array failure with swrast.
Note: this is a candidate for the 7.11 branch.
This code isn't really relevant since the kernel takes care not
to destroy busy GMR buffers.
Also with the advent of fence objects, the code was incorrect since
it didn't refcount fence handles.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Wrap _mesa_unpack_bitmap to handle the case that data is stored in pixel
buffer object.
This would make calling Bitmap with data stored in PBO by display list work.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: quote the spec; explicitly exclude the GL_BITMAP case to make code
more readable. (comments from Ian)
v3: Cast the offset by GLintptr to remove the compile warning(comments
from Brian).
I also found that I should use _mesa_sizeof_packed_type() instead,
as it includes packed pixel type, like GL_UNSIGNED_SHORT_5_6_5.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The patch(based on the reading of the emulator) came from while I was
trying to fix the oglc pbo texImage.1PBODefaults fail. This case
generates a texture with the width and height equal to window's width
and height respectively, then try to texture it on the whole window.
So, it's exactly one texel for one pixel. And, the min filter and mag
filter are GL_LINEAR. It runs with swrast OK, as expected. But it failed
with i965 driver.
Well, you can't tell the difference from the screen, as the error is
quite tiny. From my digging, it seems that there are some tiny error
happened while getting tex address. This will break the one texel for
one pixel rule in this case. Thus the linear result is taken, with tiny
error.
This patch would fix all oglc pbo subcase fail with the same issue on
both ILK, SNB and IVB.
v2: comments from Ian, make the address_round filed assignment consistent.
(the sampler is alread memset to 0 by the xxx_update_samper_state
caller, so need to assign 0 first)
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Generate the program parameters list by walking the IR instead of by
walking the list of linked uniforms. This simplifies the code quite a
bit, and is probably a bit more correct. The list of linked uniforms
should really only be used by the GL API to interact with the
application.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: Bryan Cain <bryancain3@gmail.com>
Cc: Eric Anholt <eric@anholt.net>
Having a few of these includes or forward declarations inside the
'extern "C"' block can cause problems later. Specifically, it
prevents C++ linkage functions from being added to ir_to_mesa.h and
makes G++ angry if 'struct foo' is seen both inside and outside an
'extern "C"'.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fold _mesa_get_active_uniform into its only caller in the process.
More changes are coming soon.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This simplificiation was enabled by the earlier refactors that
eliminated the references to the assembly shaders stored in the
gl_shader_program structure.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Finding this bit in the documentation proved challenging. It wasn't in
the SEND instruction's message descriptor section, nor the data port
message descriptor section. It turns out to be part of the Render
Target Write message's control bits, and in the documentation is named
"Last Render Target Select".
Shaders that use Multiple Render Targets should set this bit on the last
RT write, but not on any prior ones.
The GPU does update the Pixel Scoreboard appropriately, but doesn't
document this bit as directly causing a scoreboard clear.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
After printing the details of a specific message, we always print out
the message length and response length with nice "mlen" and "rlen"
labels.
For Gen5+ URB writes, we were dumping mlen and rlen a second time:
urb 0 urb_write interleave used complete mlen 5, rlen 0 mlen 5 rlen 0
Also, for Gen6 data port messages, we were including mlen and rlen in
the tuple of undecipherable integers.
Both of these are completely redundant. So, remove them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Every brw_set_???_message function had duplicated code, per-generation,
to set the Message Descriptor and Extended Message Descriptor bits
(SFID, message length, response length, header present, end of thread).
However, these fields are actually specified as part of the SEND
instruction itself; individual types of messages don't even specify
them (except for header present, but that's in the same bit location).
Since these are exactly the same regardless of the message type, just
create a function to set them, using the generic message structs. This
not only shortens the code, but hides a lot of the per-generation
complexity (like the SFID being in destreg__conditionalmod) in one spot.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The existing code asserted that eot == 0, as it doesn't make sense for
a thread to sample a texture as the last thing it does.
It doesn't make much sense to pass around a dead parameter either.
Especially for a function which already has a long parameter list.
So, remove the parameter and just set EOT to 0.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When reading the data port code, it was not clear to me what these
values meant, nor where I could find them in the documentation.
Especially since the latest BSpec and older PRMs document them in
radically different places...neither of which are near the descriptions
of individual messages.
Cite the documentation, and rename them to SFID to signify that these
are Shared Function IDs that one can read about in the GPU overview,
rather than arbitrary bitfields. While we're add it, make them an enum.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, we use the Render Cache for scratch access (read/write data)
and the Sampler Cache for all read only data (pull constants).
Reversing the condition here is clearer: if the caller requested the
Render Cache, use that. Otherwise, they requested the Data Cache
(which does not exist on Gen6) or Sampler Cache, so use the Sampler
Cache.
This should not change behavior in any way.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Using the constant cache for reads isn't going to work for scratch
reads (variably-indexed arrays or register spills), as these aren't
constant at all.
Also, in the new VS backend, use the proper message number for OWord
Dual Block Write messages. It's now 10, instead of 9.
+205 piglits.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
While reviewing some compiler cleanups I'd sent out, Paul noticed that
tree grafting wasn't taking "out" parameters into account.
Further investigation revealed that it isn't strictly necessary: ir_call
ends basic blocks, and tree grafting currently only operates on basic
blocks. So calls already kill grafts.
However, just to be safe, this patch makes "out" parameters explicitly
kill grafts. Paul and I both prefer this. It's a bit clearer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The 'mode' param is a bitset of GL_MAP_READ_BIT, GL_MAP_WRITE_BIT.
A future commit will perform buffer resolves in intel_region_map(). So,
even though the access mode is irrelevant to the GTT, the extra
information allows us to intelligently avoid unneccessary buffer resolves.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the following to the vtbl:
hiz_resolve_depthbuffer
hiz_resolve_hizbuffer
For all drivers for which HiZ is not enabled, the methods are set to be
no-ops. If HiZ is enabled, the methods are currently to set to empty
stubs.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
intel_context::gen field is set by intelInitContext(). So, by calling
intelInitContext() before initializing the vtable, we can can construct
different vtables for different gens.
Specifically, this allows us to set the HiZ operations to be no-ops for
contexts for which HiZ is not enabled.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
During anholt's MapTextureImage refactoring, the call to
intel_tex_image_s8z24_create_renderbuffers was missplaced. It needs to
occur *after* the miptree is allocated.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Don't dereference the color buffer if one isn't attached.
This fixes the following Piglit tests in my experimental HiZ branch:
glean/logicOp
glean/paths
Note: This is a candidate for the stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
This is necessary because i965 will need to call vbo_bind_array() when
cleaning up after a buffer resolve meta-op.
Detailed Explanation
--------------------
The vbo module tracks vertex attributes separately from the gl_context.
Specifically, the vbo module maintins vertex attributes in
vbo_exec_context::array::inputs, which is synchronized with
gl_context::Array::ArrayObj::VertexAttrib by vbo_bind_array().
vbo_draw_arrays() calls vbo_bind_array() to perform the synchronization
before calling the real draw call, vbo_context::draw_arrays.
Intel hardware accomplishes buffer resolves with a meta-op. Frequently,
that meta-op must be performed within glDraw* in the moment immediately
before the draw occurs (The hardware designers hate us...). After
performing the meta-op, but before calling vbo_bind_array(), the
gl_context's vertex attributes will have been restored to their original
state (that is, their state before the meta-op began), but the vbo
module's vertex attribute are those used in the last meta-op. Therefore we
must manually synchronize the two with vbo_bind_array() before continuing
with the original draw command (that is, the one requested with glDraw*).
See brw_predraw_resolve_buffers(), which will be added in a future commit.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
This hook allows the driver to prepare for a glBegin/glEnd.
i965 will use the hook to avoid avoid recursive calls to FLUSH_VERTICES
during a buffer resolve meta-op.
Detailed Justification
----------------------
When vertices are queued during a glBegin/glEnd block, those vertices must
of course be drawn before any rendering state changes. To enusure this,
Mesa calls FLUSH_VERTICES as a prehook to such state changes. Therefore,
FLUSH_VERTICES itself cannot change rendering state without falling into
a recursive trap.
This precludes meta-ops, namely i965 buffer resolves, from occuring while
any vertices are queued. To avoid that situation, i965 must satisfy the
following condition: that it queues no vertex if a buffer needs resolving.
To satisfy this, i965 will use the PrepareExecBegin hook to resolve all
buffers on entering a glBegin/glEnd block.
--------
v2: Don't add dd_function_table::CleanupExecEnd. Anholt and I discovered
that hook to be unnecessary.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
In some cases, Intel hardware requires that depth and stencil buffers be
separate. To accommodate swrast, i965 resorts to hackery that causes
a segfault in the fastpaths of draw_depth_stencil_pixels() and
read_depth_stencil_pixels().
The hack is that i965 sets framebuffer->Attachment[BUFFER_DEPTH].Renderbuffer
and framebuffer->Attachment[BUFFER_STENCIL].Renderbuffer to a dummy
renderbuffer for which the GetRow accessors and friends are null. The real
buffers are located at framebuffer->_DepthBuffer and framebuffer->_Stencilbuffer.
To fix the segault, this patch skips the fastpath if
framebuffer->Attachment[BUFFER_DEPTH].Renderbuffer->GetRow is null.
Note: This is a candidate for the 7.11 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When i965 uses (in the near future) meta-ops to perform buffer resolves,
the meta-op stack exceeds depth 2. I bumped it to 8 because... 8 is bigger
than 2, but not too big.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
If this flag is set, then _mesa_meta_begin/end will save/restore the state of
GL_SELECT and GL_FEEDBACK render modes.
Intel's future buffer resolve meta-ops will require this, since buffer resolves
may occur when the GL_RENDER_MODE is GL_SELECT.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
This is required in order for meta-ops to save/restore the GL_RENDER_MODE
state.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
I initially produced the patch using this bash command:
for file in {intel,i915,i965}/*.{c,cpp,h}; do [ ! -h $file ] && sed -i
's/GLboolean/bool/g' $file && sed -i 's/GL_TRUE/true/g' $file && sed -i
's/GL_FALSE/false/g' $file; done
Then I manually added #include <stdbool.h> to fix compilation errors,
and converted a few functions back to GLboolean that were used in core
Mesa's function pointer table to avoid "incompatible pointer" warnings.
Finally, I cleaned up some whitespace issues introduced by the change.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Chad Versace <chad@chad-versace.us>
Acked-by: Paul Berry <stereotype441@gmail.com>
It was previously under gpu_shader4, but I'm pretty sure everyone's
going to be doing GLSL 1.30 first (since gpu_shader4 is basically 1.30
plus a bunch of extra stuff).
Fixes piglit glsl-1.30/texel-offset-limits.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When saving the active program in _mesa_meta_begin, it was actually
saving the fragment program instead. This means that if the
application binds a program that only has a vertex shader then when
the meta saved state is restored it will forget the bound program.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41969
Reviewed-by: Chad Versace <chad@chad-versace.us>
This is a step towards providing a direct route for drivers accepting
GLSL IR for codegen. Perhaps more importantly, it runs the fixed
function fragment program through the GLSL IR optimization. Having
seen how easy it is to make ugly fixed function texenv code that can
do unnecessary work, this may improve real applicatinos.
On converting fixed function programs to generate GLSL, the linker
became cranky that we were trying to make something that wasn't a
linked vertex+fragment program. Given that the Mesa GLES2 drivers
also support desktop GL with EXT_sso, just telling the linker to shut
up seems like the easiest solution.
As pointed out by Michel Dänzer, gcc -lstdc++ doesn't work on all systems,
because it may require other libraries which are only pulled in implicitly
by g++. And libstdc++ is available only with GNU compiler.
Use c++ compiler for linking and remove redundant LDFLAGS += -lstdc++
all over the tree.
In addition to setting up the flags correctly, this renames the
generated libraries to ensure they get 'Mangled' in the name.
This is very useful for distros and the like, where mangled Mesa
and non-mangled GL libraries typically need to be installed
side-by-side.
Reviewed-by: Dan Nicholson <dbn.lists@gmail.com>
Scalar instruction that need to write to the xyz components of a
register must reserve the RGB instruction slot for a REPL_ALPHA
instruction. With this commit, the scheduler will attempt to free
the RGB slot by moving the write to the w component of a register.
Introuduce a simple function called copy_data to do the image data copy
stuff for all the save_CompressedTex*Image function. The function check
the NULL data case to avoid some potential segfault. This also would
make the code a bit simpler and less redundance.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Fix is in {read,draw}_depth_stencil_pixels(). If depthRb == stencilRb,
then it is redundant to check depthRb->x *and* stencilRb->x.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
For glReadPixels, the user supplied pixels have format
GL_UNSIGNED_INT_24_8. But, when the depthstencil buffer's format was
MESA_FORMAT_S8_Z24, the fastpath read from the buffer without reordering
the depth and stencil bits. To fix this, this patch just skips the
fastpath when the format is not MESA_FORMAT_Z24_S8.
The problem and fix for glWritePixels is analagous.
Fixes the Piglit tests below on i965/gen6 and causes no regressions.
general/depthstencil-default_fb-drawpixels-24_8
general/depthstencil-default_fb-readpixels-24_8
EXT_packed_depth_stencil/fbo-depthstencil-GL_DEPTH24_STENCIL8-drawpixels-24_8
EXT_packed_depth_stencil/fbo-depthstencil-GL_DEPTH24_STENCIL8-readpixels-24_8
Note: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
This is required for an accurate implementation of d3d1x's
CheckFormatSupport query.
It also seems generally useful for state trackers, which could
choose alternative rendering paths or formats if blending would
come at a significant performance loss.
The texture semaphore allows for prefetching of texture data. On my
RV515, this increases the FPS of Lightsmark by 33% (This is with the
reg_rename pass enabled, which is enabled in the next commit).
There is a new env variable now called RADEON_TEX_GROUP, which allows
you to specify the maximum number of texture lookups to do at once.
The default is 8, but different values could produce better results
for various application / card combinations.
We no longer emit full instructions immediately after they have been
merged. Instead merged instructions are added to the ready list and
the scheduler can commit them whenever it wants.
This is supported by the pseudo-code on pages 27 and 28 (pages 41 and
42 of the PDF) of the OpenGL 2.1 spec. The last part of the
implementation of ArrayElement is:
if (generic attribute array 0 enabled) {
if (generic vertex attribute 0 array normalization flag is set, and
type is not FLOAT or DOUBLE)
VertexAttrib[size]N[type]v(0, generic vertex attribute 0 array element i);
else
VertexAttrib[size][type]v(0, generic vertex attribute 0 array element i);
} else if (vertex array enabled) {
Vertex[size][type]v(vertex array element i);
}
Page 23 (page 37 of the PDF) of the same spec says:
"Setting generic vertex attribute zero specifies a vertex; the
four vertex coordinates are taken from the values of attribute
zero. A Vertex2, Vertex3, or Vertex4 command is completely
equivalent to the corresponding VertexAttrib* command with an
index of zero."
Fixes piglit test attribute0.
NOTE: This is a candidate for stable branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Don't allow any "CPU" buffers to be allocated by the pb_fenced
buffer manager, since we can't protect against failures during
buffer validation.
Also, add an extra slab buffer manager to allocate buffers from
the kernel if there is a failure to allocate from our big buffer pool.
The reason we use a slab manager for this, is to avoid allocating
many very small buffers from the kernel.
v2: Increased VMW_MAX_BUFFER_SIZE and fixed some comments.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Returns a configuration that makes the dri state-tracker-manager
throttle.
Also disable kernel-based throttling.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Hooks up throttling if there is a configuration function present and
it indicates that throttling is desired.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Adds a possibility for the state tracker manager to query the
target for a specific configuration.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
But don't hook it up just yet until we figure out a good way to do that.
Also, we should, in the future, add driconf options to control what
throttling reasons should be honored, and the number of outstanding
swaps allowed.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
The X server has limited throttle support on the server side,
but doing this in the client has some benefits:
1) X server throttling is per client. Client side throttling can be done
per drawable.
2) It's easier to control the throttling based on what client is run,
for example using "driconf".
3) X server throttling requires drm swap complete events.
So implement a dri2 throttling extension intended to be used by direct
rendering clients.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Michel Dänzer <michel@daenzer.net>
This change releases the stw_framebuffer::mutex past creation of
the pbuffer stw_framebuffer. Without this change the pbuffers
lock is never released. Since on win32 mutexes are recursive, this
does not hurt as long as all actions on a context are done from
the same thread. But if, for example, context creation happens in
a different thread than usage, every access to the context will
block for ever.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
In commit 018ea68d87, when I
de-compacted clip planes on Gen6+, I updated both the old and new VS
back-ends to reflect the change in how clip planes are stored, but I
failed to change the code in gen6_vs_state.c that uploads clip plane
constants when using the old VS back-end.
As a result, if the set of enabled clip planes wasn't contiguous
starting with 0, then clipping would not occur properly. This patch
corrects gen6_vs_state.c to upload clip plane constants in the new
de-compacted form.
This only affects the old VS back-end (which is used for
fixed-function and ARB vertex programs, not for GLSL vertex shaders).
Fixes Piglit test fixed-clip-enables.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41603
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes a bug where we'd wind up emitting an invalid instruction like
MOVE R[0]., R[1]; - note the empty/zero writemask. If we don't write to
any dest register channels, cull the instruction.
v2: simply change/fix the existing test for instruction culling.
Instead of the renderbuffer pointer. In the future, attaching a texture
may not mean the renderbuffer pointer gets set too.
Plus, remove some commented-out assertions.
v2: add a 'reading' parameter to distinguish between reading and writing
to the renderbuffer (we don't want to check if _ColorReadBuffer is null
when we're about to draw). Eric found this mistake.
These functions were only called in framebuffer.c where they were defined.
Remove the unneeded attIndex parameter too.
Reviewed-by: Eric Anholt <eric@anholt.net>
What I would prefer to assert is that, for each region that is currently
mapped, no batch is emitted that uses that region's bo. However, it's much
easier to implement this big hammer.
Observe that this requires that the batch flush in intel_region_map() be
moved to within the map_refcount guard.
v2: Add comments (borrowed from anholt's reply) explaining why the
assertion is a good idea.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When updating a register reference to reflect the fact that we were
taking its absolute value, the fragment shader back-end failed to
clear the negate flag, resulting in abs(-x) getting computed as
-abs(x).
I also found (and fixed) a similar problem in brw_eu.h, but I'm not
aware of an actual manifestation of that problem.
Fixes piglit test glsl-fs-abs-neg-with-intermediate.
brw_set_compression_control took a GLboolean as an argument, then
promptly used a switch statement to compare it with various enumeration
values. Clearly it's not actually a boolean.
Introduce a new enumeration type, enum brw_compression, and use that.
Found by converting GLboolean to bool; clang then gave warnings about
switching on a boolean and ultimately duplicated case errors.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad@chad-versace.us>
Neither OES_framebuffer_object nor EXT_framebuffer_object allow
querying the window system FBO.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Previously GL_DEPTH_BUFFER and GL_STENCIL_BUFFER were (incorrectly)
allowed for both. Those enums don't even really exist! Now GL_DEPTH
and GL_STENCIL are only allowed for the window system FBO.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This adds support to the clear and tile caches for integer storage
and clearing, avoiding any floating paths.
Signed-off-by: Dave Airlie <airlied@redhat.com>
these are never USCALED, always UINT in reality.
taken from some work by Christoph Bumiller
v2: fixup formatting of table + tabs
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously it was getting set in draw_set_mapped_constant_buffer() but
if there were no shader constants, that function wasn't called. So the
pt.user.planes field was null and we died when we tried to access the
clip planes in the LLVM-generated code.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=41663
Note: This is a candidate for the 7.11 branch.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Instead of 12 use DRAW_TOTAL_CLIP_PLANES. The max number of user-defined
clip planes was increased to 8 so the total number of planes is 14.
This doesn't fix any specific bug, but clearly the old code was wrong.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
For example, GL_TRIANLGES is converted to _3DPRIM_TRILIST.
The conversion is necessary because HiZ and MSAA resolve operations emit
a 3DPRIM_RECTLIST, which cannot be conveyed by GLenum.
As a consequence, brw_gs_prog_key.primitive is also converted.
v2
----
- [anholt] Split brw_set_prim into brw/gen6 variants in previous commit,
since not much code is really shared between the two.
- [anholt] Replace switch statements with table lookups, since this is
a hot path.
Reviewed-by: Eric Anholt <eric@anho.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
The "slight optimization to avoid the GS program" in brw_set_prim() is not
used by Gen 6, since Gen 6 doesn't use a GS program. Also, Gen 6 doesn't use
reduced primitives.
Also, document that intel_context.reduced_primitive is only used for Gen < 6
Reviewed-by: Eric Anholt <eric@anho.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
now that we have integer texture types I can drop this workaround so that
copies of values is done properly (as floats would fail on some corner cases).
Signed-off-by: Dave Airlie <airlied@redhat.com>
glDeleteProgram should only be able to remove the one refcount for the
user's reference to the program from the hash table (even though that
ref does live on in the hash table until the last other ref is
removed).
Fixes piglit ARB_shader_objects/delete-repeat.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
PIPE_CONTROL reported time stamp are 64 bits value incrementing every
80 ns, and only the low 32 bits are active (high 32 are always 0).
v2: Cleaned up whitespace, function arguments (anholt).
Fixes piglit EXT_timer_query/time-elapsed
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
The rest of the linker/glsl translation code checks for NULL, so I suppose we should check here too. Fixes crash on exit with i915g instanced drawing.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If there is not enough space in pushbuffer for fence emission
(nouveau_fence_emit -> nv50_screen_fence_emit -> MARK_RING),
the pushbuffer is flushed, which through flush_notify ->
nv50_default_flush_notify -> nouveau_fence_update marks currently
emitting fence as flushed. But actual emission is done after this mark.
So later when there is a need to wait on this fence and pushbuffer
was not flushed in between, fence wait will never finish causing
application to hang.
To fix this, introduce new fence state between AVAILABLE and EMITTED,
set it before emission and handle it everywhere.
Additionally obtain fence sequence numbers after possible flush in
MARK_RING, because we want to emit fences in correct order.
Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
Note: This is a candidate for the 7.11 branch.
We need add a new set of fragment shader variants, along with new vertex
elements for signed and unsigned clears.
The new fragment shader variants are due to the integers values requiring
CONSTANT interpolation. The new vertex element descriptions are for passing
the clear color as an unsigned or signed integer value.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds the various mesa->gallium and gallium->mesa format conversions
along with the GL->gallium texture choosers for integers.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This add support for unsigned/signed integer types via adding a 'pure' bit
in the format description table. It adds 4 new u_format get/put hooks,
for get/put uint and get/put sint so that accessors can get native access
to the integer bits. This is used to avoid precision loss via float converting
paths.
It doesn't add any float fetchers for these types at the moment, GL doesn't
require float fetching from these types and I expect we'll introduce a lot
of hidden bugs if we start allowing such conversions without an API mandating
it.
It adds all formats from EXT_texture_integer and EXT_texture_rg.
0 regressions on llvmpipe here with this.
(there is some more follow on code in my gallium-int-work branch, bringing
softpipe and mesa to a pretty integer clean state)
v2: fixup python generator to get signed->unsigned and unsigned->signed
fetches working.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes up the integer format choosing to pick the closest mesa format
then the most likely fallback.
(the formatting in this file needs cleaning in another patch).
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds a simple packing for GL_UNSIGNED_INT/GL_INT destination formats.
This is enough for at least the gallium drivers to pack both unsigned and signed types for read pixels.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Most of these functions used three spaces for the first level of
indentation, but four spaces for the next level. One used tabs and then
three spaces. Some used 3/4 in a then block but 3/3 in the else block.
Normally I try to avoid field days like this, but since the functions
were so inconsistent, even internally, it was making it difficult to
edit without introducing spurious whitespace changes.
So, just get it over with. git diff -b shows 0 lines changed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
i915 and i830 hardware doesn't have HiZ, so remove all HiZ related
assertions from *update_draw_buffer().
I've removed the dead format checks completely rather than replace them
with more appropriate checks. This doesn't reduce "assertion coverage",
however, because when I added these HiZ related assertions in c8fdf66
there were no pre-existing checks there.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Silences a warning about comparing to an unsigned variable. It looks like
the result of swizzle_for_size() is always assigned to unsigned vars.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This fixes failures found with the new piglit texsubimage test.
Two things were broken:
1. The dxt code doesn't handle sources images where width != row stride.
Check for that and take the _mesa_make_temp_ubyte_image() path to get
an image where width = rowstride.
2. If we don't take the _mesa_make_temp_ubyte_image() path we need to
take the source image unpacking parameters into account in order to
get the proper starting memory address of the source texels.
Note: This is a candidate for the 7.11 branch.
Docs say that default shader input color input need to be spec
as ARGB8888. And a clear rect prim essentially uses this value
instead of default diffuse. Depth on the other hands is an ieee
32 bit float. Clear stencil is U8.
Completely different are the clear values for zone init prims.
These are speced in the actual output pixel layout (and need
to be repeated for 16 bit formats).
Clear up the confusion by adding some comments.
v2: Retain the target swizzling support added by Stephan Marchesin.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Previously, if the user enabled a non-consecutive set of clip planes
(e.g. 0, 1, and 3), the driver would compact them down to a
consecutive set starting at 0. This optimization was of dubious
value, and complicated the implementation of gl_ClipDistance.
This patch changes the driver so that with Gen6 and later chipsets, we
no longer compact the clip planes. However, we still discard any clip
planes beyond the highest number that is in use, so performance should
not be affected for applications that use clip planes consecutively
from 0.
With chipsets previous to Gen6, we still compact the clip planes,
since the pre-Gen6 clipper thread relies on this behavior.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The only remaining uses of brw_vs_prog_key::nr_userclip only occurred
when using clip planes (as opposed to gl_ClipDistance). This patch
renames the value to nr_userclip_planes and sets it to zero when
gl_ClipDistance is in use. This avoids unnecessary VS recompiles.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, brw_compute_vue_map required an argument indicating the
number of clip planes in use, but all it did with it was check if it
was nonzero.
This patch changes brw_compute_vue_map to take a boolean instead.
This allows us to avoid some unnecessary recompilation of the Gen4/5
GS and SF threads.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previous to this patch, setup_uniform_clipplane_values() was setting
up clip plane uniforms based on ctx->Transform.ClipPlanesEnabled, a
piece of state not stored in the vertex shader cache key. As a
result, a change to this piece of state might not trigger a necessary
vertex shader recompile.
The patch adds a field to the vertex shader cache key,
userclip_planes_enabled, to store the current value of
ctx->Transform.ClipPlanesEnabled. Also, it changes
setup_uniform_clipplane_values() to read from this new field, so that
it's manifestly clear that the vertex shader isn't depending on state
not stored in the cache key.
Note: when the vertex shader uses gl_ClipDistance, the VS backend
doesn't need to know which clip planes are in use, so we leave the
field as zero in that case to avoid unnecessary recompiles.
Fixes Piglit test vs-clip-vertex-enables.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
No functional change. This patch rearranges the struct
brw_vs_prog_key so that the two fields related to clipping are
together, and documents those fields. This should make the patches
that follow easier to comprehend, since they add additional
clipping-related fields to this structure.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The i965 driver already had a function to count bits in a 64-bit uint
(brw_count_bits()), but it was buggy (it only counted the bottom 32
bits) and it was clumsy (it had a strange and broken fallback for
non-GCC-like compilers, which fortunately was never used). Since Mesa
already has a _mesa_bitcount() function, it seems better to just
create a _mesa_bitcount_64() function rather than special-case this in
the i965 driver.
This patch creates the new _mesa_bitcount_64() function and rewrites
all of the old brw_count_bits() calls to refer to it.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes the ES1 conformance 'userclip' test, which broke when we increased
MAX_CLIP_PLANES to 8. Core Mesa already validates incoming values
against MAX_CLIP_PLANES; we just need the ES wrapper to pass everything
through.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's required for ES 1.0 and 1.1, and isn't specified for ES 2.
While the comment says Mesa depends on it internally, removing it from
ES2 doesn't seem to regress any Piglit or ES2 conformance tests.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In particular, drivers don't enable this in ES 1.1 contexts.
Prior to this, none of the OpenGL ES 1.1 conformance tests passed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Since core Mesa no longer depends on gl_texture_image::Data pointing to
mapped texture buffers we don't have to mess with it all over the place
in the state tracker. Now Data is only used to point to malloc'd memory
that holds images which don't fit in the texture object's mipmap buffer.
These were used to find the start of a 3D image slice (or 2D array texture
slice) given a base address. Instead, use a simple array of address of
image slices instead.
This is a step toward getting rid of the gl_texture_image::ImageOffsets
field.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch implements proper support for gl_ClipVertex by causing the
new VS backend to populate the clip distance VUE slots using
VERT_RESULT_CLIP_VERTEX when appropriate, and by using the
untransformed clip planes in ctx->Transform.EyeUserPlane rather than
the transformed clip planes in ctx->Transform._ClipUserPlane when a
GLSL-based vertex shader is in use.
When not using a GLSL-based vertex shader, we use
ctx->Transform._ClipUserPlane (which is what we used prior to this
patch). This ensures that clipping is still performed correctly for
fixed function and ARB vertex programs. A new function,
brw_select_clip_planes() is used to determine whether to use
_ClipUserPlane or EyeUserPlane, so that the logic for making this
decision is shared between the new and old vertex shaders.
Fixes the following Piglit tests on i965 Gen6:
- vs-clip-vertex-const-accept
- vs-clip-vertex-const-reject
- vs-clip-vertex-different-from-position
- vs-clip-vertex-equal-to-position
- vs-clip-vertex-homogeneity
- vs-clip-based-on-position
- vs-clip-based-on-position-homogeneity
- clip-plane-transformation clipvert_pos
- clip-plane-transformation pos_clipvert
- clip-plane-transformation pos
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad@chad-versace.us>
Before this patch, clip planes didn't work properly in Mesa when using
vertex shaders, because Mesa assigned both gl_ClipVertex and
gl_Position to the same gl_vert_result (VERT_RESULT_HPOS). As a
result, backends couldn't distinguish between the two variables, so
any shader that wrote different values to them would fail to work
properly.
This patch paves the way for proper support of gl_ClipVertex by
creating a new enumerated value in gl_vert_result for it
(VERT_RESULT_CLIP_VERTEX). After this patch, a back-end may add
support for gl_ClipVertex using the following algorithm:
- If using a user-supplied GLSL vertex shader:
- If the bit corresponding to VERT_RESULT_CLIP_VERTEX is set in
gl_program::OutputsWritten:
- Clip using the vertex shader output VERT_RESULT_CLIP_VERTEX and
the clip planes defined in gl_context::Transform.EyeUserPlane.
- Else:
- Clip using the vertex shader output VERT_RESULT_HPOS and the
clip planes defined in gl_context::Transform.EyeUserPlane.
- Else (either using fixed function or an ARB vertex program):
- Clip using the vertex shader output VERT_RESULT_HPOS and the clip
planes defined in gl_context::Transform._ClipUserPlane (*)
where (*) represents the normal Mesa behavior before this patch.
An example of implementing the above algorithm can be found in the
patch that follows this one, which implements gl_ClipVertex in i965
Gen6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The previous change was not effective for lines, because there is no
4 planes 4x4 block rasterization path: it is handled by the 16x16 block
case too, and the 16x16 block was not being budged as it should.
This fixes assertion failures on line rasterization.
llvmpipe has a few special rasterization paths for triangles contained in
16x16 blocks, but it allows the 16x16 block to be aligned only to a 4x4
grid.
Some 16x16 blocks could actually intersect the tile
if the triangle is 16 pixels in one dimension but 4 in the other, causing
a buffer overflow.
The fix consists of budging the 16x16 blocks back inside the tile.
This just adds the entries to the table and fixes the asserts up.
The int32 one is definitely wrong, since it uses a float temp
which will lose precision, but its no worse than now.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is taken from reading EXT_texture_integer + EXT_texture_rg in combination,
Comments on necessity of each format, naming of formats and bugs in the
formats tables please.
Is there any formats I've missed?
Eric looked over this to make sure its consistent at least.
As I've changed the ordering of things in the format table, the follow
patches are required to avoid regression.
Signed-off-by: Dave Airlie <airlied@redhat.com>
As per Brian's suggestion we can generate this table at first start
to make sure its correct. This is a sad workaround for compilers which
don't support named initialiser. (its 2011).
Signed-off-by: Dave Airlie <airlied@redhat.com>
Commit d1fda903 (radeon: Drop mapping we were doing around
glGetTexImage()) removed the common Radeon source file
radeon_tex_getimage.c, and pulled it out of the r200, r300, r600, and
radeon makefiles. But it left behind the symlinks that were being
used to share that file among the four directories.
This patch removes the dangling symlinks.
Reviewed-by: Brian Paul <brianp@vmware.com>
We want quad/pixel Z values to be interpolated exactly the same for
multi-pass algorithms. Because of how the optimized Z-test code is
written, we can't cull the first quad in a run even if it's totally
killed. See the comment for more info.
NOTE: This is a candidate for the 7.11 branch.
Instead of relying on the mirror in the Mesa IR assembly shader, just
use the variables actually stored in the GLSL IR. This will be a bit
slower, but nobody cares about the performance of glGetActiveAttrib.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This just folds get_active_attrib into _mesa_GetActiveAttribARB
and moves the resulting function function to the other source file.
More changes are coming soon.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This currently mirrors the state tracking
gl_shader_program::Attributes, but I'm working towards eliminating
that.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This just folds bind_attrib_location into _mesa_BindAttribLocationARB
and moves the resulting function function to the other source file.
More changes are coming soon.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
hash_table_replace doesn't use get_node to avoid having to hash the key twice.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This allows querying the linked shader itself rather than the Mesa IR.
This is the first step towards removing gl_program::Attributes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The symbol table in the linked shaders may contain references to
variables that were removed (e.g., unused uniforms). Since it may
contain junk, there is no possible valid use. Delete it and set the
pointer to NULL.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers in Mesa have supported this extension for eons. This
extension is an optional features in desktop OpenGL (via
GL_ARB_draw_buffers) and OpenGL ES 2.x (via GL_NV_draw_buffers).
The extension is not usable in OpenGL ES 1.x. There is no
glDrawBuffers* entry point in OpenGL ES 1.x contexts, and glGet*v
generate errors when MAX_DRAW_BUFFERS or DRAW_BUFFERi is queried.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This also moves ATI_draw_buffers. This is to facilitate enabling
NV_draw_buffers in OpenGL ES 2.0.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Us poor souls who cross compile mesa want to be able to specify which pkg-config to pick, or at least just change one place.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Until now, we've been treating 1D arrays as a single slice, and each
array slice is actually just a row of the 2D texture. While swrast
still stores them this way, hardware drivers think that 1D arrays have
actual separate slices not stored as contiguous rows.
Reviewed-by: Brian Paul <brianp@vmware.com>
The path for ->Data was failing to be called for the FBO draw offset
fallback, and also had mismatched compressed texture support code.
This drops the intel_prepare_render() in the blit path. We aren't
copying to/from a GL_FRONT buffer, so it doesn't matter.
Too many separate functions each called from one location (in
different files). This code should all die soon when swrast starts
using MapTextureImage.
Before, we were only allocating these from our TexImage, so if the
texture image was set up in any other way (non-accelerated
glGenerateMipmaps()), they'd be missing or wrong.
Now that we can zero-copy generate the mipmaps into brand new
glTexImage()-generated storage using MapTextureImage(), we no longer
need to allocate image->Data in mipmap generate. This requires
deleting the drivers' old overrides of the miptree tracking after
calling _mesa_generate_mipmap at the same time, or the drivers
promptly lose our newly-generated data.
Reviewed-by: Eric Anholt <eric@anholt.net>
Both POW and INT DIV need a message length of 2; previously, we only
checked for POW.
Also, BRW_MATH_FUNCTION_INT_DIV_QUOTIENT_AND_REMAINDER has a response
length of 2; previously, we only checked for SINCOS. We don't use this
message, but in case we ever decide to, we may as well fix it now.
While we're at it, just move these computations into
brw_set_math_message, since they're entirely based on the function.
This fixes it for both brw_math and the old backend's brw_math_16.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Drivers implementing GLSL 1.30 want to do integer modulus, and until we
can stop generating code via ir_to_mesa, it's easier to make it silently
generate rubbish code. Multiply will do.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Classic compiler mistake. In the example below, the OMOD optimization
was combining instructions 4 and 10, but since there was an instruction
(#8) in between them that wrote to the same registers as instruction 10,
instruction 11 was reading the wrong value.
Example of the mistake:
Before OMOD:
4: MAD temp[0].y, temp[3]._y__, const[0]._x__, const[0]._y__;
...
8: ADD temp[2].x, temp[1].x___, -temp[4].x___;
...
10: MUL temp[2].x, const[1].y___, temp[0].y___;
11: FRC temp[5].x, temp[2].x___;
After OMOD:
4: MAD temp[2].x / 8, temp[3]._y__, const[0]._x__, const[0]._y__;
...
8: ADD temp[2].x, temp[1].x___, -temp[4].x___;
...
11: FRC temp[5].x, temp[2].x___;
https://bugs.freedesktop.org/show_bug.cgi?id=41367
Source swizzles for transcendent instructions were being stored in the X
channel regardless of what channel the instruction was writing.
This was causing problems for some helper functions that were expecting
source swizzles to occupy channels corresponding to the instruction's
writemask. This commit makes transcendent instructions follow the same
convention as normal instructions for representing source swizzles.
Previous behavior:
LG2 temp[0].y, input[0].x___;
Current behavior:
LG2 temp[0].y, input[0]._x__;
From the EXT_transform_feedback spec:
Primitives can be optionally discarded before rasterization by calling
Enable and Disable with RASTERIZER_DISCARD_EXT. When enabled, primitives
are discared right before the rasterization stage, but after the optional
transform feedback stage. When disabled, primitives are passed through to
the rasterization stage to be processed normally. RASTERIZER_DISCARD_EXT
applies to the DrawPixels, CopyPixels, Bitmap, Clear and Accum commands as
well.
And the GL 3.2 spec says it applies to ClearBuffer* as well.
Reviewed-by: Brian Paul <brianp@vmware.com>
This reverts commit d631c19db4.
The commit was broken, and ended up returning false all the time
because nobody in the world binds every single possible vertex array.
On further reflection, we don't want to discount stride == 0: This
function is just used for deciding to calculate whether to compute the
bonuds on the index, and there's no sense in computing index bounds
when stride == 0.
For the separate question of "how much data do I upload for this
vertex element?", the i965 driver was fixed to upload the data.
Fixes a regression of about 2x in 3DMMES, and most importantly, makes
Hammerfight playable.
Commit d631c19db4 avoided this problem
by forcing the driver to get the min/max index, but that commit was
broken, so just fix the driver problem (confusion between "do I need
to upload any data?" and "do I need the index bounds in order to
upload any data?").
Generally we're using fragment programs in all our drivers, so wasting
4MB for code that's never called is pretty lame. Reduces i965 memory
allocation for a short shader program from 21,932,128B to 17,737,816B.
As innocuous as it seemed, ebca47a basically broke the world (e.g.,
>200 piglit regressions). In vec4_visitor::emit_block_move,
src->swizzle was expected to be BRW_SWIZZLE_NOOP before setting it to
a swizzle that would replicate the existing channels of the source
type to a vec4 (e.g., .xyyy for a vec2).
The original assertion seems to have been a little bogus. In addition
to being BRW_SWIZZLE_NOOP, src->swizzle might already be a swizzle
that would replicate the existing channels of the source type to a
vec4. In other words, it might already have the value that we're
about to assign to it.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
If GL_NV_texture_env_combine4 is not supported, setting the fourth
combiner term would generate a GL error.
Of course, I noticed this right after committing the previous patch
to use a loop in the first place. <sigh>
Note that GL_EXT_texture_env_combine is always supported so the first
three combiner terms are always accepted.
There's four combiner terms (not 3) with GL_NV_texture_env_combine4.
Use a loop to make the code a little more compact.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The drivers don't need to care about the domains. All they need to set
are the bind and usage flags. This simplifies the winsys too.
This also fixes on r600g:
- fbo-depth-GL_DEPTH_COMPONENT32F-copypixels
- fbo-depth-GL_DEPTH_COMPONENT16-copypixels
- fbo-depth-GL_DEPTH_COMPONENT24-copypixels
- fbo-depth-GL_DEPTH_COMPONENT32-copypixels
- fbo-depth-GL_DEPTH24_STENCIL8-copypixels
I can't explain it.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
I have moved 'last_flush' and 'binding' from r600_bo to winsys/radeon.
The other members are now part of r600_resource.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
We were checking whether render_condition is set. That was not reliable,
because it's always set with trace and noop regardless of driver support.
Reviewed-by: Brian Paul <brianp@vmware.com>
This removes:
- PIPE_CAP_MAX_TEXTURE_IMAGE_UNITS
- PIPE_CAP_MAX_VERTEX_TEXTURE_UNITS
in favor of the that new per-shader cap.
Reviewed-by: Brian Paul <brianp@vmware.com>
All drivers support it (well, except Cell). The boolean option is going away
from core Mesa too.
This is a follow-up to Ian Romanick's patch
"mesa: Remove ARB_texture_mirrored_repeat extension enable flag".
Reviewed-by: Brian Paul <brianp@vmware.com>
This is from a Coverity defect report.
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
1314 void
1315 vec4_visitor::emit_block_move(dst_reg *dst, src_reg *src,
1316 const struct glsl_type *type, bool
predicated)
...
1351 /* Do we need to worry about swizzling a swizzle? */
->1352 assert(src->swizzle = BRW_SWIZZLE_NOOP);
1353 src->swizzle = swizzle_for_size(type->vector_elements);
Reported-by: Vinson Lee <vlee@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40158
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As written, this test correctly raises an error for #elif being used
with an undefined macro (and not as an argument to "defined"). If the
preceding #if were '#if 1' then this diagnositc would correctly be
hidden. That allows code such as the following to not raise an error:
#ifndef MAYBE_UNDEFINED
#elif MAYBE_UNDEFINED < 5
...
#endif
So this test case is working as expected already. We add it here just
to improve test coverage.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Carl Worth <cworth@cworth.org>
The specification reserves any macro name containing two consecutive
underscores, (anywhere within the name). Previously, we only raised
this error for macro names that started with two underscores.
Fix the implementation to check for two underscores anywhere, and also
update the corresponding 086-reserved-macro-names test.
This also fixes the following two piglit tests:
spec/glsl-1.30/preprocessor/reserved/double-underscore-02.frag
spec/glsl-1.30/preprocessor/reserved/double-underscore-03.frag
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Carl Worth <cworth@cworth.org>
This is as simple as abstracting one existing block of code into a
function call and then adding a single call to that function for the
case of a non-function-like macro.
This fixes the recently-added 097-paste-with-non-function-macro test
as well as the following piglit tests:
spec/glsl-1.30/preprocessor/concat/concat-01.frag
spec/glsl-1.30/preprocessor/concat/concat-02.frag
Also, the concat-04.frag test now passes for the right reason. The
test is intended to fail the compilation, but before this commit it
was failing compilation (and hence passing the test) for the wrong
reason.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Carl Worth <cworth@cworth.org>
Apparently we never implemented this, (but we've got a GLSL 1.30 test
in piglit that is exercising this case).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Carl Worth <cworth@cworth.org>
There was already a loop here to look for multiple token pastes, but
it was mistakenly incrementing the iterator counter after performing
one paste.
Instead, leave the loop iterator in place to coalesce as many tokens
as necessary into one.
This fixes the recently add 096-paste-twice test as well as the
following piglit test:
spec/glsl-1.30/preprocessor/concat/concat-03.frag
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Carl Worth <cworth@cworth.org>
This is something that piglit is exercising that currently fails.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Carl Worth <cworth@cworth.org>
The GL spec says that luminance values are returned as (l, 0, 0, 1),
L/A values as (l, 0, 0, a) and intensity values as (i, 0, 0, 1).
Use the pixel transfer scale controls to implement that.
This fixes a few failures in the new piglit getteximage-formats
test when getting a compressed L or L/A image.
If color material mode is enabled, constant buffer entries related
to the material coefficients will depend on glColor. So add
_NEW_CURRENT_ATTRIB to the bitset returned for material-related
constants in _mesa_program_state_flags().
This fixes a bug exercised by the new piglit draw-arrays-colormaterial
test.
Note: This is a candidate for the 7.11 branch.
This hasn't been needed so far since none of the core Mesa code paths
that call ctx->Driver.AllocTextureImageBuffer() are used with the
state tracker. That will change in upcoming patches.
Note that this function duplicates some code seen in the st_TexImage()
function. That can be cleaned up later.
The target, level and texObj can be obtained through the texImage
parameter. We could make similar changes for the TexImage() hooks too.
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes a build error introduced with commit
"winsys/svga: Update to vmwgfx kernel module 2.1"
if both the svga driver and the xorg state tracker was enabled
at the same time.
If needed we can re-add a minimal target for basic functionality.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
When an FBO is rendering to a texture (rather than a renderbuffer),
Gallium sets up an internal renderbuffer to handle the rendering, and
copies over enough texture state to make this work.
InternalFormat was missed out, causing glTexCopyImage to take a slow
path unnecessarily.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=41263
Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
Signed-off-by: Brian Paul <brianp@vmware.com>
Introduces fence objecs and a size limit on query buffers.
The possibility to map the fifo from user-space is gone, and
replaced by an ioctl that reads the 3D capabilities.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecranz <jakob@vmware.com>
Don't store references to these on the surface but on the context.
References to transfers are still stored on the surface since we allow
only a single map of a surface at a time.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x.
This extension was previously not supported on mach64, mga, and savage
(Savage3D and other pre-Savage4).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x.
This extension was previously not supported on i810, mach64, mga,
savage, sis, and tdfx (Voodoo Banshee and Voodoo3).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x.
This extension was previously not supported on mach64.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x.
This extension was previously not supported on mach64, mga, or r128.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x. The existing support is already partially
broken in Mesa (e.g., querying GL_TEXTURE_ENV_MODE in OpenGL ES 2.x).
This patch does not change the situation in any way.
It looks like the only hardware supported by Mesa that cannot do
ARB_texture_env_combine is pre-NV10 NVIDA chips. It appears that
these chips cannot do the GL_SUBTRACT mode. Based on looking at older
copies of nvOpenGLspecs.pdf found on the net, NVIDIA never supported
ARB_texture_env_combine on those chips either.
This extension was previously not supported on mach64, mga (G200),
r128, savage, sis, and tdfx (Voodoo Banshee and Voodoo3).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x. The existing support is already partially
broken in Mesa (e.g., querying GL_TEXTURE_ENV_MODE in OpenGL ES 2.x).
This patch does not change the situation in any way.
This extension was previously not supported on mach64, mga (G200),
savage (Savage3D and other pre-Savage4), sis, and tdfx (Voodoo
Banshee).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x. The existing support is already partially
broken in Mesa (e.g., querying GL_CLIENT_ACTIVE_TEXTURE in OpenGL ES
2.x). This patch does not change the situation in any way.
This extension was previously not supported on i810, mga (G200), or
tdfx (Voodoo Banshee).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We can use the core Mesa code for glGetTexImage() since it handles the
image mapping/unmapping now. We'll keep the decompress_with_blit() path
in the hope that it's faster than core Mesa's software decompression code.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=41312
The formula we were previously using for asinh:
asinh x = ln(x + sqrt(x * x + 1))
is numerically unstable: when x is a large negative value, the quantity
x + sqrt(x * x + 1)
is a small positive value (on the order of 1/(2|x|)). Since the
logarithm function is very sensitive in this range, any error in the
computation of the square root manifests as a large error in the
result.
This patch changes to the equivalent formula:
asinh x = sign(x) * ln(abs(x) + sqrt(x * x + 1))
which is only slightly more expensive to compute, and is numerically
stable for all x.
Fixes piglit tests
spec/glsl-1.30/execution/built-in-functions/[fv]s-asinh-*.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes the glsl-1.30/compiler/built-in-functions/trunc-* tests under 1.30.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Somehow we managed to get the unsigned int vectors, but not scalar.
Fixes _mesa_problem complaints in piglit's uint tests.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bitshifts are one of the rare places that GLSL allows mixed base types
without an implicit conversion occurring.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For hardware drivers, we only have ir_to_mesa called for the purposes
of potential swrast fallbacks (basically never on a 1.30 driver),
which we don't really care about. This will allow 1.30 to be
implemented without rewriting swrast for it.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On pre-GEN6 chips, the VUE slots set aside for clip distance aren't
actually used, so there is no reason for the clipper to waste time
interpolating them.
When commit 62bad54727 changed the enum
value used to represent these VUE slots, that caused the clipper to
start interpolating them as an accidental side effect. This patch
reverts to the old clipper behavior.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch corrects two errors in the computation of the psiz/flags
VUE slot on pre-GEN5 when using the new VS backend:
- The clip flags (which should be stored in the w component of the
first VUE slot) were being accidentally duplicated in all other
components of that VUE slot, causing partially clipped triangles to
sometimes disappear completely.
- The OR instruction wasn't being stored in "inst", causing the
BRW_PREDICATE_NORMAL flag to be applied to the wrong instruction.
This patch fixes regressions in clipping behavior when using shaders
on GEN4-5.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This constructor was storing its argument in the wrong field of the
"imm" enum, resulting in it being converted to a float when it should
have remained an unsigned integer. This was preventing clipping from
working properly on pre-GEN6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In pre-GEN6, when using clip planes, both the vertex shader and the
clipper need access to the client-supplied clip planes, since the
vertex shader needs them to set the clip flags, and the clipper needs
them to determine where to insert new vertices.
With the old VS backend, we used a clever optimization to avoid
placing duplicate copies of these planes in the CURBE: we used the
same block of memory for both the clipper and vertex shader constants,
with the clip planes at the front of it, and then we instructed the
clipper to read just the initial part of this block containing the
clip planes.
This optimization was tricky, of dubious value, and not completely
working in the new VS backend, so I've removed it. Now, when using
the new VS backend, separate parts of the CURBE are used for the
clipper and the vertex shader. Note that this doesn't affect the
number of push constants available to the vertex shader, it simply
causes the CURBE to occupy a few more bytes of URB memory.
The old VS backend is unaffected. GEN6+, which does clipping entirely
in hardware, is also unaffected.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that i965 supports 8 clip planes instead of 6, the size of the
brw_vs_compile::userplane array needs to be increased to 8. Changed
the array size to MAX_CLIP_PLANES so that if the number changes again
in the future, this array size won't be missed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When using user-defined clipping planes, the i965 driver compacts the
array of clipping planes so that disabled clipping planes do not
appear in it--this saves precious push constant space and makes it
easier to generate the pre-GEN6 clip program. As a result, when
enabling clipping planes in GEN6+ hardware, we always enable clipping
planes 0 through n-1 (where n is the number of clipping planes
enabled), regardless of which clipping planes the user actually
requested.
However, we can't do this when using gl_ClipDistance, because it would
be prohibitively complex to compact the gl_ClipDistance array inside
the user-supplied vertex shader. So, when enabling clipping planes in
GEN6+ hardware, if gl_ClipDistance is in use, we need to pass the
user-supplied enable flags directly through to the hardware rather
than just enabling the first n planes.
Fixes Piglit test vs-clip-distance-enables.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since the i965 driver supports 8 clipping planes now, we need 4 bits
to store the number of user clipping planes, not 3.
In theory this isn't strictly necessary, since brw_clip.h is only used
on pre-GEN6, and pre-GEN6 only advertises support for 6 clipping
planes, but it seems wise to err on the safe side.
In the process I removed the pad0 element of struct
brw_clip_prog_key--it doesn't seem necessary because the compiler
automatically inserts padding if needed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Override the context's GLSL version if the environment variable
MESA_GLSL_VERSION_OVERRIDE is set. Valid values for
MESA_GLSL_VERSION_OVERRIDE are integers, such as "130".
MESA_GLSL_VERSION_OVERRIDE has the same behavior as INTEL_GLSL_VERSION,
except that it applies to all drivers, not just Intel's. Since the former
supercedes the latter, this patch disables the latter.
Reviewed-by: Dave Airlie <airlied@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Again, the check was needlessly specific: this works fine on Gen7.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The check was designed to forbid it on old generations (Gen5/Ironlake),
not on new ones. It just works on Gen7/Ivybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The mesa core code uses MapTextureImage() like we need now.
v2: Drop mapping around _mesa_generate_mipmap for compressed, since
the whole path ends up going through MapTextureImage(), and the
meta decompression code ended up causing us to lose track of the
region that was originally mapped and assertion fail.
v2: Changes by Brian to MapTexImage in the decompression path.
v3: Changes by anholt to fix srcRowStride for decompression of NPOT.
Tested-by: Brian Paul <brianp@vmware.com> (v2)
EXT_texture_integer also specifies border color should be a color
union, the values are used according to the texture sampler format.
(update docs)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
It is necessary to manually set the GL version to 3.0 in order to run
Piglit tests that use glGetUniform*().
This patch allows one to override the version of the OpenGL context by
setting the environment variable MESA_GL_VERSION_OVERRIDE.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
This is a follow-up to commit
2d686fe911, which added decoding of
GL_CLIP_DISTANCE[67] to the _mesa_set_enable() function. This patch
makes the following additional fixes:
- Uses GL_CLIP_DISTANCEi enums consistently within enable.c rather
than the deprecated GL_CLIP_PLANEi enums.
- Generates an error if the user tries to access a clip flag that is
unsupported by the hardware.
- Applies the same change to _mesa_IsEnabled(), so that querying clip
flags using glIsEnabled() works properly.
- Applies corresponding changes to get.c, so that querying clip flags
using glGet*() works properly.
Fixes piglit test clip-flag-behavior.
Reviewed-by: Brian Paul <brianp@vmware.com>
We get called for TexImage higher up, and in a relatively normal way
(pixels == NULL is common for FBO setup).
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
There's nothing in our normal texture path we need for this. We don't
PBO upload blit it. We don't need to worry about flushing because
MapTextureImage handles it. hiz scattergather doesn't apply, but MTI
handles it too.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
It's totally gratuitous -- the image's miptree will be checked for
binding to the object later, anyway, with zero-copy or blitting as
appropriate.
Tested-by: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
_mesa_reference_renderbuffer already short-circuits equality, and
intel_miptree_release does nothing on NULL.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This mirrors the structure Eric used in the new VS backend, and seems
simpler. In particular, the math1/math2 split will avoid having to
figure out how many operands there are, as this is already known by the
caller.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
_swrast_choose_texture_sample_func() handles null texture object pointers
and will return the "null" sampler function which returns (0,0,0,1). This
fixes a minor regression from ce82914f5a
All drivers remaining in Mesa support this extension. This extension
is required in desktop OpenGL. The existing support is already partially
broken in Mesa (e.g., using format=GL_ABGR for glTexImage2D in OpenGL ES 2.x).
This patch does not change the situation in any way.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All drivers remaining in Mesa support this extension. This extension
is either required or optional features in desktop OpenGL, OpenGL ES
1.x, and OpenGL ES 2.x.
EXT_texture_format_BGRA8888 is mostly a subset of EXT_bgra. The only
difference seems to be that EXT_texture_format_BGRA8888 allows GL_BGRA
as an internal format to glTexImage2D and friends.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This extension is always enabled, and drivers do not have
to option to disable it.
I kept this one separate from the others because I was a little
uncertain about the changes to get.c.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Mesa has never any portion of this extension, and neither has any
other vendor.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The year 2006 apparently came from the "Last Modified Date" in the
spec header. however, the revision history at the bottom say "2/22/00
mjk - added NVIDIA Implementation Details." From that we can safely
infer that the spec is from at least 2000, and it may even be older.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The following extensions are always enabled, and drivers do not have
to option to disable them:
GL_ARB_multisample
GL_ARB_texture_compression
GL_ARB_vertex_buffer_object / GL_OES_mapbuffer
GL_EXT_copy_texture
GL_EXT_multi_draw_arrays / GL_SUN_multi_draw_arrays
GL_EXT_polygon_offset
GL_EXT_subtexture
GL_EXT_texture_edge_clamp / GL_SGIS_texture_edge_clamp
GL_EXT_vertex_array
GL_SGIS_generate_mipmap
This set was picked because the are all either required or optional
features in desktop OpenGL, OpenGL ES 1.x, and OpenGL ES 2.x. The
existing support for some is already partially broken in Mesa (e.g.,
proxy texture targets in OpenGL ES). This patch does not change the
situation in any way.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This extension is enabled by default in _mesa_init_extensions, so
drivers don't need to enable it again.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This extension is enabled by default in _mesa_init_extensions, so
drivers don't need to enable it again.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes OpenArena on Gen7. Technically, adding only the first depth stall
fixes it, but the documentation says to do all three, and the Windows
driver seems to do it.
Not observed to fix anything on Gen6 yet.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38863
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
It seems that GT1/GT2 sorts of variations are here to stay, and more
special cases will likely be required in the future. Checking by PCI ID
via the IS_xxx_GTx macros is cumbersome; introducing a new 'gt' field
analogous to intel->gen will make this easier.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Seeing as they were only used once (in the same function they were
defined), having them as context members seemed rather pointless.
Remove them entirely (rather than using local variables) since the
chipset generation checks are actually just as straightforward.
While we're at it, clean up the remainder of the if-tree that set them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
At one point, the documentation said that max thread count in 3DSTATE_PS
was at bit offset 23, but it's actually 24 on Ivybridge. Not only did
this halve our thread count, it caused us to write 1 into a bit 23, which
is marked as MBZ (must be zero). Furthermore, it made us write an even
number into this field, which is apparently not allowed. Apparently we
were just lucky it worked.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
- first determine the buffer range to upload for each buffer by walking over
vertex elements
- take buffer_offset into account
- take src_offset into account
- take src_format into account in more places
- don't just blindly upload (stride*count) bytes
NOTE: This is a candidate for the 7.11 branch.
It can now override both buffer offsets and strides in additions to resources.
Overriding buffer offsets was kinda hackish and could cause issues with
non-native vertex formats.
intel_image->mt might be NULL, say with border width set. It then would
trigger a segfault at intel_map/unmap_texture_image function.
This would fix the oglc misctest(basic.textureBorderIgnore) fail.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Fence emission can flush the push buffer, which through flush_notify
unreferences recently emitted fence. If ref count is increased after
fence emission, unreference deletes the fence, which causes SIGSEGV.
Backtrace:
nouveau_fence_del
nouveau_fence_ref
nouveau_fence_next
nouveau_pushbuf_flush
MARK_RING
nv50_screen_fence_emit
nouveau_fence_emit
nv50_flush
This bug manifested as an assertion failure in nouveau_fence.c, because
SIGSEGV handler tried to shutdown the application and used messed up
fence.
This issue was reported by Maxim Levitsky.
Note: This is a candidate for the 7.11 branch.
Without this we'd miss the last update in a sequence like {COLOR0, COLOR1},
{COLOR0}, {COLOR0, COLOR1}. I originally had a patch for this that called
updated_drawbuffers() when the buffer count changed, but later realized that
was wrong. The ARB_draw_buffers spec explicitly says "The draw buffer for
output colors beyond <n> is set to NONE.", and this is queryable state.
This fixes piglit arb_draw_buffers-state_change.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch fix a "Unresolved Symbols" run time error when using G3DVL
through the VDPAU state tracker, by linking the vdpau targets with librt.
Reported by Arkadiusz Miśkiewicz.
Caused by this commit :
commit e911dbb563
Author: Emeric Grange <emeric.grange@gmail.com>
Date: Mon Sep 12 23:39:33 2011 +0200
Signed-off-by: Emeric Grange <emeric.grange@gmail.com>
This should bring g3dvl back to work until we figured out
how SCALED types should really work.
Signed-off-by: Christian König <deathsimple@vodafone.de>
There is no guarantee that the tokens TGSI will persist beyond the
create_fs_state. The pipe driver (and therefore the draw module) is
responsible for making copies of the TGSI tokens when it needs them.
Reviewed-by: Brian Paul <brianp@vmware.com>
i915_miptree_layout, i945_miptree_layout, and brw_miptree_layout always
just return GL_TRUE, so there's really no point to it. Change them to
void functions and remove the (dead) error checking code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For some reason I thought subexpressions were chained off the top-level
one. This isn't the case, so just create a temporary context and free
it. All of this memory would be eventually freed, but now is freed
much sooner.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Very simple shaders don't actually use GLSL built-ins. For example:
- gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
- gl_FragColor = vec4(0.0);
Both of the shaders used by _mesa_meta_glsl_Clear() also qualify.
By waiting to initialize the built-ins until the first time we need to
look for a signature, we can avoid the overhead entirely in these cases.
Makes piglit run roughly 18% faster (255 vs. 312 seconds).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, we conditionally set up the SF pipline stage with a
urb_entry_read_offset of 2 when clipping was in use, and 1 otherwise,
causing the clip distance VUE slots to be skipped if present. This
was an extremely minor savings (it saved the SF unit from reading 2
vec4s out of the URB, but it didn't affect any computation, since we
only instruct the SF unit to perform interpolation on VUE slots that
are actually used by the fragment shader).
GLSL 1.30 requires an interpolated version of gl_ClipDistance to be
available for reading in the fragment shader, so we need the SF's
urb_entry_read_offset to be 1 when the fragment shader reads from
gl_ClipDistance.
This patch just unconditionally sets the urb_entry_read_offset to 1 in
all cases; this is sufficient to make gl_ClipDistance available to the
fragment shader when it is needed, and the performance loss should be
negligible when it isn't.
Reviewed-by: Eric Anholt <eric@anholt.net>
When gl_ClipDistance is in use, the contents of the gl_ClipDistance
array just need to be copied directly into the clip distance VUE
slots, so we re-use the code that copies all other generic VUE slots
(this has been extracted to its own method). When gl_ClipDistance is
not in use, the vertex shader needs to calculate the clip distances
based on user-specified clipping planes.
This patch also removes the i965-specific enum values
BRW_VERT_RESULT_CLIP[01], since we now have generic Mesa enums that
serve the same purpose (VERT_RESULT_CLIP_DIST[01]).
Reviewed-by: Eric Anholt <eric@anholt.net>
When the vertex shader writes to gl_ClipDistance, we do clipping based
on clip distances rather than user clip planes, so don't waste push
constant space storing user clip planes that won't be used.
Reviewed-by: Eric Anholt <eric@anholt.net>
i965 requires gl_ClipDistance to be formatted as an array of 2 vec4's
(as opposed to an array of 8 floats), so enable the lowering pass that
performs this conversion.
Reviewed-by: Eric Anholt <eric@anholt.net>
In order to support 8 clip distances, we need to properly decode when
the user sets the GL_CLIP_DISTANCE6 and GL_CLIP_DISTANCE7 enable
flags.
For clarity, this patch changes the names GL_CLIP_PLANE[0-5] in the
switch statement to the equivalent names GL_CLIP_DISTANCE[0-5], since
the GL_CLIP_PLANE names are deprecated.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Brian Paul <brianp@vmware.com>
This patch assigns enumerated values for gl_ClipDistance in the
gl_vert_result and gl_frag_attrib enums, so that driver back-ends can
assign gl_ClipDistance to the appropriate hardware registers. It also
adjusts the functions _mesa_vert_result_to_frag_attrib() and
_mesa_frag_attrib_to_vert_result() (which translate between the two
enums) to correctly translate the new enumerated values.
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Brian Paul <brianp@vmware.com>
GLSL 1.30 requires us to use gl_ClipDistance for clipping if the
vertex shader contains a static write to it, and otherwise use
user-defined clipping planes. Since the driver needs to behave
differently in these two cases, we need a flag to record whether the
shader has written to gl_ClipDistance.
The new flag is called UsesClipDistance. We initially store it in
gl_shader_program (since that is the data structure that is available
when we check to see whethe gl_ClipDistance was written to), and we
later copy it to a flag with the same name in gl_vertex_program, since
that is a more convenient place for the driver to access it (in i965,
at least).
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Brian Paul <brianp@vmware.com>
In i965 GEN6+ (and I suspect most other hardware), gl_ClipDistance
needs to be laid out as a pair of vec4's (the first containing clip
distances 0-3, and the second containing clip distances 4-7).
However, it is declared in GLSL as an array of 8 floats.
This lowering pass acts at the GLSL level, modifying the declaration
of gl_ClipDistance so that it is an array of vec4's rather than an
array of floats, and renaming it to gl_ClipDistanceMESA. In addition,
it modifies all accesses to the array so that they access the
appropiate component of one of the vec4's.
Since some hardware may not internally represent gl_ClipDistance as a
pair of vec4's, this lowering pass is optional. To enable it, set the
LowerClipDistance flag in gl_shader_compiler_options to true.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This patch fixes a bug in ir_hirearchical_visitor: when traversing an
exec_list representing the formal or actual parameters of a function,
it modified base_ir to point to each parameter in turn, rather than
leaving it as a pointer to the enclosing statement. This was a
problem, since base_ir is used by visitor classes to locate the
statement containing the node being visited (usually so that
additional statements can be inserted before or after it). Without
this fix, visitors might attempt to insert statements into parameter
lists.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Only the XYZ components are checked to be negative by SVGA3DOP_TEXKILL.
GL_ARB_fp requires all four components be checked. Emit a second texkill
for W if needed.
We only need to do the divide by Q step for TXP instructions.
This fixes the incorrectly rendered soft shadow test in Lightsmark.
Along with the previous texture swizzle commit, this also fixes all
the piglit glsl-fs-shadow2d-XX.shader_test failures.
This exposes the GL_EXT_texture_swizzle extension and allows the various
depth texture modes to be implemented properly. This, plus a follow-on
texture/shadow change fixes quite a few piglit GLSL shadow sampler test
failures.
Emit the SVGA3D_RS_POINTSPRITEENABLE render state.
When sprite_coord_mode=PIPE_SPRITE_COORD_LOWER_LEFT emit extra frag
shader code to invert the Y coordinate of the incoming texcoord.
Accurately describe what operations are supported when a format caps
entry is not advertised by the host, and which formats are never
supported, instead of making ad-hoc and often incorrect assumptions.
It is sometimes useful to examine the first frame or and early frame of a
quickly executing and non-repeating application, this chain introduces a new
environment variable that is checked when creating contexts. If
GALLIUM_RBUG_START_BLOCKED is set, then each context that is created is started
in a blocked state. This allows time to connect rbug before anything is
rendered in the context.
There is already comments show how to detect a null texture. Fix the
code to match the comments.
This would fix the oglc divzero(basic.texQOrWEqualsZero) and
divzero(basic.texTrivialPrim) test case fail.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fix the constant interpolation enable bit mask for flat light mode.
FRAG_BIT_COL0 attribute bit might be 0, in which case we need to
shift one more bit right.
This would fix the oglc specularColor test fail on both Sandybridge and
Ivybridge.
v2: move the constant interp bitmask setup code into for(; attr <
FRAG_ATTRIB_MAX; attr++) loop suggested by Eric.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com>
Since the blit gets sequenced after other batchbuffer rendering like
normal, there's no need to push things out early.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
All that matters here is the format of the texture, not the
internalformat (which might mean various different pixel formats). In
one case, the pbo upload for MESA_FORMAT_YCBCR would have swapped the
channels for MESA_FORMAT_YCBCR_REV.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This also improves the debugging output in the failure paths so you
get more than just "failed", and don't get spammed with "failed" when
you didn't even have a PBO to try.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There were notes about the possibility of slowdowns due to zcopy from
a PBO due to thrashing around of the region. Slowdowns are even more
likely now that textures are generally tiled, which a zcopy wouldn't
get. Additionally, there were no checks on the buffer size to ensure
that the hardware-required rounding was present, which could result in
GPU hangs on large zcopy PBOs.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This doesn't cover support for this format as a renderbuffer yet. The
spec allows implementations to not support it, though it is something
we do want to support.
Only one failure in piglit on gen6, which is texwrap with bordercolor
(as usual).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
AFAIK, there are few users of this extension and I can see a couple
reasons why this is probably broken in Mesa anyway.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The pipe_sampler_view::format field should be prefered over the resource/
texture format. The former is used to override the texture format for
sRGB decode enable/disable, etc.
Also, use new util_format_is_srgb() helper to catch all sRGB formats.
This fixes the piglit tex-srgb test for GL_EXT_texture_sRGB_decode.
This fixes a regression from a8cf4b6acf
The problem occured when two successive glDrawArrays calls accessed
subsequent elements in user-space arrays. The user-space array
from the first call wasn't being grown to accomodate the second
draw call's elements.
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
While the program won't successfully link in the end, this avoids
possible assertion failure in the driver during linking if
this->result isn't initialized with something already.
Fixes piglit:
vertex-program-two-side enabled front back front2 back2
vertex-program-two-side enabled front back
vertex-program-two-side enabled front2 back2
We now raise an GL_INVALID_ENUM in glBegin() if mode is illegal, as was
done in Yuanhan Liu's original patch.
Take geometry shaders support into account too.
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Checking if the paints are opaque in renderer_validate_blend() does not
work. We could be drawing images. Remove the check from
renderer_validate_blend() and take image drawing into consideration in
blend_use_shader().
The bug was introduced by 3f0a966807,
which affects the lookup demo.
vg_context_is_object_valid() checks if a handle is valid by checking if
the handle is a valid key of the object hash table. However, the keys
of the object hash table were object pointers.
Fix vg_context_add_object() to use the handles as the keys so that
vg_context_is_object_valid() works. This bug was introduced by
99c67f27d3.
Use _mesa_set_enable() to avoid a redudant context lookup.
Need to disable the texture target in decompress_texture_image() so the
unit isn't still enabled after glGetTexImage() returns. Arguably, the
meta restore code should do this, but it doesn't.
Reviewed-by: Eric Anholt <eric@anholt.net>
If we're generating a mipmap for an sRGB texture we need to bypass
sRGB->linear conversion. Otherwise the destination mipmap level
(drawn with a textured quad) will have the wrong colors.
If we can't turn of sRGB->linear conversion (GL_EXT_texture_sRGB_decode)
we need to use the software fallback for mipmap generation.
Note: This is a candidate for the 7.11 branch.
The 1-bit alpha channel was incorrectly encoded. Previously, any non-zero
alpha value for the ubyte alpha value would set A=1. Instead, use the
most significant bit of the ubyte alpha to determine the A bit. This is
consistent with the other channels and other OpenGL implementations.
Note: This is a candidate for the 7.11 branch.
Reviewed-by: Michel Dänzer <michel@daenzer.net>
builtin_stubs.cpp is only supposed to be used for builtin_compiler. It
contains a stub version of _mesa_glsl_initialize_functions() that does
nothing.
libglsl.a already contains builtin_function.cpp, the generated file that
contains a version of _mesa_glsl_initialize_functions() that actually
initializes all the built-in functions.
By mistakenly linking to builtin_stubs, glsl_compiler and glsl_test are
unable to compile any shaders that use built-in functions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Since Mesa is now capable of supporting up to 8 clipping planes
instead of 6, this patch updates Gallium internals to support 8
clipping planes as well.
Reviewed-by: Brian Paul <brianp@vmware.com>
draw_pipe_clip.c contained an ifdef to ensure that its local
definition of MAX_CLIPPED_VERTICES would not take effect if the global
MAX_CLIPPED_VERTICES (defined in src/mesa/main/config.h) was already
defined. This was unnecessary because draw_pipe_clip.c doesn't
directly or indirectly include src/mesa/main/config.h. Removed the
ifdef to reduce confusion.
Reviewed-by: Brian Paul <brianp@vmware.com>
This will allow drivers to increase ctx->Const.MaxClipPlanes to 8,
which is required for GLSL-1.30 compliance.
No driver behavior should be affected. However, many data structures
use MAX_CLIP_PLANES as an array size, so these arrays will get
slightly larger.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously this value was set to MAX_CLIP_PLANES, which is defined to
be 6. But MAX_CLIP_PLANES needs to be increased to 8 to support
GLSL-1.30-compliant drivers. This patch hard-codes the default value
of ctx->Const.MaxClipPlanes to 6, so that when MAX_CLIP_PLANES is
increased, it won't affect drivers that do not support 8 clip planes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch removes the assertion "MAX_CLIP_PLANES == 6" from the i965
driver. This assertion is unnecessary; nothing in the driver requires
MAX_CLIP_PLANES to be 6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
To support GLSL 1.30, we will need to increase MAX_CLIP_PLANES to 8.
To avoid breaking drivers that do not yet support 8 clip planes, this
patch modifies the Mesa core code that pertains to clipping to use
ctx->Const.MaxClipPlanes rather than MAX_CLIP_PLANES, since
ctx->Const.MaxClipPlanes will remain 6 for drivers that only support 6
clip planes.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We generate silly code for array access, and it's easier to generally
support the cleanup than to specifically avoid the bad code in each
place we might generate it.
Removes 4.6% of instructions from 41.6% of shaders in shader-db,
particularly savage2/hon and unigine.
v2: Fixes by Ken: Make is_zero/one member functions, and fix a
progress flag.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
_NEW_WINDOW_POS wasn't a real Mesa state flag, but we were missing
_NEW_BUFFERS to update the stipple offset when FBO binding or window
size changed, and _NEW_POLYGON to update when stippling gets enabled.
Fixes oglconform's tristrip test.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Because we skip the pattern upload when stippling is disabled, we need
to check again when it might have been turned on.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Trigger GL_INVALID_ENUM error if the face paramter is not a valid value.
Trigger GL_INVALID_VALUE error if the GL_SHININESS value is out side
[0, ctx->Constant.MaxShiniess].
v2: fix the max shininess value.
v3: suggested by Brian, move the face check into glMaterialfv function
to reduce code duplicate. Also, refactor the error message.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
The null platform has no window or pixmap surface (but pbuffer surface).
And the only valid display is EGL_DEFAULT_DISPLAY. It is useful for
offscreen rendering. It works everywhere becase no window system is
required.
All of the extensions actually supported by Mesa have been remapped by
remap.c for a long time. Emitting all of these data structures is
just clutter.
Drivers that need additional functions remapped, should add
'offset="assign"' to the function definition in the .xml file.
The changes to remap_helper.h are in a follow-on ~8700 line patch that
would surely be rejected by the mailing list.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Since GL_EXT_blend_logic_op is removed, _mesa_rgba_logicop_enabled(ctx)
just returns ctx->Color.ColorLogicOpEnabled. That seems kind of silly.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Since GL_EXT_blend_logic_op is removed, _LogicOpEnabled and
ColorLogicOpEnabled always have the same value.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Support is removed for four reasons:
1. The implementation was broken with respect to separate blend
equations. The GL_EXT_blend_equation_separate spec says:
"If EXT_blend_logic_op and EXT_blend_equation_separate are both
supported, the logic op blend equation should be supported separately
for RGB and alpha as with the other blend equation modes."
But Mesa's implementation of GL_LOGIC_OP specifically forbids this.
2. No hardware supported by Mesa can support separate blend equations
involving GL_LOGIC_OP.
3. No applications could be found that use this extension.
4. No other Linux OpenGL drivers support this extension.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Cc: Brian Paul <brianp@vmware.com>
The last user of this function was driInitExtensions, and that function
was removed in a previous commit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
From the NV_conditional_render spec:
BeginQuery sets the active query object name for the query type given by
<target> to <id>. If BeginQuery is called with an <id> of zero, if the
active query object name for <target> is non-zero, if <id> is the active
query object name for any query type, or if <id> is the active query
object for condtional rendering (Section 2.X), the error INVALID OPERATION
is generated.
Fixes piglit nv_conditional_render-begin-while-active.
Reviewed-by: Brian Paul <brianp@vmware.com>
From the NV_conditional_render spec:
BeginQuery sets the active query object name for the query type given by
<target> to <id>. If BeginQuery is called with an <id> of zero, if the
active query object name for <target> is non-zero, if <id> is the active
query object name for any query type, or if <id> is the active query
object for condtional rendering (Section 2.X), the error INVALID OPERATION
is generated.
Fixes piglit nv_conditional_render-begin-zero.
Reviewed-by: Brian Paul <brianp@vmware.com>
st_glsl_to_tgsi.cpp was completely ignored by makedepend because it was
not included in ALL_SOURCES, which caused that the file was not recompiled
when certain header files were changed (like glsl/ir.h).
The first part of this commit is just a consolidation.
The second part is the fix.
When copy propagating a value into an instruction that negates its
argument, we need to invert the sense of the value's "negate" flag, so
that -(+x) becomes -x and -(-x) becomes +x.
Previously, we were always setting the value's "negate" flag to true
in this circumstance, so that both -(+x) and -(-x) turned into -x.
Fixes Piglit test vs-double-negative.shader_test.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This code was really broken before. A lot of the error checks were
done much later (too late), and some of the error checks would fail.
The underlying problem is that Mesa doesn't ever keep compressed paletted
textures in their original format. The textures are immediately
converted to some RGB or RGBA format.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39991
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Jin Yang <jin.a.yang@intel.com>
Accroding the man page, GL_INVALID_VALUE would generated if access has any
bits set other than those valid defined bits.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
According the man page, GL_INVALID_OPERATION should generated if
glPixelZoom is executed between the execution of glBegin and the
corresponding execution of glEnd.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
According the man page, GL_INVALID_OPERATION should be generated if
glIsEnabled is executed betwwen the execution of glBegin and the
correspoding execution of glEnd.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Fix error handling while calling glTexEnv with invalid texture
environment parameters.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
According to the man page, it should trigger a GL_INVALID_OPERATION
while calling some glGet* functions inside glBegin and glEnd.
This patch dose handle the following functions:
glGetBooleanv
glGetFloatv
glGetIntegerv
glGetInteger64v
glGetDoublev
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
According man page, trigger error when calling glEvalMesh1/2D inside
glBegin/glEnd.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
We've had a hack to fix this in Gentoo on Solaris for a while.
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
This moves the gallium interface for clears from using a pointer to 4 floats to a pointer to a union of float/unsigned/int values.
Notes:
1. the value is opaque.
2. only when the value is used should it be interpretered according to
the surface format it is going to be used with.
3. float clears on integer buffers and vice-versa are undefined.
v2: fixed up vega and graw, dropped hunks that shouldn't have been in
patch.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Do it during swrast state validation since the FetchTexel() functions
are only called from swrast now and not core Mesa.
Remove assertions in mipmap.c since they're no longer appropriate.
Pass an explicit surface format as we do with pipe_put_tile_rgba_format().
This fixes the piglit fbo-srgb-blit test. With GL_EXT_framebuffer_sRGB we
override the resource's format with an explicit format (linear vs. sRGB).
We need to do so both when getting and putting tiles.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=40402
Reviewed-by: Dave Airlie <airlied@redhat.com>
We could constant interpolated values now and set have_perspective
if nothing else is set to avoid a GPU hang.
Signed-off-by: Dave Airlie <airlied@redhat.com>
TGSI CONSTANT interpolation is just flat, and we just read the values
direct from the LDS into the GPR without doing any interpolation on them.
This is needed to pass integer types into the fragment shader.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If we get a scaled type assume its a real integer type (as textures are).
Also fixup the blend bypass and blend clamp flags on evergreen as per the
docs.
Signed-off-by: Dave Airlie <airlied@redhat.com>
LLVM 3.0svn added SubtargetInfo as additional parameter to
createMCDisassembler() and createMCInstPrinter().
See revision 139237 of LLVM.
Signed-off-by: Tobias Droste <tdroste@gmx.de>
Signed-off-by: Brian Paul <brianp@vmware.com>
If we're drawing to a luminance, luminance/alpha or intensity surface
we have to adjust (rebase) the fragment/quad colors before writing them
to the tile cache. The tile cache always stores RGBA colors but if
we're caching a L/A surface (for example) we need to be sure that R=G=B
so that subsequent reads from the surface cache appear to return L/A
We previously had a special case for RGB (no alpha) surfaces. This
change generalizes that for the other base formats.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=40408, but sRGB
formats are still failing. That'll be addressed in a later patch.
When compiling glDrawPixels, glTexImage(), etc. and we're copying
the user's image we need to be careful about GL error checking.
Previously, we were incorrectly generating GL_OUT_OF_MEMORY in
unpack_image() if width <= 0 or height <= 0 or for invalid format/type
values. We now check those arguments in unpack_image() and return NULL
if there's a bad value. The command will get compiled with the
arguments as-is and image=NULL. Later, when the command is executed the
correct errors will be generated.
This issue was reported by Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
I'm not 100% sure about this, it may need a version check or it might
be completely wrong.
added multisample ones as well.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The array_lvalue field was attempting to enforce the restriction that
whole arrays can't be used on the left-hand side of an assignment in
GLSL 1.10 or GLSL ES, and can't be used as out or inout parameters in
GLSL 1.10.
However, it was buggy (it didn't work properly for built-in arrays),
and it was clumsy (it unnecessarily kept track on a
variable-by-variable basis, and it didn't cover the GLSL ES case).
This patch removes the array_lvalue field completely in favor of
explicit checks in ast_parameter_declarator::hir() (this check is
added) and in do_assignment (this check was already present).
This causes a benign behavioral change: when the user attempts to pass
an array as an out or inout parameter of a function in GLSL 1.10, the
error is now flagged at the time the function definition is
encountered, rather than at the time of invocation. Previously we
allowed such functions to be defined, and only flagged the error if
they were invoked.
Fixes Piglit tests
spec/glsl-1.10/compiler/qualifiers/fn-{out,inout}-array-prohibited*
and
spec/glsl-1.20/compiler/assignment-operators/assign-builtin-array-allowed.vert.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Prevents lockups with piglit tests draw-elements and draw-vertices using large
numbers of vertices.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alex.deucher@amd.com>
If we are called via the legacy DRI interface, and we don't support
legacy DRI (InitScreen is NULL), print a debug message, so it is easy
to see why the driver fails to initialize.
See https://bugs.freedesktop.org/show_bug.cgi?id=40437
Also includes loading of shared shader library code (used for f64
and integer division) and setting up the immediate array buffer
which is appended to the code.
Per the GL spec, clamp incoming colors prior to blending depending on
whether the destination buffer stores normalized (non-float) values.
Note that the constant blend color needs to be clamped too (we always
get the unclamped color from Mesa).
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=40412
This introduces an UNCLAMPED_FLOAT_TO_UBYTE x 4 inline function, as
suggested by Brian. It uses it in a few places I noticed from previous
color changes, and also some core mesa places. I haven't updated other places
yet.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This introduces a new gl_color_union union and moves the current
ClearColorUnclamped to use it, it removes current ClearColor completely and
renames CCU to CC, then all drivers are modified to expected unclamped floats instead.
also fixes st to use translated color in one place it wasn't.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The EXT_texture_integer issues says:
Should pixel transfer operations be defined for the integer pixel
path?
RESOLVED: No. Fragment shaders can achieve similar results
with more flexibility. There is no need to aggrandize this
legacy mechanism.
v2: fix comments, fix unpack paths, use same comment/code
v3: fix last comment
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Since TGSI now has a UARL opcode that takes an integer as the source, it is
no longer necessary to hack around the lack of an integer ARL opcode using I2F.
UARL is only emitted when native integers are enabled; ARL is still used
otherwise.
Reviewed-by: Brian Paul <brianp@vmware.com>
==27540== Invalid read of size 4
==27540== at 0x96277B7: _mesa_make_extension_string (string3.h:144)
==27540== by 0x9604E78: _mesa_make_current (context.c:1514)
==27540== by 0x9602A8B: st_api_make_current (st_manager.c:789)
==27540== by 0x45406E7: ???
==27540== Address 0xad35b30 is 3,688 bytes inside a block of size 3,691 alloc'd
==27540== at 0x4025315: calloc (vg_replace_malloc.c:467)
==27540== by 0x9627641: _mesa_make_extension_string (extensions.c:910)
==27540== by 0x9604E78: _mesa_make_current (context.c:1514)
==27540== by 0x9602A8B: st_api_make_current (st_manager.c:789)
==27540== by 0x45406E7: ???
And:
==28351== Invalid write of size 2
==28351== at 0x4C087CC: _mesa_make_extension_string (string3.h:144)
==28351== by 0x4BE6198: _mesa_make_current (context.c:1514)
==28351== by 0x4BD4CAB: st_api_make_current (st_manager.c:789)
==28351== Address 0x48dd1f3 is 19 bytes inside a block of size 20 alloc'd
==28351== at 0x4025315: calloc (vg_replace_malloc.c:467)
==28351== by 0x4C08711: _mesa_make_extension_string (extensions.c:778)
==28351== by 0x4BE6198: _mesa_make_current (context.c:1514)
==28351== by 0x4BD4CAB: st_api_make_current (st_manager.c:789)
==28351==
==28351== Invalid read of size 4
==28351== at 0x4C087EC: _mesa_make_extension_string (extensions.c:806)
==28351== by 0x4BE6198: _mesa_make_current (context.c:1514)
==28351== by 0x4BD4CAB: st_api_make_current (st_manager.c:789)
==28351== Address 0x48dd1f4 is 0 bytes after a block of size 20 alloc'd
==28351== at 0x4025315: calloc (vg_replace_malloc.c:467)
==28351== by 0x4C08711: _mesa_make_extension_string (extensions.c:778)
==28351== by 0x4BE6198: _mesa_make_current (context.c:1514)
==28351== by 0x4BD4CAB: st_api_make_current (st_manager.c:789)
The first part adds 2, because ' ' and '\0' may be written at the end
of the buffer.
These two functions were nearly the same with lots of duplicated code.
Now pass in a boolean 'elts' flag and use a few conditionals to implement
the linear vs. indexed cases.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
==5715== Invalid read of size 4
==5715== at 0x4AA590B: _mesa_make_extension_string (extensions.c:908)
==5715== by 0x4A83198: _mesa_make_current (context.c:1514)
==5715== by 0x4A71CAB: st_api_make_current (st_manager.c:789)
==5715== Address 0x4795730 is 0 bytes inside a block of size 1 alloc'd
==5715== at 0x4025315: calloc (vg_replace_malloc.c:467)
==5715== by 0x4AA5B4C: _mesa_make_extension_string (extensions.c:772)
==5715== by 0x4A83198: _mesa_make_current (context.c:1514)
==5715== by 0x4A71CAB: st_api_make_current (st_manager.c:789)
This fixes piglit/fbo-generatemipmap-array.
It looks like SQ_TEX_SAMPLER_WORD0_0.TEX_ARRAY_OVERRIDE should be set
for array textures in order to disable filtering between slices,
which adds a dependency between sampler views and sampler states.
This patch reworks sampler state updates such that they are postponed until
draw time. TEX_ARRAY_OVERRIDE is updated according to bound sampler views.
This also consolidates setting the texture state between vertex and
pixel shaders.
The only purpose this call served in the DRI swrast driver was to
initialize the remap table. Core Mesa already does the dispatch
offset remapping for every function that could possibly ever be
supported. There's no need to continue using that cruft in the
driver.
Core Mesa already does the dispatch offset remapping for every
function that could possibly ever be supported. There's no need to
continue using that cruft in the driver.
Since the call to _mesa_enable_imaging_extensions (via
driInitExtensions) is removed, EXT_blend_color, EXT_blend_logic_op,
and EXT_blend_minmax are no longer advertised. These all resulted in
software fallbacks, so their loss will not be mourned.
EXT_blend_subtract is, however, explicitly added to the list.
GL_FUNC_SUBTRACT is fully accelerated, but GL_FUNC_REVERSE_SUBTRACT
(still) results in a software fallback.
Cc: Alex Deucher <alexdeucher@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Core Mesa already does the dispatch offset remapping for every
function that could possibly ever be supported. There's no need to
continue using that cruft in the driver.
Since the call to _mesa_enable_imaging_extensions (via
driInitExtensions) is removed, EXT_blend_color is explicitly added to
the list.
EXT_blend_logic_op is removed from the list of extensions because
blend factors and separate blend equations are not handled correctly.
Cc: Alex Deucher <alexdeucher@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Core Mesa already does the dispatch offset remapping for every
function that could possibly ever be supported. There's no need to
continue using that cruft in the driver.
Since the call to _mesa_enable_imaging_extensions (via
driInitExtensions) is removed, EXT_blend_color is explicitly added to
the list.
EXT_blend_logic_op is removed from the list of extensions because
blend factors and separate blend equations are not handled correctly.
Based on feedback from Roland Scheidegger.
Cc: Dave Airlie <airlied@redhat.com>
Cc: Alex Deucher <alexdeucher@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Core Mesa already does the dispatch offset remapping for every
function that could possibly ever be supported. There's no need to
continue using that cruft in the driver.
Since the call to _mesa_enable_imaging_extensions (via
driInitExtensions) is removed, EXT_blend_color is explicitly added
with a dependency on the drmSupportsBlendColor flag.
EXT_blend_logic_op is removed from the list of extensions because
blend factors and separate blend equations are not handled correctly.
Based on feedback from Roland Scheidegger.
Cc: Alex Deucher <alexdeucher@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Core Mesa already does the dispatch offset remapping for every
function that could possibly ever be supported. There's no need to
continue using that cruft in the driver.
Since the call to _mesa_enable_imaging_extensions (via
driInitExtensions) is removed, EXT_blend_color, EXT_blend_minmax, and
EXT_blend_subtract are explicitly added to the list.
EXT_blend_logic_op is removed from the list of extensions because
blend factors and separate blend equations are not handled correctly.
Cc: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: Viktor Novotný <noviktor@seznam.cz>
Core Mesa already does the dispatch offset remapping for every
function that could possibly ever be supported. There's no need to
continue using that cruft in the driver.
EXT_blend_logic_op is removed from the list of extensions because
blend factors and separate blend equations are not handled correctly.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's not clear if these are acceptable cases so issue a one-time warning
in debug builds when we hit them.
Fixes segfault in piglit fbo-mipmap-copypix test.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The test was of an enum, attIndex, which should be unsigned. The
explicit check for < 0 was replaced with a cast to unsigned in an
assertion that attIndex is less than the size of the array it will be
used to index.
Reviewed-by: Eric Anholt <eric@anholt.net>
Trivially silence the compiler by adding '(void) foo;' for each unused
parameter. These parameters could not be removed. They are part of
interface used elsewhere in Mesa, and some of the other customers
actually use these parameters.
The internalFormat, format, and type parameters were not used by
either try_pbo_upload or try_pbo_zcopy, so remove them. The width
parameter was also not used by try_pbo_zcopy (because it doesn't
actually copy anything), so remove it too.
Eric Anholt notes:
The current structure of this code is so hateful I can't bring
myself to say anything about whether changing the current code is
good or bad.
I have a dream that one call would try to make a surface
(miptree/region) out of the PBO, then we'd see about whether it
matches up nicely and zero-copy/blit using that. That would be
reusable for texsubimage, which is currently awful in this
respect.
At some point we should revisit this code with pitchforks and torches.
The depth0 parameter was not used in intel_miptree_create_for_region,
so remove it. All of the places that call this function, pass 1 for
that parameter, and the place where it looks like it should have been
used (the call to intel_miptree_create_internal) also had 1 hard
coded.
Reviewed-by: Eric Anholt <eric@anholt.net>
The GLenum target parameter was not used in intel_copy_texsubimage, so
remove it. Also remove the GLenum internalFormat parameter. Each
caller just copied this out of the intel_texture_image that is already
passed to intel_copy_texsubimage.
Reviewed-by: Eric Anholt <eric@anholt.net>
The intel_context and tiling parameters were not used by any if the
i9[14]5_miptree_layout or the functions they call, and the tiling parameter was
not used by brw_miptree_layout. Remove the unnecessary parameters.
Also clean-up some of the naming, etc. in
intel_buffer_object_purgeable. 'intel' is usually used as the name of
an intel_context pointer, and intel_obj is usually used as the name of
an intel_*_obj pointer. These changes were suggested by Eric Anholt.
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Remove the assertion in intel_batchbuffer_space:
assert((intel->batch.state_batch_offset - intel->batch.reserved_space)
>= intel->batch.used*4);
After reviewing all the places where this is called, I'm (fairly)
comfortable that this assertion was redundant. Having the assertion
adds ~20KiB to a driver build:
text data bss dec hex filename
903173 26392 1552 931117 e352d i965_dri.so
924093 26392 1552 952037 e86e5 i965_dri.so
Based on feedback from Eric Anholt.
Reviewed-by: Eric Anholt <eric@anholt.net>
This differs from the FS in that we track constants in each
destination channel, and we we have to look at all the swizzled source
channels. Also, the instruction stream walk is done in an O(n) manner
instead of O(n^2).
Across shader-db, this reduces 8.0% of the instructions from 60.0% of
the vertex shaders, leaving us now behind the old backend by 11.1%
overall.
Tracking virtual GRFs has tension between using a packed array per
virtual GRF (which is good for register allocation), and sparse arrays
where there's an element per actual register (so the first and second
column of a mat2 can be distinguished inside of an optimization pass).
The FS mostly avoided the need for this second sparse array by doing
virtual GRF splitting, but that meant that instances where virtual GRF
splitting didn't work, instructions using those registers got much
less optimized.
Now instead of env INTEL_NEW_VS=1 to get it, you need INTEL_OLD_VS=1
to not get it. While it's not quite to the same codegen efficiency as
the old backend, it is not regressing piglit on G965 and G45, and
actually fixing bugs on gen6, and the remaining codegen quality
regressions all appear tractable.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't expect uniform accesses to generally go away from being dead
code at this point, and we will want to have uniforms packed before
spilling them out to pull constants when we are forced to do that.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
The offset to the arrays after the first was mis-scaled, so we'd go
access off the end of the surface and read 0s. Fixes
glsl-vs-uniform-array-3.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
While we had nice debug output for most of the instruction stream, it
was terminated by a series of anonymous MOVs and a send.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It maps to MESA_FORMAT_RGBA8888_REV. Surfaces of the format can only be
sampled from but not render to.
Only i915 is tested.
Reviewed-by: Eric Anholt <eric@anholt.net>
[olv: add a check in intel_image_target_renderbuffer_storage]
Add a new format token, __DRI_IMAGE_FORMAT_ABGR8888, to __DRI_IMAGE. It
maps to MESA_FORMAT_RGBA8888_REV in core mesa or
PIPE_FORMAT_R8G8B8A8_UNORM in gallium. The format is used by
translucent surfaces on Android.
We were splitting on each side of an unlinked program, and the two
sides lost track of which variables they referenced, resulting in
assertion failure during validation. Fixes piglit
link-struct-uniform-usage.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Previously, it would produce:
Failed to compile FS: 0:6(7): error: non-lvalue in assignment
and now it produces:
Failed to compile FS: 0:5(7): error: whole array assignment is not
allowed in GLSL 1.10 or GLSL ES 1.00.
Also, add spec quotation to the two places we have code for array
lvalues in GLSL 1.10.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We just want to mark the whole thing used, not mark from each element
the whole size in use. Fixes undefined URB entry writes on i965,
which blew up with debugging enabled.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Uses the new _mesa_decompress_image() function. Unlike the meta path
that uses textured quad rendering to do decompression, this works with
signed formats as well.
We'd still accept the GL_PALETTE[48]_* formats in glCompressedTexImage2D,
but they wouldn't be listed if you queried whether they were supported.
Signed-off-by: Adam Jackson <ajax@redhat.com>
From section 7.1 (Vertex Shader Special Variables) of the GLSL 1.30
spec:
"It is an error for a shader to statically write both
gl_ClipVertex and gl_ClipDistance."
Fixes piglit test mixing-clip-distance-and-clip-vertex-disallowed.c.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The check now applies both when explicitly declaring the size of
gl_TexCoord and when implicitly setting the size of gl_TexCoord by
accessing it using integral constant expressions.
This is prep work for adding similar size checks to gl_ClipDistance.
Fixes piglit tests texcoord/implicit-access-max.{frag,vert}.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From the GLSL 1.30 spec, section 7.1 (Vertex Shader Special Variables):
The gl_ClipDistance array is predeclared as unsized and must be
sized by the shader either redeclaring it with a size or indexing it
only with integral constant expressions.
Fixes piglit tests clip-distance-implicit-length.vert,
clip-distance-implicit-nonconst-access.vert, and
{vs,fs}-clip-distance-explicitly-sized.shader_test.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
-g3 causes binaries to be 3x - 10x bigger, not only on MinGW w/ dwarf
debugging info, but linux as well.
Stick with -g, (which defaults to -g2), like autoconf does.
Return true for NATIVE_PARAM_PREMULTIPLIED_ALPHA when all formats with
alpha support premultiplied alpha.
(Based on Chia-I Wu's patch)
[olv: remove the use of param_premultiplied_alpha from the original
patch]
Handle "format" events and return configs for the supported formats.
(Based on Chia-I Wu's patch)
[olv: update and explain why PIPE_FORMAT_B8G8R8A8_UNORM should not be
enabled without HAS_ARGB32]
Return true for NATIVE_PARAM_PREMULTIPLIED_ALPHA when all formats with
alpha support premultiplied alpha. Currently, it means when argb32 and
argb32_pre are both supported.
When wl_drm is avaiable and enabled, handle "format" events and return
configs for the supported formats. Otherwise, assume all formats of
wl_shm are supported.
EGL does not export this capability of a display server. But wayland
makes use of EGL_VG_ALPHA_FORMAT to achieve it.
So, when the native display returns true for the parameter, st/egl will
set EGL_VG_ALPHA_FORMAT_PRE_BIT for all EGLConfig's with non-zero
EGL_ALPHA_SIZE. EGL_VG_ALPHA_FORMAT attribute of a surface will affect
how the surface is presented.
Because st/vega does not support EGL_VG_ALPHA_FORMAT_PRE_BIT,
EGL_OPENVG_BIT will be cleared.
Replace the parameters of native_surface::present by a struct,
native_present_control. Using a struct allows us to add more control
options without having to update each backend every time.
The opcodes and strings were reversed. Quotient means division, and
modulus means remainder.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Follow a subset of changes in 7b1d94e5d1.
There are known issues, but it works to a certain degree. Non-working
demos also fail gracefully. More importantly, it fixes the build.
The list of numbers in (constant type (<numbers>)) needs to contain
exactly type->components() numbers (16 for a mat4, 3 for a vec3, etc.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Each of these vecN constants only provided one component, which is
illegal. The printed IR is meant to contain exactly as many components
as are necessary; the IR reader does not splat single values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were failing to relocate, so on the first draw run our scratch
would tend to get written to 0x0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were passing an MRF as the source argument, instead of using the
implied move and putting the MRF number in the proper place in the
instruction encoding.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On the old backend, we used scalar mode because Mesa IR math is
result.xyzw = math(op0.xxxx), which matched up well. However, in GLSL
IR we do things like result.xy = math(op0.xy), so we want vector mode.
For the common case of result.x = math(op0.x), performance will be the
same (no cost for un-executed channels), though result.xyzw =
math(op0.xxxx) would be worse.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When we tried to retype a brw_null_reg() in CMP(), the retyping didn't
take effect because HW_REG just ignores the type field.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If you get your total GRF count wrong, you write over some other
shader's g0, and the GPU fails shortly thereafter.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need this for the upcoming fix for sw texture_from_pixmap.
Signed-off-by: Stuart Abercrombie <sabercrombie@chromium.org>
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Current just the items that have been removed from Mesa are mentioned
in the release notes.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mesa hasn't supported color-index rendering for a long time.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GL_COLOR_INDEX produced the same result (because GL_BITMAP is always
used for stencil glDrawPixels), but it was confusing to read. I spent
about 15 minutes wondering, "WTF?"
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mesa hasn't supported color-index rendering for a long time.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
_mesa_make_temp_float_image can't work on color-index textures, but
there is no such thing as a color-index texture anymore.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These sampling functions don't work on color-index textures, but there
is no such thing as a color-index texture anymore.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These enums were only valid with the paletted texture extensions.
This allows a couple other trivial clean-ups.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's nothing left that can call any of these functions. This also
removes the meta-ops code that implemented the first two.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was also discussed at XDS 2010. However, actually making the
change was delayed because several drivers still exposed these
extensions to significant benefit (e.g., tdfx). Now that those
drivers have been removed, this code can be removed as well.
v2: A lot of bits that were missed in the previous patch have been removed.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since we now lay out the VUE the same way regardless of whether
two-sided color is enabled, brw_compute_vue_map() no longer needs to
know whether two-sided color is enabled. This allows the two-sided
color flag to be removed from the clip, GS, and VS keys, so that fewer
GPU programs need to be recompiled when turning two-sided color on and
off.
Reviewed-by: Eric Anholt <eric@anholt.net>
When doing two-sided color on GEN6+, we use the SF unit's
INPUTATTR_FACING mode to cause front colors to be used on front-facing
triangles, and back colors to be used on back-facing triangles. This
mode requires that the front and back colors be adjacent in the VUE.
Previously, we would only place front and back colors adjacent in the
VUE when two-sided color was enabled. Now we place them adjacent in
the VUE whether two-sided color is enabled or not. (We still only
swizzle the colors when two-sided color is enabled, so there should be
no user-visible change).
This simplifies the implementation of the VUE map and reduces the
amount of code that is dependent on two-sided color mode.
Reviewed-by: Eric Anholt <eric@anholt.net>
The previous computation had two bugs: (a) it used a formula based on
Gen5 for Gen6 and Gen7 as well. (b) it failed to account for the fact
that PSIZ is stored in the VUE header. Fortunately, both bugs caused
it to compute a URB size that was too large, which was benign. This
patch computes the URB size directly from the VUE map, so it gets the
result correct in all circumstances.
Reviewed-by: Eric Anholt <eric@anholt.net>
The variables offset[], idx_to_attr[], nr_bytes, nr_attrs, and
header_regs were all serving purposes which are now served by the VUE
map.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, brw_clip_interp_vertex() iterated only through the
"non-header" elements of the VUE when performing interpolation
(because header elements don't need interpolation). This code now
refers exclusively to the VUE map to figure out which elements need
interpolation, so that brw_clip_interp_vertex() doesn't need to know
the header size.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch replaces some ad-hoc computations using ATTR_SIZE and the
offset[] array to use the VUE map functions
brw_vert_result_to_offset() and brw_vue_slot_to_offset().
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously we would examine the offset[] array (since an offset of 0
meant "not in use"). This paves the way for removing the offset[]
array.
Reviewed-by: Eric Anholt <eric@anholt.net>
The offsets within the VUE of HPOS and NDC are needed only in a few
auxiliary clipping functions. This patch moves computation of those
offsets into the functions that need them, and does the computation
using the VUE map.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch changes get_attr_override() (which computes the
relationship between vertex shader outputs and fragment shader inputs)
to use the VUE map.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch removes the variables nr_attrs and nr_setup_attrs, whose
purpose is now being served by the VUE map. nr_attr_regs and
nr_setup_regs are still needed, however they are now computed using
the VUE map rather than by counting the number of vertex shader
outputs (which caused subtle bugs when gl_PointSize was written).
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, the SF used nr_setup_attrs to determine whether it was
looking at the last element of the VUE. Changed this code to use the
VUE map.
Reviewed-by: Eric Anholt <eric@anholt.net>
These data structures were serving the same purpose as the VUE map,
but were buggy. Now that the code has been transitioned to use the
VUE map, they are not needed.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, SF code used the idx_to_attr[] array to compute the
location of entries in the VUE map. This array didn't properly
account for gl_PointSize. Now we use the VUE map directly.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, some of the code in SF erroneously used bitfields based on
the gl_frag_attrib enum when actually referring to vertex results.
This worked, because coincidentally the particular enum values being
used happened to match between gl_frag_attrib and gl_vert_result. But
it was fragile, because a future change to either gl_vert_result or
gl_frag_attrib would have made the enum values stop matching up. This
patch switches the SF code to use the correct enum.
Reviewed-by: Eric Anholt <eric@anholt.net>
The new function, called get_vert_result(), uses the VUE map to find
the register containing a given vertex attribute. Previously, we used
the attr_to_idx[] array, which served the same purpose but didn't
account for gl_PointSize correctly.
This fixes a bug on pre-Gen6 wherein the back side of a triangle would
be rendered incorrectyl if the vertex shader wrote to gl_PointSize.
Reviewed-by: Eric Anholt <eric@anholt.net>
This patch moves the computation of the SF URB entry read offset from
upload_sf_unit() to its own function, so that it can be re-used when
creating the gen4-5 SF program.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, the new VS backend computed the size of the URB entry by
counting the number of MRFs used in emitting the URB entry. Now it
just gets it straight from the VUE map.
Reviewed-by: Eric Anholt <eric@anholt.net>
max_usable_mrf has been carefully set such that (max_usable_mrf -
base_mrf) is a multiple of 2, so that an even number of VUE slots are
emitted with each URB write (which Gen6 requires). This patch adds an
assertion to confirm that this is the case, and moves the comment to
this effect to be near the assertion.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, the new VS backend used two functions,
emit_vue_header_gen6() and emit_vue_header_gen4() to emit the fixed
parts of the VUE, and then a pair of carefully-constructed loops to
emit the rest of the VUE, leaving out the parts that were already
emitted as part of the header.
This patch changes the new VS backend to use the VUE map to emit the
entire VUE.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, emit_vue_header_gen4() used local variables to keep track
of which registers were storing the NDC and HPOS. This patch uses the
output_reg[] array instead, so that the code that manipulates NDC and
HPOS can be more easily refactored.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, the old VS backend computed the URB entry size by adding
the number of vertex shader outputs to the size of the URB header.
This often produced a larger result than necessary, because some
vertex shader outputs are stored in the header, so they were being
double counted. This patch changes the old VS backend to compute the
URB entry size directly from the number of slots in the VUE map.
Note: there's a subtle change in that we no longer count header
registers towards the size of the VF input. I believe this is
correct, because the header is only emitted in the output of the VS
stage--it is not present in the input. (As evidence for this, note
that brw_vs_state.c sets urb_entry_read_offset to 0--it does not
include space for the header as part of the VS input).
Reviewed-by: Eric Anholt <eric@anholt.net>
Some parts of the i965 driver keep track of locations within the VUE
(vertex URB entry) using byte offsets. This patch adds inline
functions to compute these byte offsets using the VUE map.
Reviewed-by: Eric Anholt <eric@anholt.net>
Several places in the i965 code make implicit assumptions about the
structure of data in the VUE (vertex URB entry). This patch adds a
function, brw_compute_vue_map(), which computes the structure of the
VUE explicitly. Future patches will modify the rest of the driver to
use the explicitly computed map rather than rely on implicit
assumptions about it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, this conversion was duplicated in several places in the
i965 driver. This patch moves it to a common location in mtypes.h,
near the declaration of gl_vert_result and gl_frag_attrib.
I've also added comments to remind us that we may need to revisit the
conversion code when adding elements to gl_vert_result and
gl_frag_attrib.
Reviewed-by: Eric Anholt <eric@anholt.net>
This just adds the opcodes for evergreen, need to work on r600 and cayman
implementations.
don't advertise nativeintegers yet until we work out all the regressions.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds all the API check for vertex arrays using 2101010 types.
2101010 is also useable with GL_BGRA.
v2: fix whitespace.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This adds the vertex processing paths for the 2101010 types. It converts
the attributes to floats for all the immediate entry points, some entrypoints
are normalised and the attrib APIs take a normalized parameter.
There are four main paths,
ui10 -> float unnormalized
i10 -> float unnormalized
ui10 -> float normalized
i10 -> float normalized
along with the ui2/i2 equivs.
Signed-off-by: Dave Airlie <airlied@redhat.com>
add new APIs to the internal mesa driver interface + set funcs in vtxfmt.c
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
These are the new API entrypoints for ARB_vertex_type_2_10_10_10_rev
extension, along with the new INT_2_10_10_10_REV enum.
v2: fixup crazy whitespace cut-n-paste mess
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Drivers supporting native integers set UniformBooleanTrue to the integer value
that should be used for true when uploading uniform booleans. This is ~0 for
Gallium and 1 for i965.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This just reorgs one define in csv file, and adds all the new formats
that are needed for this extension.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes all but one of the piglit regressions from enabling native integers
in softpipe. The change to fix the last regression is still being discussed.
The preferred solution to keeping track of the picture structure
has been putting it in the state tracker, so use picture_structure
instead of frame_started to check if a frame needs to begin.
If picture_structure has been changed, end the frame and start again.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
This could happen in 3 different cases, and ERRNO can explain what
happened. First case would be EIO (gpu hang), second EINVAL (something is
wrong inside the batch), and we also discovered that sometimes it happens
with ENOSPACE. All of those cases are different it it could be worth to at
least know what happened.
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
According to the comment, we need to load /some/ push constants on
pre-Gen6 hardware or the GPU will hang. The existing code set these
bogus parameters to NULL pointers; unfortunately, the code in
brw_curbe.c that loads them dereferences those pointers. So, change
them to be pointers to an actual floating point value of 0.0.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
As per Brian's suggestion, add caps for drivers that support texture
offsets to advertise a min/max via TGSI, also use it in the state tracker.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This adds tokens for texture offsets, to store 4 * swizzled vec 3
for use in TXF and other opcodes.
It also contains TGSI exec changes for softpipe to use this code,
along with GLSL->TGSI support for TXF.
v2: add some more comments, add back padding I removed.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This fixes the swrast failures for piglit's fbo-generatemipmap-formats
test (for uncompressed formats). At some point down the road this code
will go away so I haven't checked all the other store_texel() functions.
Simple demos such as test-opengl-gl_basic work. SurfaceFlinger does not
work yet due to missing GL_OES_draw_texture support (and maybe more).
Reviewed-by: Chad Versace <chad@chad-versace.us>
In preparation for porting i915 to Android, factor its source lists into
a shared makefile. This prevents duplication of source lists, and hence
prevents the Android build from breaking as often.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This is a better, more fine-grained way of lowering if statements. Fixes the
game And Yet It Moves on nv50.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We don't want to set the pixmap bit in the EGL config if the DRI
config we're adding is a double buffered config. However, don't clear
any other bits the platform might pass in in the surface_type
argument.
Using multiply and reciprocal for integer division involves potentially
lossy floating point conversions. This is okay for older GPUs that
represent integers as floating point, but undesirable for GPUs with
native integer division instructions.
TGSI, for example, has UDIV/IDIV instructions for integer division,
so it makes sense to handle this directly. Likewise for i965.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Bryan Cain <bryancain3@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Add generator instructions for the scratch opcodes.
Add emit_before() for handling ->ir and ->annotation inheritance.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This DP4 had one of its operands missing, so we were generating
garbage clip distances. Using the per-opcode instruction generators
made it obvious.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Set ctx->WindowRenderBuffer to EGL_BACK_BUFFER. As EGL_WINDOW_BIT of a
config is set only when there is dri_double_buffer, that makes sure
window surfaces are always double-buffered and contexts will render to
the back buffer.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Advertising different format support based on sample count was a
bad idea, it made resolve to window work, but resolve to anything
else would fail.
See 9f4998639c.
By emitting code before generate_code(), we ended up in align1 mode
where writemasks don't exist, so we rescaled gl_Vertex.w and things
went badly. By moving GL_FIXED support to the visitor, we end up with
normal codegen, and as a bonus the GL_FIXED setup ends up getting
printed appropriately in debug output.
Fixes gtf/GL2Tests/fixed_data_type
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
At some point we need to also move uniform accesses out to pull
constants when there are just too many in use, but we lack tests for
that at the moment.
Fixes glsl-vs-large-uniform-array.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This avoids the massive conditional move array access, and brings code
generation quality for the new VS backend into the realm of efficiency
of the old backend (roughly 20% more instructions generated than
before across shader-db, instead of assertion failing for generating
over 10,000 instructions on many shaders!).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We sometimes want to put an instruction somewhere besides the end of
the instruction stream, and we also want per-opcode instruction
generation to enable compile-time checking of operands.
We'll be using that to track things for the new VS backend, and this will
avoid cluttering brw_vs_surface_state.c for it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were primarily failing to convert in the NativeIntegers case, which
this fixes. However, we were also just truncating float uniforms when
converting to integer, which does not appear to be the correct
behavior. Note, however, that the NVIDIA drivers also truncate
instead of rounding.
GL_DOUBLE return type is dropped because it was never used and
completely broken. It can be added when there's test code.
Fixes piglit ARB_shader_objects/getuniform
v2: This is a rewrite of my previous glGetUniform patch, which Ken
pointed out missed storage_type-based conversions to integer,
which was totally broken still thanks to a typo in the testcase.
v3: Quote the spec justifying the rounding behavior.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
At least for Intel, all our uniform components are of uint32_t size, either
float or signed or unsigned int. For uploading uniform data in the driver,
it's much easier to upload a full dword per uniform element instead of trying
to pick out the bool byte and then fill in the top 3 bytes of pad with 0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Replace each occurence of
#include "../glsl/*.h"
with
#include "glsl/*.h"
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
libmesa_dri_common is a static library that contains the sources in
src/mesa/drivers/dri/common. Each DRI driver should link to it.
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
In src/mesa/Android.mk, it is non-trivial to determine which variables are
imported by `include sources.mak`. So document them.
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
libmesa_dricore.a is analogous to the libmesa.a built by the Autoconf
build.
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
In order that the Autoconf and Android build can share the same source
lists, move the lists from
src/mesa/drivers/dri/Makefile.defines
into
src/mesa/drivers/dri/common/Makefile.sources
I would like for Android to just reuse Makefile.defines, but the file is
unsuitable for reuse.
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off: Chad Versace <chad@chad-versace.us>
driverfuncs.o is already contained in libmesa.a, so remove it from the
following source lists:
src/mesa/drivers/dri/Makefiles.defines:COMMON_SOURCES.
src/mesa/drivers/dri/swrast/Makefile:SWRAST_COMMON_SOURCES
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Remove defintion of COMMON_SOURCES from {r300,r660}/Makefile. The
defintion is a duplicate of that found in
src/mesa/drivers/dri/Makefile.defines.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
The layersize calculation is slightly different on +evergreen.
This makes mpeg2 video decoding and piglits texture-packed-formats
test work correctly on this hardware.
I noticed that a thread was created for every time async flush was called, so I moved it and used some semaphores to synch.
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
This prevents null dereferences in validation of interdependent
state after a switch to a pipe context where we mark all state
as dirty but where not all state is valid / set yet.
The window system buffer will be BGRA and applications will try to
directly resolve to it, which would trigger an INVALID_OPERATION in
BlitFramebuffer if the multisample renderbuffer is RGBA.
All commonly used windows toolchains define wgl entrypoints in the windows
headers, and mesa_wgl.h not only is unnecessary but actually often stands
in the waydue to slight inconsistencies.
So remove it.
This is a port of vec4_visitor::try_rewrite_rhs_to_dst to fs_visitor.
Not only is this technique less invasive and more robust, it also
generates better code. Over and above the previous technique, this
reduced instruction count in shader-db by 0.28% on average and 1.4% in
the best case.
In no case did this technique result in more code than the prior method.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
This reverts commit 53c89c67f3, along with
the subsequent this->result = reg_undef additions it required.
Both Eric and I agree that the way he did this is really fragile; if you
forget to add this->result = reg_undef before calling accept(), it may
end up using the same register for two separate things, breaking things
in strange and mysterious ways.
The next commit will port over the new VS backend's method for solving
this problem, which is simpler, less intrusive, and still manages to
avoid MOVs in the common case.
Nothing in Mesa supports color-index textures, and most of the other
infrastructure that could allow such support has already been removed.
This puts the final nail in the coffin.
Also clean out some GL_COLOR_INDEX comments in formats.c.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This came from the "kill it with fire" discussion at XDS 2010.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This continues to allocate texImage->Data as before, so
drivers calling these functions need to use that when present.
Reviewed-by: Brian Paul <brianp@vmware.com>
ctx->Driver.MapTextureImage() / UnmapTextureImage() will be called by
the glTex[Sub]Image(), glGetTexImage() functions, etc. when we're
accessing texture data, and also for software rendering when accessing
texture data.
Reviewed-by: Brian Paul <brianp@vmware.com>
All driver implementations of FreeTextureImageBuffer already check
that Data != NULL and free it. However, this means that we will also
free driver storage if the driver storage wasn't in the form of a Data
pointer.
This was produced by the following semantic patch:
@@
expression C;
expression T;
@@
- if (T->Data) {
- C->Driver.FreeTextureImageBuffer(C, T);
+ C->Driver.FreeTextureImageBuffer(C, T);
- }
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This was produced by sed, except for one hunk in driverfuncs.c where
trailing whitespace was dropped.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Add platform_android.c that supports _EGL_PLAFORM_ANDROID. It works
with drm_gralloc, where back buffers of windows are backed by GEM
objects.
In Android a native window has a queue of back buffers allocated by the
server, through drm_gralloc. For each frame, EGL needs to
dequeue the next back buffer
render to the buffer
enqueue the buffer
After enqueuing, the buffer is no longer valid to EGL. A window has no
depth buffer or other aux buffers. They need to be allocated locally by
EGL.
Reviewed-by: Benjamin Franzke <benjaminfranzke@googlemail.com>
Reviewed-by: Chad Versace <chad@chad-versace.us>
[olv: with assorted minor changes, mostly suggested during the review]
Add rgba_masks to dri2_add_config. When it is non-NULL, the DRI config
is accepted only when the offsets and sizes of the its channels match
rgba_mask.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Quickly tested with 945GME. SurfaceFlinger (the display server and
compositor) works. 2D apps with RGB or RGBA visuals work. As for 3D
apps, some work and some do not.
Quickly tested with VMWare Workstation 7.1.4 on Linux with GeForce
GT220. SurfaceFlinger (the display server and compositor) works. 2D
apps with RGB visual works. However, due to missing
PIPE_FORMAT_R8G8B8A8_UNORM support, those with RGBA visual do not.
Factor out C_SOURCES from Makefile to Makefile.sources, and let Makefile
and SConscript share it.
Note that
$(TOP)/src/glsl/ralloc.c and
$(TOP)/src/mesa/program/register_allocate.c
are removed from C_SOURCES in Makefile.sources and added back in
Makefile and SConscript. The idea is that they are not part of r300g.
But having them in libr300.a makes build non-GL targets such as the
compiler tests or g3dvl much easier. Also, for practical reason, TOP
would be an undefined variable in Makefile.sources.
drmVersion and driver specific ioctls are used to get the PCI ID from a
DRM fd. Eexpand the mechanism to nouveau and vmwgfx, except that for
nouveau, only the vendor ID is needed, and for vmwgfx, always assume
SVGA II.
When generating dispatch templates, emit the '(void) blah;' magic to
make GCC happy. This reduces a lot of warning spam if you build with
-Wunused-parameter or -Wextra.
Reviewed-by: Chia-I Wu <olv@lunarg.com>
In preparation for porting i965 to Android, factor its source lists into
a shared makefile. This prevents duplication of source lists, and hence
prevents the Android from breaking as often.
Acked-by: Chia-I Wu <olv@lunarg.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
If the state tracker tries to map the resource directly but we can't or don't
want to do that, fail to create a transfer.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Compiling some (large) files with i686-pc-mingw32-gcc 4.2.2 (at least)
and the -gstabs option triggers a compiler error. Use this work-around
to simply compile the effected files without -gstabs.
Make setting the quant matrixes a generic interface.
Also removes setting the quant matrix from the XvMC interface
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Younes Manton <younes.m@gmail.com>
Make the picture_structure enum spec complient.
Also remove it from the compositor.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Younes Manton <younes.m@gmail.com>
Revert back to a macroblock based interface. The structure used
tries to keep as close to the spec as possible.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Younes Manton <younes.m@gmail.com>
Implement PIPE_CAP_NUM_BUFFERS_DESIRED giving the decoder control over
the number of buffers a state tracker should allocate.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Younes Manton <younes.m@gmail.com>
First of all get ride of the decode_buffer structure, while still giving
the decoder the ability to organize it's buffers depending on the needs
of the state tracker.
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Younes Manton <younes.m@gmail.com>
Instructions with 3 source operands have no write mask, so we may replace their
destinations with PV/PS in the next group even if their dst.write is 0.
Note: This is a candidate for the 7.11 branch.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Need to do full check when not all bank swizzles in the group are forced
(e.g. when trying to merge interp_* group with the next instruction)
Note: This is a candidate for the 7.11 branch.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nothing in Mesa generates these opcodes, and i965 hardware cannot
support it natively. If support were ever added for this opcode in
Mesa, there had better be a lowering pass for hardware that doesn't
support it natively.
while debugging texelFetchOffset we kept hitting the assert.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Otherwise we continue and hit the "Illegal formal parameter mode"
assertion.
Fixes negative compile test texelFetchOffset.frag in piglit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds texelFetch support to translate from GLSL to TGSI TXF opcode.
I've tested this works with an r600g and softpipe backend.
v2: drop comments, fix title,
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Bryan Cain <bryancain3@gmail.com>
This just calls the texel fetch functions directly bypassing the sampling,
notes:
1: loops inside switch should be more optimal.
2: borders can be sampled though only up to border depth, outside that
its undefined.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is a straight texel fetch with no filtering or clamping. It uses
integers to specify the i/j/k (from EXT_gpu_shader4).
To enable this I had to add another hook into the tgsi sampler so that
we could easily bypass all the filtering sample does.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This adds the get_dims callback that is called from the tgsi exec_txq.
It returns values as per EXT_gpu_program4.
v2: fix one indent + use a switch (slighty modified from Brian)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
this adds another callback in the sampler struct containing get_dims
entry point. This is used to query the driver for the texture resource
dimensions for the resource bound to the current sampler.
v2: remove unusued variable, fix indent
Signed-off-by: Dave Airlie <airlied@redhat.com>
It's the same as GL_AMD_conservative_depth. The specs have slight
differences in wording, but don't differ in content or behavior.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested with a Radeon HD 6250. SurfaceFlinger (the display server and
compositor) works. 2D apps with RGB or RGBA visuals work. As for 3D
apps, some work but some don't (with serious rendering defects).
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Put restrict in the function definitions to silence MSVC warnings
about incompatible assignments in "func = lp_tile_foobar;" when func
was declared with restrict keywords but the rhs function wasn't.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch documents some Mesa coding style conventions that came up
during the discussion of commit 67b5a32 (Perform implicit type
conversions on function call out parameters).
Before, if we ended up here without a BO for our image, but did choose
a miptree that had active rendering in the command buffer, our
teximage data would jump ahead of the rendering using the old texture
contents.
This showed up as breakage in gen-teximage and friends in the
following commit.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Several drivers have these fields in their subclasses of gl_texture_image.
They'll be useful for core Mesa too...
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The use of mmap() in winsys requires large file support. Not all OSes
have LFS so a wrapper should be used. In particular, os_mmap() should
call __mmap2() on Android.
Replace all calls to dd_function_table::MapBuffer with appropriate
calls to dd_function_table::MapBufferRange, then remove all the cruft.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The code previously passed GL_DYNAMIC_DRAW for the access parameter.
By inspection, I believe that all drivers would treat this as
GL_READ_WRITE because it's not GL_READ_ONLY and it's not
GL_WRITE_ONLY.
It appears the i965 code wants GL_WRITE_ONLY (it's about to write a
bunch of data in, never read data), while the arrayelt code is
GL_READ_ONLY (just dereffed as arguments to CALL_Whatever*v).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Keith Whitwell <keithw@vmware.com>
No driver used that parameter, and most drivers ended up with a bunch
of unused-parameter warnings because it was there.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
No driver used that parameter, and most drivers ended up with a bunch
of unused-parameter warnings because it was there.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
No driver used that parameter, and most drivers ended up with a bunch
of unused-parameter warnings because it was there.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
No driver used that parameter, and most drivers ended up with a bunch
of unused-parameter warnings because it was there.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
No driver used that parameter, and most drivers ended up with a bunch
of unused-parameter warnings because it was there.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
No driver used that parameter, and most drivers ended up with a bunch
of unused-parameter warnings because it was there.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Unfortunately, since a previous efficiency improvement, we no longer
have any open-source testcases producing register spilling, so this
code was untested in the fragment shader path. That should change
when we get proper temporary array support in the fragment shader.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40194
Also, remove the BRW_SAMPLER_MESSAGE_SIMD8_RESINFO #define because
there totally isn't a SIMD8 variant.
Unfortunately, resinfo returns FLOAT32 on Broadwater/Crestline, unlike
G45 which returns a proper UINT32. This turns out to be simple,
however: when we emit MOVs to select the desired half of the SIMD16
result, we can simply override the register type to be float so it's
converted to an integer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Not all texturing operations return floating point data. For example,
the resinfo message (textureSize or TXS) returns integer data. In the
future, we'll also add integer texture support.
ir_texture's type field contains this information; use its base type to
appropriately type the destination register. We want to keep it as a
four component vector, however, since SIMD8 samplers always have a
response length of 4.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Formats were based on a patch sent to xf86-video-nouveau by Bryan Cain
Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com>
[Michel Dänzer: Add xorg_xvmc.c to SConscript.]
The driver may install its own vertex shader. _mesa_set_vp_override
must be called so that core mesa can generate correct fragment program..
Reviewed-by: Brian Paul <brianp@vmware.com>
Factor out source lists from Makefile to Makefile.sources, and let
Makefile, SConscript, and Android.mk share it.
Note that files in $(GENERATED_SOURCES) are removed from $(C_SOURCES).
Acked-by: José Fonseca <jfonseca@vmware.com>
Acked-by: Chad Versace <chad@chad-versace.us>
ParseSourceList() can be used to parse a source list file and returns
the source files defined in it. It is supposed to be used like this
# get the list of source files from C_SOURCES in Makefile.sources
sources = env.ParseSourceList('Makefile.sources', 'C_SOURCES')
The syntax of a source list file is compatible with GNU Make. This
effectively allows SConscript and Makefile to share the source lists.
Acked-by: José Fonseca <jfonseca@vmware.com>
Acked-by: Chad Versace <chad@chad-versace.us>
There is no ir_hierarchical_visitor::visit(ir_if *) method, since ir_if
is not a leaf node. Instead, there are visit_enter and visit_leave
methods. Use visit_enter arbitrarily (either would work fine, though
visit_enter will catch errors sooner).
Found thanks to a warning emitted by Clang.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
When intel_context requires separate stencil but the DRI2 separate stencil
handshake fails, then abort and emit an error instructing the user to
upgrade the DDX to 2.16.0.
CC: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Implement the any() part of the operation the same way regular ir_unop_any
is implemented.
This is a port of commit e7bf096e8b to glsl_to_tgsi, with added integer
support.
Logical-or is implemented using addition (followed by clamping to [0,1]) on
values of 0.0 and 1.0. Replacing the logical-or operators with addition gives
a + b which has a result on the range [0, 2].
Previously a SNE instruction was used to clamp the resulting logic value to
[0,1]. In a fragment shader, using a saturate on the add has the same effect.
Adding the saturate to the add is free, so (at least) one instruction is
saved. In a vertex shader, using an SLT on the negation of the add result has
the same effect. Many older shader architectures do not support the SNE
instruction. It must be emulated using two SLT instructions and an ADD. On
these architectures, the single SLT saves two instructions.
Note that SNE is still used when integers are used for boolean values, since
there is no such thing as an integer saturate, and older shader architectures
without SNE don't support integers.
This is a port of commit 41f8ffe5e0 to glsl_to_tgsi with integer support
added.
Since this is the software path, set GRALLOC_USAGE_SW_WRITE_OFTEN when
PIPE_BIND_RENDER_TARGET, and set GRALLOC_USAGE_SW_READ_OFTEN when
PIPE_BIND_SAMPLER_VIEW.
libGLES_mesa with swrast should link in these libraries
libmesa_egl
libmesa_egl_gallium
libmesa_st_egl
libmesa_st_mesa
libmesa_glsl
libmesa_glsl_utils
libmesa_pipe_softpipe
libmesa_winsys_sw_android
libmesa_gallium
Reviewed-by: Chad Versace <chad@chad-versace.us>
This builds the static library libmesa_glsl and executable glsl_compiler
from glsl. glsl_compiler is only installed for engineering build.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This is the first step to integrate Mesa into Android(-x86) build
system. You can git clone mesa under the external/ directory of Android
source tree and build Android with
$ make BOARD_GPU_DRIVERS=swrast
It will build libGLES_mesa that will be loaded by Android runtime.
libGLES_mesa is still a stub in this commit.
Both HW and SW rendering are supported for Android. For SW rendering,
we use the generic gralloc lock/unlock for mapping and unmapping color
buffers (in winsys/android).
For HW rendering, we need to know the real type of color buffers. This
backend works with drm_gralloc, where a color buffer is backed by a GEM
object.
On Android, color buffers are passed between server and clients as
opaque buffer_handle_t. This winsys makes use of gralloc, which
provides a generic way to map and unmap buffer_handle_t for CPU access.
Add EGL_ANDROID_image_native_buffer and EGL_ANDROID_swap_rectangle.
There is no spec for them though.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chad Versace <chad@chad-versace.us>
Android uses Linux kernel and its own C runtime. It resembles
PIPE_OS_LINUX a lot with some minor exceptions.
Reviewed-by: Brian Paul <brianp@vmware.com>
Move vbo_exec_FlushVertices_internal out of FEATURE_beginend.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Chad Versace <chad@chad-versace.us>
Makes the new vertex shader backend work on Ivybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
When ctx->Const.NativeIntegers is set, Core Mesa loads integer/boolean
uniforms directly, rather than loading the floating point equivalent.
So, when that's set, we don't need to perform any conversions.
Unfortunately, we can't properly support native integers with the old
vertex shader backend, so this patch leaves them disabled for now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, native integer support was based on whether the driver
advertised GLSL 1.30 or not. However, drivers that natively support
integers may wish to do so for older GLSL versions as well. Adding this
new opt-in flag allows them to do so.
Currently disabled by default on all drivers, which was the existing
behavior (no drivers currently implement GLSL 1.30).
Fixes piglit tests on i965 with INTEL_GLSL_VERSION=130 set:
- spec/glsl-1.10/fs-uniform-int-110.shader_test
- spec/glsl-1.30/fs-uniform-int-130.shader_test
(it was doubly converting the data)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes vs-atan-* and several others. This is not the real solution we
eventually want, which will pack floats, vec2s, and vec3s into vec4
registers, but this code should provide the framework for that.
This is a rather pessimistic calculation, since it doesn't distinguish
individual channels of a vec4, or elements of an array, but should be
a minimum start for register allocation.
The areamap contains precomputed data on different aliasing types.
It is necessary for good performance.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
The state tracker expects depth and stencil pixels interleaved.
Evergreen can bind an interleaved depth-stencil resource as a colorbuffer,
but not as a zbuffer.
The hardware can do the interleaving for us when decompressing.
Such that it actually works in apps which use both.
A separate buffer is allocated for stencil. The only exception is
the window-system-provided depth-stencil buffer, where depth and stencil
share the same buffer.
This fixes:
- fbo-depthstencil-GL_DEPTH24_STENCIL8-clear
- fbo-depthstencil-GL_DEPTH24_STENCIL8-drawpixels-FLOAT-and-USHORT
- fbo-depthstencil-GL_DEPTH24_STENCIL8-readpixels-24_8
- fbo-depthstencil-GL_DEPTH24_STENCIL8-readpixels-FLOAT-and-USHORT
This was an unfinished to-do item before.
With this patch and the two preceeding patches, piglit's
fbo-generatemipmap-array test runs and passes instead of generating
a GL error and dying on an assertion.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We could do 1D/2D arrays with textured quad rendering, but it'll take
some work (as with 3D textures).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Q should not be significant for OPCODE_TEX, but it winds up getting
passed to the compute_lambda() function. Make sure it's 1.0 to
prevent garbage values, which is effectively what we get when the
swizzle is coord.xyzz (which is what GLSL gives us).
Part of the fix for piglit's fbo-generatemipmap-array test.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Declare _mesa_meta_begin()/end() in meta.h so that drivers can write
custom meta-ops (such as HiZ resolves for i965).
This necessitates moving the the META_* macros into meta.h. To prevent
naming collisions, this commit renames each macro to be MESA_META_*.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Commit 6eff33dc (glapi: generate ES dispatch headers from core mesa)
replaced the autogenerated files
src/mapi/es1api/main/{dispatch,remap_helper}.h with new autogenerated
files src/mesa/main/api_exec_es{1,2}_{dispatch,remap_helper}.h. This
patch updates the .gitignore files to properly ignore the new
autogenerated files, and stop ignoring the old autogenerated files.
Reviewed-by: Chia-I Wu <olv@lunarg.com>
The flush extensions flush call indicates end of frame and should only
be called once per frame. However, in the dri2SwapBuffer fallback
path, we call flush and then call dri2CopySubBuffer, which also calls
flush. Refactor the code to only call flush once.
Needed for GL3.
v2: evergreen support
I don't set PA_SU_SC_MODE_CNTL.MULTI_PRIM_IB_ENA.
piglit/primitive-restart does pass though. Tested on RV730 and EG-REDWOOD.
The MUL opcode does a 16bit * 32bit multiply, and we need to do the
MACH to get the top 16bit * 32bit added in.
Fixes fs-op-mult-int-*, fs-op-mult-ivec*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Shader Model 3.0[1] requires that shaders be able to execute at least
65536 instructions. Bump Mesa maxExec to that limit. This allows
several vertex shaders in the OpenGL ES 2.0 conformance test suite to
run to completion.
1: http://en.wikipedia.org/wiki/High_Level_Shader_Language
Reviewed-by: Eric Anholt <eric@anholt.net>
This cleans up some code generated by the IR-to-Mesa pass for i915.
In particular, some shaders involving arrays of constant matrices
result in really bad code.
v2: Silence several warnings from merging the gl_constant_value work.
Fix DP[23] folding. Add support for a bunch more opcodes that appear
in piglit runs on i915.
Reviewed-by: Eric Anholt <eric@anholt.net>
!a && b occurs frequently when nexted if-statements have been
flattened. It should also be possible use a MAD for (a && b) || c,
though that would require a MAD_SAT.
Reviewed-by: Eric Anholt <eric@anholt.net>
The operation ir_binop_all_equal is !(a.x != b.x || a.y != b.y || a.z
!= b.z || a.w != b.w). Logical-or is implemented using addition
(followed by clampling to [0,1]) on values of 0.0 and 1.0. Replacing
the logical-or operators with addition gives !bool((int(a.x != b.x) +
int(a.y == b.y) + int(a.z == b.z) + int(a.w == b.w)). This can be
implemented using a dot-product with a vector of all 1.0. After the
dot-product, the value will be an integer on the range [0,4].
Previously a SEQ instruction was used to clamp the resulting logic
value to [0,1] and invert the result. Using an SGE instruction on the
negation of the dot-product result has the same effect. Many older
shader architectures do not support the SEQ instruction. It must be
emulated using two SGE instructions and a MUL. On these
architectures, the single SGE saves two instructions.
Reviewed-by: Eric Anholt <eric@anholt.net>
The operation ir_binop_any_nequal is (a.x != b.x) || (a.y != b.y) ||
(a.z != b.z) || (a.w != b.w), and that is the same as any(bvec4(a.x !=
b.x, a.y != b.y, a.z != b.z, a.w != b.w)). Implement the any() part
the same way the regular ir_unop_any is implemented.
Reviewed-by: Eric Anholt <eric@anholt.net>
This is just like the ir_binop_logic_or case. The operation
ir_unop_any is (a.x || a.y || a.z || a.w). Logical-or is implemented
using addition (followed by clampling to [0,1]) on values of 0.0 and
1.0. Replacing the logical-or operators with addition gives (a.x +
a.y + a.z + a.w). This can be implemented using a dot-product with a
vector of all 1.0.
Previously a SNE instruction was used to clamp the resulting logic
value to [0,1]. In a fragment shader, using a saturate on the
dot-product has the same effect. Adding the saturate to the
dot-product is free, so (at least) one instruction is saved.
In a vertex shader, using an SLT on the negation of the dot-product
result has the same effect. Many older shader architectures do not
support the SNE instruction. It must be emulated using two SLT
instructions and an ADD. On these architectures, the single SLT saves
two instructions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Logical-or is implemented using addition (followed by clampling to
[0,1]) on values of 0.0 and 1.0. Replacing the logical-or operators
with addition gives a + b which has a result on the range [0, 2].
Previously a SNE instruction was used to clamp the resulting logic
value to [0,1]. In a fragment shader, using a saturate on the add has
the same effect. Adding the saturate to the add is free, so (at
least) one instruction is saved.
In a vertex shader, using an SLT on the negation of the add result has
the same effect. Many older shader architectures do not support the
SNE instruction. It must be emulated using two SLT instructions and
an ADD. On these architectures, the single SLT saves two
instructions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Remove the inclusion of fpu_control.h from compiler.h. Since Bionic lacks
fpu_control.h, this fixes the Android build.
Also remove the sole use of the fpu_control bits, which was in debug.c.
Those were brianp's debug bits, and he approved of their removal.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
We can't just look at the instruction that happens to appear at the
start of the loop, because it might be some other exec size and cause
us to only loop on the first N channels. We always want 8 in our
current code (since 16 doesn't work so we don't do 16-wide fragment in
that case).
Fixes loop-03.vert, which was triggering the assertions.
Link failure is something that shouldn't happen, but we sometimes want
it during development. The precompile also allows analysis of shader
codegen with shader-db.
This fixes most of the regressions in the vs array test set from the
varying array indexing work, since the giant array that was originally
allocated in virtual GRF space never gets used and is only ever
read/stored from scratch space.
We keep building these strange interfaces for DP read/write where
there's a helper function with some partially-specific,
partially-general controls, which is used in exactly one place in code
generation. Making these public will let us set up those instructions
in the one place they're to be generated.
For structs/arrays/matrices, they were ending up as uint because we
forgot to set them. All varyings in GLSL 1.20 are of base type float,
so just force the matter here (which gets inherited at
emit_urb_writes() time).
Fixes vs-varying-array-mat2-col-rd.
The low-level IR is a mashup of brw_fs.cpp and ir_to_mesa.cpp. It's
currently controlled by the INTEL_NEW_VS=1 environment variable, and
only tested for the trivial "gl_Position = gl_Vertex;" shader so far.
This will be used by the new vertex shader backend. The scalarizing
passes are skipped for non-fragment, since vertex and geometry threads
are based on vec4s.
This patch fixes a bug when lowering an integer division:
x/y
to a multiplication by a reciprocal:
int(float(x)*reciprocal(float(y)))
If x was a plain int and y was an ivecN, the lowering pass
incorrectly assigned the type of the product to be float, when in fact
it should be vecN. This caused mesa to abort with an IR validation
error.
Fixes piglit tests {fs,vs}-op-div-int-ivec{2,3,4}.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I also needed to make some changes in u_vbuf_mgr in order to override
the caps from the driver and enable the fallback even though the driver
claims the format is supported.
It's going to flush client's commands in eglWaitClient(). Before this,
egl applications using pixmap or pbuffer flicker because of no flush.
Reviewed-by: Alan Hourihane
The vs-varying-array-mat2-col-row-wr test writes a mat2[3] constant to
a mat2[3] varying out array, and also statically accesses element 1 of
it on the VS and FS sides. At link time it would get trimmed down to
just 2 elements, and then codegen of the VS would end up generating
assignments to the unallocated last entry of the array. On the new
i965 VS backend, that happened to land on the vertex position.
Some issues remain in this test on softpipe, i965/old-vs and
i965/new-vs on visual inspection, but i965 is passing because only one
green pixel is probed, not the whole split green/red quad.
This patch extends ir_validate.cpp to check the following
characteristics of each ir_call:
- The number of actual parameters must match the number of formal
parameters in the signature.
- The type of each actual parameter must match the type of the
corresponding formal parameter in the signature.
- Each "out" or "inout" actual parameter must be an lvalue.
Reviewed-by: Chad Versace <chad@chad-versace.us>
These functions don't modify the target instruction, so it makes sense
to make them const. This allows these functions to be called from ir
validation code (which uses const to ensure that it doesn't
accidentally modify the IR being validated).
Reviewed-by: Chad Versace <chad@chad-versace.us>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When an out parameter undergoes an implicit type conversion, we need
to store it in a temporary, and then after the call completes, convert
the resulting value. In other words, we convert code like the
following:
void f(out int x);
float value;
f(value);
Into IR that's equivalent to this:
void f(out int x);
float value;
int out_parameter_conversion;
f(out_parameter_conversion);
value = float(out_parameter_conversion);
This transformation needs to happen during ast-to-IR convertion (as
opposed to, say, a lowering pass), because it is invalid IR for formal
and actual parameters to have types that don't match.
Fixes piglit tests
spec/glsl-1.20/compiler/qualifiers/out-conversion-int-to-float.vert and
spec/glsl-1.20/execution/qualifiers/vs-out-conversion-*.shader_test,
and bug 39651.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39651
Reviewed-by: Chad Versace <chad@chad-versace.us>
libGLw is an old OpenGL widget library with optional Motif support.
It almost never changes and very few people actually still care about
it, so we've decided to ship it separately.
The new home for libGLw is: git://git.freedesktop.org/mesa/glw/
Reviewed-by: Brian Paul <brianp@vmware.com>
Previously if-statements were lowered from inner-most to outer-most
(i.e., bottom-up). All assignments within an if-statement would have
the condition of the if-statement appended to its existing condition.
As a result the assignments from a deeply nested if-statement would
have a very long and complex condition.
Several shaders in the OpenGL ES2 conformance test suite contain
non-constant array indexing that has been lowered by the shader
writer. These tests usually look something like:
if (i == 0) {
value = array[0];
} else if (i == 1) {
value = array[1];
} else ...
The IR for the last assignment ends up as:
(assign (expression bool && (expression bool ! (var_ref if_to_cond_assign_condition) ) (expression bool && (expression bool ! (var_ref if_to_cond_assign_condition@20) ) (expression bool && (expression bool ! (var_ref if_to_cond_assign_condition@22) ) (expression bool && (expression bool ! (var_ref if_to_cond_assign_condition@24) ) (var_ref if_to_cond_assign_condition@26) ) ) ) ) (x) (var_ref value) (array_ref (var_ref array) (constant int (5)))
The Mesa IR that is generated from this is just as awesome as you
might expect.
Three changes are made to the way if-statements are lowered.
1. Two condition variables, if_to_cond_assign_then and
if_to_cond_assign_else, are created for each if-then-else structure.
The former contains the "positive" condition, and the later contains
the "negative" condtion. This change was implemented in the previous
patch.
2. Each condition variable is added to a hash-table when it is created.
3. When lowering an if-statement, assignments to existing condtion
variables get the current condition anded. This ensures that nested
condition variables are only set to true when the condition variable
for all outer if-statements is also true.
Changes #1 and #3 combine to ensure the correctness of the resulting
code.
4. When a condition assignment is encountered with a condition that is
a dereference of a previously added condition variable, the condition
is not modified.
Change #4 prevents the continuous accumulation of conditions on
assignments.
If the original if-statements were:
if (x) {
if (a && b && c && d && e) {
...
} else {
...
}
} else {
if (g && h && i && j && k) {
...
} else {
...
}
}
The lowered code will be
if_to_cond_assign_then@1 = x;
if_to_cond_assign_then@2 = a && b && c && d && e
&& if_to_cond_assign_then@1;
...
if_to_cond_assign_else@2 = !if_to_cond_assign_then
&& if_to_cond_assign_then@1;
...
if_to_cond_assign_else@1 = !if_to_cond_assign_then@1;
if_to_cond_assign_then@3 = g && h && i && j;
&& if_to_cond_assign_else@1;
...
if_to_cond_assign_else@3 = !if_to_cond_assign_then
&& if_to_cond_assign_else@1;
...
Depending on how instructions are emitted, there may be an extra
instruction due to the duplication of the '&&
if_to_cond_assign_{then,else}@1' on the nested else conditions. In
addition, this may cause some unnecessary register pressure since in
the simple case (where the nested conditions are not complex) the
nested then-condition variables are live longer than strictly
necessary.
Before this change, one of the shaders in the OpenGL ES2 conformance
test suite's acos_float_frag_xvary generated 348 Mesa IR instructions.
After this change it only generates 124. Many, but not all, of these
instructions would have also been eliminated by CSE.
Reviewed-by: Eric Anholt <eric@anholt.net>
Now the condition (for the then-clause) and the inverse condition (for
the else-clause) get written to separate temporary variables. In the
presence of complex conditions, this shouldn't result in more code
being generated. If the original if-statement was
if (a && b && c && d && e) {
...
} else {
...
}
The lowered code will be
if_to_cond_assign_then = a && b && c && d && e;
...
if_to_cond_assign_else = !if_to_cond_assign_then;
...
Reviewed-by: Eric Anholt <eric@anholt.net>
EGL doesnt define howto manage different native platforms.
So mesa has a builtime configurable default platform,
whith non-standard envvar (EGL_PLATFORM) overwrites.
This caused unneeded bugreports, when EGL_PLATFORM was forgotten.
Detection is grouped into basic types of NativeDisplays (which itself
needs to be detected). The final decision is based on characteristcs
of these basic types:
File Desciptor based platforms (fbdev):
- fstat(2) to check for being a fd that belongs to a character device
- check kernel subsystem (todo)
Pointer to structuctures (x11, wayland, drm/gbm):
- mincore(2) to check whether its valid pointer to some memory.
- magic elements (e.g. pointers to exported symbols):
o wayland display stores interface type pointer (first elm.)
o gbm stores pointer to its constructor (first elm.)
o x11 as a fallback (FIXME?)
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
GLESv1 and GLESv2 have their own dispatch.h and remap_helper.h. These
headers are only used by api_exec_es1.c and api_exec_es2.c in core mesa.
Move the rules to generate them from glapi to core mesa.
Reviewed-by: Brian Paul <brianp@vmware.com>
[olv: updated after reviewing to fix SCons build]
glapi_gen.mk is supposed to be included by glapi users to simplify
header generation. This commit also makes es1api, es2api, and
shared-glapi use it.
Reviewed-by: Brian Paul <brianp@vmware.com>
[olv: updated after reviewing to prefix all variables in glapi_gen.mk by
glapi_gen]
glapi/gen-es/ defines two sets of GLAPI XMLs for OpenGL ES 1.1
(es1_API.xml) and 2.0 (es2_API.xml) respectively. They are used to
generate dispatch.h and remap_helper.h for GLES. Together with
gl_and_es_API.xml, we have to maintain three sets of GLAPI XMLs.
This commit makes dispatch.h and remap_helper.h for GLES be generated
from gl_and_es_API.xml.
Reviewed-by: Brian Paul <brianp@vmware.com>
add gl_api::filter_functions and gl_function::filter_entry_points to
filter out unwanted functions and entry points.
Reviewed-by: Brian Paul <brianp@vmware.com>
Move the list of entry points belong to GLES from mapi_abi.py to a new
file.
Until we figure out how to describe the APIs an entry point belongs to
in the XML file, and how to handle the case where an entry point others
alias is missing in some APIs, this is an easier solution than
maintaining another two sets of XMLs in glapi/gen-es/.
Reviewed-by: Brian Paul <brianp@vmware.com>
Remove the 'f' suffix from a float literal.
- .float 0.0f+1.0
+ .float 1.0
This fixes the following compile error with clang:
error: unexpected token in directive
.float 0.0f+1.0
^
Note: This is a candidate for the stable branches.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Optional parallel rendering of spans using OpenMP.
Initial implementation for aa triangles. A new option for scons is
also provided to activate the openmp support (off by default).
Signed-off-by: Brian Paul <brianp@vmware.com>
After copy buffer on preGEN6, it is necessary to wait for the blit to
complete before returning data to the user.
This should fix the piglit test: copy_buffer_coherency (pre-GEN6).
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
"reg" was set in only one case, virtual GRFs pre register allocation,
and would be unset and have hw_reg set after allocation. Since we
never bothered with looking at virtual GRF number after allocation
anyway, just use the same storage and avoid confusion.
Besides separating out a logical step of the giant register allocator
function, this now communicates a bunch of the allocator information
through entries in brw_context, which will make this code partially
reusable for caching the expensive allocator setup.
It's fewer pointers to track, and when we start caching the register
set, should be algorithmically better in the cache hit case (lookup in
a byte-per-register array, instead of a linear walk through
desctiption of register classes to find how to translate that class).
This was a debugging aid at one point -- virtual grf 0 should never be
allocated, and it would be used if undefined register access occurred
in codegen. However, it made the confusing register allocation code
even more confusing by indexing things off of 1 all over.
At least one of the invariants verified by IR validation concerns the
relative ordering of toplevel constructs in the IR: references to
global variables must come after the declarations of those global
variables.
Since linking affects the ordering of toplevel constructs in the IR,
it's possible that a bug in the linker will cause invalid IR to be
generated, even if all the pre-linked shaders are valid. (In fact,
such a bug was fixed by the previous commit.)
Bugs like this are easily masked by further optimization passes,
particularly inlining. So to make them easier to track down, this
patch addes an IR validation step right after linking, and before
final optimization occurs. The validation only occurs on debug
builds.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When link_functions.cpp adds a new function to the final linked
program, it needs to add it after any global variable declarations
that the function refers to, otherwise the IR will be invalid (because
variable declarations must occur before variable accesses). The
easiest way to do that is to have the linker emit functions to the
tail of the final linked program.
The linker used to emit functions to the head of the final linked
program, in an effort to keep callees sorted before their callers.
However, this was not reliable: it didn't work for functions declared
or defined in the same compilation unit as main, for diamond-shaped
patterns in the call graph, or for some obscure cases involving
overloaded functions. And no code currently relies on this sort
order.
No Piglit regressions with i965 Ironlake.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
process_array_type() contains an assertion to verify that no IR
instructions are generated while processing the expression that
specifies the size of the array. This assertion needs to happen
_after_ checking whether the expression is constant. Otherwise we may
crash on an illegal shader rather than reporting an error.
Fixes piglit tests array-size-non-builtin-function.vert and
array-size-with-side-effect.vert.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rearranged the logic for converting the ast for a function call to
hir, so that we constant fold before emitting any IR. Previously we
would emit some IR, and then only later detect whether we could
constant fold. The unnecessary IR would usually get cleaned up by a
later optimization step, however in the case of a builtin function
being used to compute an array size, it was causing an assertion.
Fixes Piglit test array-size-constant-relational.vert.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38625
The ast-to-hir conversion needs to emit function signatures in two
circumstances: when a function declaration (or definition) is
encountered, and when a built-in function is encountered.
To avoid emitting a function signature in an illegal place (such as
inside a function), emit_function() checked whether we were inside a
function definition, and if so, emitted the signature before the
function definition.
However, this didn't cover the case of emitting function signatures
for built-in functions when those built-in functions are called from
inside the constant integer expression that specifies the length of a
global array. This failed because when processing an array length, we
are emitting IR into a dummy exec_list (see process_array_type() in
ast_to_hir.cpp). process_array_type() later checks (via an assertion)
that no instructions were emitted to the dummy exec_list, based on the
reasonable assumption that we shouldn't need to emit instructions to
calculate the value of a constant.
This patch changes emit_function() so that it emits function
signatures at toplevel in all cases.
This partially fixes bug 38625
(https://bugs.freedesktop.org/show_bug.cgi?id=38625). The remainder
of the fix is in the patch that follows.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
opt_dead_functions contained a shortcut to skip processing the first
function's body, based on the assumption that IR functions are
topologically sorted, with callees always coming before their callers
(therefore the first function cannot contain any calls).
This assumption turns out not to be true in general. For example, the
following code snippet gets translated to IR that violates this
assumption:
void f();
void g();
void f() { g(); }
void g() { ... }
In practice, the shortcut didn't cause bugs because of a coincidence
of the circumstances in which opt_dead_functions is called:
(a) we do inlining right before dead function elimination, and
inlining (when successful) eliminates all calls.
(b) for user-defined functions, inlining is always successful, because
previous optimization passes (during compilation) have reduced
them to a form that is eligible for inlining.
(c) the function that appears first in the IR can't possibly call a
built-in function, because built-in functions are always emitted
before the function that calls them.
It seems unnecessarily fragile to have opt_dead_functions depend on
these coincidences. And the next patch in this series will break (c).
So I'm reverting the shortcut. The consequence will be a slight
increase in link time for complex shaders.
This reverts commit c75427f4c8.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts an unnecessary part of commit 4683529048 and fixes misrendering
and an assertion failure in Cogs.
Fixes freedesktop.org bug 39888.
Reviewed-by: Brian Paul <brianp@vmware.com>
If there are any cases left where the st thinks that RGBA -> BGRA
will swap components, it will get what it deserves.
Now the GPU's 2D engine goes unused. What a shame.
validate_program relies on validate_shader_program to fill in errMsg;
empirically, there exist cases where that doesn't happen.
While tracking those down may be worthwhile, initializing the string so
we don't try to ralloc_strdup random garbage also seems wise.
Fixes issues caught by valgrind while running some test case.
NOTE: This is a candidate for stable release branches.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
DRI2 will throw BadRequest for this when the client is not local, but
DRI2 is an implementation detail and not something callers should have
to know about. Silently swallow errors in this case, and just propagate
the failure through DRI2Connect's return code.
Note: This is a candidate for the stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=28125
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
This saves both register space and upload bandwidth for unused values.
Note that previously we were relying on the visitor not initially
generating references to different sets of uniforms between the 8-wide
and 16-wide code generation, and now we're relying on them dead-code
eliminating the same stuff, too.
We should remove the relocations which caused a validation failure
from the list, so that the kernel receives only the validated ones.
NOTE: This is a candidate for the 7.11 branch.
That code drops performance in Unigine Heaven and Tropics
by a factor of 10. That's too crazy even for a debug build.
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Unlike C++, empty declarations such as
float;
should be valid. The spec is not explicit about this actually.
Some apps that generate their shader sources may rely on this. This was
noted when porting one of them to Linux from Windows.
Reviewed-by: Chad Versace <chad@chad-versace.us>
Note: this is a candidate for the 7.11 branch.
Resolve via glBlitFramebuffer allows resolving a sub-region of a
renderbuffer to a different location in any mipmap level of some
other texture, and, with a new extension, even scaling. Therefore,
location and size parameters are needed.
The mask parameter was added because resolving only depth or only
stencil of a combined buffer is possible as well.
Full information about the blit operation allows the drivers to
take the most efficient path they possibly can.
This avoids the following runtime error with EGL on platforms that
require linking with libm for nontrivial math functions:
failed to load module: /xorg/lib64/gbm/gbm_gallium_drm.so: undefined
symbol: powf
(Based on Kristóf RALOVICHs patch and Ian's suggestions in
http://lists.freedesktop.org/archives/mesa-dev/2011-August/010036.html)
Use backend_map kernel query if supported, otherwise analyze ZPASS_DONE
results to get the mask.
Fixes lockups with predicated rendering due to incorrect query buffer
initialization on some cards.
Note: this is a candidate for the 7.11 branch.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
These looked more like copy-and-paste to me than the others (which
looked more like possibly someone forgot to write some code in a
refactor), so I didn't verify where they came from.
This makes piglit a lot more happy. The errors are logged when
INTEL_DEBUG=fallbacks because the application is about to hit a big
software fallback. We frequently ask people to run applications that
are hitting software fallbacks with INTEL_DEBUG=fallbacks so the we
can help them debug the reason for the software fallback.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This can only happen in GLSL shaders because assembly shaders that use
too many temps are rejected by core Mesa. It is easiest to make this
happen with shaders that contain flow-control that could not be lowered.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rely on the driver to do the right thing. This probably means falling
back to software. Page 88 of the OpenGL 2.1 spec specifically says:
"A shader should not fail to compile, and a program object should
not fail to link due to lack of instruction space or lack of
temporary variables. Implementations should ensure that all valid
shaders and program objects may be successfully compiled, linked
and executed."
There is no provision for saying "No" to a valid shader that is
difficult for the hardware to handle, so stop doing that.
On i915 this causes a large number of piglit tests to change from FAIL
to WARN. The warning is because the driver still emits messages to
stderr like "i915_program_error: Unsupported opcode: BGNLOOP".
It also fixes ES2 conformance CorrectFull_frag and CorrectParse1_frag
on i915 (and probably other hardware that can't handle loops).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This prevents assertion failures in ralloc_strcat. The ralloc_free in
_mesa_free_shader_program_data can be omitted because freeing the
gl_shader_program in _mesa_delete_shader_program will take care of
this automatically.
A bunch of this code could use a refactor to use ralloc a bit more
effectively. A bunch of the things that are allocated with malloc and
owned by the gl_shader_program should be allocated with ralloc (using
the gl_shader_program as the context).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
linker_warning is a new function. It's identical to linker_error
except that it doesn't set LinkStatus=false and it prepends "warning: "
on messages instead of "error: ".
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Remove the other places that set LinkStatus to false since they all
immediately follow a call to linker_error. The function linker_error
was previously known as linker_error_printf. The name was changed
because it may seem surprising that a printf function will set an
error flag.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For power-of-two sizes, h0 == mt->height0 since it's already a multiple
of two. However, for NPOT, they're different; h1 should be computed
based on the original size.
Fixes piglit test "cubemap npot" and oglconform test "textureNPOT".
NOTE: This is a candidate for stable release branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Before, if any uniform or constant array was accessed with indirect
addressing, st_translate_program() would emit uniform constants in the place
of immediates. This behavior was unavoidable with ir_to_mesa/mesa_to_tgsi, but
glsl_to_tgsi can work around it since the GLSL IR backend and the TGSI
emission are both inside the state tracker.
Fixes a regression unintentionally introduced by "glsl_to_tgsi: fix shaders with
indirect addressing of temps" that caused missing leaves in 3dmark01 test 4 (Nature)
and missing/displaced textures on human models in Counter-Strike: Source.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Bryan Cain <bryancain3@gmail.com>
Thanks to Kenneth Graunke for pointing out that glsl_type::get_instance(base, 4, 1)
is the same as glsl_type::get_vec4_type(base).
The function was only used in st_glsl_to_tgsi, and this commit replaces that usage
with get_instance.
Disabled by default on all drivers. To enable it, change ctx->GLSLVersion to 130
in st_extensions.c. Currently, softpipe is the only driver with integer support.
The functionality is not used by anything yet, and the glUniform functions will
need to be reworked before this can reach its full usefulness. It is
nonetheless a step towards integer support in the state tracker and classic drivers.
It is still a work in progress at this point, but it produces working and
reasonably well-optimized code.
Originally based on ir_to_mesa and st_mesa_to_tgsi, but does not directly use
Mesa IR instructions in TGSI generation, instead generating TGSI from the
intermediate class glsl_to_tgsi_instruction. It also has new optimization
passes to replace _mesa_optimize_program.
The previous formula for atan(x,y) returned a value of +/- pi whenever
|x|<0.0001, and used a formula based on atan(y/x) otherwise. This
broke in cases where both x and y were small (e.g. atan(1e-5, 1e-5)).
This patch modifies the formula so that it returns a value of +/- pi
whenever |x|<1e-8*|y|, and uses the formula based on atan(y/x)
otherwise.
The previous formula for asin(x) was algebraically equivalent to:
sign(x)*(pi/2 - sqrt(1-|x|)*(A + B|x| + C|x|^2))
where A, B, and C were arbitrary constants determined by a curve fit.
This formula had a worst case absolute error of 0.00448, an unbounded
worst case relative error, and a discontinuity near x=0.
Changed the formula to:
sign(x)*(pi/2 - sqrt(1-|x|)*(pi/2 + (pi/4-1)|x| + A|x|^2 + B|x|^3))
where A and B are arbitrary constants determined by a curve fit. This
has a worst case absolute error of 0.00039, a worst case relative
error of 0.000405, and no discontinuities.
I don't expect a significant performance degradation, since the extra
multiply-accumulate should be fast compared to the sqrt() computation.
Fixes piglit tests {vs,fs}-asin-float and {vs,fs}-atan-*
The function used a variable named 'score', which was an outright lie.
A signature matches or it doesn't; there is no fuzzy scoring.
Change the return type of parameter_lists_match() to an enum, and
let ir_function::matching_sigature() switch on that enum.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Array constructors obey narrower conversion rules than other constructors
[1] --- they use the implicit conversion rules [2] instead of the scalar
constructor conversions [3]. But process_array_constructor() was
incorrectly applying the broader rules.
[1] GLSL 1.50 spec, Section 5.4.4 Array Constructors, page 52 (58 of pdf)
[2] GLSL 1.50 spec, Section 4.1.10 Implicit Conversions, page 25 (31 of pdf)
[3] GLSL 1.50 spec, Section 5.4.1 Conversion, page 48 (54 of pdf)
To fix this, first check (with glsl_type::can_be_implicitly_converted_to)
if an implicit conversion is legal before performing the conversion.
Fixes:
piglit:spec/glsl-1.20/compiler/structure-and-array-operations/array-ctor-implicit-conversion-bool-float.vert
piglit:spec/glsl-1.20/compiler/structure-and-array-operations/array-ctor-implicit-conversion-bvec*-vec*.vert
Note: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
The function is no longer used and has been replaced by
glsl_type::can_implicitly_convert_to().
Note: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Context
-------
In ast_function_expression::hir(), parameter_lists_match() checks if the
function call's actual parameter list matches the signature's parameter
list, where the match may require implicit conversion of some arguments.
To check if an implicit conversion exists between individual arguments,
type_compare() is used.
Problems
--------
type_compare() allowed the following illegal implicit conversions:
bool -> float
bvecN -> vecN
int -> uint
ivecN -> uvecN
uint -> int
uvecN -> ivecN
Change
------
type_compare() is buggy, so replace it with glsl_type::can_be_implicitly_converted_to().
This comprises a rewrite of parameter_lists_match().
Fixes piglit:spec/glsl-1.20/compiler/built-in-functions/outerProduct-bvec*.vert
Note: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
This method checks if a source type is identical to or can be implicitly
converted to a target type according to the GLSL 1.20 spec, Section 4.1.10
Implicit Conversions.
The following commits use the method for a bugfix:
glsl: Fix implicit conversions in non-constructor function calls
glsl: Fix implicit conversions in array constructors
Note: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
This is part of fixing a ~1% performance regression in OpenArena when
changing the fixed function fragment shader to using the new backend.
Right now this just avoids the LINTERP of the projector, not the math
using it.
Certain attributes (position, psize, etc.) don't
count as params; they are handled separately by the hw.
However, the VS is required to export at least one param
and r600_shader_from_tgsi() takes care of adding a dummy
export if there is none. Make sure the VS param export
count in the SPI properly accounts for this.
Note: This is a candidate for the 7.11 branch.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This was done in the old codegen path, but not the new one. Caught by
piglit fbo tests after the conversion to GLSL ff_fragment_shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The FF VS generation happens just after the FF FS generation in
state.c, so the ctx->VP._Current value is for the previous state
update's vertex shader, not the one that will be chosen as a result of
this state update. The vertexShader and vertexProgram variables
should be accurately telling us whether there's going to be a
ctx->VP._Current (except on _MaintainTnlProgram drivers, where it's
always true).
The glsl-vs-statechange-1 test was created to test for this, but it
turns out that the bug is hidden by the fact that we call
_mesa_update_state() twice per draw call -- once from
_mesa_valid_to_render() and once from vbo_draw_arrays(), and the
second one was fixing up the first one.
Reviewed-by: Brian Paul <brianp@vmware.com>
We have to make it through this loop processing the color multiple
times, so we can't go overwriting it on our first color buffer.
Reviewed-by: Brian Paul <brianp@vmware.com>
When we do a glReadPixels into the temporary buffer, we don't want to
use GL_LUMINANCE, GL_LUMINANCE_ALPHA or GL_INTENSITY since they will
compute L=R+G+B which is not what we want.
This bug has existed all along but was only exposed by the elimination
of the driver hook for glCopyTexImage() in
5874890c26.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=39604
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
The previous commit removed the last use of this field.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The purpose of the (irb->draw_offset & 4095) != 0 check was to ensure
that we don't have XYy offsets into a tile, since Gen4 hardware doesn't
support that. However, it's insufficient: there are cases where
draw_offset & 4095 is 0 but we still have a Y-offset. This leads to an
assertion failure in brw_update_renderbuffer_surface with tile_y != 0.
Instead, simply call intel_renderbuffer_tile_offsets to compute the
actual X/Y offsets and check if either are non-zero. This makes both
the workaround and the assertion check the same things.
Fixes piglit test fbo-generatemipmap-formats, and should also fix
bugs #34009 and #39487.
NOTE: This is a candidate for stable release branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34009
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39487
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad@chad-versace.us>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
We were neglecting to load dvdx and dvdy. v is not optional.
Fixes glslparsertests tex-grad-0[12345].frag on Broadwater/Crestline.
(We still need an execution test using sampler1D.)
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The constant used in the radians() function didn't have enough
precision, causing a relative error of 1.676e-5, which is far worse
than the precision of 32-bit floats. This patch reduces the relative
error to 1.14e-9, which is the best we can do in 32 bits.
Fixes piglit tests {fs,vs}-radians-{float,vec2,vec3,vec4}.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
What a beast.
r300g doesn't depend on files from r300c anymore, so r300c is now left
to its own fate. BTW 'make test' can be invoked from the gallium/r300
directory to run some compiler unit tests.
The implementation deviated slightly from the GL_EXT_texture_sRGB spec
and from other implementations. A giant comment block was added to
justify the somewhat odd behavior of this function.
In addition, the interface had unnecessary cruft. The 'all' parameter
was false at all callers, so it has been removed.
Reviewed-by: Brian Paul <brianp@vmware.com>
If an application requests a generic compressed format for a texture
and the driver does not pick a specific compressed format, return the
generic base format (e.g., GL_RGBA) for the GL_TEXTURE_INTERNAL_FORMAT
query.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=3165
Reviewed-by: Brian Paul <brianp@vmware.com>
lower_variable_index_to_cond_assign runs until it can't make any more
progress. It then returns the result of the last pass which will
always be false. This caused the lowering loop in
_mesa_ir_link_shader to end before doing one last round of
lower_if_to_cond_assign. This caused several if-statements (resulting
from lower_variable_index_to_cond_assign) to be left in the IR.
In addition to this change, lower_variable_index_to_cond_assign should
take a flag indicating whether or not it should even generate
if-statements. This is easily controlled by
switch_generator::linear_sequence_max_length. This would generate
much better code on architectures without any flow contol.
Fixes i915 piglit regressions glsl-texcoord-array and
glsl-fs-vec4-indexing-temp-src.
Reviewed-by: Eric Anholt <eric@anholt.net>
The index buffer state emit only occurred if there was an IB in place
and we were in either a new batch or a new IB state. But because we
only flagged new IB state if IB state changed from the last IB state
we calculated, we could simply never emit IB state after batchbuffer
wraps if the first draw didn't use the IB and we didn't actually
change the IB.
Fixes piglit glx-multi-context-ib-1.
Fixes user-clip on 965 with 3D clears enabled. I created a separate
flag because I wanted to avoid the overhead of the matrix operations
in this path.
Reviewed-by: Brian Paul <brianp@vmware.com>
It turns out that internally the texture cache gets flushed in a
couple of cases, particularly around 2D operations mixed with 3D. In
almost all cases one of those happens between rendering to an
FBO-attached texture and rendering from that texture. However, as of
the next patch, glean tfbo (and the new fbo-flushing-2 test) would
manage to get stale texture values because one of those flushes didn't
occur. The intention of this code was always to get the render cache
cleared and ready to be used from the sampler cache (and it does on <=
gen4), so this just catches gen5 up.
This patch was also tested to fix fbo-flushing on gen7.
When emitting a MAC instruction in a vertex shader, brw_vs_emit()
calls accumulator_contains() to determine whether the accumulator
already contains the appropriate addend; if it does, then we can avoid
emitting an unnecessary MOV instruction.
However, accumulator_contains() wasn't checking the val.negate or
val.abs flags. As a result, if the desired value was the negation, or
the absolute value, of what was already in the accumulator, we would
generate an incorrect shader.
Fixes piglit test vs-refract-vec4-vec4-float.
Tested on Gen5 and Gen6.
Reviewed-by: Eric Anholt <eric@anholt.net>
On Ivybridge, the shadow comparitor goes in the first slot, rather than
at the end. It's not necessary to send u, v, and r.
Fixes tests texturing/texdepth and glean/fbo.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 53c89c67f3 ("i965: Avoid generating
MOVs for assignments of expressions.") added the line "this->result =
reg_undef" all over the code. Unfortunately, since Eric developed his
patch before I landed Ivybridge support, he missed adding it to
fs_visitor::emit_texture_gen7() after rebasing.
Furthermore, since I developed TXD support before Eric's patch, I
neglected to add it to the gradient handling when I rebased.
Neglecting to set this causes the visitor to use this->result as storage
rather than generating a new temporary. These missing statements
resulted in the same register being used to store several different
values.
Fixes the following piglit tests on Ivybridge:
- glsl-fs-shadow2dproj.shader_test
- glsl-fs-shadow2dproj-bias.shader_test
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The blend_quad function clobbers the actual render target color/alpha
values while applying the destination blend factor, which results in
restoring the wrong value during the masking stage for write-disabled
channels.
Reviewed-by: Brian Paul <brianp@vmware.com>
Just like the non-constant array index lowering pass, compare all N
indices at once. For accesses to a vec4, this saves 3 comparison
instructions on a vector architecture.
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously the code would just look at deref->array->type to see if it
was a constant. This isn't good enough because deref->array might be
another ir_dereference_array... of a constant. As a result,
deref->array->type wouldn't be a constant, but
deref->variable_referenced() would return NULL. The unchecked NULL
pointer would shortly lead to a segfault.
Instead just look at the return of deref->variable_referenced(). If
it's NULL, assume that either a constant or some other form of
anonymous temporary storage is being dereferenced.
This is a bit hinkey because most drivers treat constant arrays as
uniforms, but the lowering pass treats them as temporaries. This
keeps the behavior of the old code, so this change isn't making things
worse.
Fixes i965 piglit:
vs-temp-array-mat[234]-index-col-rd
vs-temp-array-mat[234]-index-col-row-rd
vs-uniform-array-mat[234]-index-col-rd
vs-uniform-array-mat[234]-index-col-row-rd
Reviewed-by: Eric Anholt <eric@anholt.net>
Leaving the unused registers with other values caused assertion
failures and other problems in places that blindly iterate over all
sources.
brw_vs_emit.c:1381: get_src_reg: Assertion `c->regs[file][index].nr !=
0' failed.
Fixes i965 piglit:
vs-uniform-array-mat[234]-col-row-rd
vs-uniform-array-mat[234]-index-col-row-rd
vs-uniform-array-mat[234]-index-row-rd
vs-uniform-mat[234]-col-row-rd
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes many cases of accessing arrays of matrices using
non-constant indices at each level.
Fixes i965 piglit:
vs-temp-array-mat[234]-index-col-rd
vs-temp-array-mat[234]-index-col-row-rd
vs-temp-array-mat[234]-index-col-wr
vs-uniform-array-mat[234]-index-col-rd
Fixes swrast piglit:
fs-temp-array-mat[234]-index-col-rd
fs-temp-array-mat[234]-index-col-row-rd
fs-temp-array-mat[234]-index-col-wr
fs-uniform-array-mat[234]-index-col-rd
fs-uniform-array-mat[234]-index-col-row-rd
fs-varying-array-mat[234]-index-col-rd
fs-varying-array-mat[234]-index-col-row-rd
vs-temp-array-mat[234]-index-col-rd
vs-temp-array-mat[234]-index-col-row-rd
vs-temp-array-mat[234]-index-col-wr
vs-uniform-array-mat[234]-index-col-rd
vs-uniform-array-mat[234]-index-col-row-rd
vs-varying-array-mat[234]-index-col-rd
vs-varying-array-mat[234]-index-col-row-rd
vs-varying-array-mat[234]-index-col-wr
Reviewed-by: Eric Anholt <eric@anholt.net>
If the non-constant index was in the LHS of an assignment, any
existing condititon on that assignment would be lost.
Reviewed-by: Eric Anholt <eric@anholt.net>
If the non-constant index was in the LHS of an assignment, any
existing condititon on that assignment would be lost.
Fixes i965 piglit:
fs-temp-array-mat[234]-col-row-wr
fs-temp-array-mat[234]-index-col-row-wr
fs-temp-array-mat[234]-index-col-wr
fs-temp-array-mat[234]-index-row-wr
vs-varying-array-mat[234]-index-col-wr
Reviewed-by: Eric Anholt <eric@anholt.net>
The previous implementation could easily get tricked if the LHS of an
assignment included a non-constant index that was "inside" another
dereference. For example:
mat4 m[2];
m[0][i] = vec4(0.0);
Due to the way it tracked whether the array was being assigned, it
would think that the non-constant index was in an r-value. The new
code fixes that by tracking l-values and r-values differently. The
index is also replaced by cloning the IR and replacing the index
variable instead of the odd way it was done before.
v2: Apply some simplifications suggested by Eric Anholt. Making
assignment_generator::rvalue be ir_dereference instead of ir_rvalue
simplified the code a bit.
Fixes i965 piglit fs-temp-array-mat[234]-index-wr and
vs-varying-array-mat[234]-index-wr.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34691
Reviewed-by: Eric Anholt <eric@anholt.net>
There's no reason for it to be there, and another class that may not
have access to the visitor will need it soon.
Reviewed-by: Eric Anholt <eric@anholt.net>
Not sure how I computed these, but they were wrong (which explains why
bumping the polynomial order before never improved precision).
This allows to pass the EXP test cases of PSPrecision/VSPrecision DCTs.
Add an iteration step, which makes rqsqrt precision go from 12bits to
24, and fixes RSQ/NRM test case of PSPrecision/VSPrevision DCTs.
There are no uses of this function outside shader translation.
These tests invoke do_lower_jumps() in isolation (using the glsl_test
executable) and verify that it transforms the IR in the expected way.
The unit tests may be run from the top level directory using "make
check".
For reference, I've also checked in the Python script
create_test_cases.py, which was used to generate these tests. It is
not necessary to run this script in order to run the tests.
Acked-by: Chad Versace <chad@chad-versace.us>
This patch adds a new build artifact, glsl_test, which can be used for
testing optimization passes in isolation.
I'm hoping that we will be able to add other useful standalone tests
to this executable in the future. Accordingly, it is built in a
modular fashion: the main() function uses its first argument to
determine which test function to invoke, removes that argument from
argv[], and then calls that function to interpret the rest of the
command line arguments and perform the test. Currently the only test
function is "optpass", which tests optimization passes.
This patch moves the following functions from main.cpp (the main cpp
file for the standalone executable that is used to create the built-in
functions) to standalone_scaffolding.cpp, so that they can be re-used
in other standalone executables:
- initialize_context()*
- _mesa_new_shader()
- _mesa_reference_shader()
*initialize_context contained some code that was specific to main.cpp,
so it was split into two functions: initialize_context() (which
remains in main.cpp), and initialize_context_from_defaults() (which is
in standalone_scaffolding.cpp).
Several Mesa headers redundantly define the INLINE macro. Adding this
guard prevents the compiler from complaining about macro redefinition.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad@chad-versace.us>
This is an alternative to the draw module's polygon stipple stage.
The softpipe implementation here is just a test. The advantange of
using the new polygon stipple utility module (with other drivers)
is we can avoid software vertex processing in the draw module and
get much better performance.
Polygon stipple doesn't require special vertex processing like
the other draw module stage.
u_vbuf_upload_buffers modifies the buffer offsets. If they are not
restored, and any of the vertex formats is not supported natively, the
next u_vbuf_mgr_draw_begin call will translate the vertex buffers with
incorrect buffer offsets.
ES 2.0.25 page 127 says:
If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is NONE, then
querying any other pname will generate INVALID_ENUM.
See also:
b9e9df78a0
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The GLSL 1.20 and later specs say:
"Recursion is not allowed, not even statically. Static recursion is
present if the static function call graph of the program contains
cycles."
Recursion is detected and rejected both a compile-time and at
link-time. The complie-time check happens to detect some cases that
may be removed by various optimization passes. The spec doesn't seem
to allow this, but other vendors (e.g., NVIDIA) appear to only check
at link-time after all optimizations.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33885
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The behavior of flushes in the hardware is a maze of twisty passages,
and strangely the VS constants appear to be loaded during a pipeline
flush instead of at the time of the packet emit according to the
simulator. On moving the STATE_BASE_ADDRESS packet to where it really
needed to live (in order for data loads by other packets to be
correct), we sometimes no longer got a flush between those packets
where we apparently needed it. This replicates the flushes implied by
a STATE_BASE_ADDRESS update, fixing the GPU hangs in OGLC and the
"engine" demo.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36821
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39257
Tested-by: Keith Packard <keithp@keithp.com> (bzflag and etracer fixed)
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
There's scary stuff going on in PIPE_CONTROL internals, and if the
BSpec says to do this to make PIPE_CONTROL work, I'll go ahead and do
it because we'll probably never be able to debug it after the fact.
v2: Use stall at scoreboard instead of depth stall, as noted by Ken.
For this and occlusion queries, we're trying to avoid setting
I915_GEM_DOMAIN_RENDER for the write domain, because the data written
is definitely not going through the render cache, but we do need to
tell the kernel that the object has been written. However, with using
I915_GEM_DOMAIN_GTT, the kernel on retiring the batchbuffer sees that
the w/a BO has a write domain of GTT, and puts it on the flushing
list. If something tries to wait for that BO to finish rendering
(such as the AUB dumper reading the contents of BOs), we get into
wait_request (since obj->active) but with a 0 seqno (since the object
is on the flushing list, not actually on a ringbuffer), and BUG_ONs.
To avoid the kernel bug (which I'm hoping to delete soon anyway), just
use I915_GEM_DOMAIN_INSTRUCTION like occlusion queries do. This
doesn't result in more flushing, because we invalidate INSTRUCTION on
every batchbuffer now that we're state streaming, anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Kenneth Graunke <kenneth@whitecape.org>
This cuts out a large portion of the overhead of glClear() from
resetting the texenv state and recomputing the fixed function
programs. It also means less use of fixed function internally in our
GLES2 drivers, which is rather bogus.
Reviewed-by: Brian Paul <brianp@vmware.com>
When parsing S-Expressions, we need to store nul-terminated strings for
Symbol nodes. Prior to this patch, we called ralloc_strndup each time
we constructed a new s_symbol. It turns out that this is obscenely
expensive.
Instead, copy the whole buffer before parsing and overwrite it to
contain \0 bytes at the appropriate locations. Since atoms are
separated by whitespace, (), or ;, we can safely overwrite the character
after a Symbol. While much of the buffer may be unused, copying the
whole buffer is simple and guaranteed to provide enough space.
Prior to this, running piglit-run.py -t glsl tests/quick.tests with GLSL
1.30 enabled took just over 10 minutes on my machine. Now it takes 5.
NOTE: This is a candidate for stable release branches (because it will
make running comparison tests so much less irritating.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Commit 1a339b6c71 made
st_ChooseTextureFormat map GL_RGBA with type GL_UNSIGNED_BYTE
to PIPE_FORMAT_A8B8G8R8_UNORM.
The image format for ARGB pixmaps is PIPE_FORMAT_B8G8R8A8_UNORM
however. This mismatch caused the texture to be recreated in
st_finalize_texture.
NOTE: This is a candidate for the 7.11 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39209
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
This fixes a regression introduced by commit
a26121f375 (fd.o bug #39219).
Since the __glXInitialize() call should be unnecessary anyway, this is
probably a nicer fix for the original problem too.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: padfoot@exemail.com.au
Until now, the stencil buffer was allocated as a Y tiled buffer, because
in several locations the PRM states that it is. However, it is actually
W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section
4.5.2.1 W-Major Format:
W-Major Tile Format is used for separate stencil.
The GTT is incapable of W fencing, so we allocate the stencil buffer with
I915_TILING_NONE and decode the tile's layout in software.
This fix touches the following portions of code:
- In intel_allocate_renderbuffer_storage(), allocate the stencil
buffer with I915_TILING_NONE.
- In intel_verify_dri2_has_hiz(), verify that the stencil buffer is
not tiled.
- In the stencil buffer's span functions, the tile's layout must be
decoded in software.
This commit mutually depends on the xf86-video-intel commit
dri: Do not tile stencil buffer
Author: Chad Versace <chad@chad-versace.us>
Date: Mon Jul 18 00:38:00 2011 -0700
On Gen6 with separate stencil enabled, fixes the following Piglit tests:
bugs/fdo23670-drawpix_stencil
general/stencil-drawpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-copypixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-drawpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-readpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-copypixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-drawpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-readpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-copypixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-drawpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-readpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-copypixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-drawpixels
spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-readpixels
spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-copypixels
spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-readpixels
spec/EXT_packed_depth_stencil/readpixels-24_8
Note: This is a candidate for the 7.11 branch.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
LLVM 3.0svn introduced a new type system. It defines a new way to create
named structs and removes the (now not needed) LLVMInvalidateStructLayout
function. See revision 134829 of LLVM.
Signed-off-by: Tobias Droste <tdroste@gmx.de>
Signed-off-by: Brian Paul <brianp@vmware.com>
In a rare case of building gallium only, we need to
check if the required packages are available
libdrm_[intel|nouveau] - gallium[i915 i965|nouveau]
v2: r300g and r600g do not need libdrm_radeon
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Including the full "3DSTATE_VF_STATISTICS" should make it easier to
cross-reference the code and documentation.
Also, move the 965/GM45 suffix to the beginning for consistency with
newer #defines.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The documentation uses 3DSTATE_DRAWING_RECTANGLE, and we already had it
defined in brw_defines.h; we were simply using an old #define from
intel_reg.h.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This is useful for shadow map generation. Tested with glsl-bug-22603,
which rendered the depth textures with fallbacks before.
Acked-by: Chad Versace <chad@chad-versace.us>
We were updating our new viewport using the old buffers' _WindowMap.m.
We can do less math and avoid using that deprecated matrix by just
folding the viewport calculation right in to the driver.
Fixes piglit fbo-depthtex.
i915_update_draw_buffers() already handles the fallback bit for
missing stencil region, so here we just need to handle whether the GL
thinks we have stencil data or not (and disable the test if so).
We were disabling it once at the moment we changed draw buffers, but
later enabling of depth test could turn it back on. Fixes
fbo-nodepth-test.
Note that ctx->DrawBuffer has to be checked because during context
create we get called while it's still unset. However, we know we'll
get an intel_draw_buffer() after that, so it's safe to make a silly
choice at this point.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30080
The illusion of shared code here wasn't fooling anybody. It was
tempting to keep i830 and i915 still shared, but I think I actually
want to make them diverge shortly.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This brings us into compliance with page 17 (page 22 of the PDF) of
the GLSL 1.20 spec:
"[Sampler types] can only be declared as function parameters or
uniform variables (see Section 4.3.5 "Uniform"). ... [Samplers]
cannot be used as out or inout function parameters."
The spec isn't explicit about whether this rule applies to
structs/arrays containing shaders, but the intent seems to be to
ensure that it can always be determined at compile time which sampler
is being used in each texture lookup. So to avoid creating a
loophole, the rule needs to apply to structs/arrays containing shaders
as well.
Fixes piglit tests spec/glsl-1.10/compiler/samplers/*.frag, and fixes
bug 38987.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38987
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The new location, as a member function of glsl_type, is more
consistent with queries like is_sampler(), is_boolean(), is_float(),
etc. Placing the function inside glsl_type also makes it available to
any code that uses glsl_types.
These happen to work because their values are the same as the equivalent
PIPE_TRANSFER_* flags, but it's still misleading.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
The GLSL spec says:
"If a built-in function is redeclared in a shader (i.e., a
prototype is visible) before a call to it, then the linker will
only attempt to resolve that call within the set of shaders that
are linked with it."
This patch enforces this behavior. When a function call is processed
a flag is set in the ir_call to indicate whether the previously seen
prototype is the built-in or not. At link time a call will only bind
to an instance of a function that matches the "want built-in" setting
in the ir_call.
This has the odd side effect that first call to abs() in the shader
below will call the built-in and the second will not:
float foo(float x) { return abs(x); }
float abs(float x) { return -x; }
float bar(float x) { return abs(x); }
This seems insane, but it matches what the spec says.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=31744
The following resolves the build issues and missing symbols
Add "xvmc-nouveau/target.c" - missing symbol "driver_description"
Add "drivers/nvc0/libnvc0.a" - missing symbol "nvc0_screen_create"
Remove "drivers/softpipe/libsoftpipe.a" - unnessecary dependency
resolves build (when building without swrast)
Add "drivers/trace/libtrace.a" in Makefile
Note: With/without those patches xvmc-nouveau still segfaults
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Use all zpass data for predication instead of the last block only.
Use query buffer as a ring instead of reusing the same area
for each new BeginQuery. All query buffer offsets are in bytes
to simplify offsets math.
commit 1856230d9fa61710cce3e152b8d88b1269611a73
Author: José Fonseca <jose.r.fonseca@gmail.com>
Date: Tue Jul 12 23:41:27 2011 +0100
make: Use better var names on packaging.
commit d1ae72d0bd14e820ecfe9f8f27b316f9566ceb0c
Author: José Fonseca <jose.r.fonseca@gmail.com>
Date: Tue Jul 12 23:38:21 2011 +0100
make: Apply several of Dan Nicholson's suggestions.
commit f27cf8743ac9cbf4c0ad66aff0cd3f97efde97e4
Author: José Fonseca <jose.r.fonseca@gmail.com>
Date: Sat Jul 9 14:18:20 2011 +0100
make: Put back the tar.bz2 creation rule.
Removed by accident.
commit 34983337f9d7db984e9f0117808274106d262110
Author: José Fonseca <jose.r.fonseca@gmail.com>
Date: Sat Jul 9 11:59:29 2011 +0100
make: Determine tarballs contents via git ls-files.
The wildcards were a mess:
- lots of files for non Linux platforms missing
- several files listed and archived twice
Using git-ls-files ensures things are not loss when making the tarballs.
commit 34a28ccbf459ed5710aafba5e7149e8291cb808c
Author: José Fonseca <jose.r.fonseca@gmail.com>
Date: Sat Jul 9 11:07:14 2011 +0100
glut: Remove GLUT source.
Most distros ship freeglut, and most people don't care one vs the other,
and it hasn't been really maintained.
So it is better to have Mesa GLUT be revisioned and built separately
from Mesa.
commit 5c26a2c3c0c7e95ef853e19d12d75c4f80137e7d
Author: José Fonseca <jose.r.fonseca@gmail.com>
Date: Sat Jul 9 10:31:02 2011 +0100
Ignore the tarballs.
commit 26edecac589819f0d0efe2165ab748dbc4e53394
Author: José Fonseca <jose.r.fonseca@gmail.com>
Date: Sat Jul 9 10:30:24 2011 +0100
make: Create the Mesa-xxx-devel symlink automatically.
Also actually remote the intermediate uncompressed tarballs.
this moves getting the context into the debug in this function,
just spotted it trawling callgrind traces for other things.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The forward references to video enum types in p_context.h causes
a massive number of compiler warnings (ISO C forbids forward references
to ‘enum’ types).
By putting the new video enums in a separate header that can be included
by p_context.h and p_screen.h we can avoid this.
Acked-by Christian König <deathsimple@vodafone.de>
inline the hotpath of the reference remaining the same. This shouldn't
penalise the slow path at all but improve the hot path so we don't have
to jump to the function.
It also moves some assert checks under an #ifndef NDEBUG.
Minor clean-ups added by Brian.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Because we don't support them.
For instance, R32G32B32 is not R32G32B32X32 as was assumed.
Add support for R8G8B8X8_UNORM instead of R8G8B8_UNORM surfaces.
I think the past are those times when the gallium interface was changed all
the time. Now it is not, so there is no reason to always compile the libs
if they are not needed.
There seems to be a bug in r600g when uploading more than one layer of a
3D resource at once with a hardware blit.
So just do them one at a time to workaround this.
Bitmap caching shouldn't affect the results of the queries and
conditional render.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
We were failing at rounding, misplacing the non-baselevels. Fixes:
3DFX_texture_compression_FXT1/fbo-generate-mipmaps
ARB_texture_compression/fbo-generate-mipmaps
EXT_texture_compression_s3tc/fbo-generate-mipmaps
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The first rendering after context create didn't know of the color
buffer yet, triggering a sw fallback. The intel_prepare_render() from
intelSpanRenderStart then found the buffer and turned off fallbacks,
but intelSpanRenderFinish was never called and things were left
mapped. By checking buffers before making the call on whether to do
the fallback pipeline or not, we avoid the fallback change inside of
the rendering pipeline.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=31561
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In some cases _mesa_create_context() can return NULL an in the mesa
state tracker, we do not concider the case, which may cause issues
within st_create_context_priv()
This patch adds a simple check (similar to the one in the dri drivers)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
As Chia-I Wu said 'There are two libGL providers, Xlib and DRI based
they cannot coexist'
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Marek Olšák <maraeo@gmail.com>
This version is mostly Dan's post to the mesa-dev mailing list on
6/22/2011.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Dan Nicholson <dbn.lists@gmail.com>
According to the GLSL 1.20 specification, "it is a semantic error if
there are multiple ways to apply [implicit] conversions [...] such that
the call can be made to match multiple signatures."
Fixes a regression caused by 60eb63a855,
which implemented the wrong policy of finding a "closest" match.
However, this is not a revert, since the original code failed to
continue looking for an exact match once it found two inexact matches.
It's OK to have multiple inexact matches if there's also an exact match.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38971
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This is exactly analogous to Eric's Gen6 change in commit
6861a70177. His explanation:
"This is just like PointSprite overrides, but it's always on for that
attribute."
Fixes glsl-fs-pointcoord and gtf/point_sprites.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 7.11 branch.
This is exactly analogous to Eric's Gen6 change in commit
f304bb8a5d. His explanation:
"We were assuming that the input attribute n to the FS was
FRAG_ATTRIB_TEXn, which happened to be true often enough for our
testcases."
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 7.11 branch.
This is exactly analogous to Eric's Gen6 change in commit
e7280b16d6.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 7.11 branch.
This is just barely more pretty-printing than we previously had, but
at least it doesn't leave out unit states in the log.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is quite a bit of spam, but I think it's useful to have in a full
INTEL_DEBUG=batch dump. And a lot of this spam on glxgears is just
because we're awful at handling our constants :/
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The previous brw_state_dump output was rather useless -- last used
program per batch, and just the hex. Now we dump all programs (since
we don't know which were used), and disassemble them. But that's a
ton of spam, and usually when looking into program contents we use
INTEL_DEBUG={vs,wm,misc,other} and when looking into state updates we
use INTEL_DEBUG=batch, so this dump usually just massively clutters up
the output.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that we're using state base addresses for most things, we're less
interested in the absolute address of the state, and more in its
offset from the state base address (start of batchbuffer). Also,
reorder the printout so it looks more like the batchbuffer dump.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I want to make brw_state_dump.c handle more than just the last
statechange, so I want to keep track of what's in the batch state. By
using AUB file numbering for most of these packets, this may be
reusable for aub dumping.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will let me hang cached compiler structs off of the context
without having to worry about cleaning them up at destroy time.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's no pretty way to avoid the overwriting of the src operands, so
just use a temporary destination and rely on the MOV optimization.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were stomping over the source for the body of the LIT instruction
when doing the MOV of 1.0 to the uninteresting channels.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From ARB_framebuffer_object:
If a buffer is specified in <mask> and does not exist in both the
read and draw framebuffers, the corresponding bit is silently
ignored.
Using GL_NONE as DataType of Z32_FLOAT_X24S8, not sure what I should put there.
The spec says the type is n/a.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The existing code was missing GL_DEPTH_COMPONENT32, resulting in it
wrongly returning the color buffer instead of the depth buffer.
Fixes an issue in PlaneShift 0.5.7 when casting spells. The game calls
CopyTexSubImage2D on buffers with a GL_DEPTH_COMPONENT32 internal
format, which (prior to this patch) resulted in an attempt to copy
ARGB8888 to X8_Z24.
Instead of adding the missing enumeration directly, convert the code to
use _mesa_is_depth_format() and _mesa_is_depthstencil_format() as these
should catch any newly added depth formats in the future.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This was tricky. We were doing a use-before-initialize of
grf_reg_count, but the value usually got overwritten anyway -- when we
didn't have to do a relocation (typical), or on gen5 when we didn't
have relocations at all.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38771
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit b46dc45cee claimed that
NEW_POLYGONSTIPPLE is gratuitous, but somehow just changed comments
and whitespace instead of actually removing the flag.
While we're at it, 3DSTATE_PS doesn't appear to need NEW_LINE or
NEW_POLYGON either (those are in 3DSTATE_WM). Also, 3DSTATE_WM
doesn't appear to need BRW_NEW_NR_WM_SURFACES or BRW_NEW_CURBE_OFFSETS
either (those are in 3DSTATE_PS).
NOTE: This is a candidate for the 7.11 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
When switching channels with xine it sometimes happens that xine
destroys the drawable before we get a chance to call
DRI2DestroyDrawable, resulting in an x error.
SUB & LRP instructions should toggle NEG bit instead of setting it,
otherwise e.g. "SUB a,b,-1" is translated as "ADD a,b,-1"
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
For 0^0 case result of "LOG_CLAMPED ...,0" is -MAX_FLOAT, and then result of
"MUL_LIT ...,0,-MAX_FLOAT,..." is -MAX_FLOAT instead of 0 because of special
src1 checks for -MAX_FLOAT. So swap src0/1:
"MUL_LIT ...,-MAX_FLOAT,0,..." to get expected 0, then result of
"EXP_IEEE ...,0" is 1 as expected for LIT.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Create a new GLX drawable struct to track client related info, and add a
wrap counter to it drawable and track it as we receive events. This
allows us to support the full 64 bits of the event structure we pass to
the client even though the server only gives us a 32 bit count.
Reviewed-by: Michel Dänzer <michel@daenzer.net>
Reviewed-by: Jeremy Huddleston <jeremyhu@apple.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Normally lower_jumps.cpp doesn't need to lower a break instruction
that occurs at the end of a loop, because all back-ends can produce
proper GPU instructions for a break instruction in this "canonical"
location. However, if other break instructions within the loop are
already being lowered, then a break instruction at the end of the loop
needs to be lowered too, since after the optimization is complete a
new conditional break will be inserted at the end of the loop.
Without this patch, lower_jumps.cpp may require multiple passes in
order to lower all jumps. This results in sub-optimal output because
lower_jumps.cpp produces a brand new set of temporary variables each
time it is run, and the redundant temporary variables are not
guaranteed to be eliminated by later optimization passes.
Fixes unit test test_lower_breaks_6.
Previously, lower_jumps.cpp would break out of its loop after lowering
a jump instruction in just the then- or else-branch of a conditional,
and it would fail to lower a jump instruction occurring in the other
branch.
Without this patch, lower_jumps.cpp may require multiple passes in
order to lower all jumps. This results in sub-optimal output because
lower_jumps.cpp produces a brand new set of temporary variables each
time it is run, and the redundant temporary variables are not
guaranteed to be eliminated by later optimization passes.
Fixes unit test test_lower_returns_4.
The visitor class in lower_jumps.cpp never removes or replaces the
instruction being visited, but it frequently alters or removes the
instructions that follow it. Therefore, to make sure the altered IR
is visited, it needs to iterate through exec_lists using foreach_list
rather than visit_exec_list().
Without this patch, lower_jumps.cpp may require multiple passes in
order to lower all jumps. This results in sub-optimal output because
lower_jumps.cpp produces a brand new set of temporary variables each
time it is run, and the redundant temporary variables are not
guaranteed to be eliminated by later optimization passes.
Also, certain invariants assumed by lower_jumps.cpp may fail to hold,
causing assertion failures.
Fixes unit tests test_lower_pulled_out_jump,
test_lower_unified_returns, test_lower_guarded_conditional_break,
test_lower_return_non_void_at_end_of_loop, and test_lower_returns_3.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, lower_jumps.cpp would only lower return and continue
statements that appeared inside conditionals. This patch makes it
lower unconditional returns and continue statements that occur inside
a loop.
Such unconditional flow control statements would be unlikely to be
explicitly coded by a reasonable user, however they might arise as a
result of other optimizations.
Without this patch, lower_jumps.cpp might not lower certain return and
continue statements, causing some backends to fail.
Fixes unit tests test_lower_return_void_at_end_of_loop and
test_remove_continue_at_end_of_loop.
Previously, lower_jumps.cpp only lowered return statements that
appeared inside of an if statement.
Without this patch, lower_jumps.cpp might not lower certain return
statements, causing some back-ends to fail (as in bug #36669).
Fixes unit test test_lower_returns_1.
Previously, do_lower_jumps.cpp determined whether to lower return
statements in ir_lower_jumps_visitor::should_lower_jumps(). Moved
this logic to ir_lower_jumps_visitor::visit(ir_function_signature *),
so that it can be used in determining whether to lower a return
statement at the end of a function.
Previously ir_reader was only able to handle return of non-void.
This patch is necessary in order to allow optimization passes to be
tested in isolation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
LLVM 3.0svn changes pretty rapidly. The change in
Target->createMCInstPrinter() signature which inspired commits
40ae214067 and
92e29dc5b0 has been reverted.
Signed-off-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Otherwise PIPE_FORMAT_X8B8G8R8_UNORM and friends would fail.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
When the state tracker adds a front buffer, nothing triggers a validate
drawable call, since the state tracker manager is never notified.
Force a validate drawable call by invalidating the framebuffer's stamp, so
that the window system's renderbuffer (if any) is picked up.
This fixes bug 38988
https://bugs.freedesktop.org/show_bug.cgi?id=38988
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Fixes segfault when running cubemap demo on i945. This happened
when intel_region_reference() was called in i915_set_draw_region()
with depth_region=NULL.
Reviewed-by: Eric Anholt <eric@anholt.net>
Even if we don't have a current context, if we're freeing the rb we
should free its region (and BO). The renderbuffer unreference checks
appear to be just cargo-cult from the region unreference code.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30217
Reviewed-by: Chad Versace <chad@chad-versace.us>
As a result of this cleanup, a bug in
intel_process_dri2_buffer_no_separate_stencil() became quite apparent.
We were associating the NULL pointer after an unreference with the
STENCIL attachment -- clarify the logic and attach the right region.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This should help us avoid leaking regions in region reference code by
making the API more predictable.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This prevents developer surprise at seeing a GL_DEPTH_COMPONENT
texture have stencil bits, and avoids the metaops path accidentally
copying stencil bits around in glCopyTexImage(GL_DEPTH_COMPONENT) (and
being broken because swrast's glReadPixels(GL_UNSIGNED_INT_24_8) is
broken).
Acked-by: Chad Versace <chad@chad-versace.us>
We simply emit these using OUT_BATCH and bitshifting, as it results in
better compiled code than packed structures. Since our documentation
is public, it's not terribly useful to keep these around for reference.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Also rename it from CMD_STATE_INSN_POINTER to CMD_STATE_SIP to match the
documentation.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is a little different from most because it's a single DWord;
there's no length field.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I'm not sure about this one. The current code actually follows the spec, but
considering the spec is supposed to be written against GL 3.2 I'd say the spec
is broken. I filled out a spec feedback form over a month ago, but either the
form is broken, or nobody cares.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is probably nicer if the array size ever changes.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The total number of units used by a shader is limited to MAX_TEXTURE_UNITS,
but the actual indices are only limited by MAX_COMBINED_TEXTURE_IMAGE_UNITS,
since they're shared between vertex and fragment shaders.
NOTE: This is a candidate for the 7.11 branch.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's not supposed to do conversion, but st sometimes asks us to.
Sometimes conversion is even wrong (e.g. between UNORM and SRGB).
This should now include all formats the 2D engine supports.
Fixes an assertion failure in the piglib out-01.frag
ARB_explicit_attrib_location test. The locations set via the layout
qualifier in fragment shader were not being applied to the shader
outputs. As a result all of these variables still had a location of
-1 set.
This may need some more work for pre-3.0 contexts. The problem is
dealing with generic outputs that lack a layout qualifier. There is
no way for the application to specify a location
(glBindFragDataLocation is not supported) or query the location
assigned by the linker (glGetFragDataLocation is not supported).
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38624
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Vinson Lee <vlee@vmware.com>
And don't delete them. Let ralloc clean them up. Deleting the
temporary IR leaves dangling references in the prog_instruction. That
results in a bad dereference when printing the IR with MESA_GLSL=dump.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38584
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Using GLuint pointers worked when the pixel size was four bytes
or the row stride was a multiple of four but was otherwise broken.
Fixes failures found with the piglit fbo-stencil test.
This helps to fix https://bugs.freedesktop.org/show_bug.cgi?id=38729
NOTE: This is a candidate for the 7.11 branch.
The existing error result doesn't appear in the GL 2.1 or 3.2
compatibility specs, and triggers an unexpected GL error in Intel's
oglconform when it tries to reset the feedback state after usage so
that the "diff the state at error time vs. context init time" code
doesn't generate spurious diffs. The unexpected GL error then
translates into testcase failure. Brian wants the safety check on
buffer = NULL, though, so that people can't as easily set up a broken
buffer.
Fixes a bug caught by oglconform, and now piglit
ARB_vertex_program/getenv4d-with-error. The wrapping of an existing
GL function made it so that we couldn't distinguish an error in
looking up our arguments from an existing error. Instead, make a
helper function to choose the param, and use it from multiple callers.
v2: Move the success case line into the conditional, use COPY_4V more.
Commit 6750226e6d bumped the base MRF to
m2 instead of m0, but failed to adjust inst->mlen, which was being set
to the highest MRF. Subtracting the base MRF solves the issue.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
This fixes a regression introduced with commit
"st-api: Rework how drawables are invalidated v3"
where the glx state tracker manager would invalidate a drawable each time it
checks the drawable dimensions, even during a validate call, which
resulted in an endless loop, since the state tracker would immediately
detect the new invalidation and rerun the validate...
This change marks the drawable invalid only if the drawable dimensions actually
changed during the validate, which will result in at most a single
unnecessary validate by the context running a validate during which the
dimensions changed.
To avoid unnecessary validates altogether, we need to implement yet another
st-api change: Returning the current time stamp from the validate function,
as suggested by Chia-I Wu. The glx state tracker manager could then return
the stamp resulting from the last drawable dimension check.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
It makes things too random, as settings for temporary trials get stored
permannently, and it make difficult to build several platforms from the
same tree.
So disable it, again.
If a user-buffer was referenced twice by a draw command, the affected ranges
were uploaded separately, with only the last one being referenced by the
hardware. Make sure we upload only a single range.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
We currently always treat contents of user-buffers as volatile so
we don't need to take any particular action when the state tracker
announces that the contents has changed.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Viewperf uses some unusual vertex arrays where the stride is less
than the element size. In this case, the stride was 4 while the
element size was 12. The difference of 8 bytes causes us to miss
uploading the tail bit of the array data.
Typically the stride is >= the element size so there was no problem
with other apps.
Stream user buffer contents rather than trying to maintain persistent
host / hardware copies.
Resulting negative array offsets are not allowed by the hardware,
(well, at least not according to header files), so adjust index bias
to make all array offsets positive.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Make sure that the upload manager doesn't upload data that's not
dirty. This speeds up the viewperf test proe-04/1 a factor 5 or so on svga.
Also introduce an u_upload_unmap() function that can be used
instead of u_upload_flush() so that we can pack
even more data in upload buffers. With this we can basically reuse the
upload buffer across flushes.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
libdrm is used in multiple places. Always check for it and set
have_libdrm. Each user can then check the variable.
This is useful when only EGL and DRI drivers are needed.
The idea is that DRI driver, libGL and libOSMesa are libraries that can
be independently enabled, yet --with-driver does not allow us to easily
do that, if not impossible. This also matches what
--enable-{egl,xorg,d3d1x} do for the respective libraries.
There are two libGL providers: Xlib-based and DRI-based. They cannot
coexist. To be able to choose between them, --enable-xlib-glx is also
added.
With this commit, --with-driver=dri can be replaced by
$ ./configure --enable-dri --enable-glx --disable-osmesa
--with-driver=xlib can be replaced by
$ ./configure --disable-dri --enable-glx --enable-osmesa \
--enable-xlib-glx
and --with-driver=osmesa can be replaced by
$ ./configure --disable-dri --disable-glx --enable-osmesa
Some combinations that cannot be supported with --with-driver will
produce errors at the moment. But in the future, we would like to
support, for example,
$ ./configure --enable-dri --disable-glx --enable-egl
(build libEGL and DRI drivers, but not libGL)
Note that this commit still keeps --with-driver for transitional
purpose.
- Copy i915c's support for phases, that should allow us to run a coupe more shaders.
- Fix the error messages.
- Still try to proceed when we get a shader that's too long.
MOD_TO_FRACT was designed to lower the GLSL 1.20 mod() function, which
operates on floating point values. However, we also use ir_binop_mod
for GLSL 1.30's % operator, which operates on integers.
For now, make MOD_TO_FRACT only apply to floating-point mod operations.
In the future, we may want to add a lowering pass for integer-based mod.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, it would simply say "type error" in three different cases:
- The LHS is not an integer
- The RHS is not an integer
- The LHS and RHS have different base types (int vs. uint)
Now the error messages state the specific problem.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, ir_function::matching_signature had a fatal bug: if a
function had more than one non-exact match, it would simply return NULL.
This occured, for example, when looking for max(uvec3, uvec3):
- max(vec3, vec3) -> score 1 (found first)
- max(ivec3, ivec3) -> score 1 (found second...used to return NULL here)
- max(uvec3, uvec3) -> score 0 (exact match...the right answer)
This did not occur for max(ivec3, ivec3) since the second match found
was an exact match.
The new behavior is to return a match with the lowest score. If there
is an exact match, that will be returned. Otherwise, a match with the
least number of implicit conversions is chosen.
Fixes piglit tests max-uvec3.vert and glsl-inexact-overloads.shader_test.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
No MOV is necessary since signed/unsigned integers share the same
bit-representation; it's simply a question of interpretation. In
particular, the fs_reg::imm union shouldn't need updating.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Mesa IR actually stores all numbers as floating point, so this is
totally a farce, but we may as well keep it going.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reverts commit f41e1db327
"fix conversions from uint to bool and from float/bool to uint"
f2i, b2i, and b2i should not accept uint types. Use i2u and u2i.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
These are necessary to handle int/uint constructor conversions. For
example, the following code currently results in a type mismatch:
int x = 7;
uint y = uint(x);
In particular, uint(x) still has type int.
This commit simply adds the new operations; it does not generate them,
nor does it add backend support for them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We almost never want to specify a condition, and when we do we're
already thinking about it (because we're writing a lowering pass
generating the condition), so a default argument should make the code
more pleasant to read.
NOTE: This is a candidate for the 7.11 branch (we want to be able to
cherry-pick future code).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Our copy propagation tends to be bad at handling the later array
accesses of the matrix argument we moved to a temporary. Generally we
don't need to move it to a temporary, though, so this avoids needing
more copy propagation complexity.
Reduces instruction count of some Unigine Tropics and Sanctuary
fragment shaders that do operations on uniform matrix arrays by 5.9%
on gen6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were constrained to using temporaries because we were assuming
variables all over. This simplifies things a bit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This awkward typing was to avoid shadowing the function argument (the
matrix) with the temporary deref (the column) before the
get_column()/get_element()s were moved into the expression/assignment
constructors. They're about to become not-variables, so the current
names had to go. This change is almost mechanical (other than
column_expr), so it should make the next diff clearer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I think this makes the code more obvious by moving the declarations to
their single usage (now that we aren't using them to get at the ->type
field for expression constructors).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Move defintion of M_PI (for the benefit of <math.h> which do not define it), to
before the first use of it
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Brian Paul <brianp@vmware.com>
Commit 1a339b6c(st/mesa: prefer native texture formats when possible)
introduced two new arguments to the st_choose_format() functions.
This patch fixes the order and passes the correct internal_target
rather than GL_NONE
NOTE: This is a candidate for the 7.11 branch
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
The api and the state tracker manager code as well as the state tracker code
assumed that only a single context could be bound to a drawable. That is not
a valid assumption, since multiple contexts can bind to the same drawable.
Fix this by making it the state tracker's responsibility to update all
contexts binding to a drawable
Note that the state trackers themselves don't use atomic stamps on
frame-buffers. Multiple context rendering to the same drawable should
be protected by the application.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Instead of using a chain of manually maintained if/else blocks to
handle "#extension" directives, we now consult a table that specifies,
for each extension, the circumstances under which it is available, and
what flags in _mesa_glsl_parse_state need to be set in order to
activate it.
This makes it easier to add new GLSL extensions in the future, and
fixes the following bugs:
- Previously, _mesa_glsl_process_extension would sometimes set the
"_enable" and "_warn" flags for an extension before checking whether
the extension was supported by the driver; as a result, specifying
"enable" behavior for an unsupported extension would sometimes cause
front-end support for that extension to be switched on in spite of
the fact that back-end support was not available, leading to strange
failures, such as those in
https://bugs.freedesktop.org/show_bug.cgi?id=38015.
- "#extension all: warn" and "#extension all: disable" had no effect.
Notes:
- All extensions are currently marked as unavailable in geometry
shaders. This should not have any adverse effects since geometry
shaders aren't supported yet. When we return to working on geometry
shader support, we'll need to update the table for those extensions
that are available in geometry shaders.
- Previous to this commit, if a shader mentioned
ARB_shader_texture_lod, extension ARB_texture_rectangle would be
automatically turned on in order to ensure that the types
sampler2DRect and sampler2DRectShadow would be defined. This was
unnecessary, because (a) ARB_shader_texture_lod works perfectly well
without those types provided that the builtin functions that
reference them are not called, and (b) ARB_texture_rectangle is
enabled by default in non-ES contexts anyway. I eliminated this
unnecessary behavior in order to make the behavior of all extensions
consistent.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
These were previously 1-bit-wide bitfields. Changing them to bools
has a negligible performance impact, and allows them to be accessed by
offset as well as by direct structure access.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From the OpenGL docs for GL_ARB_explicit_attrib_location:
This extension provides a method to pre-assign attribute locations to
named vertex shader inputs and color numbers to named fragment shader
outputs.
This was accidentally implemented for fragment shader inputs. This
patch fixes it to apply to fragment shader outputs.
Fixes piglit tests
spec/ARB_explicit_attrib_location/1.{10,20}/compiler/layout-{01,03,06,07,08,09,10}.frag
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38624
The scissor state was incorrectly in a .prepare function instead of
.emit, so the packet would end up in the batch before the
STATE_BASE_ADDRESS. It appears that this doesn't actually hurt, as
the scissor address gets dereferenced according to the current SBA at
draw time.
All it's going to do is generate lots and lots and lots of
'warning: visibility attribute not supported in this configuration; ignored'
warnings
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Dan Nicholson <dbn.lists@gmail.com>
Considering fbdev as an in-kernel window system,
- opening a device opens a connection
- there is only one window: the framebuffer
- fb_var_screeninfo decides window position, size, and even color format
- there is no pixmap
Now EGL is built on top of this window system. So we should have
- the fd as the handle of the native display
- reject all but one native window: NULL
- no pixmap support
modeset support is still around, but it should be removed soon.
The system routine requires m0 be reserved for saving off architectural
state. Moved the allocation to start at 2 instead of 0.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously, if max_depth were 1, the following code would see the
first if-statement (correctly) not get flattened, but the second
if-statement would (incorrectly) get flattened:
void main()
{
if (a)
gl_Position = vec4(0);
if (b)
gl_Position = vec4(1);
}
This is because the visit_leave(ir_if*) method would not decrement the
depth before returning on the first if-statement.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Typically this was done by having a surface creation function fail if
the format was not supported.
However, in some situations when changing hardware surface formats,
it's desirable to do this check before attempting costly readback operations.
Also updated the surface_redefine interface.
Bump minor.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Blending and maybe even alpha-test don't work with those formats.
Only supporting RGBA, BGRA, RGBX, BGRX.
NOTE: This is a candidate for the 7.10 and 7.11 branches.
Remove set_event_handler() and pass the event handler with
native_get_XXX_platform(). Add init_screen() so that the pipe screen is
created later. This way we don't need to pass user_data to
create_display().
If we happened to allocate a texture result (or other vector) to the
highest hardware register slot, and we were in 16-wide, we would
under-count the registers used and potentially wrap around to g0 if
that allocation crossed a 16-register block boundary. Bad rendering
and hangs ensued.
Tested-by: Ian Romanick <idr@freedesktop.org>
When the fill mode is PIPE_POLYGON_MODE_LINE we were basically
converting the polygon into triangles, then drawing the outline of all
the triangles. But we really only want to draw the lines around the
perimeter of the polygon, not the interior lines.
NOTE: This is a candidate for the 7.10 branch.
In gen6 and above, clip distances 0-3 are written to message register
3's xyzw components, and 4-7 to message register 4's xyzw components.
Therefore when when writing the clip distances we need to examine the
lower 2 bits of the clip distance index to see which component to
write to.
emit_vertex_write() was examining the lower 3 bits, causing clip
distances 4-7 not to be written correctly.
Fixes piglit test vs-clip-vertex-01.shader_test
Evergreen+ don't support multi-writes so we need to emulate
it in the shader. Fixes the following piglit tests:
fbo-drawbuffers-fragcolor
ati_draw_buffers-arbfp-no-option
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
In intel_draw_buffer, there exists a workaround to prevent
_mesa_update_framebuffer from creating a swrast depth wrapper when
using separate stencil. This commit fixes the workaround, which was
incomplete for s8z24 texture renderbuffers.
Fixes fbo-blit-d24s8 on gen5 with separate stencil manually enabled.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Since all infrastructure is now in place to support packed
depth/stencil renderbuffers when using separate stencil, there is no
need for special cases when separate stencil is enabled.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Also, in order to coerce intel_update_tex_wrapper_regions() to
allocate the hiz region, alter intel_update_tex_wrapper_regions() to
examine the renderbuffer format instead of the texture image format.
Signed-off-by: Chad Versace <chad@chad-versace.us>
... and into new function intel_update_tex_wrapper_regions.
This prevents code duplication in the next commit.
Also add a note explaining that the hiz region is broken for mipmapped
depth textures.
Signed-off-by: Chad Versace <chad@chad-versace.us>
... when using separate stencil.
Define function intel_tex_image_x8z24_create_renderbuffers and call it
in intelTexImage after the miptree has been created and filled with data.
Signed-off-by: Chad Versace <chad@chad-versace.us>
... because they will be needed by intel_tex_image_s8z24_create_renderbuffers.
Redeclared functions are:
intel_alloc_renderbuffer_storage
intel_renderbuffer_set_draw_offsets
Signed-off-by: Chad Versace <chad@chad-versace.us>
Redeclare as non-static because
intel_tex_image_s8z24_create_renderbuffers will use it.
Remove the 'wrapper' parameter, because there is no wrapper for
intel_texture_image.depth_rb and stencil_rb.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the fields depth_rb and stencil_rb, and put hooks in place to
release the renderbuffers in intelFreeTextureImageData and
intelTexImage.
Signed-off-by: Chad Versace <chad@chad-versace.us>
Otherwise we can end up creating RGBA render targets (which are BGRA on the
hardware), and then we bind them as RGBA textures (which are RGBA on the
hardware). This generates software fallbacks every time we bind the frame as
a texture.
Commit 1a339b6c71 caused us to take
a different path through the glCopyTexSubImage() code. The
pipe_get_transfer() call neglected to pass the texture's level, face
and slice info. So we were always transferring from the 0th mipmap
level even when the source renderbuffer was a non-zero mipmap level
in a texture.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=38649
NOTE: This is a candidate for the 7.10 branch.
Once again, assuming the compiler is clever works out so poorly. The
generated code initialized the structure on the stack, then did a
lookup into it. This was a performance regression from
70c6cd39bd.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It's common in applications just before the advent of
EXT_separate_shader_objects to have multiple linked shaders with the
same VS or FS. While we aren't detecting those at the Mesa level, we
can detect when our compiled output happens to match an existing
compiled program.
This patch was created after noting the incredible amount of compiled
program data generated by Heroes of Newerth. It reduces the program
data in use at the start menu (replayed by apitrace) from 828kb to
632kb, and reduces CACHE_NEW_WM_PROG state flagging by 3/4. It
doesn't impact our rate of hardware state changes yet, because things
depending on CACHE_NEW_WM_PROG also depend on BRW_NEW_FRAGMENT_PROGRAM
which is still being flagged.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This patch add the support for 24bpp in the dri/swrast implementation.
Signed-off-by: Marc Pignat <marc@pignat.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
Use a typed struct to describe the native buffer and let the backends
map the native buffer to winsys_handle for
resource_from_handle/resource_to_handle.
When shared glapi is not enabled, there are two glapi providers and we
cannot decide which one to link to at build time. It results in
unresolved symbols in st/mesa. This commit makes st/mesa a loadable
module when shared glapi is not enabled, and hopes that the apps will
link to one of the glapi providers (GL or GLES).
Build pipe drivers here instead of using those built by the
soon-to-be-removed targets/egl.
[with an update by Benjamin Franzke to use --{start|end}-group]
Should unify this too, but will delay that until the planned
libdrm_nouveau/winsys changes which are likely to cause major
changes to this bo validation code too.
In fixed function, stride == 0 (e.g. glColor4f() outside of the draw
call) would get turned into uniform inputs, which is why it was
ignored originally in this test. For shaders, drivers end up seeing a
need to upload stride == 0 data, and get confused by needing to upload
when vbo_all_varyings_in_vbos() returned true. In the 965 driver
case, it wouldn't bother to compute the min/max index, and uploaded
nothing if the min/max wasn't known.
We've talked about removing the ff stride=0-into-uniforms code, so
this check shouldn't be missed once that's gone.
Fixes ARB_vertex_buffer_object/mixed-immediate-and-vbo
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37934
Reviewed-by: Brian Paul <brianp@vmware.com>
We would still want to consider that data as being in a VBO even if we
managed to produce this case, which as far as I know we can't.
Reviewed-by: Brian Paul <brianp@vmware.com>
All the packets chosen before came from grepping the pdf for
nonpipelined, and these two came from grepping for non.pipelined. We
could stand a review by looking at all packets emitted and identifying
what kind they are.
Previously, the builtins in OES_texture_3D.{frag,vert} were only
compiling properly as a consequence of bug 38015, which allows
unsupported extensions to be enabled. This fix eliminates the builtin
compiler's reliance on bug 38015, so that bug 38015 can be fixed.
Fixes broken glTexImage2D with format=GL_RGBA since
1a339b6c71
The origin for this behaviour is that r600_is_format_supported
checks only against r600_state_inline.h tables not evergreens.
If possible, we want to match the hardware format to what the app uses. By
doing so, we avoid the need for pixel conversions and therefore greatly speed
up texture uploads.
evergreen+ stores depth and stencil separately so when we
allocate a depth/stencil fbo, make sure we allocate enough
memory for both depth and stencil buffers.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Now all infrastructure is in place to support s8_z24 non-texture
renderbuffers for gen7.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Hiz buffer allocation can only occur if the 'else' branch has been taken,
so move the hiz buffer allocation into the 'else' branch.
Having the hiz buffer allocation dangling outside of the if-tree was just
damn confusing.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the following fields:
intel_renderbuffer.wrapped_depth;
intel_renderbuffer.wrapped_stencil
If the intel_context is using separate stencil and the renderbuffer has
a packed depth/stencil format, then wrapped_depth and wrapped_stencil are
the real renderbuffers.
Alter the following functions to accomodate the wrapped buffers:
intel_delete_renderbuffer
intel_draw_buffer
intel_get_renderbuffer
intel_renderbuffer_map
intel_renderbuffer_unmap
Subsequent commits allocate renderbuffer storage for wrapped_depth and
wrapped_stencil.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Commit b5c847c7ca erroneously disabled
support for S8_Z24 texture format when the context required separate
stencil (intel_context.must_use_separate_stencil).
But the GL spec requires implementations to support GL_DEPTH24_STENCIL8.
So we better find a way to fake it...
From page 180 (196 of pdf) of the OpenGL 3.0 spec:
In addition, implementations are required to support the following
sized internal [texture] formats.
[...]
- Combined depth+stencil formats: DEPTH32F_STENCIL8 and and
DEPTH24_STENCIL8.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
this drop a bunch of unnecessary checks (i.e. should be trapped
at gallium level), and also removes the switch statement in favour
of some calculated values for the vgt values.
Signed-off-by: Dave Airlie <airlied@redhat.com>
the attached patch should be an improvement over Vadim Girlin's patch
fixing LIT instruction for r600g (commit
2fe39b46e7).
Instructions used in tgsi_lit have been reordered to always write to a
dst channel after the same channel in src has been read (so if src ==
dst, input values are not overwritten before being used).
Signed-off-by: Dave Airlie <airlied@redhat.com>
We want to bind to our context before calling __glXSetCurrentContext or
messing with the gc rect in order to properly handle error conditions.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
In applegl, GLX advertises the same extensions provided by OpenGL.framework
even if such extensions are not provided by glapi. This allows a client
to get access to such API.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
FramebufferTextureLayer is an alias of FramebufferTextureLayerEXT, so
FramebufferTextureLayerARB needs to be listed as an alias of
FramebufferTextureLayerEXT rather than FramebufferTextureLayer.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Previously it was up to the driver or later code generator to reject
these shaders. It turns out that nobody did this.
This will need changes to support geometry shaders.
NOTE: This is a candidate for the stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37743
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since switching to hidden visibility on gcc, GLw apps were failing to
link. Use the GLAPI definition to use default visibility where necessary.
$ nm lib/libGLw.so | grep DrawingArea
0000000000004020 T GLwCreateMDrawingArea
0000000000003430 T GLwDrawingAreaMakeCurrent
0000000000003410 T GLwDrawingAreaSwapBuffers
0000000000204c60 D glwDrawingAreaClassRec
0000000000204d48 D glwDrawingAreaWidgetClass
00000000002053c0 D glwMDrawingAreaClassRec
00000000002054e0 D glwMDrawingAreaWidgetClass
Signed-off-by: Dan Nicholson <dbn.lists@gmail.com>
Tested-by: justin <jlec@gentoo.org>
This was spectacularly unsafe. On my system, address 0 happens to be
the hardware status page for the render ring, and the first quadword
of that happens to contain nothing we ever look at, but I sure didn't
look forward to having to debug some day when, for example, the kernel
happened to bind the ringbuffer before binding the hwsp.
That flag was leftover from gen4, where brw_curbe.c is choosing ranges
of the CURBE space for constants to live in, and the unit state tells
where to load them from. That's not the case on gen6 -- we don't set
this flag (since constants aren't in the URB), nor do we have any
state like that to upload.
If --disable-gallium is passed, llvm-config isn't checked for, so mark
it explicitly as absent, through LLVM_CONFIG=no.
Passing --disable-gallium would result in:
| ../configure: line 9739: --version: command not found
| ../configure: line 9740: --cppflags: command not found
| ../configure: line 9741: --libs: command not found
| ../configure: line 9743: --ldflags: command not found
With this commit, one gets that instead:
| configure: error: LLVM is required to build Gallium R300 on x86 and x86_64
Signed-off-by: Cyril Brulebois <kibi@debian.org>
This removes all the --enable-gallium-$driver options and --disable-gallium.
Gallium can be disabled by --with-gallium-drivers= (without parameters).
Default is:
--with-gallium-drivers=r300,swrast
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
There is an obvious redundancy:
--with-driver=dri VS --with-state-trackers=dri
--with-driver=xlib VS --with-state-trackers=glx
--enable-openvg VS --with-state-trackers=vega
--enable-egl VS --with-state-trackers=egl
This patch adds two new options for the remaining state trackers:
--enable-xorg
--enable-d3d1x
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Our hardware doesn't have a sample_d_c message, so we have to do a
regular sample_d and emit instructions to manually perform the
comparison.
This requires a state dependent recompile whenever the sampler's compare
mode or function change. This adds the per-sampler comparison functions
to brw_wm_prog_key, but only sets them when the sampler's compare mode
is GL_COMPARE_R_TO_TEXTURE (i.e. only for shadow sampling).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it available earlier, which will soon be necessary.
(Separating code motion from actual changes.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is somewhat ugly, but I couldn't think of a nicer way to handle the
interleaved coordinate/derivative parameter loading.
Ironlake and Sandybridge will still hit an assertion in visit().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Prior to this patch, it would attempt to optimize and allocate registers
for the program even if it failed to compile. This seems wasteful.
More importantly, the "message length > 11" failure seems to choke the
instruction scheduler, making it somehow use an undefined value and
segmentation fault.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
There will be a little bit of thrashing of the program cache BO as the
cache warms up, but once the application is in steady state, this
reduces relocations on gen5 and later.
On my T420 laptop, cairogl firefox-talos-gfx performance improves 2.6%
+/- 1.3% (n=6). No statistically significant performance difference
on nexuiz (n=5).
The _ColorDrawBuffers[] wouldn't get updated despite us having updated
what it depends on (Attachments[]->Renderbuffer). Other callers of
_mesa_remove_attachment are already flagging _NEW_BUFFERS for other
reasons. The specific bug report that led to this fix (and
the fbo-finish-deleted testcase) was fixed by
23b6f9606d, though.
Reviewed-by: Brian Paul <brianp@vmware.com>
This loop is trying to see if all the buffers to be uploaded happen to
be the same increment from the start of the 3DSTATE_VERTEX_BUFFERS
currently loaded in the hardware. However, we might be at a smaller
offset than the previous set of VERTEX_BUFFERS, so we can't reuse
because that packet made the first entry be its starting offset (you
can't access outside the given bounds).
Fixes piglit ARB_vertex_buffer_object/elements-negative-offset.
Current LIT implementation uses dst components for storing temp
results, possibly overwriting still needed values (depends on the
swizzles).
This patch uses temp reg for one of such cases (found in etqw) and
fixes "LIT R.z, R.xyzz".
Tested on evergreen. Fixes some etqw-demo rendering glitches when
"Lighting" is set to "High" in the settings.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The current dri context unbind logic will leak drawables until the process
dies (they will then get released by the GEM code). There are two ways to fix
this: either always call driReleaseDrawables every time we unbind a context
(but that costs us round trips to the X server at getbuffers() time) or
implement proper drawable refcounting. This patch implements the latter.
Signed-off-by: Antoine Labour <piman@chromium.org>
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Throttle pretty hard in order to prioritize user-space interactivity over
3D application speed. May revisit this later.
Signed-off-by: Thomas <thellstrom@vmware.com>
When emitting either a hiz or stencil buffer, the 'separate stencil
enable' and 'hiz enable' bits are set in 3DSTATE_DEPTH_BUFFER. Therefore
we must emit both 3DSTATE_HIER_DEPTH_BUFFER and 3DSTATE_STENCIL_BUFFER.
Even if there is no stencil buffer, 3DSTATE_STENCIL_BUFFER must be
emitted; failure to do so causes a hang on gen5 and a stall on gen6.
This also fixes a silly, obvious segfault that occured when a hiz buffer
xor separate stencil buffer existed.
Fixes the piglit tests below on Gen5 when hiz and separate stencil are
manually enabled:
fbo-alphatest-nocolor
fbo-depth-sample-compare
fbo
hiz-depth-read-fbo-d24-s0
hiz-depth-stencil-test-fbo-d24-s0
hiz-depth-test-fbo-d24-s0
hiz-stencil-read-fbo-d0-s8
hiz-stencil-test-fbo-d0-s8
fbo-missing-attachment-clear
fbo-clear-formats
fbo-depth-*
Changes piglit test result from crash to fail:
hiz-depth-stencil-test-fbo-d0-s8
Signed-off-by: Chad Versace <chad@chad-versace.us>
[airlied: final chunk of Mike's patch from bug 37476
this uses a loop to emit the GRADIENTS and does a check to
see if we need to fetch to a temporary register. It also
increases the context src gpr to 4 which is needed here.]
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for stable release branches (and don't forget
to re-run "make builtins" after cherry-picking.)
Mike had actually done a lot of the TXD support in a patch in bug
37476 which I see now, I'll add the bits of his work that I didn't think
to add to my work.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This at least passes the piglit arb_shader_texture_lod-texgrad test,
the AMD shader analyzer seems to multiply the V component by an unspecified
constant value no idea why.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This sets the base level as the zero level, which fixes
piglit/texturing/tex-miplevel-selection*.
The r600 hardware ignores the BASE_LEVEL field in some cases, so we can't
use it.
Evergreen might need this too.
Commit 56ef62d988
"glsl: Generate readable unique names at print time."
changed ir_print_visitor to not generate @0x1234567 suffixes except
where necessary. So there's no need to manually remove them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
This change to _glapi_create_table_from_handle causes it to fill the dispatch
table with NoOps for unimplemented functionality. This matches what is done
in indirect_init.c and also allows us to enable logging (when built with
-DDEBUG and the MESA_DEBUG or LIBGL_DEBUG environment variables are set) to
catch cases where clients are trying to use these unimplemented extentions.
Additionally, this fixes some gcc -pedantic warnings.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Check that the difference in array pointers/offsets from the 0th
array are less than the stride, for both VBOs and user-space arrays.
Previously, we were only doing this for the later.
This tightens up the interleaved array test and fixes a problem with
the llvmpipe driver where we were creating way too many vertex fetch
variants only because the pipe_vertex_element::src_offset values were
changing frequently. This change results in a 5x speed-up for one of
the viewperf tests.
Also, clean up the function to make it easier to understand.
The code was playing fast and loose with rowstrides, which meant that
if a driver chose anything different for its alignment requirements,
the generated mipmaps came out garbage. Unlike the uncompressed case,
we can't generate mipmaps directly into image->Data, so by using
TexImage2D we cut out most of the weird logic that existed to generate
in-place into ->Data. The up/downside is that the driver recovery
code for the fact that _mesa_generate_mipmaps whacked ->Data has to be
turned off for compressed now.
Fixes 6 piglit tests about compressed mipmap gen.
The path taken is wildly different based on this (do we generate from
a temporary image, or from level-1's data), and we appear to have
stride bugs in the compressed case that are tough to disentangle.
This just duplicates the code for the moment, the followon commit will
do the actual changes. Only real code change here is handling
maxLevel in one common place.
This is effectively just "round up when dividing by 4" compared to the
previous code. Fixes the broken stripe at the top of
fbo-generatemipmap-formats GL_EXT_texture_compression_rgtc.
We don't care just about the internalFormat/cpp/compressed, but about
the specific format chosen. We have no support for format
translations as part of texture validation, and furthermore it has
restrictions in the GL specification. However, we should be making
consistent decisions for this check anyway.
Generally image uploads to a the region occur at TexImage time, but
that's not the case for fallback _mesa_generate_mipmap(), and in this
path we were forgetting to align the width when dividing height. We
were just leaving out parts of the compressed block at 2x2 and 1x1
levels.
Fixes gen-compressed-teximage.
Copy-and-paste from the bgra cases. The C paths attempt to avoid
copying the 'x' channel, but it's harmless, you might as well. Good for
about 5% in glxgears (740 to 780 fps).
Signed-off-by: Adam Jackson <ajax@redhat.com>
glReadPixels() was performing RGB -> L conversion differently from the
glTexImage() style conversion appropriate for glCopyTexImage().
Fixes gles2conform copy_texture.
We were mapping the renderbuffer once, then walking over all the
buffers to map just the texture ones using the other texture mapping
function that handled the x/y offset to the image in the region. But
then we would go and overwrite *those* mappings with the original
mappings for depth/stencil, which was wrong.
Instead, just walk over the attachments once and map the attachments.
Wasn't that easy?
This is already pointing at 0 or Height - 1 and with an appropriate
pitch, so no need to recompute those values per customization of the
spans code. Cuts 3 out of 21kb of the compiled size.
Reviewed-by: Chad Versace <chad@chad-versace.us>
The "newImage" isn't particularly new -- it might be the same texture
that was attached to the same attachment point before. This function
also gets called when just rebinding back to an FBO with a texture
attachment.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad@chad-versace.us>
It was originally located in the region because the tracking of
depth/color buffers was on the regions, and getting back to the irb
would have been tricky. Now, we're keying off of the renderbuffer in
more places, which means we can move these fields where they belong.
This could fix potential rendering failure with a single texture
having multiple images attached to different renderbuffers across
shareCtx (as far as I can tell, this was the only failure we could
cause, since anything else should trigger intel_render_texture in
between, for example a BindFramebuffer).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad@chad-versace.us>
gc->vtable->destroy is always set and is used unconditionally
in other places, so don't bother checking for it first.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
This stuff is really for software rendering, it's not core Mesa.
A small step toward pushing the FetchTexel() stuff down into swrast.
Reviewed-by: Eric Anholt <eric@anholt.net>
x86_64_entry_start needs to be declared static in the C code,
in order to have the correct address in entry_get_public
(seems not to be needed on x86).
The compiler needs to lookup a local not a global object.
Otherwise addresses needed for _glapi_proc_address will be computed
from some random offset (0x6400229a61058b48 in my case).
This sadly requires work in the VS to rescale them, because the
hardware doesn't support this format natively.
Fixes arb_es2_compatibility-fixed-type and gtf/fixed_data_type.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The function was named "find_unconditional_discard", but didn't
actually check that the discard statement found was unconditional.
Fixes piglit glsl-fs-discard-04.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A trivial fix for error: format not a string literal and no format
arguments with compiling with -Werror=format-security flags.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If either depth or stencil buffer has packed depth/stencil format, then do
not use separate stencil.
Before this commit, emit_depthbuffer() incorrectly assumed that the
texture's stencil renderbuffer wrapper was a *separate* stencil buffer,
because the depth and stencil renderbuffer wrappers are distinct for
depth/stencil textures (that is, depth_irb != stencil_irb).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38134
Signed-off-by: Chad Versace <chad@chad-versace.us>
CONFIG regs (byte offsets 0x8000-0xac00) are single state and the pipeline
must be flushed and hw idle when they are changed. Border color regs
are in the CONFIG range and this is why a flush is required when changing
them. CONTEXT regs (byte offset 0x28000+) are multi-state and those do
not require flushes when changing them.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
This is just like PointSprite overrides, but it's always on for that
attribute.
Fixes glsl-fs-pointcoord, gtf/point_sprites.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
We were assuming that the input attribute n to the FS was
FRAG_ATTRIB_TEXn, which happened to be true often enough for our
testcases.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
If the wrap R (3rd) mode is set to CLAMP or CLAMP_TO_BORDER and the texture
isn't 3D, r300 always samples the border color regardless of texture
coordinates.
I HATE THIS HARDWARE.
NOTE: This is a candidate for the 7.10 branch.
Ideally we'd have a compiler and register spilling and all that
but this is good enough for now to avoid the gpu hang in piglit,
glsl-vs-vec4-indexing-temp-dst-in-nested-loop-combined
on r600/r700 cards.
based on r600c patch
Andre Maasikas <amaasikas@gmail.com>
r600c: bump sq gpr resources if a shader needs more than default
Signed-off-by: Dave Airlie <airlied@redhat.com>
According to vol2a.07, it only applies from Cantiga to Sandybridge.
I found this in my ringbuffers while investigating various GPU hangs.
While it may not have been the cause, it seemed wise to remove it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Thanks to Chad's hard work implementing separate stencil and HiZ
support, this is entirely straightforward.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We need to call add_validated_bo to do proper aperture space accounting.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since Gen7 doesn't support packed depth/stencil, the stencil buffer
can't possibly be relevant for determining the depth format.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes mesa more consistent with glibtool and XCode where the
generated file matches the dylib id rather using an extra symlink
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
When it is sensible to do so,
1) intelCreateBuffer() now attaches separate depth and stencil
buffers
to the framebuffer it creates.
2) intel_update_renderbuffers() requests for the framebuffer
a separate stencil buffer (DRI2BufferStencil).
The criteria for "sensible" is:
- The GLX config has nonzero depth and stencil bits.
- The hardware supports separate stencil.
- The X driver supports separate stencil, or its support has not yet
been determined.
If the hardware supports hiz too, then intel_update_renderbuffers()
also requests DRI2BufferHiz.
If after requesting DRI2BufferStencil we determine that X driver did not
actually support separate stencil, we clean up the mistake and never ask
for DRI2BufferStencil again.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Assert that the GLX config has an expected depth/stencil bit combination:
one of d24/s8, d16/s0, d0/s0. These are the only depth/stencil
configurations that we advertise.
Remove the check for software stencil, because given the assertions'
constraints the check always fails.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Extract the code that queries DRI2 to obtain the DRIdrawable's buffers
into intel_query_dri2_buffers_no_separate_stencil().
Extract the code that assigns the DRI buffer's DRM region to the
corresponding renderbuffer into
intel_process_dri2_buffer_no_separate_stencil().
Rationale
---------
The next commit enables intel_update_renderbuffers() to query for separate
stencil and hiz buffers. Without separating the separate-stencil and
no-separate-stencil paths, intel_update_renderbuffers() degenerates into
an impenetrable labyrinth of if-trees.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the fields below to intel_screen. The expression in parens is the
value to which intelInitScreen2() currently sets the field.
GLboolean hw_has_separate_stencil (true iff gen >= 7)
GLboolean hw_must_use_separate_stencil (true iff gen >= 7)
GLboolean hw_has_hiz (always false)
enum intel_dri2_has_hiz dri2_has_hiz (INTEL_DRI2_HAS_HIZ_UNKNOWN)
The analogous fields in intel_context now inherit their values from
intel_screen.
When hiz and separate stencil become completely implemented for a given
chipset, then the respective fields need to be enabled.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
... which indicates if the X driver supports DRI2BufferHiz and
DRI2BufferStencil.
I'm placing this in its own commit due to the large comment block.
CC: Ian Romanick <idr@freedesktop.org>
CC: Kristian Høgsberg <krh@bitplanet.net>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Since the stencil buffer is interleaved, the generic Mesa renderbuffer
accessors do not suffice. Custom span functions are necessary.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When emitting 3DSTATE_DEPTH_BUFFER, also emit 3DSTATE_HIER_DEPTH_BUFFER if
there is a hiz buffer. Ditto for 3DSTATE_STENCIL_BUFFER and a separate
stencil buffer.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
glapidispatch.h was located in glapi and shared with mesa core. Because
the way it was shared, mesa core must include it indirectly via
main/dispatch.h.
Now that it is no longer needed by glapi and is located in core mesa,
merging it with main/dispatch.h to avoid wrong uses.
Generate different glapidispatch.h's for GL and GLES. For GLES, we want
a local remap table.
This reverts commit 5af46e8360. The
commit will break GL remap table setup when main/glapidispatch.h is
regenerated.
See piglit dlist-fdo31590.c test and
http://bugs.freedesktop.org/show_bug.cgi?id=31590
In this case we had node->prim_count=1 but node->count==0 because the
display list started with glBegin() but had no vertices. The call to
glEvalCoord1f() triggered the DO_FALLBACK() path. When replaying the
display list, the old condition basically no-op'd the call to
vbo_save_playback_vertex_list call(). That led to the invalid operation
error being raised in glEnd().
NOTE: This is a candidate for the 7.10 branch.
Previously, we were errantly drawing some interior edges of clipped
polygons and quads. Also, we were introducing extra edges where
polygons intersected the view frustum clip planes.
The main problem was that we were ignoring the edgeflags encoded in
the primitive header's 'flags' field which are set during polygon/quad
->tri decomposition. We need to observe those during clipping. Since
we can't modify the existing vert's edgeflag fields, we need to store
them in a parallel array.
Edge flags also need to be handled differently for view frustum planes
vs. user-defined clip planes. In the former case we don't want to draw
new clip edges but in the later case we do. This matches NVIDIA's
behaviour and it just looks right.
Finally, note that the LLVM draw code does not properly set vertex
edge flags. It's OK on the regular software path though.
When GLX_INDIRECT_RENDERING is defined, some symbols are used in
libglapi.a but are not defined. Define them through the help of
glapitemp.h.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
This updates the apple dispatch table to match the current glapi.
Aliases are still not handled very well.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
With this change, Apple's libGL is now using glapi rather than implementing
its own dispatch. In this implementation, two dispatch tables are created:
__ogl_framework_api always points into OpenGL.framework.
__applegl_api is the vtable that is used. It points into OpenGL.framework
or to local implementations that override / interpose this in OpenGL.framework
The initialization for __ogl_framework_api was copied from XQuartz with some
modifications and probably still needs further edits to better deal with
aliases.
This is a good step towards supporting both indirect and direct rendering
on darwin.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
In starting the migration to using mapi, rename __gl_api to
__ogl_framework_api since it is a vtable for OpenGL.framework
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
So only with kernel version 2.7 can this work, thanks to Alex
for pointing that out. Also add a workaround for a hw bug.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Evergreen can do this as well as cayman, so we should enable it.
This fixes a gpu lockup with
glsl-vs-vec4-indexing-temp-dst-in-nested-loop-combined.shader_test
I need to add a better workaround for r600/r700.
Signed-off-by: Dave Airlie <airlied@redhat.com>
We weren't emitting the SQ setup regs at all which really is
fail.
When a state is always enabled we need to add it to the dirty list
as well.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Since resources don't generally vary in size, this splits
the emit path, it also takes into a/c that texture and vertex resources
have different number of relocs, and avoids emitting the extra
reloc for vertex resources.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Exit this loop early to avoid pointless iterations later.
Move the resource bos to the first two regs, it actually
doesn't matter which regs we use for this in resource land.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The EXT_framebuffer_object spec (and later specs) say:
"If a buffer is specified in <mask> and does not exist in both
the read and draw framebuffers, the corresponding bit is silently
ignored."
Check for color, depth, and stencil that the source and destination
FBOs have the specified buffers. If the buffer is missing, remove the
bit from the blit request mask and continue.
Fixes the crash in piglit test 'fbo-missing-attachment-blit from', and
fixes 'fbo-missing-attachment-blit es2 from'.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37739
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
NOTE: This is a candidate for the stable branches.
In an ES2 context (or if GL_ARB_ES2_compatibility) is supported, the
framebuffer can be complete with some attachments be missing. In this
case the _ColorDrawBuffers pointer will be NULL.
Fixes the crash in piglit test fbo-missing-attachment-clear.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37739
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
NOTE: This is a candidate for the stable branches.
query->num_results already has the size in dwords of the query
buffer. There no need to multiply again. We were reading past
the end of the buffer, resulting in reading garbage.
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=37028
agd5f: clarify the comment.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
According to the hw documentation, the driver needs to:
- allocate 128 bits for each possible DB
- clear the 128 bits for each possible DB
- write 1 to bits 127 and 63 for upper DBs that don't
exist on a particular asic
Previously we were only doing these steps if the
asic had less than the max possible DBs.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
With complex shaders there are often "holes" in the fs inputs, and we only
have 8 tex coorsd to map those to. To fix this, we remap fs inputs to [0..8].
This lets us to run many more GLSL programs.
At the end of flushing we were scanning over 450 blocks
with generally about 50 enabled. This reduces the scanning
to just the list of enabled blocks.
Signed-off-by: Dave Airlie <airlied@redhat.com>
There isn't much point taking the overhead of range/block lookups on resources
we aren't going to be getting resource registers at wierd offsets.
Signed-off-by: Dave Airlie <airlied@redhat.com>
resource setting could be a fair bit more lightweight,
this patch just separates the resource structs from the standard
reg tracking structs in the driver, later patches will improve
the winsys.
Signed-off-by: Dave Airlie <airlied@redhat.com>
we don't need to loop over all the registers unless we have
some bos in the block, also avoid setting the ctx flags,
and move the optional stuff down below this chunk.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reference implementation which produces high quality renderings.
Based on Higher Quality Elliptical Weighted Avarage Filter (EWA).
Signed-off-by: Brian Paul <brianp@vmware.com>
We implement line stipples, just not *quite* correctly. We have a
piglit testcase to use when we want to fix it, if we do. Until then,
don't lie to our test suites.
We do have hardware antialised lines. If we care, we should actually
fix them to be conformant (or as close as possible) instead of using
this knob to fool testcases using swrast.
For some interesting reading on the state of GL_*_SMOOTH across
several drivers, see:
http://homepage.mac.com/arekkusu/bugs/invariance/HWAA.html
From my reading of the GL 2.1 spec, no antialiasing is strictly
conformant for polygon smoothing. Yes, it's absurd, but then,
hardware doesn't support this so maybe it's not so absurd.
ir_print_visitor::visit(ir_constant *) was failing to index properly
into ir->type->fields.structure, so the first field name was being
reprinted for every field in the structure.
Signed-off-by: Brian Paul <brianp@vmware.com>
ast_expression::print() had an incorrect index into the subexpressions
array, so (a ? b : c) was being incorrectly rendered as (a ? b : b).
Signed-off-by: Brian Paul <brianp@vmware.com>
This makes this function not be an always miss for the branch predictor.
Noticed using cachegrind, makes a minor difference to gears numbers on r600g.
Signed-off-by: Dave Airlie <airlied@redhat.com>
These are handled separately in the winsys, so don't need the calculations
done at this point. this manifested as a crash in point-sprite,
Thanks to XoD on #radeon for pointing it out.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The flush function, when asked for, should not return a NULL fence.
NULL can only be returned if fences are not implemented, and st/mesa
doesn't call any of the fence functions if it receives a NULL fence
(because some drivers don't even set the fence hooks).
ARB_sync is exposed if fence_finish is set.
Mesa now limits, by default, the max number of texture levels to 15 so we
can now support the architectural maximum for gen4-6 of 14.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This modifies the VGT state and move the SPI setup to its own discrete state.
It then just sets the SPI state up and the VGT state up once and modifies
them thereafter.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This splits the initialisation and the setting of values in the resource
buffers. We only should end up initialising once and updateing with new values
when needed.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves the overhead of working out the range/block to state build time,
it also allows the compiler to use constants for a lot of things instead
of working them out each time.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves the functions down the file, and also adds a ctx parameter.
This is precursor patch just moving stuff around and getting it ready.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This drop the r600_draw_vbo CPU usage on a run of nexuiz from 1.40% to 0.72%
in sysprof for me on my Fusion APU.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This range was 76 dwords long, the 75th dword changes, the first 60 or so
don't. split the block so it emits less often.
Signed-off-by: Dave Airlie <airlied@redhat.com>
glx code hasn't lived under xserver/GL for a long time now.
Signed-off-by: Nathan Kidd <nkidd@opentext.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
OpenGL 4.0 Compatibility, page 449:
If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is NONE, no
framebuffer is bound to target. In this case querying pname FRAMEBUFFER_-
ATTACHMENT_OBJECT_NAME will return zero, and all other queries will generate
an INVALID_OPERATION error.
Reviewed-by: Chad Versace <chad@chad-versace.us>
This avoids the extra CMP and the predication on SEL, so in addition
to one less instruction, it makes scheduling less constrained.
Improves glbenchmark Egypt performance 0.6% +/- 0.2% (n=3). Reduces
FS instruction count across affected shaders in shader-db by 1.3%
without regressing any.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reduces compiled size of brw_wm_surface_state.o another 1.9%.
Overall, this brw_wm_surface_state reduction series cuts
firefox-talos-gfx runtime by 0.68% +/- 0.42% (n=6).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It turns out that gcc is just awful at generating code for
brw_structs.h style state setup, and using bitshifting on u32s
generates better code while being similarly readable (and more
verifiable compared to the specs, using the INTEL_MASK macro).
It's only used in the old fragment program path, to avoid projection
when w is always 1. We do want to do this in the new path pre-gen6
too, but we'll probably do it through the ir.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Oddly, this increases compiled code size. (marking the 'if' as likely
also increases code size, but not as much).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Interestingly, the compiler wasn't doing this for us at -O2, so we
were doing the computation for every non-_ReallyEnabled unit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
- all asics need to emit CONTEXT_CONTROL
- all r6xx asics need to emit 3D_START_CMDBUF
The ddx and r600c already do this. r600g should as well.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
We are getting inconsistent methods for endian detection (same answer when
it works, just doesn't work on some platforms) depending on whether __GLIBC__
is defined, which of course depends on include ordering before p_config.h
Just make p_config.h include limits.h to solve this.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
On my original R600 card this at least lets gnome shell run for a while longer
and the piglit r300-readcache test case works a lot more reliably.
Still a few more stability issues running a piglit test run though.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The spec doesn't state it should be an error, but. We have this piglit test
useprogram-inside-begin that passes with this commit. No idea what's correct.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
The conditional rendering should be able to kill CopyPixels.
I assume the render condition has no effect on resource_copy_region.
This fixes piglit:
- NV_conditional_render/copypixels
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Always default to DEFAULT_*_FORMATS for mandatory GL formats.
(st_choose_format must not fail for those)
Use DEFAULT_RGBA when alpha is required instead of RGB.
Use DEFAULT_RGB otherwise.
These are more or less the remaining differences between the old code and
the new one.
Reviewed-by: Brian Paul <brianp@vmware.com>
The problem is: The second time the function is called with a new
internal format, strb->format is usually not PIPE_FORMAT_NONE.
RenderbufferStorage(... GL_RGBA8 ...);
RenderbufferStorage(... GL_RGBA16 ...); // had no effect on the format
Broken with: fd6f2d6e57
Test: piglit/fbo-storage-completeness
NOTE: This is a candidate for the 7.10 branch.
(if fd6f2d6e57 is cherry-picked as well)
Reviewed-by: Brian Paul <brianp@vmware.com>
Lowered indirect addressing can create lots of immediates.
Fixes piglit/glsl-fs-uniform-array-7 on r300g.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Based uppon a patch from Pali Rohár <pali.rohar@gmail.com>.
This seems to get at least YUV->RGB conversion working.
So a simple "mplayer -vo vdpau" now seems to work fine.
From now on, depth test is always enabled in hardware.
If depth test is disabled in Gallium, the hardware Z function is set to ALWAYS.
If there is no zbuffer set, the colorbuffer0 memory is set as a zbuffer
to silence the CS checker.
This fixes piglit:
- occlusion-query-discard
- NV_conditional_render/bitmap
- NV_conditional_render/drawpixels
- NV_conditional_render/vertex_array
We want to check for Success, otherwise it will fail even with the right visual.
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Antoine Labour <piman@chromium.org>
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
When using _mesa_layout_parameters, all params copied in the 'layout'
output in the PASS 1 don't modify StateFlags (because they are simply
memcpy'ed).
This patch fixes the problem, assuring output gl_prog_param_list
StateFlags field is the same as the input one.
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Brian Paul <brianp@vmware.com>
At glLinkShaders time, a fail() call in FS compile in 8-wide (the one
that's required to succeed, though we may relax that at some point for
pre-Ironlake performance) will now report out as a link error.
We now have:
brw_fs.cpp handles calling out to everything and optimization.
brw_fs_visitor.cpp handles translating to our LIR.
brw_fs_emit.cpp handles emitting from our LIR to native code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's an assumption here that fixed GRFs will never intersect with
the allocated GRFs. That's true today, though it might change some
day if we decide to register-allocate the regs containing push
constants once they're dead.
This fixes a regression in 0f7325b890 in
Lightsmark from the texture instructions now containing g0 references
instead of having that be implied. Performance is improved 15.2% +/-
3.6% (n=3).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34968
emit_add_a16 was using the incorrect source.
This caused adds in the form of:
add u16 $a0 s32 $a1 u32 0x00000200
to have a source AREG of $a0 instead of $a1.
Fixes World of Warcraft in OpenGL and D3D without GLSL.
They were occupying whole 32-bit words, despite being only 10 or so
bits. Reduces code size slightly (80/3300 bytes).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From the GL 2.1 spec:
"Required perspective-correct interpolation for all fragment
attributes except depth in sections 3.4.1 and 3.5.1, effectively
making GL PERSPECTIVE CORRECT HINT a no-op."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
First, FBO read/draw == NULL validation happens in mesa core not
intelReadBuffers -> intel_draw_buffers. Second, that condition is no
longer tested for in our driver since ARB_ES2_compatibility was added.
Reviewed-by: Brian Paul <brianp@vmware.com>
Otherwise, the driver is likely to draw the flushed vertices to the
new drawbuffer instead of the old one, missing the point of the flush.
Reviewed-by: Brian Paul <brianp@vmware.com>
From the ARB_ES2_compatibility spec:
"(8) How should we handle draw buffer completeness?
RESOLVED: Remove draw/readbuffer completeness checks, and treat
drawbuffers referring to missing attachments as if they were NONE."
Fixes arb_es2_compatibility-drawbuffers when the short-circuit for
ARB_ES2_compatibility in the previous commit is dropped.
Reviewed-by: Brian Paul <brianp@vmware.com>
glDrawBuffers pointing at an unattached buffer is supposed to be
incomplete without ARB_ES2_compatibility. The testcase to catch the
bug of not implementing that bit of the spec was tricked by this
missing piece of state update.
Reviewed-by: Brian Paul <brianp@vmware.com>
If we use FBOs to access mipmap levels with glRead/Draw/CopyPixels()
we need to be sure to access the correct mipmap level/face/slice.
Before, we were just passing zero in quite a few places.
This fixes the new piglit fbo-mipmap-copypix test.
NOTE: This is a candidate for the 7.10 branch.
V_SQ_CF_WORD1_SQ_CF_INST_HALT is 0x1f on both
evergreen and cayman.
Reported-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
The logic of intel_draw_buffers() expected that stencil buffers were
always combined depth/stencil.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When a texture is attached to multiple FBO's, a separate renderbuffer
wrapper is created for each attachment. This necessitates storing the hiz
region for these renderbuffers in the texture itself instead of the
renderbuffer wrapper.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Before this commit, the renderbuffer's region was updated in
intel_renderbuffer_texture(). This commit moves the update into
intel_update_wrapper(), which is a more logical location for updates.
This is in preparation for the next commit, which allocates and
updates the texture's hiz region in intel_update_wrapper(). Having the two
region updates located in the same function makes good form.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
A hiz surface must be supplied to the hardware when rendering to a depth
buffer with hiz. There are three potential places to store that surface:
1. Allocate a larger intel_region for the depthbuffer, and let the
region's tail be the hiz surface.
2. Allocate a separate intel_region for hiz, and store it as
brw_context state.
3. Allocate a separate intel_region for hiz, and store it in
intel_renderbuffer.
We choose method 3.
Method 1 has not been chosen due to future complications it might cause
when requesting a DRI drawable's depth buffer attachment from X.
Method 2 has not been chosen because storing the hiz region apart from
the depth region makes lazy hiz/depth resolves difficult to implement.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Given a format, is_hiz_depth_format() indicates if HiZ can be enabled on
a depthbuffer of that format.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
... in intel_alloc_renderbuffer_storage(). The stencil buffer has quirky
pitch requirements, so its region allocation is a special case.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
When hardware supports separate stencil, enable support for separate
depth/stencil texture formats in the table
intel_context.ctx.TextureFormatsSupported. If the hardware must use
separate stencil, then disable support for combined depth/stencil formats.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Prefer MESA_FORMAT_X8_Z24 over MESA_FORMAT_S8_Z24 for textures with
internal format GL_DEPTH_COMPONENT*.
i965 needs MESA_FORMAT_X8_Z24 for HiZ and separate stencil.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
Add the following flags:
intel_context.has_separate_stencil
intel_context.must_use_separate_stencil
intel_context.has_hiz
The flags are currently set to false, and will be enabled for a given
chipset once the feature is completely implemented.
Since it may be some time before these features are completed, their
values can be overridden with environment variables INTEL_HIZ and
INTEL_SEPARATE_STENCIL. Valid values for these environment variables are
"0" and "1".
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
... because that's not a safe thing to do. The request buffer is shared
storage among all threads, and after UnlockDisplay the 'req' pointer may
point into someone else's request.
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Cayman is the RadeonHD 69xx series of GPUs. This adds support for
3D acceleration to the r600g driver.
Major changes:
Some context registers moved around - mainly MSAA and clipping/guardband related.
GPR allocation is all dynamic
no vertex cache - all unified in texture cache.
5-wide to 4-wide shader engines (no scalar or trans slot)
- some changes to how instructions are placed into slots
- removal of END_OF_PROGRAM bit in favour of END flow control clause
- no vertex fetch clause - TC accepts vertex or texture
Signed-off-by: Dave Airlie <airlied@redhat.com>
These don't need one, and I was seeing 0xff being returned and set in
the GPU registers with some tests.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Instead of using a giant switch statement with lots of code, use a
table to convert GL format enums to pipe formats.
Tested by running the old code next to the new and asserting that
the return value was the same for piglit tests.
We're doing a linear search, but if that ever appears to be too slow
the table could easily be sorted or hashed.
Certain applications (e.g., Bernina My Label, and the Windows
implementation of Processing language) destroy the device context used when
creating the frame-buffer, causing presents to fail because we were still
referring to the old device context internally.
This change ensures we always use the same HDC passed to the ICD
entry-points when available, or our own HDC when not available (necessary
only when flushing on single buffered visuals).
Since the SET_xxx and GET_xxx macros used to initialize the remap_table
have been replaced by inline functions, the missing late macro expansion
leads to driDispatchRemapTable not being redefined to remap_table, which
in turn causes the remap_table not to be setup properly.
This commit fixes the issue by moving the table redefinition after the
definition of driDispatchRemapTable but in front of the inline function
definitions.
Despite that negative values aren't sensible here, making this unsigned
is dangerous. Consider get_pointer_generic, which computes a value of
the form:
void *base + (int x * int stride + int y) * unsigned bpp
The usual arithmetic conversions will coerce the (x*stride + y)
subexpression to unsigned. Since stride can be negative, this is
disastrous.
Fixes at least the following piglit tests on Ironlake:
fbo/fbo-blit-d24s8
spec/ARB_depth_texture/fbo-clear-formats
spec/EXT_packed_depth_stencil/fbo-clear-formats
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
According to OpenGL 3.1 chapter 2.1.5 the representation without zero
should only be used for vertex attribute values, but not for textures
or frame-buffers.
According to OpenGL 3.1 chapter 2.1.5 the representation without zero
should only be used for vertex attribute values, but not for textures
or frame-buffers.
The coordinate offsets set in the m1 header are for textureOffset;
they have nothing to do with textureGrad (TXD).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The same as 3e43adef95 but for Gen7.
This doesn't quite fix GL_ARB_depth_texture/fbo-clear-formats; there's
still a 1 pixel wide black line on the right edge of the smaller squares.
The results were entirely wrong before, and are at least close now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Since wayland 4bde293ff8109d55eeaee8732f5a6ee0c8cd4bd9 we cant
lookup visuals, as we dont receive the visual token events.
The format for pixmap-images thus has to default to argb for now.
GLES uses GL_APIENTRYP instead of GLAPIENTRYP, which breaks with the
latest API table generation code. This fixes the issue by emitting a
definition for GL_APIENTRYP when generating the GLES files.
This prevents the error
prog: for the -disable-mmx option: may only occur zero or one times!
when creating a new context after XCloseDisplay with DRI drivers linked
with a shared LLVM 2.8 library.
In particular, this fixes the case where a vertex shader only uses
generic vertex attributes (non-0th). Before, we were no-op'ing the
glDrawArrays/Elements().
This fixes the new piglit pos-array test.
NOTE: This is a candidate for the 7.10 branch.
Previously, always did unorm8->float/nonlinear-to-linear conversion (using
lookup table), then convert back to nonlinear (using the expensive math
func pow among others), and finally convert back to int (assuming caller
wants unorm8), because the float texture fetch function is used for getting
the actual texel values. This should probably all be changed at some point,
but for now simply enable the memcpy path also for srgb formats (but if for
instance swizzling is required, still the whole conversion will be done).
Clip distance is calculated each time vertex position is written
which is suboptiomal is some cases but very safe.
User clip planes are an obsolete feature anyway.
Every time number of clip planes increases, the vertex program
is recompiled.
That ensures no overhead in normal case (no user clip planes)
and reasonable overhead otherwise.
Fixes 3D windows in compiz, and reflection effect in neverball.
Also fixes compiz expo plugin when windows were dragged and each
window shown 3 times.
This was going to get in the way of separate depth/stencil (which
wants to know about both, and whether they are the same rb), and also
wasn't a sufficient flag for the fix in the following commit.
In the 16-wide rework, I missed that we were setting some things to be
SIMD16 mode (corresponding to their setup in emit_texture_gen4()).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These fields are documented to be in the payload, and though the FB
write docs say they *aren't* in the payload, for all other fields the
payload and header is structured so that no overwriting is required
except for non-default options.
It turns out there's nothing in the hardware preventing this. It
appears that it ought to work on pre-gen6 as well, but just produces
GPU hangs.
Improves glbenchmark Egypt framerate 4.4% +/- 0.3% (n=3), and Pro by
2.6% +/- 0.6% (n=3).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As of gen6, alt-mode (which we use) MOVs of floats are not raw --
they'll modify infs/nans. This broke discard and alpha test in
16-wide, where apparently the upper 8 bits of the pixel enables being
set were causing the whole value to get trashed upon being moved.
Treating the values as UD instead of float makes sure they get
preserved. While I'm here, replace the two 8-wide moves of the halves
of the header with a single compressed move.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36648
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is part of fixing fbo-alphatest-nocolor -- a regression in
35e8fe5c99 after the initial regression,
that had us using a garbage BLEND_STATE[0] (in particular, the alpha
test enable) if no color buffer was bound.
I thought I was thwarted initially when I couldn't do conditional mod
on a MOV, and couldn't use two immediate constants in one instruction.
But g0 != g0 is also a way to produce a failing comparison.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Anisotropic filtering extension for swrast intended to be used by osmesa
to create high quality renderings.
Based on Higher Quality Elliptical Weighted Avarage Filter (EWA).
A 2nd implementation using footprint assembly is also provided.
Signed-off-by: Brian Paul <brianp@vmware.com>
Correctly links against selinux library when MESA is built with --enable-selinux option.
Fixes bug #36333 in Freedesktop bugzilla
Signed-off-by: Dave Airlie <airlied@redhat.com>
This function was taking a lot more CPU than required due to it memsetting
a bunch of memory that didn't require it from what I can see.
We should only memset here when we are about to fill out the sampler,
otherwise we end up doing a bunch of memsets for everytime this function
is called, basically setting 0 memory to 0.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The change for GPU hanging in 13bab58f04
fell back even when rb == NULL, which is wrong for GLES2 and caused
segfaulting in GLES2 conformance. For the GPU hang case (where the
broken 2D driver failed to allocate a BO for the window system
renderbuffer), it also would assertion fail/segfault immediately after
the fallback setup when the renderbuffer map failed.
Fixes GLES2 conformance packed_depth_stencil.
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Texture LOD Bias is now S4.8 instead of S4.6;
Min LOD, and Max LOD are now U4.8 instead of U4.6.
Fixes piglit test tex-miplevel-selection.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
The data port messages for this are rather different. For now, fail to
compile rather than hanging the GPU.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
On gen4/5, the RNDZ and RNDE instructions return floor(x), but set special
"round increment bits" in the flag register; a predicated ADD (+1) fixes
the result.
The documentation still lists '.r' as existing, and says that the
predicated add is necessary, but it apparently lies. According to the
simulator, BRW_CONDITIONAL_R (7) is not a valid conditional modifier
and the RNDZ and RNDE instructions simply produce the correct value.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The MATH instruction cannot handle source modifiers, even on Gen7.
So, apply this workaround for Sandybridge on Ivybridge as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes piglit test glsl-fs-loop-continue.shader_test on Ivybridge.
According to the documentation, the CONT instruction's UIP field should
point to the WHILE instruction on both Sandybridge and Ivybridge.
The previous code made UIP point to the implicit DO instruction, which
seems incorrect. I'm not sure how it could have worked on Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge's IF instruction doesn't support conditional modifiers.
It also introduces UIP, which must point to the ENDIF instruction.
ELSE and ENDIF remain the same except that JIP moves from dst to src1.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge puts the shadow comparator first, then lod/bias, and finally
the coordinate---unlike previous generations which always reserved four
slots for the coordinate at the beginning.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Most of this code copied from brw_wm_sampler_state.c.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I'm still not happy with the amount of code duplication here, but it
will have to do for now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Otherwise, Ivybridge seems to ignore the newly supplied data, giving us
rubbish for vertices.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This shouldn't be done using MRFs, but until I have a proper solution
for dealing with MRFs, this allows my hack to keep working.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The message header is still incorrect, but this is a start.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge's SEND instruction uses GRFs instead of MRFs. Unfortunately,
a lot of our code explicitly uses MRFs, and rewriting it would take a
fair bit of effort. In the meantime, use a hack:
- Change brw_set_dest, brw_set_src0, and brw_set_src1 to implicitly
convert any MRFs into the top 16 GRFs.
- Enable gen6_resolve_implied_move on Ivybridge: Moving g0 to m0
actually moves it to g111 thanks to the previous hack.
It remains to officially reserve these registers so the allocator
doesn't try to reuse them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This also disables the HiZ and separate stencil buffers. We still need
to implement stencil.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we currently only support sampling in the fragment shader, we only
bother to emit the PS variant. In the future we'll need to emit others.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This may not be necessary, but it seems like a good idea.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Ivybridge can update each stage's binding table pointer independently,
so we want separate dirty bits. Previous generations can simply
subscribe to all three dirty bits and emit as usual.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_vs_state.c; reuses create_vs_constant_bo from there.
The 3DSTATE_VS command is identical but 3DSTATE_CONSTANT_VS is not.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
SF and CLIP viewport state has been combined into SF_CLIP_VIEWPORT;
SF_CLIP and CC state pointers can now be uploaded independently.
Some portions of the hardware documentation refer to separate upload
commands for SF and CLIP; these are outdated and incorrect.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_clip_state.c.
This enables early culling and sets the necessary fields. Otherwise, it
is entirely the same, so I doubt this patch is strictly necessary for a
functional driver.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The state itself still seems to be the same; the only change is that
each part (CC, BLEND, DEPTH_STENCIL) can now be uploaded independently.
Thus, we still rely on the code in gen6_cc.c to set up the state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_wm_state.c.
The main change from Sandybridge seems to be that 3DSTATE_WM was split
into two separate state packet commands: 3DSTATE_WM and 3DSTATE_PS.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Copied from gen6_sf_state.c.
The main change from Sandybridge seems to be that 3DSTATE_SF was split
into two separate state packet commands: 3DSTATE_SF and 3DSTATE_SBE
("setup backend"). The bit-offsets are even the same - only the DWords
numbers have shuffled around a bit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently this always reserves 16kB for push constants, regardless of
how much space is needed, and partitions it evenly betwen the VS and FS.
This is probably not ideal, but is straightforward.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, gen7_atoms is a verbatim copy of gen6_atoms; future commits
will update it to contain gen7-specific state.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, IS_GEN7, IS_IVYBRIDGE, IS_IVB_GT1, and IS_IVB_GT2 all return
false. This allows me to write the code for them before actually adding
the PCI IDs and thus enabling the hardware.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The documentation uses the term "vertex URB entries", the code talks
about "entry size", and so on. Also, handles are just "pointers" to
entries (actually small integers).
Also rename max_gs_handles to max_gs_entries.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The primary motivation for this is to better support Ivybridge control
flow. Ivybridge IF instructions need to point to the first instruction
of the ELSE block -and- the ENDIF instruction; the existing code only
supported back-patching one instruction ago.
A second goal is to simplify and centralize the back-patching, hopefully
clarifying the code somewhat.
Previously, brw_ELSE back-patched the IF instruction, and brw_ENDIF
back-patched the previous instruction (IF or ELSE). With this patch,
brw_ENDIF is responsible for patching both the IF and (optional) ELSE.
To support this, the control flow stack (if_stack) maintains pointers to
both the IF and ELSE instructions. Unfortunately, in single program
flow (SPF) mode, both were emitted as ADD instructions, and thus
indistinguishable.
To remedy this, this patch simply emits IF and ELSE, rather than ADDs;
brw_ENDIF will convert them to ADDs (the SPF version of back-patching).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This hides the IF stack and back-patching of IF/ELSE instructions from
each of the code generators, greatly simplifying the interface.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This would be so much easier if we were using C++; we could simply use
constructors and destructors. Instead, we have to update all the
callers.
While we're at it, ralloc various brw_wm_compile fields rather than
explicitly calloc/free'ing them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
On Sandybridge, we don't need to break down primitives. There's no need
to bother setting up brw_compile and such if it's not going to be used;
bail as early as possible.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
ctx->Light.ProvokingVertex depends on _NEW_LIGHT.
Found by inspection.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it symmetric with brw_set_dest, which is convenient, and will
also allow for assertions to be made based off of intel->gen.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This fixes piglits fragment-and-vertex-texturing test on llvmpipe for me.
I've no idea if someone had another plan for this that is smarter than what
I've done here, but what I've basically done is
split fragment and vertex sampler and sampler_view setup function, factor
out the common chunks of both.
side-cleanups:
drop st->state.sampler_list - unused
don't update border color if we have no border color.
should fix https://bugs.freedesktop.org/show_bug.cgi?id=35849
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
I'm hard pressed to think of any reason a gallium thread would want to
receive a signal, especially considering its probably loaded as a library
and you don't want the threads interfering with the main threads signal
handling.
This solves a problem loading llvmpipe into the X server for AIGLX,
where the X server relies on the SIGIO signal going to the main thread,
but once llvmpipe loads the SIGIO can end up in any of its threads.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Nothing special, just changing conditions for when HiZ can be enabled and
when HiZ memory becomes invalid.
I was thinking about it again and realized it had not been quite right.
This is actually just the message descriptor for Gen6+ dataport access;
it has nothing to do with the render cache. Access to the sampler cache
and constant cache also would use this struct; rename for clarity.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
These are documented on page 245 of IHD_OS_Vol4_Part2.pdf (the public
Sandybridge documentation/SEND instruction description).
Somebody had the bright idea to reuse gen4/5 defines labelled READ/WRITE
which just happened to be the same values as Render Cache/Sampler Cache.
It turns out that this field has nothing to do with READ/WRITE on
Sandybridge, but rather represents which data port to direct it to.
This was especially confusing in brw_set_dp_read_message, which
used "BRW_MESSAGE_TARGET_DATAPORT_WRITE." In a read function.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For example, an indirect load like "ld b128 $r0q c0[$r0]" seems to
overwrite the address register before finishing the load, but only
if there are a lot of threads running.
Visible as displaced geoemtry in Unigine Heaven.
According to my documentation this is actually "Media Block Write" on
Gen4-5; there has never been a "DWord Block Write."
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
only allocate the blocks ptr in the range if we ever have one,
otherwise don't bother wasting the memory.
valgrind glxinfo
before:
==967== in use at exit: 419,754 bytes in 706 blocks
==967== total heap usage: 3,552 allocs, 2,846 frees, 3,550,131 bytes allocated
after:
==5227== in use at exit: 419,754 bytes in 706 blocks
==5227== total heap usage: 3,452 allocs, 2,746 frees, 3,140,531 bytes allocate
Signed-off-by: Dave Airlie <airlied@redhat.com>
This drops 6k of the text segment, a minor drop in the ocean, however
it also makes the code a lot cleaner and removes a lot of duplicated
information, hopefully making it more maintainable.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This table covered a large range unnecessarily, reduce the address
range covered, use the fact that the bottom two bits aren't significant,
and remove unused fields from the range struct. It also drops the hash_size/shift in context in favour of a define, which should make doing the math
a bit less CPU intensive.
valgrind glxinfo
Before:
==320== in use at exit: 419,754 bytes in 706 blocks
==320== total heap usage: 3,691 allocs, 2,985 frees, 7,272,467 bytes allocated
After:
==967== in use at exit: 419,754 bytes in 706 blocks
==967== total heap usage: 3,552 allocs, 2,846 frees, 3,550,131 bytes allocated
Signed-off-by: Dave Airlie <airlied@redhat.com>
Currently r600g always maps every bo, this is quite pointless as it wastes
VM and on 32-bit with wine running VM space is quite useful.
So with this patch we don't create the mappings until first use, without
tiling enabled this probably won't make a major difference on its own,
but with tiled staged uploads it should avoid keeping maps for most of the
textures unnecessarily.
v2: add bo data ptr check
Signed-off-by: Dave Airlie <airlied@redhat.com>
typedef void (GLAPIENTRYP _GLUfuncptr)(); causes the following warning:
function declaration isn't a prototype.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
GetVertexAttrib*{,ARB} is no longer aliased to the NV calls.
This fixes tracing yofrankie with apitrace, given it requires accurate
results from GetVertexAttribiv*.
NOTE: This is a candidate for the stable branches.
Mesa already supports this because of NV_fragment_program.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marek Olšák <maraeo@gmail.com>
If state tracker asked us to map resource directly and we can't
do it (because of tiling), return NULL instead of doing full transfer
- state tracker should handle it and fallback to some other method
or repeat transfer without PIPE_TRANSFER_MAP_DIRECTLY.
It greatly improves performance of xorg state tracker on nv50+,
because its fallback (DFS/UTS) is much faster than full transfer.
Eliminates unaligned accesses on strict architectures. Spotted by Jay
Estabrook.
Signed-off-by: Matt Turner <mattst88@gmail.com>
NOTE: This is a candidate for the 7.10 branch.
GLSL stopped using:
BRA, EXP, LOG, LRP, NRM3, NRM4, XPD.
GLSL started using:
KIL, SCS, SSG, SWZ.
(omg why SWZ? isn't proc_src_register flexible enough?)
GLSL doesn't use these opcodes some Radeons do support:
ARR, DP2A, DST, LRP, XPD.
These opcodes are now unused:
AND, NOT, NRM3, NRM4, OR, XOR.
(plus maybe the NV extensions which are unused by Gallium)
In addition to that, we don't use two-dimensional indirect addressing,
which the Mesa IR can do.
PIPE_ARCH_UNKNOWN_ENDIAN is used no where else. All #else branches of
ifdef PIPE_ARCH_LITTLE assume big-endian. Not #error'ing out here
only serves to allow bad things to happen.
Signed-off-by: Matt Turner <mattst88@gmail.com>
1/ln(2) is equivalent to log2(e), so define it as such.
log2(e) = ln(e)/ln(2) = 1/ln(2)
Worst of all, the definitions for M_LOG2E and ONE_DIV_LN2
(right beside each other!) weren't the same.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Setting SOURCE_FORMAT to EXPORT_NORM is an optimization.
Leaving SOURCE_FORMAT at 0 will work in all cases, but is less
efficient. The conditions for the setting the EXPORT_NORM
optimization are as follows:
R600/RV6xx:
BLEND_CLAMP is enabled
BLEND_FLOAT32 is disabled
11-bit or smaller UNORM/SNORM/SRGB
R7xx/evergreen:
11-bit or smaller UNORM/SNORM/SRGB
16-bit or smaller FLOAT
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Hopefully we can find out the proper fix for this, but for now
this makes the fbo mipmap tests pass on my rv670 (x2 card).
Signed-off-by: Dave Airlie <airlied@redhat.com>
r6xx asics have some problems with the surface
sync logic for the CB and DB. It's recommended
to use the event write interface for flushing
the DB/CB caches rather than the sync packets.
A single event write flush flushes all dst
caches, so we only need one for all CBs and DB.
Should fix:
https://bugs.freedesktop.org/show_bug.cgi?id=35312
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This seems more in line with what the documentation suggests we should be
doing. It doesn't fix the rv635 regression, though I thought it might,
so it means I've no idea whats actually going wrong there.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
We only handle a 32 bit swap count, so use the new structure definitions.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
After sending the GLXChangeDrawableAttributes request, we also set a
local set of attributes on the DRI drawable. But in the indirect case
this array won't be present, so skip the setting in that case to avoid a
crash.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We use a hidden window for pbuffer contexts, but Windows limits window
sizes to the desktop size by default. This means that creating a big
pbuffer on a small resolution single monitor would truncate the pbuffer
size to the desktop.
This change overrides the windows maximum size, allow to create windows
arbitrarily large.
Pointed out by clang:
src/gallium/auxiliary/draw/draw_context.h:251:41: warning: implicit conversion
from enumeration type 'enum pipe_cap' to different enumeration type
'enum pipe_shader_cap' [-Wconversion]
return tgsi_exec_get_shader_param(param);
~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~~
Otherwise there would be no way to know whether the state has been changed.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This reverts commit 1dc204d145.
MC_COORD_TRUNCATE is for MPEG and produces quite an interesting behavior
on regular textures. Anyway that commit broke filtering in demos/cubemap.
The new allocator uses ra and does swizzle packing.
Also, a data structure (struct rc_variable) and associated functions have
been added for generating UD and DU chains.
This function can be used to avoid creating single register classes for
input/payload registers. This makes optimistic coloring less likely
to fail.
Reviewed-by: Eric Anholt <eric@anholt.net>
The instruction scheduler will sometimes leave orphaned sources when
converting instructions from RGB to Alpha. If one of these orphaned
sources has an index greater than the maximum temporary register index,
then the compiler will incorrectly report "Too many hardware temporaries
used". The dead sources pass cleans up these orphaned sources.
GL_FIXED should not be accepted in the other gl*Pointer calls in OpenGL.
There is a new piglit for this: arb_es2_compatibility-fixed-type.
Reviewed-by: Brian Paul <brianp@vmware.com>
We were accidentally leaving blending enabled for LogicOp GL_COPY,
which ARB_color_buffer_float/GL_RGBA32F-render (and friends) caught.
Additionally, the GL spec says that no LogicOp should be done to
floating-point targets, and the GPU gets really angry even if you say
to LogicOp GL_COPY to float.
As we expanded the usage of the state cache, it grew extra
functionality. However, with the recent state streaming rework, we're
back to the state cache being used only for shader kernels, which is
the piece of GPU state that's actually expensive to compute again from
scratch, since it involves compiling.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that all the dynamic state is streamed through the top of the
batchbuffer, we can cut out many of our relocations to that state by
using the base address.
Improves 3DMMES taiji performance 3.3% +/- 0.4% (n=15).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Overall, across this series since the last set of numbers, gen6 3DMMES
taiji performance has dropped 0.8% +/- 0.3% (n=15), probably due to
the increased reissuing of state from some of the state objects that
otherwise never changed, and increased occurrence of the per-batch
overhead as we've increased how much we put in the batch BO without
increasing the batch BO's size.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The samplers are about to become streamed for gen6 performance, which
would cause this unit to blow out the state cache.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is in a way a revert of f5bb775fd1.
The tiny win that had will be overwhelmed by the win of using the gen6
dynamic state base address.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Improves 3DMMES taiji demo performance by 5.1% +/- 1.9% (n=15), by
reducing CPU time spent thrashing around those tiny little constant BOs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The payload regs can go all the way up to register 60+, so just give
them 8 bits to be addressed by instead of 3-4 (which made source_w_reg
of 8 end up 0). There's no reason to aggressively pack these fields,
as they are just used as compiler information, where being easier to
access is probably more important than shaving a byte or two off of
the structure.
Fixes piglit fragcoord_w.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36649
I was promoting to float for ARB_color_buffer_float unclamped, which
failed when ARB_texture_float wasn't present. Since the metaops don't
need results outside of [0,1] when not drawing to a floating point
destination, they can just use a fixed point texture when floating
point destinations are impossible.
Fixes regression in fdo23670-depth_test when --enable-texture-float is
not present.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36473
Since commit de579a1 "Include GIT SHA1 in GL version string"
$ git status
On branch master
Your branch is ahead of 'origin/master' by 2 commits.
Untracked files:
(use "git add <file>..." to include in what will be committed)
src/mesa/main/git_sha1.h
nothing added to commit but untracked files present (use "git add" to track)
Add git_sha1.h to .gitignore so git knows not to warn it is present but untracked
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Also use MAX3 and incorporate Ian's suggestion in texformat.c.
I don't think wrapping u_format_rgb9e5.h in another header and thus making it
more complicated is worth it.
swrast support done.
There is no renderbuffer support in swrast, because it's not required
by the extension.
Reviewed-by: Brian Paul <brianp@vmware.com>
I was wondering why I had been getting GL_RGBA for GL_RGB9_E5.
Instead of setting GL_RGBA and CHAN_TYPE for most types,
use the helper functions to obtain the info.
Reviewed-by: Brian Paul <brianp@vmware.com>
If we run out of bin memory and do an early return from
lp_setup_begin_query() we'd omit setting the setup->active_query
pointer. Then, when lp_setup_end_query() was later called, the
assertion for setup->active_query == pq would fail. Moving the
assigment in lp_setup_begin_query() avoids that.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Including windows.h was ineffective on MSVC because we define the NOGDI macro,
which skips the wingdi.h include.
Unsetting NOGDI is also a bad idea because it causes all sort of symbol
clashes with SGI code.
The real problem is that WINGDAPI was not being defined, also due to NOGDI,
so simply define it to blank if not done already. This seems to make
everybody happy.
The default value is 64 but drivers usually advertise more, like 4096.
Allows ARB vp/fp programs to use more parameters.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This reverts commit 50ade6ea69.
Fixes jerky rendering again on apps that don't block on the GPU per
frame and are GPU bound (e.g. 3DMMES on Ironlake). The whole point of
this complicated throttle scheme is to wait on frame n-1 to have
started rendering before starting frame n's rendering. Otherwise, the
GPU-bound app will race ahead and call the GL to draw many
nearly-identical frames, then >0ms later get stuck waiting for them
(all dispatched at about the same time) to retire, then render a new
batch of nearly-identical frames.
If GL_RGB16F or GL_RGB32F is specified let's try the 3-component float
texture formats before trying the 4-component ones. Before this,
GL_RGB16/32F were treated the same as GL_RGBA16/32F.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
The actual code that needs this include is just using
"if defined (PIPE_OS_UNIX)", and the two conditions should match.
This should also make the file compile under Hurd.
This is more painful than instruction scheduling, as we have to
compare two MRF writes to see if they coincide, and have to handle
partial GRF writes before that (for example, the result of a math
instruction written to color).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All that needed fixing was skipping the newly-possible
uncompressed/sechalf partial GRF constant writes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Most of the work of the scheduler is agnostic to wide dispatch. It
operates on our virtual GRF file, which means instructions are
generally referring to 8 or 16 wide naturally. For the MRF file
management we're trying to track the actual hardware MRF file, so we
need to watch if an instruction writes multiple MRFs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is glued in in a bit of an ugly way -- we rely on the uniforms
having been set up by 8-wide dispatch, and we just reuse them without
the ability to add new uniforms for any reason, since the 8-wide
compile is already completed. Today, this all works out because our
optimization passes are effectively the same for both and even if they
weren't, we don't reduce the set of uniforms pushed after
optimization.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Without this, consumers often have to keep linked lists of the
entries, at additional malloc cost.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These reduce an emitted (not decoded) instruction per shader on
g4x/gen5, but may allow for additional register coalescing as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
At this point it doesn't do uniforms, which have to be laid out the
same between 8 and 16. Other than that, it supports everything but
flow control, which was the thing that forced us to choose 8-wide for
general GLSL support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Note that the virtual grfs are in increments of the dispatch_width,
not hardware registers -- this makes the 16-wide emit and 8-wide emit
mostly the same.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I hit this when testing RV350, which lacks RGB10_A2 render target
support. It had been missed when implementing the format and probably
unused by anything else too.
Not applicable to 7.10.
Reviewed-by: Eric Anholt <eric@anholt.net>
"st/mesa: check image size before copy_image_data_to_texture()" caused
a regression in piglit fbo-generatemipmap-formats test on all gallium drivers.
Level 0 for NPOT textures will not match minified values, so don't do this
check for level 0.
Signed-off-by: Dave Airlie <airlied@redhat.com>
In the initial code if we had nothing in the vector slots r would
never get reset to 0, so we'd fail to compile shaders, after the previous
commit this would happen for the LIT tests. When I fixed that we did a lot
of unnecessary loops through all the vector states when we had no vector
slots filled. So this patch optimises thing for the scalar only state.
This fixes the 3 LIT piglit tests on r600g.
Signed-off-by: Dave Airlie <airlied@redhat.com>
In the R600 ISA document:
Section 4.7.5 Cycle restrictions for the ALU.trans states that
PV/PS have cycle restrictions wrt constants.
This is part of a fix for the LIT tests
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes a bug in Trine where fragment.color would write
FRAG_RESULT_COLOR (which is interpreted by drivers as being the "write
this to all color buffers" option) instead of FRAG_RESULT_DATA0 (just
the first target).
Fixes piglit ATI_draw_buffers/arbfp-no-index.
This extension support consists of replacing
"gl_texture_obj->Sampler." with "_mesa_get_samplerobj(ctx, unit)->".
One instance of referencing the texture's base sampler remains in the
initial miptree allocation, where I'm not sure we have a clear
association with any texture unit.
Tested with piglit ARB_sampler_objects/sampler-objects.
Reviewed-by: Brian Paul <brianp@vmware.com>
Since we lack hardware support for it, this is a simple matter of
checking _mesa_check_conditional_render at the entrypoints, and
suppressing it for the metaops where it doesn't apply.
Reviewed-by: Brian Paul <brianp@vmware.com>
The NV_conditional_render spec calls out specific operations that
conditional rendering applies to, which doesn't include these.
Fixes NV_conditional_render/generatemipmap on swrast.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested with rgtc-teximage-0[12].
EXT_texture_compression_rgtc/fbo-generatemipmap-formats fails in NPOT
just like S3TC does.
Reviewed-by: Brian Paul <brianp@vmware.com>
This assertion doesn't make any sense to me -- the convertFormat is
already something valid (tested above), and the BaseFormat dictated by
convertFormat doesn't matter to the function about to be called (it's
the datatype/comps that were pulled out of convertFormat).
Fixes assertion failure in
GL_EXT_texture_compression_rgtc/fbo-generatemipmap-formats
(still has a rendering failure in NPOT like S3TC does).
Reviewed-by: Brian Paul <brianp@vmware.com>
We were falling through to the default R8 and RG88 formats instead of
compressing when possible. Noticed by swrast fbo-blending-formats
actually doing rendering.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
They were totally broken for several releases.
scons now builds everything the project files built and more, and can be
kept up-to-date with little effort.
Need to reset the point/line/tri functions to point to the "first"
versions whenever we flush vertices. Fixes unfilled polygon rendering
errors seen in demos/samples/logo.c. See comments for more info.
NOTE: This is a candidate for the 7.10 branch.
Broken with e5c6a92a12. (ARB_color_buffer_float)
Clamping should occur if type != float, otherwise the MSBs of the resulting
pixels are killed off. For example, reading back LUMINANCE = R+G+B can be
greater than 0xff, but the result is naturally masked by 0xff
for UNSIGNED_BYTE, leading to bogus results.
The following bug report seems to want clamping to occur if type == half_float
too. Not sure what's correct.
Bug: [bisected pineview] oglc case pxconv-read failed
https://bugs.freedesktop.org/show_bug.cgi?id=35852
Tested by: Fang Xun <xunx.fang@intel.com>
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
None of this ever gets used. Fog is always calculated by a fragment
program. Even though the fixed-function fog unit is never used, state
updates are still sent to the hardware. Removing those spurious state
updates can't hurt performance.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
Fragment programs are generated by core Mesa for fixed-function.
Because of this, there's no reason to handle cases where there is no
fragment program for fog.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
All drivers expect this to always be GL_NONE. Don't let there be any
opportunity for a bad value to leak out and infect some unsuspecting
driver. If any driver for hardware that had fixed-function
per-fragment fog (i915 and perhaps some r300-ish) was ever going to
add support, it would have done it by now.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
This patch fixes two bugs related to fog in the fixed-function
fragment shader generation code.
Fog was only lowered to instructions if MRTs were used. The fragment
shader assembler always lowers "fog option" code to instructions, and
many drivers (e.g., r300) expect this.
When fog lowering did happen, it was after the instruction count was
checked against implementation limits. Since fog lowering may add up
to 5 instructions, a program that was below the limits before lowering
may exceed the limits after lowering.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Corbin Simpson <MostAwesomeDude@gmail.com>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
We should only copy images into the dest texture if the size is correct.
This fixes a failed assertion when finalizing a texture with mis-defined
mipmap levels such as:
level 0: 32x32
level 1: 8x8
Also, fix incorrect mipmap level used in assertion at the top of
copy_image_data_to_texture().
NOTE: This is a candidate for the 7.10 branch.
Lots of code (deleted by this patch) tried to make type == result->type,
but not all cases did. Don't pretend; just use result->type.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Things definitely remaining todo: switch statements, clip distances.
On 965, we also need real integers in the VS, and implementations of
some things like isinf/isnan.
Reviewed-by: Brian Paul <brianp@vmware.com>
For 1 and 2-channel formats the hardware only supports rendering to R
and RG. To do I and L render targets we just call them R and
everything works out. For A, we would need to rewrite the CC to do
the alpha channel's blending on color instead, and send the fragment
alpha down the red channel. For LA, there doesn't seem to be any
hope, because we can't do independent color/alpha blending while
treating the LA surface as RG.
Reviewed-by: Brian Paul <brianp@vmware.com>
The blitter only does up 32bpp at a time, so we handle it by mangling
coordinates and calling the surface 32bpp.
Fixes ARB_texture_rg/fbo-generatemipmap-formats-float with ARB_texture_float.
Reviewed-by: Brian Paul <brianp@vmware.com>
Of these, intel will be using I and L initially, and A once we rewrite
fragment shaders and the CC for rendering to it as R.
Reviewed-by: Brian Paul <brianp@vmware.com>
Keep track of when the caches are dirty, and only flush them when
the framebuffer state is set and when the context is flushed.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fixes piglit's draw-instanced-divisor test for softpipe on both
the generic and SSE paths. This is temporary until we have the
correct per-array max_index information.
this needs revisiting, we really don't want to be flushing all 32 of these,
but currently we don't flush any of them, and it seems to have caused a regression
as reported on irc with doom3 on evergreen.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Writes within ELSE blocks were being ignored which prevented us from
discovering all possible writers for some register values.
Fixes piglit glsl-fs-raytrace-bug27060
This gets me from 2200 to 1978 dwords for a gears frame.
This is due to us having some 32-dwords blocks in the SPI, that we only
modify the first dwords off.
v2: fix dirty reg count from Bas Nieuwenhuizen
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is a first step to decreasing the CPU usage, by decreasing how much
stuff we pass to the GPU and hence to the kernel CS checker.
This adds a check to see if the values we need to write are actually dirty,
and avoids writing if they are. However certain register need to always
be written so we add a new flag to say which ones should be always written
if used. (Note this could probably be done cleaner with a larger refactoring,
since I think the CONST_BUFFER_SIZE_PS/VS and CONST_CACHE_PS/VS might
be better off as a special state).
It also moves the need_bo to be a flags on the register now.
With this, a frame of gears goes from emitting 3k dwords to emitting 2k dwords,
and I'm sure it could get a lot smaller.
v2: fix some evergreen dirty bits.
Original patch from: Bas Nieuwenhuizen, I NIHed nearly the same thing
before seeing his patch on the list, oops.
Reviewed-by: Bas Nieuwenhuizen
Signed-off-by: Dave Airlie <airlied@redhat.com>
Most of the newer portions of the code use OUT_BATCH style. I prefer
this style because it offers a clear distinction between a) hardware
messages/structures with a mandatory format, and b) data structures for
our own internal use that we can format however we want.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Since we never enable the GS on Sandybridge, there's no need to allocate
it any URB space.
Furthermore, the previous calculation was incorrect: it neglected to
multiply by nr_vs_entries, instead comparing whether twice the size of
a single VS URB entry was bigger than the entire URB space. It also
neglected to take into account that vs_size is in units of 128 byte
blocks, while urb_size is in bytes.
Despite the above problems, the calculations resulted in an acceptable
programming of the URB in most cases, at least on GT2.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
The expression
x = y, 5, 3;
will generate
0:7(9): warning: left-hand operand of comma expression has no effect
The warning is only emitted for the left-hand operands, becuase the
right-most operand is the result of the expression. This could be
used in an assignment, etc.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This reverts what remains of commit
28bab24e16. It was garbage, trying to
use a MESA_FORMAT enum as a preprocessor token, and I don't know how I
thought it was even tested.
Reviewed-by: Brian Paul <brianp@vmware.com>
The GL_RED and GL_RG were tricking this code into executing, but it's
totally unprepared for a 16-bit channel and just rescaled the values
down to 0. We don't have anything with <8bit channels alongside >8bit
channels, so disabling it should be safe.
Reviewed-by: Brian Paul <brianp@vmware.com>
This will replace the current (broken by trying to use an enum in the
preprocessor) spantmp2.h support I wrote for the intel driver.
Reviewed-by: Brian Paul <brianp@vmware.com>
Since we're using GTT mappings now (no manual detiling), there's
really nothing special to accessing these buffers, other than needing
the new RowStride field of gl_renderbuffer to accomodate padding.
Reduces the driver size by 2.7kb, and improves glean depthStencil
performance 3-10x (!)
Reviewed-by: Brian Paul <brianp@vmware.com>
This will allow some drivers to reuse the core renderbuffer.c get/put
row functions in place of using the spantmp.h macros. Note that
unlike textures, we use a signed integer here to allow for handling
FBO orientation.
Reviewed-by: Brian Paul <brianp@vmware.com>
Everything appears to already be in place for this. Fixes aborts in:
ARB_texture_rg/fbo-alphatest-formats-float
ARB_texture_rg/fbo-blending-formats-float.
Reviewed-by: Brian Paul <brianp@vmware.com>
The _mesa_base_fbo_format variant doesn't handle some texture
internalformats, such as "3".
Fixes:
fbo-blending-formats.
fbo-alphatest-formats
EXT_texture_sRGB/fbo-alphatest-formats
Reviewed-by: Brian Paul <brianp@vmware.com>
The very presence of this extension breaks things.
This should bring us closer to being able to run Unigine Heaven.
The extension will be re-enabled once gl_InstanceID is implemented.
This was copy-and-paste from originally trying to get DP read/write
working reliably, and notably for other common messages (URB, sampler)
we weren't doing this.
Most of this is code movement to get the scratch space allocated in a
shared location. Other than that, the only real changes are that the
old oword block messages now operate on oword-aligned areas (with new
messages for unaligned access, which we don't do), and that the
caching control is in the SFID part of the descriptor instead of
message control.
Fixes glsl-fs-convolution-1.
It was accepting only GL_DUDV_ATI and not the specific sized format
GL_DU8DV8_ATI. Fixes assertion failure at startup in Shadowgrounds.
Reviewed-by: Brian Paul <brianp@vmware.com>
The 095-recursive-define test case was triggering infinite recursion
with the following test case:
#define A(a, b) B(a, b)
#define C A(0, C)
C
Here's what was happening:
1. "C" was pushed onto the active list to expand the C node
2. While expanding the "0" argument, the active list would be
emptied by the code at the end of _glcpp_parser_expand_token_list
3. When expanding the "C" argument, the active list was now empty,
so lather, rinse, repeat.
We fix this by adjusting the final popping at the end of
_glcpp_parser_expand_token_list to never pop more nodes then this
particular invocation had pushed itself. This is as simple as saving
the original state of the active list, and then interrupting the
popping when we reach this same state.
With this fix, all of the glcpp-test tests now pass.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32835
Signed-off-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
Without it gcc complains:
nv50_screen.c: In function ‘nv50_screen_is_format_supported’:
nv50_screen.c:48: warning: implicit declaration of function ‘util_format_is_supported’
and handles it wrongly - util_format_is_supported returns boolean, which is typedef'ed
to uchar, but function without prototype is assumed to return int.
For me nv50_screen_is_format_supported was returning true for float formats without
--enable-texture-float...
Sounds very unlikely, but I don't have a better explanation at the
moment.
The GPU throws page faults at the first page after the code buffer
quite frequently on startup, and traces don't show us overflowing.
This pass coverts CMP T0, T1 T2 T0 -> MOV T0, T2 when the CMP
instruction is the first instruction to write to register T0.
This pass is useful for hardware that requires a lot of lowering passes
that generate many CMP instructions.
So --enable-texture-float it is.
Hardware drivers (including the Gallium ones) should
use #ifdef TEXTURE_FLOAT_ENABLED to hide any code that may
expose floating-point renderbuffers via any interface,
public or private.
v2: Print a warning when using --enable-texture-float.
Squashed commit of the following:
Author: Marek Olšák <maraeo@gmail.com>
mesa: handle floating-point formats in _mesa_base_fbo_format
mesa: add ARB/ATI_texture_float, remove MESAX_texture_float
commit 123bb110852739dffadcc81ad80b005b1c4f586d
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Wed Aug 25 01:35:42 2010 +0200
mesa: compute floatMode for FBOs and return it on RGBA_FLOAT_MODE
It's clear enough that the current segmentation fault isn't what we
want. And it's also very easy to know what we do want here, (just
check with any functional C preprocessor such as "gcc -E").
Add the desired output as an expected file so that the test suite
gives useful output, (showing the omitted output and the segfault),
rather than just reporting "No such file" for the expected file.
These were all written as generic list functions, (accepting and returning
a list to act upon). But they were only ever used with parser->active as
the list. By simply accepting the parser itself, these functions can update
parser->active and now return nothing at all. This makes the code a bit
more compact.
And hopefully the code is no less readable since the functions are also
now renamed to have "_parser_active" in the name for better correlation
with nearby tests of the parser->active field.
The common case for this test suite is to quickly test that everything
returns the correct results. In this case, the second run of the test
suite under valgrind was just annoying, (and the user would often
interrupt it).
Now, do what is wanted in the common case by default (just run the
test suite), and require a run with "glcpp-test --valgrind" in order
to test with valgrind.
The expected file here captures the current behavior of glcpp (which
is to generate an obscure "syntax error, unexpected $end" diagnostic
for this case).
It would certainly be better for glcpp to generate a nicer diagnostic,
(such as "missing closing parenthesis in function-like macro
definition" or so), but the current behavior is at least correct, and
expected. So we can make the test suite more useful by marking the
current behavior as expected.
The expected file here captures the current behavior of glcpp (which
is to generate a division-by-zero error) for this case.
It's easy to argue that it should be short-circuiting the evaluation
and not generating the diagnostic (which happens to be what gcc does).
But it doesn't seem like we should force this behavior on our
pre-processor, (and, as always, the GLSL specification of the
pre-processor is too vague on this point).
This test is behaving just fine already---it's generating an informative
diagnostic, ("error: division by 0 in preprocessor directive"), so adding
this in the expected file makes things pass.
We could actually try to do an early return both for gallium textures and
malloc memory textures, but I'm not sure exactly which situations
stImage->pt is NULL, and whether texImage->Data == NULL would be acceptible
or not.
Reviewed-by: Brian Paul <brianp@vmware.com>
This is the same as ARB_draw_buffers (which derived from it), except
for s/ARB/ATI/. The glapi bits were already in place, and what was
missing was just the ARB_fp part. The new Humble Bundle game "trine"
tries to use this extension without checking that it's exposed, which
this works around.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=36182
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is like what we do for add/mul, but we have to invert the
predicate to choose the other source instead.
This removes 5 extra moves of constants in nexuiz shaders. No
statistically significant performance difference on my Sandybridge
laptop (n=5).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We were letting any old operand through, which generally resulted in
assertion failures later.
Fixes array-logical-xor.vert.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We just do the AST-to-HIR processing, and only push the instructions
if needed in the constant false case.
Fixes glslparsertest/glsl2/logic-02.frag
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We just do the AST-to-HIR processing, and only push the instructions
if needed in the constant true case.
Fixes glslparsertest/glsl2/logic-01.frag
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
By always using a boolean, we should generally avoid further
complaints. The failure case I see is logic_not, where the user might
understandably make the mistake of using `!' on a boolean vector (like
a piglit case did recently!), and then get a further complaint that
the new boolean type doesn't match the bvec it gets assigned to.
Fixes invalid-logic-not-06.vert (assertion failure when the bad type
ends up in an expression and ir_constant_expression gets angry).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33314
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
A few GLES2 tests tripped over this when using array dereferences to
hit channels on the LHS (see piglit test
glsl-copy-propagation-vector-indexing). We wouldn't find the
ir_dereference_variable, and assume that that meant that it wasn't an
assignment to a scalar/vector, and thus not notice that the variable
had been changed.
Release the old depth region and reference the new one *only* if it has
changed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@intel.com>
Prior to Gen6, we use the GS for breaking down quads, quad-strips,
and line loops. On Gen6, earlier stages already take care of this,
so we never need the GS.
Since this code is likely completely untested, remove it for now.
We can write new code when enabling real geometry shaders.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This reverts commit b4cbd2b312.
It looked like a safe sanity check. It missed the issue of the start of
the buffer not being at 0, but even that was not enough to explain why
setting the max vertex index caused glyphs to be dropped from the game
'Achron'.
Instead, the issue appears to be related to the use of the vertex bias
and so we would need to re-emit the max-index every time we adjusted the
bias, so re-emitting the relocations and defeating the original
optimisation.
Reported-and-tested-by: Thomas Jones <thomas.jones@utoronto.ca>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35163
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
When the window is minimized GetClientRect will return zeros.
Instead of creating a 1x1 framebuffer, simply preserve the current window
size, until the window is restored or maximized again.
The earlier change to ensure rendertargets and textures are always
rebound at every command buffer start had the downside of making
successive flushes no longer no-ops, as a command buffer with merely
the rebinding commands were being unnecessarily sent to the vGPU.
This change only re-emits the bindings when necessary, by keeping track
of the need to rebind outside of the dirty state update mechanism.
According to https://bugs.freedesktop.org/show_bug.cgi?id=34280
commit 5d1387b2da causes the font corruption
problems people have been seeing under various apps and gnome-shell on r200
cards.
This commit changed (loosened) the check for using the memcpy path in the
former al88 / al1616 texstore functions, which are now also used to
store rg texures. This patch restores the old strict check in case of
al textures. I've no idea why this fixes things, since I don't know the
code in question at all. But after seeing the bisect in bfdo34280 point
to this commit, I gave this fix a try and it fixes the font issues seen on
r200 cards.
[airlied:
r200 has no native working A8, so it does an internal storage format of AL88
however srcFormat == internalFormat == ALPHA when we get to this point,
so it copies, but it wants to store into an AL88 not ALPHA so fails,
I'll also push a piglit test for this on r200].
Many thanks to Nicolas Kaiser who did all the hard work of tracking this down!
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously the macro would (ALIGN(value - alignment - 1, alignment)).
At the very least, this was missing parenthesis around "alignment -
1". As a result, if value was already aligned, it would be reduced by
alignment. Condisder:
x = ROUND_DOWN_TO(256, 128);
This becomes:
x = ALIGN(256 - 128 - 1, 128);
Or:
x = ALIGN(127, 128);
Which becomes:
x = 128;
This macro is currently only used in brw_state_batch
(brw_state_batch.c). It looks like the original version of this macro
would just use too much space in the batch buffer. It's possible, but
not at all clear to me from the code, that the original behavior is
actually desired.
In any case, this patch does not cause any piglit regressions on my
Ironlake system.
I also think that ALIGN_FLOOR would be a better name for this macro,
but ROUND_DOWN_TO matches rounddown in the Linux kernel.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Keith Whitwell <keithw@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Also make the GL_ARB_draw_instanced block follow the same pattern as
the other blocks.
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a 49.6% +/- 2.0% (n=9, IPS outlier removed) performance
improvement for the hacked-up-for-cache-misses scissor-many, and no
statistically significant performance difference for the
hacked-up-for-cache-hits version (n=9, IPS outlier removed). No
statistically significant performance difference from ETQW (n=5) from
these last two commits.
This is a 28.1% +/- 1.4% (n=10) performance improvement for the
hacked-up-for-cache-misses scissor-many (n=10), and no statistically
significant wall-time performance difference for the
hacked-up-for-cache-hits version (n=9, first outlier in each removed
since IPS was warming up. User time increased by about 4.7%, but
kernel time decreased equivalently).
Build the new sources, plug the new functions into the dispatch table,
implement display list support. And enable extension in the gallium
state tracker.
gl_texture_object contains an instance of this type for the regular
texture object sampling state. glGenSamplers() generates new instances
of gl_sampler_object which can override that state with glBindSampler().
Fixes issues in e.g. nexuiz (desertfactory) or supertuxkart that
look like lighting bugs.
They're not visible with the software rasterizers because their
notion of linear interpolation seems to be different from that
of nv50/nvc0.
This reverts commit 437c748bf5.
The commit is wrong for several reasons. One of them is when we grab
a new buffer, we should update all the states it is bound in,
including all parallel contexts. I don't think this is even doable.
The correct solution would be upload data via a temporary buffer and
do resource_copy_region to the original one.
https://bugs.freedesktop.org/show_bug.cgi?id=36088
Add Cygwin platform-specific settings and drivers to build for dri driver:
- by default, disable direct rendering.
- if direct rendering is enabled, the swrast dridriver is the only one it's
sensible to try to build (this doesn't work at the moment as additional patches
are required to build a libGL which can load just swrast without the DRM headers,
even though there's no actual functional dependency)
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Julien Cristau <jcristau@debian.org>
Fix build when configured --with-driver=dri --disable-driglx-direct on targets
without drm e.g. GNU/Hurd and Cygwin
Based on the Debian patch file '05_hurd-ftbfs.diff' by Samuel Thibault.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Julien Cristau <jcristau@debian.org>
Reviewed-By: Jakob Bornecrantz <wallbraker@gmail.com>
The theory here was to detect a temporary variable used within a loop,
and avoid considering it live across the entire loop. However, it was
overeager and failed when the first definition of the variable
appeared within the loop but was only conditionally defined.
Fixes glsl-fs-loop-redundant-condition.
Otherwise min_lod can potentially be larger than the clamped max_lod. The
code that follows will swap min_lod and max_lod in that case, resulting in a
max_lod larger than MAX_LEVEL.
Signed-off-by: Brian Paul <brianp@vmware.com>
Base level and min LOD aren't equivalent. In particular, min LOD has no
effect on image array selection for magnification and non-mipmapped
minification.
Signed-off-by: Brian Paul <brianp@vmware.com>
Although for GL a zero stride means tightly packed elements, Mesa
internally uses zero strides for constant arrays.
Therefore user buffers need to be defined from
buffer_offset + src_offset + min_index*stride
to
buffer_offset + src_offset + max_index*stride + elem_size
Simplifying the later with (max_index + 1)*stride will give zero
sized buffers.
This change also aggregates the st_context's info about user buffers
into a single array.
We adjust 'end' to fit into _MaxElement, but that may result into a 'start'
value bigger than 'end' being passed downstream, causing havoc.
This could be seen with arb_robustness_draw-vbo-bounds, due to an
application bug.
Because:
- bindings are not fully automatic, and they are broken most of the time
- unit tests/samples can be written in C on top of graw
- tracing/retracing is more useful at API levels with stable ABIs such as
GL, producing traces that cover more layers of the driver stack and and
can be used for regression testing
There's really no need for a prefix on member functions, and overloading
takes care of the _op1/_op2 distinction quite nicely. Eric already made
a similar change in the i965 FS backend.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Rather than ir_to_mesa_dst_reg_from_src and ir_to_mesa_src_reg_from_dst.
The new constructors are marked 'explicit' so that the compiler can
catch cases where source and destination registers were accidentally
interchanged.
This also necessitated using constructors to initialize the undef and
address registers, as well as adding a default constructor.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Both classes are completely private to ir_to_mesa.cpp, so there won't be
any name conflicts with other parts of Mesa. The prefix simply makes it
harder to read.
Also, use a class rather than typedef structs.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is in preparation from removing the "ir_to_mesa_" prefix on the
src_reg and dst_reg types, which would cause a naming conflict.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The code would previously handle the projection, then swizzle the
shadow comparitor into place. However, when the projection is done
"by hand," as in the TXB case, the unprojected shadow comparitor would
over-write the projected shadow comparitor.
Shadow comparison with projection and LOD is an extremely rare case in
real application code, so it shouldn't matter that we don't handle
that case with the greatest efficiency.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=32395
This matches the behaviour below when numSamples is compared.
At least with the gallium state tracker this can actually occur if st_render_texture fails.
Signed-off-by: Brian Paul <brianp@vmware.com>
This is always the way with real hardware and desktop OpenGL. Some
hardware can't do some formats natively. The alpha-only, luminance,
and intensity formats are usually the most problematic. Some sized
formats can also be problematic. This patch provides fall-back
formats for those that are not natively supported.
At some point it would be interesting to try providing
device-independent conversions using EXT_texture_swizzle. The drivers
that support EXT_texture_swizzle could, for example, see
GL_LUMINANCE16_SNORM as MESA_FORMAT_SIGNED_R16 with a { r, r, r, 1 }
swizzle. Care would need to be taken to prevent issues with using
those textures for FBO rendering.
This is the rest of the fix for glean's pixelFormats test on i965.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This class of hardware can natively sample all of the snorm surface
formats that DX10 requires, but it can't do some of the legacy GL
formats. In particular, all of the alpha, luminance, and intensity
formats are unsupported.
This partially fixes the breakage in glean's pixelFormats test since
GL_EXT_texture_snorm support was added to Mesa.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We use this format to represent the accum buffer. No snorm texture
sampling or rendering takes place.
Fixes failed assertion with swrast and any app using the accum buffer
(and glxinfo).
Piglit tests:
- glsl-fs-shadow2d-01
- glsl-fs-shadow2d-02
- glsl-fs-shadow2d-03
- fs-shadow2d-red-01
- fs-shadow2d-red-02
- fs-shadow2d-red-03
NOTE: This is a candidate for the stable branches.
Various documentation mentions that "W" is handed to the WM stage,
but further digging seems to indicate that they really mean 1/W.
The code here is still unclear, but changing this fixes piglit
test "fragcoord_w" on Sandybridge as well as a Khronos ES2 conformance
test. I also tested 3DMarkMobile ES2.0's taiji and hoverjet demos, as
well as Nexuiz, just to be safe.
NOTE: This is a candidate for the 7.10 branch.
Fixes regressions caused by commit 9a21bc6401, namely GPU hangs when
running gnome-shell or compiz (Mesa bugs #35820 and #35853).
I incorrectly refactored the case that dealt with ARF_NULL; even in that
case, the source register needs to be changed to the MRF.
NOTE: This is a candidate for the 7.10 branch (if 9a21bc6401 is
cherry-picked, take this one too).
Branch emulation and loop unrolling are done in the GLSL frontend.
Transforming loops is no longer needed for fragment shaders, but it is still
necessary for vertex shaders.
Registers that are used inside of loops need to be considered live
starting with the first instruction of the outermost loop.
https://bugs.freedesktop.org/show_bug.cgi?id=34370
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Reviewed-by: Eric Anholt <eric@anholt.net>
Oops, the mask was being used in the loop to determine whether to use
include the stencil || depth values. This began to fail when mask was
cleared at the beginning of the loop. So reorder the tests and do the
work up-front along with determining the depth_stencil value to use.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35822
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(some little changes by Marek Olšák)
Squashed commit of the following:
commit 737c0c6b7d591ac0fc969a7590e1691eeef0ce5e
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Fri Aug 27 02:13:57 2010 +0200
draw: disable SSE and PPC paths (use LLVM instead)
These paths don't support vertex clamping, and are anyway
obsoleted by LLVM.
If you want to re-enable them, add vertex clamping and test that it
works with the ARB_color_buffer_float piglit tests.
commit fed3486a7ca0683b403913604a26ee49a3ef48c7
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:27:38 2010 +0200
draw_llvm: respect vertex color clamp
commit ef0efe9f3d1d0f9b40ebab78940491d2154277a9
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:26:43 2010 +0200
draw: respect vertex clamping in interpreter path
Macro can lead to hard to debug list bugs. For instance consider
the following :
LIST_ADD(item, list->prev)
3 instruction of the macro became :
(list->prev)->next->prev = item
which is equivalent to :
list->prev = item
Thus list prev field changes and next instruction in the macro
(list->prev)->next = item
became :
item->next = item
And you endup with list corruption, other case lead to similar
list corruption. Inline function are not affected by this short
coming
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
When the condition
min_index == 0 && sizeof(ib[0]) == sizeof(draw_elts[0])
was true, we were wrongly ignoring istart and processing indices 0.
Reorder some statements to make the code easier to understand.
Now that we purposefully generate delta that point outside of the target
buffer, the assertion has outlived its usefulness.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Once more! This time without the unwarranted conversion from
drm_intel_bo_alloc_tiled.
Signed-off-by: [a very embarrassed] Chris Wilson <chris@chris-wilson.co.uk>
v2: Allocate the fences from a single shared buffer object.
v3: Allocate the r600_fence structs in blocks of 16.
Spin a few times before calling sched_yield in r600_fence_finish().
This should be the last bit of infrastructure changes before
generating GLSL IR for assembly shaders.
This commit leaves some odd code formatting in ir_to_mesa and brw_fs.
This was done to minimize whitespace changes / reindentation in some
loops. The following commit will restore formatting sanity.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This array is going to be used in the main compiler soon. Leaving
them uniforms.c caused problems for building the stand-alone compiler.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
The hardware should be set according to this table:
FORMAT -> R300 COLORFORMAT
-------------------------
X16 -> UV88
X16Y16 -> ARGB8888
X32 -> ARGB8888
X32Y32 -> ARGB16161616
US_OUT_FMT must contain the real format.
I wasn't able to make B3G3R2 and L4A4 work, but those aren't important.
The vertex color clamp control is a property of an API,
a lot like gl_rasterization_rules.
The state should be set according to the API being implemented, for example:
OpenGL Compatibility: enabled by default
OpenGL Core: disabled by default
D3D11: always disabled
This patch also changes the way ARB_color_buffer_float is advertised.
If no SNORM or FLOAT render target is supported, fragment color clamping
is not required.
BTW this changes the gallium interface.
Some rather cosmetic changes by Marek.
Squashed commit of the following:
commit 513b37d484f0318311e84bb86ed4c93cdff71f13
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:17:54 2010 +0200
mesa/st: respect fragment clamping in st_DrawPixels
commit 546a31e42cad459d7a7a10ebf77fc5ffcf89e9b8
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:17:28 2010 +0200
mesa/st: support fragment and vertex color clamping
commit c406514a1fbee6891da4cf9ac3eebe4e4407ec13
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Tue Aug 24 21:56:37 2010 +0200
mesa/st: expose ARB_color_buffer_float if unclamping is supported
commit d0c5ea11b6f75f3da2f4ca989115f150ebc7cf8d
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 17:53:41 2010 +0200
mesa/st: use unclamped colors
This assumes that Gallium is to be interpreted as given drivers the
responsibility to clamp these colors if necessary.
commit aef5c3c6be6edd076e955e37c80905bc447f8a82
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:12:34 2010 +0200
mesa, mesa/st: handle read color clamping properly
We set IMAGE_CLAMP_BIT in the caller based on _ClampReadColor, where
the operation mandates it. (see the removed XXX comment. -Marek)
TODO: did I get the set of operations mandating it right?
commit 76bdfcfe3ff4145a1818e6cb6e227b730a5f12d8
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:18:25 2010 +0200
gallium: add color clamping to the interface
Squashed commit of the following:
Author: Marek Olšák <maraeo@gmail.com>
mesa: fix getteximage so that it doesn't clamp values
mesa: update the compute_version function
mesa: add display list support for ARB_color_buffer_float
mesa: fix glGet query with GL_ALPHA_TEST_REF and ARB_color_buffer_float
commit b2f6ddf907935b2594d2831ddab38cf57a1729ce
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Tue Aug 31 16:50:57 2010 +0200
mesa: document known possible deviations from ARB_color_buffer_float
commit 5458935be800c1b19d1c9d1569dc4fa30a97e8b8
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Tue Aug 24 21:54:56 2010 +0200
mesa: expose GL_ARB_color_buffer_float
commit aef5c3c6be6edd076e955e37c80905bc447f8a82
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:12:34 2010 +0200
mesa, mesa/st: handle read color clamping properly
(I'll squash the st/mesa part to a separate commit. -Marek)
We set IMAGE_CLAMP_BIT in the caller based on _ClampReadColor, where
the operation mandates it.
TODO: did I get the set of operations mandating it right?
commit 3a9cb5e59b676b6148c50907ce6eef5441677e36
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:09:41 2010 +0200
mesa: respect color clamping in texenv programs (v2)
Changes in v2:
- Fix attributes other than vertex color sometimes getting clamped
commit de26f9e47e886e176aab6e5a2c3d4481efb64362
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:05:53 2010 +0200
mesa: restore color clamps on glPopAttrib
commit a55ac3c300c189616627c05d924c40a8b55bfafa
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 18:04:26 2010 +0200
mesa: clamp color queries if and only if fragment clamping is enabled
commit 9940a3e31c2fb76cc3d28b15ea78dde369825107
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Wed Aug 25 00:00:16 2010 +0200
mesa: introduce derived _ClampXxxColor state resolving FIXED_ONLY
To do this, we make ClampColor call FLUSH_VERTICES with the appropriate
_NEW flag.
We introduce _NEW_FRAG_CLAMP since fragment clamping has wide-ranging
effects, despite being in the Color attrib group.
This may be easily changed by s/_NEW_FRAG_CLAMP/_NEW_COLOR/g
commit 6244c446e3beed5473b4e811d10787e4019f59d6
Author: Luca Barbieri <luca@luca-barbieri.com>
Date: Thu Aug 26 17:58:24 2010 +0200
mesa: add unclamped color parameters
Fixes piglit test glsl-vs-arrays-3 on Sandybridge, as well as garbage
rendering in 3DMarkMobileES 2.0's Taiji demo and GLBenchmark 2.0's
Egypt and PRO demos.
NOTE: This a candidate for stable release branches. It depends on
commit 9a21bc6401.
This was open-coded in three different places, and more are necessary.
Extract this into a function so it can be reused.
Unfortunately, not all variations were the same: in particular, one set
compression control and checked that the source register was not
ARF_NULL. This seemed like a good idea, so all cases now do so.
Eventually the miptree refcounting interface should be cleaned up.
The assymmetry dramatically increases the probability of bugs like
this. It should be made to like like libdrm refcounting or the
refcounting style used in other parts of Mesa.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=33046
Reviewed-by: Eric Anholt <eric@anholt.net>
Specifically, this ensures things like the front buffer actually exist. This
fixes piglt fbo/fbo-sys-blit and fd.o bug 35483.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
GLSL 1.30 states clearly that only float and int are allowed, while the
GLSL ES specification's issues section states that sampler types may
take precision qualifiers.
Fixes compilation failures in 3DMarkMobileES 2.0 and GLBenchmark 2.0.
NOTE: This is a candidate for stable release branches.
Civilization 4's shaders make heavy use of gl_Color and don't use
perspective interpolation. This resulted in rivers, units, trees, and
so on being rendered almost entirely white. This is a regression
compared to the old fragment shader backend.
Found by inspection (comparing the old and new FS backend code).
References: https://bugs.freedesktop.org/show_bug.cgi?id=32949
NOTE: This is a candidate for the 7.10 branch.
Since GLSL IR allows multiple ir_variables to share the same name, we
need to generate unique names when printing the IR. Previously, we
always used %s@%p, appending the ir_variable's memory address.
While this worked, it had two drawbacks:
- When there aren't duplicates, the extra "@0x669a3e88" is useless
and makes the code harder to read.
- Real duplicates were hard to tell apart:
channel_expressions@0x6699e3c8 vs. channel_expressions@0x6699ddd8
We now append @2, @3, @4, and so on, but only where necessary to
distinguish duplicates. Since we only do this at print time, any
performance impact is irrelevant.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
While making some other changes in this area I was finding it annoying
each of these functions took mostly the same set of parameters in
differing orders.
'i' is already used for the outer loop. This caused some problems
while doing other work in this area. No bug exists here... until you
want to use the outer loop counter in the inner loop.
We were using typecasts because the functions pointers are polymorphic
in the second argument type, which.
Surprisingly the wrong calling convention didn't cause crashes on Windows,
but it was causing certain registers to be trashed in MSVC optimized
builds, when processing callists in the ClearView RC Flight Simulator.
I think the code ends up a lot more legible this way, though we've
still got the overloads in the fs_inst as well (even though there's
only one caller left currently).
If set to year X, only report extensions up to that year. This is a
work-around for games that try to copy the extensions string to a fixed
size buffer and overflow. If a game was released in year X, setting
MESA_EXTENSION_MAX_YEAR to that year will likely fix the problem.
We now use a 4-bit writemask for all instruction types, which makes it
easier to write generic helper functions to manipulte writemasks.
NOTE: This is a candidate for the 7.10 branch.
The code no longer supports otherwise -- it relies on buffers being
uploaded via u_upload_mgr -- so make this clear.
Also, there's no need to flush after draws from user buffers, given all
user content should have been copied by then.
Unsigned long is 32bit on several platforms (e.g., Windows), yielding
1UL << 32 to be zero.
Note that BITFIELD64_BIT result is often assigned to variables of type
GLbitfield, instead of GLbitfield64. That's probably wrong and should be
addressed in a later change.
It should have been a tip when the spec says "However, implicitly
sized arrays cannot be assigned to. Note, this is a rare case that
*initializers and assignments appear to have different semantics*."
(empahsis mine)
Fixes bugzilla #34367.
NOTE: This is a candidate for stable release branches.
Early Z support is set in the DST_VARS command. Hence split up static
state emission to avoid reissuing to much on fragment shader changes,
especially the costly dst buffer relocations.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The optimization loop won't reinsert noise instructions or quadop
vectors, so we were traversing the tree for nothing. Lowering vector
indexing was in the loop after do_common_optimization() to avoid the
work if it ended up that the index was actually constant, but that has
been called already in the core.
Most of the time we don't have a non-uniform struct variable in the
shader, so this cuts the time spent in do_structure_splitting during
glean texCombine by about 2/3.
This fixes an issue where the .obj files wound up in the src/
directory rather than the build/ directory. That prevented
combined 32-bit and 64-bit builds from working.
Signed-off-by: Brian Paul <brianp@vmware.com>
Additionally, to discarding the whole buffer, use
PIPE_TRANSFER_DISCARD_RANGE in pipe_buffer_write when the
write covers only part of the buffer.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
In memory mapping buffer objects make use of
PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE and PIPE_TRANSFER_DISCARD_RANGE
when appropriate.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
It doesn't support them. Also, we shouldn't be
emitting CB_BLENDx_CONTROL on R600 as the regs don't
exist there, but I'm not sure of the best way to deal
with this in the current r600 winsys.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
This packet is required when updating the DB, CB,
or STRMOUT base addresses on rv6xx for the surface
sync logic to work correctly.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
This sort of worked because blend state setup cleared MULTIWRITE_ENABLE again,
but that's not something we want to depend on.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Disable Z_EXPORT / STENCIL_EXPORT / KILL_ENABLE again if a shader doesn't
use those. This is similar to 0a6f09a76a.
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
The idea behind this is that anything touching registers should be in
r600_state.c or evergreen_state.c. This is also consistent with
evergreen_pipe_shader_vs().
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
We need more and more of these, and it is difficult and prone to version
incompatability issues trying to single out every one of them.
This mimicks what was done in SCons.
The assignment on line 368, `tex_swizzles[i] = SWIZZLE_NOOP`, is rendered
dead by the reassignment on line 392.
Signed-off-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is necessary for GLSL 1.30+ shadow sampling functions, which return
a single float rather than splatting the value to a vec4 based on
GL_DEPTH_TEXTURE_MODE.
This was probably missed when implementing luminance and luminance alpha
render targets.
_mesa_get_format_bits checks for both GL_*_BITS and GL_TEXTURE_*_SIZE.
This fixes:
main/framebuffer.c:892: _mesa_source_buffer_exists: Assertion `....' failed.
The r300 compiler can eliminate unused uniforms and remap uniform locations
if their number surpasses hardware limits, so the limit is actually
NumParameters + NumUnusedParameters. This is important for some apps
under Wine to run.
Wine sometimes declares a uniform array of 256 vec4's and some Wine-specific
constants on top of that, so in total there is more uniforms than r300 can
handle. This was the main motivation for implementing the elimination
of unused constants.
We should allow drivers to implement fail & recovery paths where it makes
sense, so giving up too early especially when comes to uniforms is not
so good idea, though I agree there should be some hard limit for all drivers.
This patch fixes:
- glsl-fs-uniform-array-5
- glsl-vs-large-uniform-array
on drivers which can eliminate unused uniforms.
Avoid setting the same gpu register several times in a r600_pipe_state.
Compute the final value of the register and set that one time. This avoids
some overhead in r600_context_pipe_state_set().
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Well, not sure what exactly it is, but it certainly doesn't contain
the control flow stack, but vertex data.
Not sure about size, I've only seen the first few KiB written, but
the binary driver seems to allocate more.
Depth values are also written before the shader is executed, so if
early tests are enabled, fragments that failed the alpha test were
modifying the depth buffer, but they shouldn't.
The kernel drm takes care of all coherency as long as we don't forget
to submit all outstanding commands in the batchbuffer ...
Also move batchbuffer initialization up because otherwise transfers
for some helper textures fail with a segmentation fault.
And kill the dead code, flushes should now be correct everywhere.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The docs say it can be set for direct texture lookups, but even that
causes problems.
This fixes the wireframe bug:
https://bugs.freedesktop.org/show_bug.cgi?id=32688
NOTE: This is a candidate for the 7.9 and 7.10 branches.
(1, -_, ...) was converted to (-1, ...) because of the negation
in the second component.
Masking out the unused bits fixes this.
Piglit:
- glsl-fs-texture2d-branching
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This gets one more piece of the pipeline onto the new codegen backend.
Once ARB_fragment_program can generate GLSL programs, we can nuke the
old backend.
This is like how we track FragmentProgram._Current for the computed
ARB fragment program for fixed function texenv, but this gives direct
access to the gl_shader_program for drivers to codegen from, skipping
ARB_fp.
This is a step towards providing a direct route for drivers accepting
GLSL IR for codegen. Perhaps more importantly, it runs the fixed
function fragment program through the GLSL IR optimization. Having
seen how easy it is to make ugly fixed function texenv code that can
do unnecessary work, this may improve real applicatinos.
It would be nice if we handled optimized uniform math like this in
some generic way, since people often end up doing uniform expressions
in shaders, but for now keep this hard-coded like it was in the
texenvprogram code.
For fixed function fragment processing in GLSL IR, we want to be able
to reference this state value. gl_* not explicitly permitted is
reserved, so using this variable name internally shouldn't be any
issue.
This is used in the upcoming fixed function shader_program generation,
and shader_program and ARB programs are together in this code until
both fragment and vertex ff get converted.
The drivers have been changed so that they behave as if all of the flags
were set. This is already implicit in most hardware drivers and required
for multiple contexts.
Some state trackers were also abusing the PIPE_FLUSH_RENDER_CACHE flag
to decide whether flush_frontbuffer should be called.
New flag ST_FLUSH_FRONT has been added to st_api.h as a replacement.
Since we're compiling/linking GLSL shaders we should check against
the shader uniform limits, not the legacy vertex/fragment program
parameter limits which are usually lower.
The gl_program_constants struct is for limits that are applicable to
any/all shader stages. Move the geometry shader-only fields into the
gl_constants struct.
Remove redundant MaxGeometryUniformComponents field too.
Need to flush rendering (or at least indicate that the rug might be getting
pulled out from underneath us) when a shader, buffer object or query object
is about to be deleted.
Also, this helps to tell the VBO module to unmap its current vertex buffer.
Benefits:
- spares us a relocation.
- needed for zone rendering (if that ever happens).
- just awesome.
v2: Rename the debug option. Completely disabling the blitter is
required for Y tiling to work, so this option will cover other
code paths in the future.
v3: Implement suggestions by Jakob Bornecrantz.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Flushing the batch/hw backend doesn't invalidate the derived state.
So kill the unnecessary function calls and add an assert in
emit_hardware_state for paranoia.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With the new clear code this is possible (if the app call glClear
before drawing the first primitive).
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The polygon stipple fallback does not have to be implemented in the
draw module (it doesn't need window coords, etc). Drivers can use
this utility and avoid sw vertex fallbacks if pstipple is enabled.
Note: this is WIP and not used by any driver yet.
The gc var would be NULL if called from line 238. Instead, get
the opcode from __glXSetupForCommand(dpy) as done in other places.
And remove the unused gc parameter.
Fixes a bug reported by "John Doe" on 3/9/2011.
NOTE: This is a candidate for the 7.10 branch.
This fd gets passed in from outside, closing it causes the X.org server
to crap out when the driver doesn't identify the chipset.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Improves performance of a hacked-up scissor-many (to reuse a small set
of scissors instead of blowing out the cache, and then to run 100x
more iterations so it actually took some time) by 3.6% +/- 1.2% (n=10)
Add linear probing on collisions.
Expand entry array by a fixed scale (currently 2) to help avoid
collisions.
Use a LRU approach to ensure that the number of entries stored in the
cache doesn't exceed the requested size.
Do that later on when we set up the hwtnl state instead.
This addresses a problem when we drop the uploaded copy due to a vb
size change, it will remain referenced in svga->curr.vb[], and the
new contents of the vb will never be uploaded.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
The encoding/decoding algorithms are shared with RGTC.
Thanks to some magic with the base format, the RGTC texstore functions work
for LATC too.
swrast passes the related piglit tests besides two things:
- The alpha channel is wrong (it's always 1), however the incorrect alpha
channel makes some other tests fail too, so I guess it's unrelated to LATC.
- Signed LATC fetches aren't correct yet (signed values are clamped to [0,1]),
however RGTC has the same problem.
Further testing (with other of my patches) shows that hardware drivers
and softpipe work.
BTW, ETQW uses this extension.
st->user_vb[attr] was always pointing to the same user vb, regardless
of the value of attr. Together with reverting the temporary workaround
for bug 34378, and a fix in the svga driver, this fixes googleearth on svga.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
The signature list in a function must contain only ir_function_signature nodes.
The target of an ir_call must be an ir_function_signature.
These were added while trying to debug Mesa bugzilla #34203.
Instead of using the current gl_fragment_program. These aren't necessarily
the same, for example when translate_program() is called by
i915ValidateFragmentProgram().
this doesn't bind to drivers yet, just enough to in theory make indirect
work against other servers.
I'm really not sure what the rules for adding extensions to the known_gl_extensions list as it looks to be missing a few. are these GL extensions that have GLX
protocol??
Signed-off-by: Dave Airlie <airlied@redhat.com>
Comments about the deleted stuff:
- openaren hang: likely caused by the vertex corruptions, fixed by Jakob.
- tiling: Y-tiling works with my hw-clear branch. X-tiling works as
merged to master a while ago (execbuf2 version).
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
ARB_instanced_arrays is a subset of D3D9.
ARB_draw_instanced is a subset of D3D10.
The point of this change is to allow D3D9-level drivers to enable
ARB_instanced_arrays without ARB_draw_instanced.
If an array redeclaration includes an initializer, the initializer
would previously be dropped on the floor. Instead, directly apply the
initializer to the correct ir_variable instance and append the
generated instructions.
Fixes bugzilla #34374 and piglit tests glsl-{vs,fs}-array-redeclaration.
NOTE: This is a candidate for stable release branches. 0292ffb8 and
8e6cb9fe are also necessary.
Removed sampler view support for USCALED/SSCALED, the texture unit
refuses to convert to non-normalized float. The enums are treated
like UNORM.
Removed duplicate format related headers.
The hw doesn't like it - demos/shadowtex is broken. The emitted shader
isn't totally empty though, the depth write fixup gets emitted instead.
Maybe that one is somewhat fishy, too?
Idea for this patch from Jakob Bornecrantz.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is an awful hack and will hurt performance on Ironlake, but we're
at a loss as to what's going wrong otherwise. This is the only common
variable we've found that avoids the problem on 4 applications
(CelShading, gnome-shell, Pill Popper, and my GLSL demo), while other
variables we've tried appear to only be confounding. Neither the
specifications nor the hardware team have been able to provide any
enlightenment, despite much searching.
https://bugs.freedesktop.org/show_bug.cgi?id=29172
Tested by: Chris Lord <chris@linux.intel.com> (Pill Popper)
Tested by: Ryan Lortie <desrt@desrt.ca> (gnome-shell)
Because the format can be changed to UNORM in a surface.
This fixes:
state_tracker/st_atom_framebuffer.c:163:update_framebuffer_state:
Assertion `framebuffer->cbufs[i]->texture->bind & (1 << 1)' failed.
Computation of the delta of this array from the last had a silly little
bug and ignored any initial delta==0 causing grief in Nexuiz and
friends.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
There is a silicon bug which causes unpredictable behaviour if the
URB_FENCE command should cross a cache-line boundary. Pad before the
command to avoid such occurrences. As this command only applies to
gen4/5, do the fixup unconditionally as the specs do not actually state
for which chip it was fixed (and the cost is negligible)...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Previously, the rule deleted by this commit was matched every single
time (being the longest match). If not skipping, it used REJECT to
continue on to the actual correct rule.
The flex manual advises against using REJECT where possible, as it is
one of the most expensive lexer features. So using it on every match
seems undesirable. Perhaps more importantly, it made it necessary for
the #if directive rules to contain a look-ahead pattern to make them
as long as the (now deleted) "skip the whole line" rule.
This patch introduces an exclusive start state, SKIP, to avoid REJECTs.
Each time the lexer is called, the code at the top of the rules section
will run, implicitly switching the state to the correct one.
Fixes piglit tests 16384-consecutive-chars.frag and
16385-consecutive-chars.frag.
The expected result has been out of sync with what glcpp produces for
some time; glcpp's actual result seems to be correct and is very close to
GCC's cpp. Updating this will make it easier to catch regressions in
upcoming commits.
PIPE_BIND_CONSTANT_BUFFER alone was OK for nv50/nvc0, but nv30 will need
to be able to set others on certain chipsets.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This commit is basically a copy-over of the fix
Chia-I Wu's commited to wayland:
http://cgit.freedesktop.org/wayland/wayland-demos/commit/?id=1b6c0ed95
"Workaround an xcb-dri2 bug.
xcb_dri2_connect_device_name generated by xcb-proto 1.6 is broken.
It only works when the length of the driver name is a multiple of 4."
This reverts commit 1f9a0a4e6e.
This caused trouble with Lightsmark w/ i965 driver and fbo/fbo-blit-d24s8
(see bug 34894). It's probably something simple but no time to debug now.
SNB has 64k urb space, we only use piece of them.
The more urb space we alloc,
the more concurrent vs threads we can run.
push the urb space usage to the limit.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
this fixes fbo-generatemipmap-formats rgtc and s3tc in NPOT mode
with softpipe.
r600g fails to even get level 0 correct so have to look into that
a bit further.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This was always converting to 8-bit per channel unsigned formats,
which isn't suitable for RGTC signed formats, this special cases
those two formats and converts to floats for those.
Signed-off-by: Dave Airlie <airlied@redhat.com>
SNORM needs a bit of work in the state tracker in order for mipmap
generation to work I believe.
I'm also not sure that having unorm fetches for an snorm format is
sane.
With signed types we weren't hitting this test however the comment
stating this doesn't happen often doesn't apply when using signed
types since an all 0 block is quite common which isn't abs min or max.
this fixes the limits correctly again also.
Signed-off-by: Dave Airlie <airlied@redhat.com>
if the values are all in the last dword, the high bits can be 0,
This fixes a valgrind warning I saw when playing with mipmaps.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Drivers can call this function as needed. It tells the VBO module to
always unmap the current glBegin/glEnd VBO when we flush. Otherwise
it's possible to be in a flushed state but still have the VBO mapped.
Among other benefits, parallel makes now work. Since many people have
parallel builds by default (via MAKEFLAGS environment variable), this
sames some irritation at release time...when there's usually not any
other irritation already.
In my last commit I introduced a build dependency upon a new libdrm.
Add the associated autoconf checks. As the headers are part of the core
libdrm, we need to bump that version and so may as well bump the chipset
specific versions simultaneously.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
With relaxed relocation checking in the kernel, we can specify a
negative delta (i.e. pointing outside of the target bo) in order to fake
a range in a large buffer. We only then need to upload the elements used
and adjust the buffer offset such that they correspond with the indices
used in the DrawArrays.
(Depends on libdrm 0209428b3918c4336018da9293cdcbf7f8fedfb6)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
... and take advantage of start_vertex_bias to trim to [min_index,
max_index] where possible (i.e. when we need to upload all arrays).
Fixes half_float_vertex(misc.fillmode.wireframe)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34595
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
When doing copy swapbuffers using drm, throttle on outstanding copy operations.
Introduces a new environment variable, EGL_THROTTLE_FENCES that the
user can use to indicate the desired number of outstanding swapbuffers, or
disable throttling using EGL_THROTTLE_FENCES=0.
This can and perhaps should be extended to the pageflip case as well, since
with some hardware pageflips can be pipelined. In case the pageflip syncs, the
throttle operation will be a no-op anyway.
Update copyright notices.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Use the pageflip ioctl when available.
Otherwise, or when the backbuffer contents need to be preserved,
fall back to a copy operation.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
The copy swap can be used when we need to preserve the contents of
the back buffer or when there is no way to do native page-flipping.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
This makes it usable also for native helpers.
Also add inline functions to access the context and to
uninit the native display structure.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
This is reliant on a drm patch that I posted on the list + a version bump.
These will appear in drm-next today.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If the drm minor version is > 9 (i.e. whats in drm-next),
we enable s3tc + texture tiling by default now.
this changes R600_FORCE_TILING to R600_TILING which can
be set to false to disable tiling on working drm.
Signed-off-by: Dave Airlie <airlied@redhat.com>
nv50_resource is being called nv04_resource now temporarily, to avoid
a naming conflict with nouveau_resource from libdrm.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Modified from original to remove chipset-specific code, and to be decoupled
from the mm present in said drivers.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
By default the hardware rounds texcoords. However,
for point sampled textures, the expected behavior is
to truncate. When we have point sampled textures,
set the truncate bit in the sampler.
Should fix:
https://bugs.freedesktop.org/show_bug.cgi?id=25871
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
No idea why I didn't do it like this the first time, but share
the code like other portions of mesa do using _tmp.h suffix
and some #defines for the types.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The other draw stages like aaline and pstipple were already doing this.
If the driver used the aapoint stage but not the others it would crash
because of a null pipe->draw pointer.
It was only implemented in the swrast driver and probably not used by
any applications. A modern app would use a dependent/chained texture
lookup in the fragment shader.
when doing glCopyTex[Sub]Image() and checking the source buffer's
completeness.
We only need to determine FBO completeness when the status is indeterminate.
I removed the HiZ memory management, because the HiZ RAM is too small
and I also did it in hope that HiZ will be enabled more often.
This also sets aligned strides to HIZ_PITCH and ZMASK_PITCH.
This adds support for the RGTC unsigned and signed
texture storage and fetch methods.
the code is a port of the DXT5 alpha compression code.
Signed-off-by: Dave Airlie <airlied@redhat.com>
With an extremely dumb strategy. But it's the same i915c employs.
Also improve the hw_atom code slightly by statically specifying the
required batch space. For extremely variably stuff (shaders, constants)
it would probably be better to add a new parameter to the hw_atom->validate
function.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Also contains the first few bits for hw state atoms.
v2: Implement suggestion by Jakob Bornecrantz.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
v2: Add the batch bo to the libdrm validation lost, for otherwise
libdrm won't take previously used buffers into account.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Move it to i915_state_static.c This way i915_emit_state.c only emits
state and doesn't (re)calculate it.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This came out of discussion at the office today, and we agreed that
solving this for indirect wasn't really interesting, though the
server-side change would be of a similar level of difficulty.
If another thread bound a context to the drawable then unbound it, the
driContextPriv would end up NULL.
With the previous two fixes, this fixes glx-multithread-makecurrent-2,
despite the issue not being about the multithreaded makecurrent.
The driver only has one reasonable place to look for its context to
flush anything, which is the current context. Don't bother it with
having to check.
This extension allows a client to bind one context in multiple threads
simultaneously. It is then up to the client to manage synchronization of
access to the GL, just as normal multithreaded GL from multiple contexts
requires synchronization management to shared objects.
The old value, BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE makes it sound like we're
doing a non-bias texture lookup. It has the same value as the new constant
BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE_BIAS_COMPARE, so there should be no
functional changes.
There is an issue with gcc 4.6.0 that leads to segfault/assert with mesa
due to ureg_src size, reshuffling the structure member to better better
alignment work around the issue.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47893
7.9 + 7.10 candidate
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
so far only hw mipmap generation is testing on softpipe,
passes test added to piglit.
this requires another patch to mesa to let array textures mipmaps
even start to happen.
platform.system in SCons on Cygwin includes the OS version number.
Windows XP - CYGWIN_NT-5.1
Windows Vista - CYGWIN_NT-6.0
Windows 7 - CYGWIN_NT-6.1
Reduce all Cygwin platform variants to just 'cygwin' so anything
downstream can simply use 'cygwin' instead of the different full
platform names.
255.875 matches the hardware documentation. Presumably this was a typo.
NOTE: This is a candidate for the 7.10 branch, along with
commit 2bfc23fb86.
Reviewed-by: Eric Anholt <eric@anholt.net>
In the case where glBlitFramebuffer is being used to copy to a texture
without scaling it is faster if we can use the hardware to do a blit
rather than having to do a texture render. In most of the drivers
glCopyTexSubImage2D will use a blit so this patch makes it check for
when glBlitFramebuffer is doing a simple copy and then divert to
glCopyTexSubImage2D.
This was originally proposed as an extension to the common meta-ops.
However, it was rejected as using the BLT is only advantageous for Intel
hardware.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=33934
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Might be necessary if a block sneaks in somewhere, like a common
block for moves of phi sources after a loop break.
This is harmless and normally will be removed before emission.
In linear scan we can't allocate multiple values with different
live ranges at the same time to assign them consecutive regs.
Maybe we should just switch to graph coloring for all values ...
The svga_update_state() mechanism is inadequate as it will always end up
flushing the primitives before processing the SVGA_NEW_COMMAND_BUFFER
dirty state flag.
Fixes piglit/fbo-depth-sample-compare:
==14722== Invalid free() / delete / delete[]
==14722== at 0x4C240FD: free (vg_replace_malloc.c:366)
==14722== by 0x84FBBFD: intel_upload_unmap (intel_buffer_objects.c:695)
==14722== by 0x85205BC: brw_prepare_vertices (brw_draw_upload.c:457)
==14722== by 0x852F975: brw_validate_state (brw_state_upload.c:394)
==14722== by 0x851FA24: brw_draw_prims (brw_draw.c:365)
==14722== by 0x85F2221: vbo_exec_vtx_flush (vbo_exec_draw.c:389)
==14722== by 0x85EF443: vbo_exec_FlushVertices_internal (vbo_exec_api.c:543)
==14722== by 0x85EF49B: vbo_exec_FlushVertices (vbo_exec_api.c:973)
==14722== by 0x86D6A16: _mesa_set_enable (enable.c:351)
==14722== by 0x42CAD1: render_to_fbo (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
==14722== by 0x42CEE3: piglit_display (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
==14722== by 0x42F508: display (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
==14722== Address 0xc606310 is 0 bytes after a block of size 18,720 alloc'd
==14722== at 0x4C244E8: malloc (vg_replace_malloc.c:236)
==14722== by 0x85202AB: copy_array_to_vbo_array (brw_draw_upload.c:256)
==14722== by 0x85205BC: brw_prepare_vertices (brw_draw_upload.c:457)
==14722== by 0x852F975: brw_validate_state (brw_state_upload.c:394)
==14722== by 0x851FA24: brw_draw_prims (brw_draw.c:365)
==14722== by 0x85F2221: vbo_exec_vtx_flush (vbo_exec_draw.c:389)
==14722== by 0x85EF443: vbo_exec_FlushVertices_internal (vbo_exec_api.c:543)
==14722== by 0x85EF49B: vbo_exec_FlushVertices (vbo_exec_api.c:973)
==14722== by 0x86D6A16: _mesa_set_enable (enable.c:351)
==14722== by 0x42CAD1: render_to_fbo (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
==14722== by 0x42CEE3: piglit_display (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
==14722== by 0x42F508: display (in /home/ickle/git/piglit/bin/fbo-depth-sample-compare)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34604
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This adds EXT_texture_array support to r600g, it passes the piglit
array-texture test but I suspect may not be complete.
It currently requires a kernel patch to fix the CS checker to allow
these, so you need to use R600_ARRAY_TEXTURE=true for now
to enable them.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is because the HW doesn't always store a 1D array like a
2D texture, it more likely stores it like 2D texture (i.e.
alignments etc).
This means we upload each slice separately and let the driver
work out where to put it.
this might break nvc0 as I can't test it, I have only nv50 here.
Signed-off-by: Dave Airlie <airlied@redhat.com>
... and prefers a small batch whereas gen4+ prefer a large batch to
carry more state.
Tuning using openarena/padman indicate that a batch size of just 4096 is
best for those cases.
Bugzilla: https://bugs.freedesktop.org/process_bug.cgi
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The spec says that the offsets in the vertex-fetch instructions need to be byte-aligned and makes no specification with regard to the required alignment of the offset and stride in the vertex resource constant register.
However, testing indicates that all three values need to be DWORD aligned.
Ptr can be very well NULL, so when there are two arrays, with one having
offset 0 (and thus NULL Ptr), and the other having a non-zero offset,
the non-zero value is taken as minimum (because of !low_addr ? start ...).
On 32-bit systems, this somehow works. On 64-bit systems, it leads to crashes.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
255.875 matches the hardware documentation. Presumably this was a typo.
Found by inspection. Not known to fix any issues.
Reviewed-by: Eric Anholt <eric@anholt.net>
pixel_w is the final result; wpos_w is used on gen4 to compute it.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
We can't safely use fixed size arrays since Gen6+ supports unlimited
nesting of control flow.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
The code that generates MATH instructions attempts to work around
the hardware ignoring source modifiers (abs and negate) by emitting
moves into temporaries. Unfortunately, this pass coalesced those
registers, restoring the original problem. Avoid doing that.
Fixes several OpenGL ES2 conformance failures on Sandybridge.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Single-operand math already had these workarounds, but POW (the only two
operand function) did not. It needs them too - otherwise we can hit
assertion failures in brw_eu_emit.c when code is actually generated.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
gl_PointSize (VERT_RESULT_PSIZ) doesn't take up a message register,
as it's part of the header. Without this fix, writing to gl_PointSize
would cause the SF to read and use the wrong attributes, leading to all
kinds of random looking failure.
Reviewed-by: Eric Anholt <eric@anholt.net>
If two buffers had the same stride where one buffer is a user one and
the other is a vbo, it was considered to be one interleaved buffer,
resulting in incorrect rendering and crashes.
This patch makes sure that the interleaved buffer is either user or vbo,
not both.
Needs to track this ourself since because we get into a race condition with
the dri_util.c code on make current when rendering to the front buffer.
This is what happens:
Old context is rendering to the front buffer.
App calls MakeCurrent with a new context. dri_util.c sets
drawable->driContextPriv to the new context and then calls the driver make
current. st/dri make current flushes the old context, which calls back into
st/dri via the flush frontbuffer hook. st/dri calls dri loader flush
frontbuffer, which calls invalidate buffer on the drawable into st/dri.
This is where things gets wrong. st/dri grabs the context from the dri
drawable (which now points to the new context) and calls invalidate
framebuffer to the new context which has not yet set the new drawable as its
framebuffers since we have not called make current yet, it asserts.
When clearing a GL_LUMINANCE_ALPHA buffer, for example, we need to convert
the clear color (R,G,B,A) to (R,R,R,A). We were doing this for texture border
colors but not renderbuffers. Move the translation function to st_format.c
and share it.
This fixes the piglit fbo-clear-formats test.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
If finalizing a non-POW mipmapped texture with an odd-sized base texture
image we were allocating the wrong size of gallium texture (off by one).
Need to be more careful about computing the base texture image size.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=34463
Reducing the number of relocations has lots of nice knock-on effects,
not least including reducing batch buffer size, auxilliary array sizes
(vmalloced and copied into the kernel), processing of uncached
relocations etc.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Rather than waiting on the first batch after the last swapbuffers to be
retired, call into the kernel to wait upon the retirement of any request
less than 20ms old. This has the twofold advantage of (a) not blocking
any other clients from utilizing the device whilst we wait and (b) we
attain higher throughput without overloading the system.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the next vertex arrays are a (discontiguous) continuation of the
current arrays, such that the new vertices are simply offset from the
start of the current vertex buffer definitions we can reuse those
defintions and avoid the overhead of relocations and invalidations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Using a temporary buffer for large discontiguous uploads into the common
buffer and a single buffered upload is faster than performing the
discontiguous copies through a mapping into the GTT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Upload the non-vbo arrays into a single interleaved buffer object, and
so need to just emit a single vertex buffer relocation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If the user passed in several arrays interleaved in the same vbo, only
emit a single vertex buffer and relocation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Track reuse of the vertex buffer objects and so minimise the number of
vertex buffers used by the hardware (and their relocations).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we now pack the indices into a common upload buffer, we can reuse a
single CMD_INDEX_BUFFER packet and translate each invocation with a
start vertex offset.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Move the tracking of the last emitted instructions into the core
batchbuffer routines and take advantage of the shadow batch copy to
avoid extra memory allocations and copies.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
It's faster. Not only is the memcpy more efficiently performed in the
kernel (making up for the system call overhead), but by not using mmap
we remove the greater overhead of tracking the vma of every batch.
And it means we can read back from the batch buffer without incurring
the cost of a uncached read through the GTT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we use state relocations and we know that all the state belongs to
the same bo, we can drop the multiple references to the same bo.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we write directly into the batch in system memory, we do not need to
write first to the stack (as was to avoid read back through the GTT)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
As we write directly into the batch in system memory, we do not need to
write first to the stack (as was to avoid read back through the GTT)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In preparation for a greater change, use the color_calc_state_bo already
provisioned for this purpose.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Rather than performing lots of little writes to update the common bo
upon each update, write those into a static buffer and flush that when
full (or at the end of the batch). Doing so gives a dramatic performance
improvement over and above using mmaped access.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Rather than performing a blit to completely overwrite a busy bo, simply
discard it and create a new one with the fresh data.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Dynamic arrays have the tendency to be small and so allocating a bo for
each one is overkill and we can exploit many efficiency gains by packing
them together.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Dynamic draw buffers are used by clients for temporary arrays and for
uploading normal vertex arrays. By keeping the data in memory, we can
avoid reusing active buffer objects and reallocate them as they are
changed. This is important for Sandybridge which can not issue blits
within a batch and so ends up flushing the batch upon every update, that
is each batch only contains a single draw operation (if using dynamic
arrays or regular arrays from system memory).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Fix a typo which meant that --enable-shared-glapi didn't actually cause a shared glapi to be built
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
This behavior was last when moving the transfers to the contexts.
This fixes several piglit failures, which were reading the color renderbuffer
before the draw operations were emitted.
This is a temporary workaround. It fixes sauerbrauten with shaders enabled.
I guess we might be changing vertex attribs somewhere and not updating
the appropriate dirty flags, therefore we can't rely on them for now.
Or maybe we need to make this state dependent on some other flags too.
More info:
https://bugs.freedesktop.org/show_bug.cgi?id=34378
When we drop the in_swtnl_draw flag, we must force a rerun of
update_need_swtnl to reset the need_swtnl flag to its correct value outside
of a swtnl vbo draw.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
If we hit the pipe_get/put_tile() path for setting up the glCopyPixels
texture we were passing the wrong x/y position to pipe_get_tile().
The x/y position was already accounted for in the pipe_get_transfer()
call so we were effectively reading from 2*readX, 2*readY.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Addresses excessive TEMP allocation in vertex shaders where all CONSTs are
stored into TEMPS at the start, but copy propagation was failing due to
the presence of IFs.
We could do something about loops, but ifs are easy enough.
This enables the egl_dri2 driver to load swrast driver
for software rendering. It could be used when hardware
dri2 drivers are not available, such as in VM.
Signed-off-by: Haitao Feng <haitao.feng@intel.com>
Some cards don't appear to work correctly with the UNORM type,
so switch to the integer type, however since gallium has no
integer types yet from what I can see we need to do a hack to
workaround it for the blitter.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is a workaround for a bug in libtxc_dxtn.
Fixes:
- piglit/GL_EXT_texture_compression_s3tc/fbo-generatemipmap-formats
Signed-off-by: Dave Airlie <airlied@redhat.com>
Arrays are zero based. If the highest element accessed is 6, the
array needs to have 7 elements.
Fixes piglit test glsl-fs-implicit-array-size-03 and bugzilla #34198.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Fixes regression: https://bugs.freedesktop.org/show_bug.cgi?id=34160
Commit e7c1f058d1 disabled constant-folding
when division-by-zero occured. This was a mistake, because the spec does
allow division by zero. (From section 5.9 of the GLSL 1.20 spec: Dividing
by zero does not cause an exception but does result in an unspecified
value.)
For floating-point division, the original pre-e7c1f05 behavior is
reinstated.
For integer division, constant-fold 1/0 to 0.
This reverts commit b3cf92aa91.
The reverted commit prevented constant-folding of reciprocal expressions
when the reciprocated expression was 0. However, since the spec allows
division by zero, constant-folding *is* permissible in this case.
From Section 5.9 of the GLSL 1.20 spec:
Dividing by zero does not cause an exception but does result in an
unspecified value.
Before populating the vertex buffer attribute pointer (VB->AttribPtr[]),
convert vertex data in GL_FIXED format to GL_FLOAT.
Fixes bug: http://bugs.freedesktop.org/show_bug.cgi?id=34047
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This is a multi-threading optimization which hides the kernel overhead
behind a thread. It improves performance in CPU-limited apps by 2-15%.
Of course you must have at least 2 cores for it to make any difference.
It can be disabled with:
export RADEON_THREAD=0
On r600, s3tc formats require a 1D tiled texture format,
so we have to do uploads using a blit, via the 64-bit and 128-bit formats
Based on the r600c code we use a 64 and 128-bit type to do the
blits.
Still requires R600_ENABLE_S3TC until the kernel fixes are in,
this has only been tested on evergreen where the kernel doesn't
yet get in the way.
the miptree setup and pitch storing didn't work so well for block
based things like compressed textures. The CB takes blocks, where
the texture sampler takes pixels, and transfers need bytes,
So now we store blocks/bytes and translate to pixels in the sampler.
This is necessary for s3tc to work properly.
If the underlying transfer had a stride wider for hw alignment reasons,
the mipmap generation would generate badly strided images.
this fixes a few problems I found while testing r600g with s3tc
Signed-off-by: Dave Airlie <airlied@redhat.com>
Because an app may do something like this:
while (!(ptr = bo_map(..., DONT_BLOCK))) {
/* Do some other work. */
}
And it would be looping endlessly if we didn't flush.
If EXT_draw_buffers2 is supported but ARB_draw_buffers_blend isn't
_mesa_BlendFuncSeparateEXT only sets up the blend equation and function for the
first render target. This patch makes sure that update_blend doesn't try to use
the data from other rendertargets in such cases.
Signed-off-by: Brian Paul <brianp@vmware.com>
The vertex arrays state should be set only when (_NEW_ARRAY | _NEW_PROGRAM)
is dirty. This assumes user buffer content is mutable, which will be
sorted out in the next commit. The following usage case should be much faster
now:
for (i = 0; i < 1000; i++) {
glDrawElements(...);
}
Or even:
for (i = 0; i < 1000; i++) {
glSomeStateChangeOtherThanArraysOrProgram(...);
glDrawElements(...);
}
The performance increase from this may be significant in some apps and
negligible in others. It is especially noticable in the Torcs game (r300g):
Before: 15.4 fps
After: 20 fps
Also less looping over attribs in st_draw_vbo yields slight speed-up
in apps with lots of glDraw* calls.
but really wtf? all these PCI IDs need to be ripped out of here, its totally
unscalable and the drivers already have this info so could export it some better way.
tested by Darxus on #wayland.
This an adds --enable-shared-dricore option to configure. When enabled,
DRI modules will link against a shared copy of the common mesa routines
rather than statically linking these.
This saves about 30MB on disc with a full complement of classic DRI
drivers.
v2: Only enable with a gcc-compatible compiler that handles rpath
Handle DRI_CFLAGS without filter-out magic
Build shared libraries with the full mklib voodoo
Fix typos
v3: Resolve conflicts with talloc removal patches
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Track variables, functions, and types during parsing. Use this
information in the lexer to return the currect "type" for identifiers.
Change the handling of structure constructors. They will now show up
in the AST as constructors (instead of plain function calls).
Fixes piglit tests constructor-18.vert, constructor-19.vert, and
constructor-20.vert. Also fixes bugzilla #29926.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This requires lexical disambiguation between variable and type
identifiers (as most C compilers do).
Signed-off-by: Keith Packard <keithp@keithp.com>
NOTE: This is a candidate for the 7.9 and 7.10 branches.
add support for the 32-bit types, also fixup the
export setting to handle types with channels > 11 bits properly
Signed-off-by: Dave Airlie <airlied@redhat.com>
Based on Dave's branch.
The majority of this commit is a cleanup, mainly renaming things.
There wasn't much code to import, just ioctl calls.
Also done:
- implemented unsynchronized bo_map (important optimization!)
- radeon_bo_is_referenced_by_cs is no longer a refcount hack
- dropped the libdrm_radeon dependency
I'm surprised that this has resulted in less code in the end.
Previously the SNE and SEQ instructions would calculate the partial
result to the destination register. This would cause problems if the
destination register was also one of the source registers.
Fixes piglit tests glsl-fs-any, glsl-fs-struct-equal,
glsl-fs-struct-notequal, glsl-fs-vec4-operator-equal,
glsl-fs-vec4-operator-notequal.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
If the formats don't match we need to update the surface with the new
format.
if we can render to SRGB formats, enable the extension
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes a regression from commit 5cbff0932e.
The problem is *some* glDrawPixels fragment programs need to be deleted,
but not all. Use an explicit flag to indicate whether or not the program
needs to be deleted.
This should fix http://bugs.freedesktop.org/show_bug.cgi?id=34049
From section 5.9 of the GLSL 1.20 spec:
The operator modulus (%) is reserved for future use.
From section 5.8 of the GLSL 1.20 spec:
The assignments modulus into (%=), left shift by (<<=), right shift by
(>>=), inclusive or into ( |=), and exclusive or into ( ^=). These
operators are reserved for future use.
The GLSL ES 1.00 spec and GLSL 1.10 spec have similiar language.
Fixes bug:
https://bugs.freedesktop.org//show_bug.cgi?id=33916
Fixes Piglit tests:
spec/glsl-1.00/compiler/arithmetic-operators/modulus-00.frag
spec/glsl-1.00/compiler/assignment-operators/modulus-assign-00.frag
spec/glsl-1.10/compiler/arithmetic-operators/modulus-00.frag
spec/glsl-1.10/compiler/assignment-operators/modulus-assign-00.frag
spec/glsl-1.20/compiler/arithmetic-operators/modulus-00.frag
spec/glsl-1.20/compiler/assignment-operators/modulus-assign-00.frag
Fixes a minor memory leak with the "engine" mesa demo.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Make sure we unreference the vertex buffer pointers in a local array.
This fixes huge vertex buffer / memory leaks in mesa demos "fire" and "engine".
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Relative addressing of constant buffers can't work properly through the
kcache, since you can only address within the currently locked kcache window.
Instead, this patch binds the constant buffer as a shader resource, and then
explicitly fetches the constant using a vertex fetch with fetch type
VTX_FETCH_NO_INDEX_OFFSET from the shader. There's probably still some room
for improvement, doing the fetch right before the instruction that needs the
value may not be quite optimal for example.
The r600_bc_alu_src structure is used in two different ways, as a vector and
for the individual channels of that same vector. This is somewhat fragile,
and probably confusing.
If the client hasn't done the initial wl_display_iterate() at the time
we initialize the display, we have to do that in platform_wayland.c.
Make sure we detect that correctly instead of dup()ing fd=0, and use
the sync callback to make sure we don't wait forever for authorization that
won't happen.
This code has originally matured in r300g and was ported to r600g several
times. It was obvious it's a code duplication.
See also comments in the header file.
The scheduler and the register allocator are not good enough yet to deal
with the effects of the register rename pass. This was causing a 50%
performance drop in Lightsmark. The pass can be re-enabled once the
scheduler and the register allocator are more mature. r300 and r400
still need this pass, because it prevents a lot of shaders from using
too many texture indirections.
NOTE: This is a candidate for the 7.10 branch.
This adds i965 support for GL_EXT_framebuffer_sRGB, it introduces a new
constant to say that the driver can support sRGB enabled FBOs since enabling
the extension doesn't mean the driver can actually support sRGB.
Also adds the suggested state flush in the core code suggested by Brian.
fix the ARB_fbo color encoding.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Querying index zero is not an error in OpenGL ES 2.0.
Querying an index larger than the value returned by
GL_MAX_VERTEX_ATTRIBS is an error in all APIs.
Fixes bugzilla #32375.
This patch cleans up many of the extra copies in GLSL IR introduced by
i965's scalarizing passes. It doesn't result in a statistically
significant performance difference on nexuiz high settings (n=3) or my
demo (n=10), due to brw_fs.cpp's register coalescing covering most of
those extra moves anyway. However, it does make the debug of wine's
GLSL shaders much more tractable, and reduces instruction count of
glsl-fs-convolution-2 from 376 to 288.
The type of u_get_transfer_vtbl of the usage argument in u_transfer.h is
unsigned and not enum pipe_transfer_usage. This patch changes the type
of usage to unsigned to match the prototype in the header file.
Standard library functions in C++ are in the std namespace. When using
C++-style header files for the standard library, some compilers, such as
Sun Studio, provide symbols only for the std namespace and not for the
global namespace.
This patch adds using statements for standard library functions. Another
option could have been to prepend standard library function calls with
'std::'.
This patch fixes several compilation errors with Sun Studio.
This just adds a flag to create the texture without doing any
flushing to it. Flushing occurs in the draw function. This avoids
unnecessary flushes when we end up rebinding a CB/DB/texture due
to the blitter just restoring state.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Before, the set_sampler_views() and restore_sampler_views() functions
used MAX2(old,new) to tell the driver how many samplers or sampler
views to set. This could result in cases such as:
pipe->set_fragment_sampler_views(pipe, 4, views={foo, bar, NULL, NULL})
Many/most gallium drivers would take this as-is and set
ctx->num_sampler_views=4 and ctx->sampler_views={foo, bar, NULL, NULL, ...}.
Later, loops over ctx->num_sampler_views would have to check for null
pointers. Worse, the number of sampler views and number of sampler CSOs
could get out of sync:
ctx->num_samplers = 2
ctx->samplers = {foo, bar, ...}
ctx->num_sampler_views = 4
ctx->sampler_views={Foo, Bar, NULL, NULL, ...}
So loops over the num_samplers could run into null sampler_views pointers
or vice versa.
This fixes a failed assertion in the SVGA driver when running the Mesa
engine demo in AA line mode (and possibly other cases).
It looks like all gallium drivers are careful to unreference views
and null-out sampler CSO pointers for the units beyond what's set
with the pipe::bind_x_sampler_states() and pipe::set_x_sampler_views()
functions.
I'll update the gallium docs to explain this as well.
This new interface could set up context for OpenGL,
OpenGL ES1 and OpenGL ES2. It will be used by egl_dri2
driver.
Signed-off-by: Haitao Feng <haitao.feng@intel.com>
Leak was introduced when fixing strict aliasing violation in this code:
the reference counting was preserved, but the destructor call on zero
reference count was not.
Exactly one half would be the ideal, but this is a soft limit, and one
more byte over brings us to synchronous behavior.
Flushing when the referred GMR exceeds one third of the aperture gives us
statistically better performance.
Certain mingw32 cross compilers (e.g. RedHat's) defaults to use DLL gcc
runtime.
Given the main deliverable from this project are self-contained drivers,
which are loaded by any application, this dependency can cause havoc.
If we get a sw accessible buffer like the S8 texture we end up
doing depth tracking on it when there is no need since we won't
ever bind it to the hardware. This leads to a sw fallback in the
transfer destruction which leads to and endless recusion loop
of fail in transfer destroy.
Signed-off-by: Dave Airlie <airlied@redhat.com>
this adds a flag to keep track of whether the depth texture structure
is the flushed texture or not, so we can avoid doing flushes when
we do a hw rendering from one to the other.
it also renames flushed to dirty_db which tracks if the DB copy
has been dirtied by being bound to the hw.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This consolidates the code duplicated between the fragment sampler
and vertex sampler functions. Plus, it'll make adding support for
geometry shader samplers trivial.
Before we were looping to nr_samplers, which is the number of fragment
samplers, not vertex samplers.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
For example, this now raises an error:
#define XXX 1 / 0
Fixes bug: https://bugs.freedesktop.org//show_bug.cgi?id=33507
Fixes Piglit test: spec/glsl-1.10/preprocessor/modulus-by-zero.vert
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Avoid division-by-zero when constant-folding the following expression
types:
ir_unop_rsq
ir_binop_div
ir_binop_mod
Fixes bugs:
https://bugs.freedesktop.org//show_bug.cgi?id=33306https://bugs.freedesktop.org//show_bug.cgi?id=33508
Fixes Piglit tests:
glslparsertest/glsl2/div-by-zero-01.frag
glslparsertest/glsl2/div-by-zero-02.frag
glslparsertest/glsl2/div-by-zero-03.frag
glslparsertest/glsl2/modulus-zero-01.frag
glslparsertest/glsl2/modulus-zero-02.frag
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Do not constant-fold a reciprocal if any component of the reciprocated
expression is 0. For example, do not constant-fold `1 / vec4(0, 1, 2, 3)`.
Incorrect, previous behavior
----------------------------
Reciprocals were constant-folded even when some component of the
reciprocated expression was 0. The incorrectly applied arithmetic was:
1 / 0 := 0
For example,
1 / vec4(0, 1, 2, 3) = vec4(0, 1, 1/2, 1/3)
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This was my mistake when converting from talloc to ralloc. I was
confused because the other calls in the function are to asprintf_append
and the original code used str as the context rather than NULL.
Fixes bug #33823.
Previously a register would be marked as available if any component
was written. This caused shaders such as this:
0: TEX TEMP[0].xyz, INPUT[14].xyyy, texture[0], 2D;
1: MUL TEMP[1], UNIFORM[0], TEMP[0].xxxx;
2: MAD TEMP[2], UNIFORM[1], TEMP[0].yyyy, TEMP[1];
3: MAD TEMP[1], UNIFORM[2], TEMP[0].zzzz, TEMP[2];
4: ADD TEMP[0].xyz, TEMP[1].xyzx, UNIFORM[3].xyzx;
5: TEX TEMP[1].w, INPUT[14].xyyy, texture[0], 2D;
6: MOV TEMP[0].w, TEMP[1].wwww;
7: MOV OUTPUT[2], TEMP[0];
8: END
to produce incorrect code such as this:
BEGIN
DCL S[0]
DCL T_TEX0
R[0] = MOV T_TEX0.xyyy
U[0] = TEXLD S[0],R[0]
R[0].xyz = MOV U[0]
R[1] = MUL CONST[0], R[0].xxxx
R[2] = MAD CONST[1], R[0].yyyy, R[1]
R[1] = MAD CONST[2], R[0].zzzz, R[2]
R[0].xyz = ADD R[1].xyzx, CONST[3].xyzx
R[0] = MOV T_TEX0.xyyy
U[0] = TEXLD S[0],R[0]
R[1].w = MOV U[0]
R[0].w = MOV R[1].wwww
oC = MOV R[0]
END
Note that T_TEX0 is copied to R[0], but the xyz components of R[0] are
still expected to hold a calculated value.
Fixes piglit tests draw-elements-vs-inputs, fp-kill, and
glsl-fs-color-matrix. It also fixes Meego bugzilla #13005.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Also return it as the correct type. Previously the whole array would
be returned and each element would be expanded to a vec4.
Fixes piglit test getuniform-01 and bugzilla #29823.
Passing ralloc_vasprintf_append a 0-byte allocation doesn't work. If
passed a non-NULL argument, ralloc calls strlen to find the end of the
string. Since there's no terminating '\0', it runs off the end.
Fixes a crash introduced in 14880a510a.
If we see a MACRO bit on r600g its 2D tiled,
if don't see a MACRO bit and we do see a MICRO bit then its 1D tiled.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If the tile type for the buffer is 1 then its been bound to the
DB at some point, we need to decompress it, otherwise its only
been bound as texture/cb so don't do anything.
This fixes 5 piglit tests here on r600g.
this just adds the ioctl interface and sets the tile type
and array mode in the correct place.
This seems to bring eg 1D tiling to the same level, and issues
as on r600. No idea how to address 2D yet.
Previously we'd happily compile GLSL 1.30 shaders on any driver. We'd
also happily compile GLSL 1.10 and 1.20 shaders in an ES2 context.
This has been a long standing FINISHME in the compiler.
NOTE: This is a candidate for the 7.9 and 7.10 branches
Rather than passing "True", pass a bitfield describing the particular
variant's features - either projection or offset.
This should make the code a bit more readable ("Proj" instead of "True")
and make it easier to support offsets in the future.
This annotation is for an "in" function parameter for which it is only legal
to pass constant expressions. The only known example of this, currently,
is the textureOffset functions.
This should never be used for globals.
Having these as actual integer values makes it difficult to implement
the texture*Offset built-in functions, since the offset is actually a
function parameter (which doesn't have a constant value).
The original rationale was that some hardware needs these offset baked
into the instruction opcode. However, at least i965 should be able to
support non-constant offsets. Others should be able to rely on inlining
and constant propagation.
Since the introduction of ir_var_system_value, system variables would be
printed as "temporary" and temporaries would result in out-of-bounds
array access, showing up as garbage in printed IR.
SVGA3D only supports SGN for vertex shaders, and this requires two additional
temporary registers for intermediate results.
For fragment shaders, lower to two CMPs and one ADD.
The extra length is the size of the request *minus* the size of the
VendorPrivate header, not the addition.
NOTE: This is a candidate for the 7.9 and 7.10 branches
Signed-off-by: Julien Cristau <jcristau@debian.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
xGLXChangeDrawableAttributesSGIXReq follows the GLXVendorPrivate header
with a drawable, number of attributes, and list of (type, value)
attribute pairs. Don't forget to put the number of attributes in there.
I don't think this can ever have worked.
NOTE: This is a candidate for the 7.9 and 7.10 branches
Signed-off-by: Julien Cristau <jcristau@debian.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
Like on some r5xx, there are multiple DB backends on the r600,
we need to add up the query results from each of these to get the
final correct value.
So far I'm not 100% sure how to calculate the num_db, value
setting it to 4 should be harmless enough until we do.
This fixes occulsion_query piglit test on my rv740.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Although CUBE is a reduction inst, it writes to more than just PV.X
so we need to keep the dst channel.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This only works on r600/r700 so far, evergreen doesn't appear
to have the multiwrite enable bit in the color control, so we
may have to actually do a shader rewrite on EG hardware.
remove some duplicate code reg defines also.
Signed-off-by: Dave Airlie <airlied@redhat.com>
I know Jerome will probably rewrite the way depth textures work sometime
soon. For the time being this should at least make common depth texture usage
for shadowing work properly though.
When the paint is opaque (currently, solid color with alpha 1.0f), no
blending is needed for VG_BLEND_SRC_OVER. This eliminates the serious
performance hit introduced by 859106f196
for a common scenario.
Swizzles are now defined everywhere as a field with 12 bits that contains
4 channels worth of meaningful information. Any channel that is unused is
set to RC_SWIZZLE_UNUSED. This change is necessary because rgb instructions
and alpha instructions were initializing channels that would never be used
(channel 3 for rgb and channels 1-3 for alpha) with 0 (aka RC_SWIZZLE_X).
This made it impossible to use generic helper functions for swizzles,
because sometimes a channel value of 0 meant unused and other times it
meant RC_SWIZZLE_X.
All hacks that tried to guess how many channels were relevant have
also been removed.
1) Only translate the [min_index, max_index] range.
2) Upload translated vertices via the uploader.
3) Rename valid_vertex_buffer[] to real_vertex_buffer[]
Only upload the [min_index, max_index] range instead of [0, userbuf_size].
This an important optimization.
Framerate in Lightsmark:
Before: 22 fps
After: 75 fps
The same optimization is already in r300g.
I can't see a performance difference with this code, which means all
the driver-specific code removed in this commit was unnecessary.
Now we use u_upload_mgr in a slightly different way than we did before it got
dropped. I am not restoring the original code "as is" due to latest
u_upload_mgr changes that r300g performance benefits from.
This also fixes:
- piglit/fp-kil
When an app loads libEGL.so dynamically with RTLD_LOCAL, loading DRI
drivers would fail because of missing glapi symbols. This commit makes
egl_dri2 load libglapi.so with RTLD_GLOBAL to export glapi symbols for
future symbol resolutions.
The same trick can be found in GLX. However, egl_dri2 can only do so
when --enable-shared-glapi is given. Because, otherwise, both libGL.so
and libglapi.so define glapi symbols and egl_dri2 cannot tell which
library to load.
When the user sets EGL_DRIVER to egl_dri2 (or egl_glx), make sure the
built-in driver is used. The user might leave the outdated egl_dri2.so
(or egl_glx.so) on the filesystem and we do not want to load it.
makedepend would crash when a source includes a header indirectly, such
as
#define HEADER "some-header.h"
#include HEADER
Do not define HEADER (makedepend would detects this as an incomplete
include) and add the dependency manually in the Makefile.
This should hopefully fix bug #33374.
For 1D/2D texture arrays use the pipe_resource::array_size field.
In OpenGL 1D arrays texture use the height dimension as the array
size and 2D array textures use the depth dimension as the array size.
Gallium uses a special array_size field instead. When setting up
gallium textures or comparing Mesa textures to gallium textures we
need to be extra careful that we're comparing the right fields.
The new st_gl_texture_dims_to_pipe_dims() function maps OpenGL
texture dimensions to gallium texture dimensions and simplifies
this quite a bit.
This reverts commit d3df641f0a.
The original commit had sat unpushed on my machine for months. By the
time I found it again, I had forgotten that we had decided not to use
this change after all, (the relevant test was removed long ago).
The GLSL specification is vague here, (just says "as is standard for
C++"), though the C specifications seem quite clear that this should
be an error.
However, an existing piglit test (CorrectPreprocess11.frag) expects
this to be a warning, not an error, so we change this, and document in
README the deviation from the specification.
So that 'foo' can be found in: OPTION=prefixfoosuffix,foo
Also allow that debug options can be separated by a non-alphanumeric characters
instead of just commas.
This drops the memblock manager for ZMASK. Instead, only one zbuffer can be
compressed at a time. Note that this does not necessarily have to be slower.
When there is a large number of zbuffers, compression might be used more often
than it was before. It's also easier to debug.
How it works:
1) 'clear' turns the compression on.
2) If some other zbuffer is set or the currently-bound zbuffer is used
for texturing, the driver decompresses it and then turns the compression off.
Notes:
- The ZMASK clear has been refactored, so that only one packet3 is used to clear
ZMASK.
- The 8x8 compression mode is disabled. I couldn't make it work without issues.
- Also removed driver-specific stuff from u_blitter.
Driver status:
- RV530 and R580 appear to just work (finally).
- RV570 should work, but there may be an issue that we don't correctly
calculate the number of dwords to clear, resulting in a partially
uninitialized zbuffer.
- RS690 misrenders as if no ZMASK clear happened. No idea what's going on.
- RV350 may even hardlock. This issue was already present and this patch doesn't
fix it.
I think we are still missing some hardware info we need to make the zbuffer
compression work fully.
Note that there is also an issue with HiZ, resulting in a sort of blocky
zigzagged corruption around some objects.
From the AMD_conservative_depth spec:
If gl_FragDepth is redeclared in any fragment shader in a program, it
must be redeclared in all fragment shaders in that program that have
static assignments to gl_FragDepth. All redeclarations of gl_FragDepth in
all fragment shaders in a single program must have the same set of
qualifiers.
When AMD_conservative_depth is enabled:
* Let 'layout' be a token.
* Extend the production rule of layout_qualifier_id to process the tokens:
depth_any
depth_greater
depth_less
depth_unchanged
Let's assume there are two options with names such that one is a substring
of another. Previously, if we only specified the longer one as a debug option,
the shorter one would be considered specified as well (because of strstr).
This commit fixes it by checking that each option is surrounded by commas.
(a regexp would be nicer, but this is not a performance critical code)
Update the max_array_access of a global as functions that use that
global are pulled into the linked shader.
Fixes piglit test glsl-fs-implicit-array-size-01 and bugzilla #33219.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Previously only global arrays with implicit sizes would be patched.
This causes all arrays that are actually accessed to be sized.
Fixes piglit test glsl-fs-implicit-array-size-02.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
emit_adjusted_wpos() needs separate x,y translation values. If we
invert Y, we don't want to effect X.
Part of the fix for http://bugs.freedesktop.org/show_bug.cgi?id=26795
NOTE: This is a candidate for the 7.9 and 7.10 branches.
The set of internalFormat parameters accepted by glRenderBufferStorage
depends on the EXT vs. ARB version of framebuffer_object. The later
added support for GL_ALPHA, GL_LUMINANCE, etc. formats. Note that
these formats might be legal but might not be supported. That should
be checked with glCheckFramebufferStatus().
This reverts commit 65c41d55a0.
There really are quite a few differences in the set of internal
formats allowed by glTexImage and glRenderbufferStorage.
Per the spec, all OpenVG handles are 32-bit. We can't just cast them
to/from integers on 64-bit systems.
Start fixing that mess by introducing a set of handle/pointer conversion
functions in handle.h. The next step is to implement a handle/pointer
hash table...
We were sending too long requests for GLXChangeDrawableAttributes,
GLXGetDrawableAttributes, GLXDestroyPixmap and GLXDestroyWindow.
NOTE: This is a candidate for the 7.9 and 7.10 branches
Signed-off-by: Julien Cristau <jcristau@debian.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
FLT_TO_INT is a vector instruction, despite what the (current) documentation
says. FLT_TO_INT_FLOOR and FLT_TO_INT_RPI aren't explicitly mentioned in the
documentation, but those are vector instructions too.
largely a merge of the previously discussed origin/gallium-resource-sampling
but updated.
the idea is to allow arbitrary binding of resources, the way opencl, new gl
versions and dx10+ require, i.e.
DCL RES[0], 2D, FLOAT
LOAD DST[0], SRC[0], RES[0]
SAMPLE DST[0], SRC[0], RES[0], SAMP[0]
Contrary what the name may suggest, LLVM's opaque types are used for
recursive types -- types whose definition refers itself -- so opaque
types correspond to pre-declaring a structure in C. E.g.:
struct node;
struct link {
....
struct node *next;
};
struct node {
struct link link;
}
Void pointers are also disallowed by LLVM. So the suggested way of creating
what's commonly referred as "opaque pointers" is using byte pointer (i.e.,
uint8_t * ).
Fixes the build when selecting driver=osmesa and building static libraries.
Otherwise, mklib tries to add the ‘-ltalloc’ object to the archive, which
obviously fails.
Clients which statically link to osmesa will need to link to libtalloc also,
as specified in the Libs.private of osmesa.pc.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=33360
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
I couldn't make it work.
GB_TILE_CONFIG.Z_EXTENDED, which enables per-pixel Z clamping, and
VAP_CLIP_CNTL.CLIP_DISABLE, which disables clipping, do help, but they
also add regressions like random graphics corruptions in some games.
But we have to cheat and peek at the GENERIC semantic indices the
state tracker uses for TEXn.
Only outputs from 0x300 to 0x37c can be replaced, and so we have to
know on shader compilation which ones to put there in order to keep
doing separate shader objects properly.
At some point I'll probably create a patch that makes gallium not
force us to discard the information about what is a TexCoord.
The _NEW_ACCUM flag was only set when changing the accumulation
buffer clear color and never used anywhere. Reclaim that dirty bit.
Clean up the definitions of the other dirty bit flags.
Only a few texture object parameters can effect texture completeness:
min level, max level, minification filter. Don't mark the texture
incomplete for other texture object state changes.
We are not required to do the linear->sRGB conversion if ARB_framebuffer_sRGB
is unsupported. However I think the conversion should work in hw except
for blending, which matches the D3D9 behavior.
The rvalue of the returned value can be NULL if the shader says
'return foo();' and foo() is a function that returns void.
Existing GLSL specs do *NOT* say that this is an error. The type of
the return value is void. If the return type of the function is also
void, then this should compile without error. I expect that future
versions of the GLSL spec will fix this (wink, wink, nudge, nudge).
Fixes piglit test glsl-1.10/compiler/expressions/return-01.vert and
bugzilla #33308.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Before, we were converting between linear/sRGB in glReadPixels,
glDrawPixels, glAccum, etc if the renderbuffer was an sRGB texture.
Those all need to operate on pixel values as-is without conversion.
Also, when setting up render-to-texture, if the texture is sRGB the
pipe_surface view must be linear RGB. This will change when we
support GL_ARB_framebuffer_sRGB.
This fixes http://bugs.freedesktop.org/show_bug.cgi?id=33353
When we read/write image tiles we need to use the format specified
in the pipe_surface, not the pipe_transfer format (which comes from
the underlying texture/resource format).
This comes up when rendering to sRGB surfaces (via OpenGL render to
texture). Ignoring the new GL_ARB/EXT_framebuffer_sRGB extension
for now, when we render to a sRGB surface we need to treat it like
a regular, linear colorspace RGB surface. Before, when we read/wrote
tiles to sRGB surfaces we were inadvertantly doing the color space
conversion.
The new function, pipe_get_tile_rgba_format(), no longer takes a
swizzle (we weren't actually using it anywhere). Rename it to
indicate that the format is passed explicitly.
GLES can be enabled by running scons with
$ scons gles=yes
When gles=yes is given, the build is changed in three ways. First,
libmesa.a will be built with FEATURE_ES1 and FEATURE_ES2. This makes
DRI drivers and libEGL support and advertise GLES support. Second, GLES
libraries will be created. They are libGLESv1_CM, libGLESv2, and
libglapi. Last, libGL or opengl32 will link to libglapi. This change
is required as _glapi_* will be declared as __declspec(dllimport) in
libmesa.a on windows. libmesa.a expects those symbols to be defined in
another DLL. Due to this change to GL, GLES support is marked
experimental.
Note that GLES requires libxml2-python to generate some of its sources.
The original allocations use regs->regs as the context, so talloc will
happily ignore the context given here. Change it to match to clarify
that it isn't changing.
Improves the cases when:
* an explicit assignment references the read-only variable
* an 'out' or 'inout' function parameter references the read-only variable
This adds the get/enable enums and internal gl_config storage
for this extension.
In theory this is all that is needed to enable this extension
from what I can see, since its not mandatory to implement the
features if you don't advertise the visuals or the fb configs.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The loop to choose a pixel format for the window was incrementing
'i' after we succeeded in creating the window so if we chose format[0]
for graw_create_window_and_screen() we were putting format[1] in
the pipe_resource template for creating the render target.
This only worked because of the order of the elements in the formats[]
array.
The graw_xlib.c code now properly compares the requested gallium pixel
format against the visual's color layout.
Update all the graw demos to fix the off-by-one-i error.
ideally this should be set once in the beginning of CS but there's
no way to change values there while in the middle of rendering.
For now reemitting SQ setup seems to work probably due to
r700WaitForIdleClean after each render
currently does not to try to decrease values once increased
fixes hangs in glsl-vs-vec4-indexing-temp-src-in-nested-loop-combined
glsl-vs-vec4-indexing-temp-dst-in-nested-loop-combined for my rv740
maybe more for other chips
When --enable-shared-glapi is specified, libGL will share libglapi with
OpenGL ES instead of defining its own copy of glapi. This makes sure an
app will get only one copy of glapi in its address space.
The new option is disabled by default. When enabled, libGL and libglapi
must be built from the same source tree and distributed together. This
requirement comes from the fact that the dispatch offsets used by these
libraries are re-assigned whenever GLAPI XMLs are changed.
For GLX, indirect rendering for has_different_protocol() functions is
tricky. A has_different_protocol() function is assigned only one
dispatch offset, yet each entry point needs a different protocol opcode.
It cannot be supported by the shared glapi. The fix to this is to make
glXGetProcAddress handle such functions specially before calling
_glapi_get_proc_address.
Note that these files are automatically generated/re-generated
src/glx/indirect.c
src/glx/indirect.h
src/mapi/glapi/glapi_mapi_tmp.h
Move _glapi_* symbols from libGLESv1_CM.so and libGLESv2.so to
libglapi.so. This makes sure an app will get only one copy of glapi in
its address space.
Note that with this change, libGLES* and libglapi must be built from the
same source tree and distributed together. This requirement comes from
the fact that the dispatch offsets used by these libraries are
re-assigned whenever GLAPI XMLs are changed.
In bridge mode, mapi no longer implements glapi.h. It becomes a user of
glapi.h. Imagine an app that uses both libGL.so and libGLESv2.so.
There will be two copies of glapi in the app's memory. It is possible
that _glapi_get_dispatch does not return what _glapi_set_dispatch set,
if they access different copies of the global variables. The solution
to this situation to build either one of the libraries as a bridge to
the other. Or build both libraries as bridges to another shared
glapi library.
When MAPI_MODE_GLAPI is defined, u_current_table is renamed to
_glapi_Dispatch or _glapi_tls_Dispatch. The ASM dispatchers should not
use hardcoded name.
The new implementation is based on mapi. No new script is needed. As
noted in sources.mk, the way to use it is to compile MAPI_GLAPI_SOURCES
with MAPI_MODE_GLAPI defined.
Fix GLAPIPrinter, ES1APIPrinter, and ES2APIPrinter to output files that
are ready for compilation. Since gl_and_es_API.xml is based on
gl_API.xml, the hidden and handcode attributes of entries have to be
overridden for ES1APIPrinter and ES2APIPrinter.
gl_and_es_API.xml defines OpenGL ES 1.1 and 2.0 API as well as OpenGL
API. It consists of gl_API.xml and the newly added es_EXT.xml,
ARB_get_program_binary.xml, OES_single_precision.xml, and
OES_fixed_point.xml.
This fixes a bunch of unnecessary barriers due to the scheduler not
knowing what that arbitrary register description refers to when trying
to reason about its dependencies.
The result is rescheduling in the convolution kernel shader in
Lightsmark, which results in avoiding register spilling and increasing
the performance of the first scene from 6-7 fps midway through the
panning to 11fps. The register spilling was a regression from Mesa
7.9 to Mesa 7.10.
Improves performance of my GLSL demo by 5.1% (+/- 1.4%, n=7). It also
reschedules the giant multiply tree at the end of
glsl-fs-convolution-1 so that we end up not spilling registers,
producing the expected level of performance.
Can happen during swrast fallbacks if a buffer is somehow bound as
a render target and a texture.
Fixes gnome-shell on nv20, and gets it mostly working on nv10.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
sw clears were being used and not getting the correct offsets in the span
code.
also not emitting correct offsets for CB draws to texture levels.
(I've no idea why I'm playing with r100).
This is a candidate for 7.9 and 7.10
According to R700 ISA we have only two channels for cfile constants.
This patch makes piglit tests "glsl1-constant array with constant
indexing" happy on RV710.
Fixes the following Piglit tests:
glslparsertest/shaders/array2.frag
glslparsertest/shaders/dataType6.frag
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This fixes a potential failure when a begin/end_query is the first
thing to happen after flushing the scene.
NOTE: This is a candidate for the 7.10 and 7.9 branches.
This revealed a bug in ra_get_spill_benefit where we only considered
the benefit of the first adjacency we were to remove, explaining some
of the ugly spilling I've seen in shaders. Because of the reduced
spilling, it reduces the runtime of glsl-fs-convolution-1 36.9% +/-
0.9% (n=5).
This was recommended in the original paper, but I figued "make it run"
before "make it fast". Now we make it fast. Reduces the runtime of
glsl-fs-convolution-1 by 12.7% +/- 0.6% (n=5).
Our use of the register allocator in i965 is somewhat unusual.
Whereas most architectures would have a smaller set of registers with
fewer register classes and reuse that across compilation, we have 1,
2, and 4-register classes (usually) and a variable number up to 128
registers per compile depending on how many setup parameters and push
constants are present. As a result, when compiling large numbers of
programs (as with glean texCombine going through ff_fragment_shader),
we spent much of our CPU time in computing the q[] array. By keeping
a separate list of what the conflicts are for a particular reg, we
reduce glean texCombine time 17.0% +/- 2.3% (n=5).
We don't expect this optimization to be useful for 915, which will
have a constant register set, but it would be useful if we were switch
to this register allocator for Mesa IR.
Fixes texrect-many regression with ff_fragment_shader -- as we added
refs to the subsequent texcoord scaling paramters, the array got
realloced to a new address while our params[] still pointed at the old
location.
The driver was saying that independend blend functions was not supported,
but it really was. The driver was using the per-target independend blend
factors but the state tracker was only setting the 0th one (per the
Gallium spec).
Fixes a piglit fbo-drawbuffers2-blend regression.
See https://bugs.freedesktop.org/show_bug.cgi?id=33215
The removed semantic check also exists in ast_type_specifier::hir(), which
is a more natural location for it.
The check verified that precision statements are applied only to types
float and int.
* Check that precision qualifiers only appear in language versions 1.00,
1.30, and later.
* Check that precision qualifiers do not apply to bools and structs.
Fixes the following Piglit tests:
* spec/glsl-1.30/precision-qualifiers/precision-bool-01.frag
* spec/glsl-1.30/precision-qualifiers/precision-struct-01.frag
* spec/glsl-1.30/precision-qualifiers/precision-struct-02.frag
Change default value to ast_precision_none, which denotes the absence of
a precision of a qualifier.
Previously, the default value was ast_precision_high. This made it
impossible to detect if a precision qualifier was present or not.
The check is performed only in GLSL versions >= 1.30.
From section 4.3.4 of the GLSL 1.30 spec:
"It is an error to use centroid in in a vertex shader."
Fixes Piglit test
spec/glsl-1.30/compiler/storage-qualifiers/vs-centroid-in-01.vert
The check is performed only in GLSL versions >= 1.30.
Fixes the following Piglit tests:
* spec/glsl-1.30/compiler/interpolation-qualifiers/fs-smooth-02.frag
* spec/glsl-1.30/compiler/interpolation-qualifiers/vs-smooth-01.vert
... and 'centroid varying'. The check is performed only in GLSL
versions >= 1.30.
From page 29 (page 35 of the PDF) of the GLSL 1.30 spec:
"interpolation qualifiers may only precede the qualifiers in, centroid
in, out, or centroid out in a declaration. They do not apply to the
deprecated storage qualifiers varying or centroid varying."
Fixes Piglit test
spec/glsl-1.30/compiler/interpolation-qualifiers/smooth-varying-01.frag.
If an interpolation qualifier is present, then the method returns that
qualifier's string representation. For example, if the noperspective bit
is set, then it returns "noperspective".
This implements the extension by choosing a different set of texture
fetch functions when the texture parameter changes.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If a literal slot isn't used it should be set
to 0 instead of an uninitialized value. Also the
channels for pre R700 trig functions were incorrect.
And most important literals were not counted against ndw,
resulting in an invalid force_add_cf detection.
We don't have all of the features of this extension hooked up yet, but
the consensus yesterday was that since those features are things that
we should also be supporting in our ES2 implementation, claiming ES2
here too doesn't make anything worse and will make incremental
improvement through piglit easier.
This code should never have been triggered, but I often did anyway
when I disabled optimization passes during debugging, then spent my
time debugging that this code doesn't work.
The comment of "this is just like teximages except for..." is a pretty
good clue that we're handling this wrong. By just using the teximage
code, we catch a bunch of cases we'd missed, like GL_RED and GL_RG.
This catches more opportunities than the prog_optimize.c code on
openarena's fixed function shaders turned to GLSL, mostly due to
looking at multiple source instructions for copy propagation
opportunities. It should also be much more CPU efficient than
prog_optimize.c's code.
The usage of macro V_SQ_ALU_WORD1_OP2_SQ_OP2_INST_FLT_TO_INT_FLOOR was
introduced by commit 323ef3a1f0 but the
macro is undefined. Disable this case to fix the build for now.
Add a bit in struct gl_extensions for OES_standard_derivatives, and enable
the bit by default. Advertise the extension only if the bit is enabled.
Previously, OES_standard_derivatives was advertised in GLES2 contexts
if ARB_framebuffer_object was enabled.
The specs that add 'layout' require the use of 'in' or 'out'.
However, a number of implementations, including Mesa, shipped several
of these extensions allowing the use of 'varying' and 'attribute'.
For these extensions only a warning is emitted.
This differs from the behavior of Mesa 7.10. Mesa 7.10 would only
accept 'attribute' with 'layout(location)'. This behavior was clearly
wrong. Rather than carrying the broken behavior forward, we're just
doing the correct thing.
This is related to (piglit) bugzilla #31804.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
All of the extensions that add the 'layout' keyword also enable (and
required) the use of 'in' and 'out' with shader globals.
This is related to (piglit) bugzilla #31804.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
When use_spoken is true, istart (the first vertex of this segment) is
replaced by i0 (the spoken vertex of the fan). There are still icount
vertices.
Thanks to Brian Paul for spotting this.
Without this, X doesn't start with UMS on r300g.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
Signed-off-by: Paulo Zanoni <pzanoni@mandriva.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
The idea is to be able to match a driver using the following order
try egl_gallium with hw renderer
try egl_dri2
try egl_gallium with sw renderer
try egl_glx
given the module list
egl_gallium
egl_dri2
egl_glx
For that, UseFallback initialization option is added. The module list
is matched twice: with the option unset and with the option set. In the
first pass, egl_gallium skips its sw renderer and egl_glx rejects to
initialize since UseFallback is not set. In the second pass,
egl_gallium skips its hw renderer and egl_dri2 rejects to initialize
since UseFallback is set. The process stops at the first driver that
initializes the display.
Reorder/rename and document the fields that should be set by the driver during
initialization. Drop the major/minor arguments from drv->API.Initialize.
This makes it unnecessary to pass _mesa_glsl_parse_state around
everywhere, making at least the prototypes a lot easier to read.
It's also more C++-ish than a pile of static C functions.
All of these functions used to take s_list pointers so they wouldn't all
need SX_AS_LIST conversions and error checking. However, the new
pattern matcher conveniently does this for us in one centralized place.
So there's no need to insist on s_list. Switching to s_expression saves
a bit of code and is somewhat cleaner.
Previously, the IR reader was riddled with code that:
1. Checked for the right number of list elements (via a linked list walk)
2. Retrieved references to each component (via ->next->next pointers)
3. Downcasted as necessary to make sure that each sub-component was the
right type (i.e. symbol, int, list).
4. Checking that the tag (i.e. "declare") was correct.
This was all very ad-hoc and a bit ugly. Error checking had to be done
at both steps 1, 3, and 4. Most code didn't even check the tag, relying
on the caller to do so. Not all callers did.
The new pattern matching module performs the whole process in a single
straightforward function call, resulting in shorter, more readable code.
Unfortunately, MSVC does not support C99-style anonymous arrays, so the
pattern must be declared outside of the match call.
stObj->pt is null when a TFP texture is passed to st_finalize_texture,
and with the changes introduced in the above commit this resulted in a
new texture being created and the existing image being copied into it.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Fixes a PT_NOT_PRESENT error cause by:
- allocating in VRAM
- emitting GART relocs to 0x17bc/0x17c0, moving the buffer
- telling the bufmgr that the buffer should be in VRAM when we use it,
but not correcting the value sent to 0x17bc/0x17c0.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Fixes a failed assertion when a renderbuffer ID that was gen'd but not
previously bound was passed to glFramebufferRenderbuffer(). Generate
the same error that NVIDIA does.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
This fixes a problem when glDrawBuffers(GL_NONE). The fragment program
was writing to color output[0] but OutputsWritten was 0. That led to a
failed assertion in the Mesa->TGSI translation code.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
The extension string in GLES1 contexts always advertised
GL_OES_point_sprite. Now advertisement depends on ARB_point_sprite being
enabled.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Change all OES extension strings that depend on ARB_framebuffer_object to
instead depend on EXT_framebuffer_object.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Add GL_OES_stencil8 to ES2.
Remove the following:
GL_OES_compressed_paletted_texture : ES1
GL_OES_depth32 : ES1, ES2
GL_OES_stencil1 : ES1, ES2
GL_OES_stencil4 : ES1, ES2
Mesa advertised these extensions, but did not actually support them.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Place GL, GLES1, and GLES2 extensions in a unified extension table. This
allows one to enable, disable, and query the status of GLES1 and GLES2
extensions by name.
When tested on Intel Ironlake, this patch did not alter the extension
string [as given by glGetString(GL_EXTENSIONS)] for any API.
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
In particular, variables cannot be redeclared invariant after being
used.
Fixes piglit test invariant-05.vert and bugzilla #29164.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
With the change to not reset baselevel, this GL_LINEAR filtering was
resulting in generating mipmaps off of the base level instead of the
next higher detail level. Fixes fbo-generatemipmap-filtering.
Reported by: Neil Roberts <neil@linux.intel.com>
The ad-hoc placement of recalculation somewhere between when they got
invalidated and when they were next needed was confusing. This should
clarify what's going on here.
Update SConscripts to re-enable or add support for EGL on windows and
x11 platforms respectively. targets/egl-gdi is replaced by
targets/egl-static, where "-static" means pipe drivers and state
trackers are linked to statically by egl_gallium, and egl_gallium is a
built-in driver of libEGL. There is no more egl_gallium.dll on Windows.
This greatly improves codegen for programs with flow control by
allowing coalescing for all instructions at the top level, not just
ones that follow the last flow control in the program.
tgsi_helper_copy is used on several occasions to copy a temporary result
into the real destination register to emulate writemasks for OP3 and
reduction operations. According to R600 ISA that's unnecessary.
This patch fixes this use for MAD, CMP and DP4.
Fixes piglit tests glsl-1.20/compiler/qualifiers/in-01.vert and
glsl-1.20/compiler/qualifiers/out-01.vert and bugzilla #32910.
NOTE: This is a candidate for the 7.9 and 7.10 branches. This patch
also depends on the previous two commits.
When GCC encounters a division by zero in a preprocessor directive, it
generates an error. Since the GLSL spec says that the GLSL
preprocessor behaves like the C preprocessor, we should generate that
same error.
It's worth noting that I cannot find any text in the C99 spec that
says this should be an error. The only text that I can find is line 5
on page 82 (section 6.5.5 Multiplicative Opertors), which says,
"The result of the / operator is the quotient from the division of
the first operand by the second; the result of the % operator is
the remainder. In both operations, if the value of the second
operand is zero, the behavior is undefined."
Fixes 093-divide-by-zero.c test and bugzilla #32831.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
In _token_list_equal_ignoring_space(token_list_t*, token_list_t*), add
a guard that prevents dereferncing a null token list.
This fixes test src/glsl/glcpp/tests/092-redefine-macro-error-2.c and
Bugzilla #32695.
When attaching a small mipmap level to an FBO, the original gen4
didn't have the bits to support rendering to it. Instead of falling
back, just blit it to a new little miptree just for it, and let it get
revalidated into the stack later just like any other new teximage.
Bug #30365.
If we hit this path, we're level 1+ and the base level got allocated
as a single level instead of a full tree (so we don't match
intelObj->mt). This tries to recover from that so that we end up with
2 allocations and 1 validation blit (old -> new) instead of
allocations equal to number of levels and levels - 1 blits.
We don't need to worry about levels other than MaxLevel because we're
minifying -- the lower levels (higher detail) won't contribute to the
result. By changing BaseLevel, we forced hardware that doesn't
support BaseLevel != 0 to relayout the texture object.
This reverts commit 7ce6517f3a.
This reverts commit d60145d06d.
I was wrong about which generations supported baselevel adjustment --
it's just gen4, nothing earlier. This meant that i915 would have
never used the mag filter when baselevel != 0. Not a severe bug, but
not an intentional regression. I think we can fix the performance
issue another way.
Most _3DSTATE defines contain the command type, sub-type, opcode, and
sub-opcode (i.e. 0x7905). These, however, contain only the sub-opcode
(i.e. 0x05). Since they are inconsistent with the rest of the code and
nothing uses them, simply delete them.
The _3DOP and _3DCONTROL defines seemed similar, and were also unused.
From reading EXT_texture_sRGB and EXT_framebuffer_sRGB and interactions
with FBO I've found that swrast is converting the sRGB values to linear for
blending when an sRGB texture is bound as an FBO. According to the spec
and further explained in the framebuffer_sRGB spec this behaviour is not
required unless the GL_FRAMEBUFFER_SRGB is enabled and the Visual/config
exposes GL_FRAMEBUFFER_SRGB_CAPABLE_EXT.
This patch fixes swrast to use a separate Fetch call for FBOs bound to
SRGB and avoid the conversions.
v2: export _mesa_get_texture_dimensions as per Brian's comments.
Signed-off-by: Dave Airlie <airlied@redhat.com>
With core mesa doing runtime API checks, GLES overlay is no longer
needed. Make --enable-gles-overlay equivalent to --enable-gles[12].
There may still be places where compile-time checks are done. They
could be fixed case by case.
This is a step forward for compatibility with really old GLX. But the
real reason for making this change now is so that we can make egl_glx a
built-in driver without having to link to libGL.
In preparation for making egl_dri2 built-in. It also handles
symbol lookup error: /usr/local/lib/egl/egl_dri2.so: undefined symbol:
_glapi_get_proc_address
more gracefully.
If you want to enable noop set GALLIUM_NOOP=1 as an env variable.
You need first to enable noop wrapping for your driver see change
to src/gallium/targets/dri-r600/ in this commit as an example.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Add release function for texture_from_pixmap extension.
Some platform need to release texture image for texture_from_pixmap
extension, add this interface for those platforms.
https://bugs.freedesktop.org/show_bug.cgi?id=32912
The fix is to call update_derived_state before user buffer uploads.
I've also moved some code around.
Unfortunately, there are still some ZMASK-related bugs which cause
misrendering, i.e. flushing doesn't always work and glean/fbo fails.
The motivation behind this rework is to get some speed by reducing
CPU overhead. The performance increase depends on many factors,
but it's measurable (I think it's about 10% increase in Torcs).
This commit replaces libdrm's radeon_cs_gem with our own implemention.
It's optimized specifically for r300g, but r600g could use it as well.
Reloc writes and space checking are faster and simpler than their
counterparts in libdrm (the time complexity of all the functions
is O(1) in nearly all scenarios, thanks to hashing).
(libdrm's radeon_bo_gem is still being used in the driver.)
It works like this:
cs_add_reloc(cs, buf, read_domain, write_domain) adds a new relocation and
also adds the size of 'buf' to the used_gart and used_vram winsys variables
based on the domains, which are simply or'd for the accounting purposes.
The adding is skipped if the reloc is already present in the list, but it
accounts any newly-referenced domains.
cs_validate is then called, which just checks:
used_vram/gart < vram/gart_size * 0.8
The 0.8 number allows for some memory fragmentation. If the validation
fails, the pipe driver flushes CS and tries do the validation again,
i.e. it validates only that one operation. If it fails again, it drops
the operation on the floor and prints some nasty message to stderr.
cs_write_reloc(cs, buf) just writes a reloc that has been added using
cs_add_reloc. The read_domain and write_domain parameters have been removed,
because we already specify them in cs_add_reloc.
The space checking has been tested by putting small values in vram/gart_size
variables.
There really shouldn't be any difference between the two for us.
Fixes a bug where Z16 renderbuffers would be untiled on gen6, likely
leading to hangs.
By relying on just intel_span_supports_format, some formats that
aren't supported pre-gen4 were not reporting FBO incomplete. And we
also complained in stderr when it happened on i915 because draw_region
gets called before framebuffer completeness validation.
ARB_fbo no longer disallows mismatched width/height on attachments
(shouldn't be any problem), mixed format color attachments (we only
support 1), and L/A/LA/I color attachments (we already reject them on
965 too). It requires Gen'ed names (driver doesn't care), and adds
FramebufferTextureLayer (we don't do texture arrays). So it looks
like we're already in the position we need to be for this extension.
Bug #27468, #32381.
In general, we have to negate in immediate values we pass in because
the src1 negate field in the register description is in the bits3 slot
that the 32-bit value is loaded into, so it's ignored by the hardware.
However, the src0 negate field is in bits1, so after we'd negated the
immediate value loaded in, it would also get negated through the
register description. This broke this VP instruction in the position
calculation in civ4:
MAD TEMP[1], TEMP[1], CONST[256].zzzz, CONST[256].-y-y-y-y;
Bug #30156
This only uploads the [min_index, max_index] range instead of [0, userbuf size],
which greatly speeds up user buffer uploads.
This is also a prerequisite for atomizing vertex arrays in st/mesa.
Fix the error in uniform row calculating, it may alloc one line
more which may cause out of range on memory usage, sometimes program
aborted when free the memory.
NOTE: This is a candidate for 7.9 and 7.10 branches.
Signed-off-by: Brian Paul <brianp@vmware.com>
This provides an upload facility for the constant buffers since Marek's
constants in user buffers changes.
gears at least work on my evergreen now.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This should make it easier to cross-reference the code and hardware
documentation, as well as clear up any confusion on whether constants
like CMD_3D_WM_STATE mean WM_STATE (pre-gen6) or 3DSTATE_WM (gen6+).
This does not rename any pre-gen6 defines.
The draw module set new state that didn't require swtnl which caused need_swtnl to
be unset. This caused the call from to svga_update_state(svga, SVGA_STATE_SWTNL_DRAW)
from the vbuf backend to overwrite the vdecls we setup there to be overwritten with
the real buffers vdecls.
Previously the 'STDGL invariant(all)' pragma added in GLSL 1.20 was
simply ignored by the compiler. This adds support for setting all
variable invariant.
In GLSL 1.10 and GLSL ES 1.00 the pragma is ignored, per the specs,
but a warning is generated.
Fixes piglit test glsl-invariant-pragma and bugzilla #31925.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
GLSL 1.10 and 1.20 allow any sort of sampler array indexing.
Restrictions were added in GLSL 1.30. Commit f0f2ec4d added support
for the 1.30 restrictions, but it broke some valid 1.10/1.20 shaders.
This changes the error to a warning in GLSL 1.10, GLSL 1.20, and GLSL
ES 1.00.
There are some spurious whitespace changes in this commit. I changed
the layout (and wording) of the error message so that all three cases
would be similar. The 1.10/1.20 and 1.30 text is the same. The only
difference is that one is an error, and the other is a warning. The
GLSL ES 1.00 wording is similar but not quite the same.
Fixes piglit test
spec/glsl-1.10/compiler/constant-expressions/sampler-array-index-02.frag
and bugzilla #32374.
It's no longer needed because the upload buffer remains mapped while the CS
is being filled (openarena, ut2004 and others that this code was for do not
use VBOs by default).
The overhead of resource_create, transfer_inline_write, and resource_destroy
to upload constant data is very visible with some apps in sysprof, and
as such should be eliminated.
My approach uses a user buffer to pass a pointer to a driver. This gives
the driver the freedom it needs to take the fast path, which may differ
for each driver.
This commit addresses the same issue as Jakob's one that suballocates out
of a big constant buffer, but it also eliminates the copy to the buffer.
- Added a parameter to specify a minimum offset that should be returned.
r300g needs this to better implement user buffer uploads. This weird
requirement comes from the fact that the Radeon DRM doesn't support negative
offsets.
- Added a parameter to notify a driver that the upload flush occured.
A driver may skip buffer validation if there was no flush, resulting
in a better performance.
- Added a new upload function that returns a pointer to the upload buffer
directly, so that the buffer can be filled e.g. by the translate module.
The map/unmap overhead can be significant even though there is no waiting on busy
buffers. There is simply a huge number of uploads.
This is a performance optimization for Torcs, a car racing game.
Directly include mtypes.h if a file uses a gl_context struct. This
allows future removal of headers that are not strictly necessary but
indirectly include mtypes.h for a file.
BaseLevel/MaxLevel are mostly used for two things: clamping texture
access for FBO rendering, and limiting the used mipmap levels when
incrementally loading textures. By restricting our mipmap trees to
just the current BaseLevel/MaxLevel, we caused reallocation thrashing
in the common case, for a theoretical win if someone really did want
just levels 2..4 or whatever of their texture object.
Bug #30366
This has always been ugly about our texture code -- object base/max
level vs intel object first/last level vs image level vs miptree
first/last level. We now get rid of intelObj->first_level which is
just tObj->BaseLevel, and make intelObj->_MaxLevel clearly based off
of tObj->_MaxLevel instead of duplicating its code (incorrectly, as
image->MaxLog2 only considers width/height and not depth!)
It was quite a mess by trying to do NULL renderbuffers and real
renderbuffers in the same function. This clarifies the common case of
real renderbuffers.
This makes
fbo-generatemipmap-formats GL_EXT_texture_sRGB-s3tc
match
fbo-generatemipmap-formats GL_EXT_texture_compression_s3tc
and swrast in bad DXT1_RGBA alpha=0 handling, but it means we won't
unpack and repack someone's textures into uncompressed SARGB8 format.
It's just LUMINANCE, not LUMINANCE_ALPHA. Fixes
fbo-generatemipmap-formats GL_EXT_texture_sRGB-s3tc assertion failure
when it tries to pack the L8 channels into LUMINANCE_ALPHA and wonders
why it's trying to do that.
From the r600 ISA:
Each ALU clause can lock up to four sets of constants
into the constant cache. Each set (one cache line) is
16 128-bit constants. These are split into two groups.
Each group can be from a different constant buffer
(out of 16 buffers). Each group of two constants consists
of either [Line] and [Line+1] or [line + loop_ctr]
and [line + loop_ctr +1].
For supporting more than 64 constants, we need to
break the code into multiple ALU clauses based
on what sets of constants are needed in that clause.
Note: This is a candidate for the 7.10 branch.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
rc_inst_can_use_presub() wasn't checking for too many RGB sources in
Alpha instructions or too many Alpha sources in RGB instructions.
Note: This is a candidate for the 7.10 branch.
Perform this check in ast_declarator_list::hir().
From section 4.3.6 of the GLSL 1.30 spec:
"If a vertex output is a signed or unsigned integer or integer
vector, then it must be qualified with the interpolation
qualifier
flat."
Allow redeclaration of the following built-in variables with an
interpolation qualifier in language versions >= 1.30:
* gl_FrontColor
* gl_BackColor
* gl_FrontSecondaryColor
* gl_BackSecondaryColor
* gl_Color
* gl_SecondaryColor
See section 4.3.7 of the GLSL 1.30 spec.
We were looking at the current draw buffer instead to see whether the
depth/stencil combination matched. So you'd get told your framebuffer
was complete, until you bound it and went to draw and we decided that
it was incomplete.
The _ColorDrawBuffers is a piece of computed state that gets for the
current draw/read buffers at _mesa_update_state time. However, this
function actually gets used for non-current draw/read buffers when
checking if an FBO is complete from the driver's perspective. So,
instead of trying to just look at the attachment points that are
currently referenced by glDrawBuffers, look at all attachment points
to see if they're driver-supported formats. This appears to actually
be more in line with the intent of the spec, too.
Fixes a segfault in my upcoming fbo-clear-formats piglit test, and
hopefully bug #30278
Until we know how hw converts quads to polygon in beginning of
3D pipeline, for now unconditionally use last vertex convention.
Fix glean/clipFlat case.
When figuring out whether a renderbuffer should be used to set the
visual bits of an FBO, we were missing important baseformats like
GL_RED, GL_RG, and GL_LUMINANCE.
st/egl should be enabled with --enable-openvg even the driver is xlib or
osmesa. Also, GLX_DIRECT_RENDERING should not be defined because libdrm
is not checked.
I think was used long ago, when we actually read the builtins into the
shader's instruction stream directly, rather than creating a separate
shader and linking the two. It doesn't seem to serve any purpose now.
The initial values in the grctx are 0x0000 anyway, and re-binding them
all to 0x0000 destroys some init done by the nouveau drm.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This reverts commit de6fd527a5.
Revert this workaround as it seems the real trouble is caused by
lineloop, which doesn't require GS convert on sandybridge actually.
Gen4 and Gen5 hardware can have a maximum supported nesting depth of 16.
Previously, shaders with control flow nested 17 levels deep would
cause a driver assertion or segmentation fault.
Gen6 (Sandybridge) hardware no longer has this restriction.
Fixes fd.o bug #31967.
This adds a new optional max_depth parameter (defaulting to 0) to
lower_if_to_cond_assign, and makes the pass only flatten if-statements
nested deeper than that.
By default, all if-statements will be flattened, just like before.
This patch also renames do_if_to_cond_assign to lower_if_to_cond_assign,
to match the new naming conventions.
Determine header present for fb write by msg length is not right
for SIMD16 dispatch, and if there're more output attributes, header
present is not easy to tell from msg length. This explicitly adds
new param for fb write to say header present or not.
Fixes many cases' hang and failure in GL conformance test.
Set window_bit only when the visual id is greater than zero. Correct
visual types. Skip slow configs as they are not relevant. Finally, do
not return duplicated configs.
All single-buffered configs were ignored before to make sure
EGL_RENDER_BUFFER is settable for window surfaces. It is better to
allow single-buffered configs and set EGL_WINDOW_BIT only for
double-buffered ones. This way there can be single-buffered pixmaps.
The SNB alt-mode math does the denorm and inf reduction even for a
"raw MOV" like we do for g0 message header setup, where we are moving
values that aren't actually floats. Just use UD type, where raw MOVs
really are raw MOVs.
Fixes glxgears since c52adfc2e1, but no
piglit tests had regressed(!)
This gets my vote for most pointless extension of all time, I'm guessing
some driver could possibly optimise for this instead of counting it might
just get a true/false, but I'm not really sure.
need this to eventually advertise 3.3 despite its total uselessness.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Rename MAPI_GLAPI_SOURCES to MAPI_UTIL_SOURCES. Rename macro
MAPI_GLAPI_CURRENT to MAPI_MODE_UTIL. Update the comments to make it
clear that mapi may be used in two ways and how.
These mistakenly computed 't' instead of t * t * (3.0 - 2.0 * t).
Also, properly vectorize the smoothstep(float, float, vec) variants.
NOTE: This is a candidate for the 7.9 and 7.10 branches.
No need to register function prototypes in the module now that we call
the C function pointer directly -- less LLVM objects lying around.
Limited testing with lp_test_format.
Even though a bound texture stays bound when calling set_fragment_sampler_views,
it must be assigned a new cache region depending on the occupancy of other
texture units.
This fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=28800
Thanks to Álmos <aaalmosss@gmail.com> for finding the bug in the code.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
We need to keep using the pipe_get_tile_swizzle() even though there's
no swizzling because we need to explicitly pass in the surface format.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=32459
The only mismatch between the two is that we have to clear the
destination's alpha to 1.0. Fixes WOW performance on my Ironlake,
from a few frames a second to almost playable.
Before, we were going off of a couple of known (hopeful) matches
between internalFormats and the cpp of the read buffer. Instead, we
can now just look at the gl_format of the two to see if they match.
We should avoid bad blits that might have been possible before, but
also allow different internalFormats to work without having to
enumerate each one.
The blit that follows appears in the command stream so it's serialized
with previous rendering. Any queued vertices in the tnl layer were
already flushed up in mesa/main/.
Create a constant int pointer to the C function, then cast it to the
function's type. This avoids using trampoline code which seem to be
inadvertantly freed by LLVM in some situations (which leads to segfaults).
The root issue and work-around were found by José.
NOTE: This is a candidate for the 7.10 branch
Michel Hermier reported libdrm segfault (and kernel crash) on nv40 using
gallium :
http://www.mail-archive.com/nouveau@lists.freedesktop.org/msg06563.html
It turns out these were caused by some missing WAIT_RING (or wrong
computation of the WAIT_RING sizes). Unlike all other libdrm_nouveau users,
nvfx gallium tried to use a mininum calls of WAIT_RING, one WAIT_RING could
apply to many methods for different code paths and spread across several
functions. This made it too tricky to find out what the missing or wrong
WAIT_RING was.
By restoring BEGIN_RING, we force one WAIT_RING per method, and it's much
easier to check if the free size required in the pushbuffer is correct. As
curro said, "let's keep it simple for the maintainers until the big
bottlenecks are gone"
Benchmarked on nv35 with openarena, nexuiz and ut2004 and no performance
regression.
The core of this patch was made with Coccinelle, with minor manual fixes
made on top.
Tested-by: Michel Hermier <hermier@frugalware.org>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
This is the hack for input interactivity of frontbuffer rendering
(like we do for backbuffer at intelDRI2Flush()) by waiting for the n-2
frame to complete before starting a new one. However, for an
application doing multiple contexts or regular rebinding of a single
context, this would end up lockstepping the CPU to the GPU because
every unbind was considered the end of a frame.
Improves WOW performance on my Ironlake by 48.8% (+/- 2.3%, n=5)
Given a dispatch slot, entry_get_public returns the address of the
corresponding public entry point. There may be more than one of them.
But since they are all equivalent, it is fine to return any one of them.
With entry_get_public, the address of any public entry point can be
calculated at runtime when an assembly dispatcher is used. There is no
need to have a mapping table in such case. This omits the unnecessary
relocations from the binary.
Hidden entries are just like normal entries except that they are not
exported. Since it is not always possible to hide them, and two hidden
aliases can share the same entry, the name of hidden aliases are mangled
to '_dispatch_stub_<slot>'.
Split out function name generation from _c_decl to _c_function, and use
it everywhere. Add an optional 'export' argument to _cdecl. It is
prepended to the returned string.
Really no idea why I didn't see this before, but these values were opposite
the register spec.
this seems to fix rv530 HiZ on my laptop, will reenable in next commit.
Signed-off-by: Dave Airlie <airlied@redhat.com>
For GL fragColor semantics we need to tell the pipe drivers that the fragment
shader color result is to be replicated to all bound color buffers, this
adds the basic TGSI + documentation.
v2: fix missing comma pointed out by Tilman on mesa-dev.
Signed-off-by: Dave Airlie <airlied@redhat.com>
If 'start' is odd, render the first triangle with indices embedded
in the command stream, which adds 3 to 'start' and makes it even.
Then continue with the fast path.
This is a bandaid on the problem that if some formats were not renderable
(like luminance_alpha), st/mesa fell back to some RGBA format, so basically
some non-renderable formats were actually not used at all. This is only
a problem with hardware drivers, softpipe can render to anything.
Instead, require only RGB8/RGBA8 to be renderable.
Radeon GPUs can do this. R600 can even do render-to-texture.
Packing and extracting aren't implemented, but we shouldn't hit them (I think).
Tested with swrast, softpipe, and r300g.
Just like everywhere else, I never trust my constant uploads to
correctly put constants in the right places, even though that's so
rarely where the issue is.
It's mostly like gen4 message descriptor setup, except that the sizes
of type/control changed to be like gen5. Fixes 21 piglit cases on
gm45, including the regressions in bug #32311 from increased VS
constant buffer usage.
Fixes this GCC warning.
lp_bld_const.h: In function 'lp_build_const_int_pointer':
lp_bld_const.h:137: warning: cast from pointer to integer of different size
Can't get away from referencing upload buffer as after flush a vertex buffer
using the upload buffer might still be active. Likely need to simplify the
pipe_refence a bit so we don't waste so much cpu time in it.
candidates for 7.10 branch
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Instead of when we read texture tiles. Now swizzling happens after
the shadow depth compare step. This fixes the piglit glsl-fs-shadow2d*
tests (except for proj+bias because of a GLSL bug).
We need to swizzle after the shadow comparison so that the GL_DEPTH_MODE
functionality is handled properly.
This fixes all the piglit glsl-fs-shadow2d*.shader_test cases, except
for glsl-fs-shadow2dproj-bias.shader_test which fails because of a
bug in the GLSL compiler (fd.o 32395).
Note the support for non float vertex draw likely regressed need to
find what we want to do there.
candidates for 7.10 branches
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
This is still awful, but my ability to care about reworking the old
backend so we can just get a temporary value into a POW is awfully low
since the new backend does this all sensibly.
Fixes:
fp1-LIT test 1
fp1-LIT test 3 (case x < 0)
fp1-POW test (exponentiation)
fp-lit-mask
commit 4f106f44a32eaddb6cf3fea6ba5ee9787bff609a
Author: Brian Paul <brianp@vmware.com>
Date: Mon Dec 13 14:06:08 2010 -0700
st/mesa: reorganize vertex program translation code
Now it looks like the fragment and geometry program code.
Also remove the serial number fields from programs. It was used to
determine when new translations were needed. Now the variant key is
used for that. And the st_program_string_notify() callback removes all
variants when the program's code is changed.
commit e12d6791c5e4bff60bb2e6c04414b1b4d1325f3e
Author: Brian Paul <brianp@vmware.com>
Date: Mon Dec 13 13:38:12 2010 -0700
st/mesa: implement geometry shader varients
Only needed in order to support per-context gallium shaders.
commit c5751c673644808ab069259a852f24c4c0e92b9d
Author: Brian Paul <brianp@vmware.com>
Date: Sun Dec 12 15:28:57 2010 -0700
st/mesa: restore glDraw/CopyPixels using new fragment program variants
Clean up the logic for fragment programs for glDraw/CopyPixels. We now
generate fragment program variants for glDraw/CopyPixels as needed which
do texture sampling, pixel scale/bias, pixelmap lookups, etc.
commit 7b0bb99bab6547f503a0176b5c0aef1482b02c97
Author: Brian Paul <brianp@vmware.com>
Date: Fri Dec 10 17:03:23 2010 -0700
st/mesa: checkpoint: implement fragment program variants
The fragment programs variants are per-context, as the vertex programs.
NOTE: glDrawPixels is totally broken at this point.
commit 2cc926183f957f8abac18d71276dd5bbd1f27be2
Author: Brian Paul <brianp@vmware.com>
Date: Fri Dec 10 14:59:32 2010 -0700
st/mesa: make vertex shader variants per-context
Gallium shaders are per-context but OpenGL shaders aren't. So we need
to make a different variant for each context.
During context tear-down we need to walk over all shaders/programs and
free all variants for the context being destroyed.
The new GLSL compiler doesn't support geom shaders yet so disable the
GL_ARB_geometry_shader4 extension. Undo this when geom shaders work again.
NOTE: This is a candidate for the 7.10 branch.
This is the same as what the array dereference handler does.
Fixes piglit test glsl-link-struct-array (bugzilla #31648).
NOTE: This is a candidate for the 7.9 and 7.10 branches.
The hardware supports zero stride just fine. This is a port
of 2af8a19831 from r300g.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
The RS690 memory controller prefers things to be on a different
boundary than the discrete GPUs, we had an attempt to fix this,
but it still failed, this consolidates the stride calculation
into one place and removes the really special case check.
This fixes gnome-shell and 16 piglit tests on my rs690 system.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
Signed-off-by: Dave Airlie <airlied@redhat.com>
tgsi_helper_copy is used on several occasions to copy a temporary result
into the real destination register to emulate writemasks for OP3 and
reduction operations. According to R600 ISA that's unnecessary.
This patch fixes this use for MAD, CMP and DP4.
When the driver is the last reference to libEGL.so, unloading it will
cause libEGL.so to be unmapped and give problems. Disable the unloading
for now. Still have to figure out the right timing to unload drivers.
The hardware apparently does support a zero stride, so let's use it.
This fixes missing objects in ETQW, but might also fix a ton of other
similar-looking bugs.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
If a source operand has a non-native swizzle (e.g. the KIL instruction
cannot have a swizzle other than .xyzw), the lowering pass uses one or more
MOV instructions to move the operand to an intermediate temporary with
native swizzles.
This commit fixes that the presubtract information was lost during
the lowering.
NOTE: This is a candidate for both the 7.9 and 7.10 branches.
This fixes broken rendering of trees in ETQW. The trees still disappear
for an unknown reason when they are close.
Broken since:
2ff9d4474b
r300/compiler: make lowering passes possibly use up to two less temps
NOTE: This is a candidate for the 7.10 branch.
do_assignment may apply implicit conversions to coerce the base type
of initializer to the base type of the variable being declared. Fixes
piglit test glsl-implicit-conversion-02 (bugzilla #32287). This
probably also fixes bugzilla #32273.
NOTE: This is a candidate for the 7.9 branch and the 7.10 branch.
There are exceptions to the table for depth texturing or rendering to
not-quite-supported formats thanks to the non-orthogonal component
selection for surface formats, but it's still a lot simpler.
Fixes piglit valgrind glsl-array-bounds-04 failure (FDO bug 29946).
NOTE:
This is a candidate for the 7.10 branch.
This is a candidate for the 7.9 branch.
The current state is allowed to be undefined past DrawElements et al.
Consequently omit that copying at least in the display list code.
This pays us some percents performance.
Signed-off-by: Brian Paul <brianp@vmware.com>
assert(current_save_state < MAX_META_OPS_DEPTH) did not compile.
Rename current_save_state to SaveStackDepth to be more consistent with
the style of the other fields.
VS places color attributes together so that SF unit can fetch the right
attribute according to object orientation. This fixes light issue in
mesa demo geartrain, projtex.
_mesa_meta_CopyPixels results in nested meta operations on Sandybridge.
Previoulsy the second meta operation overrides all states saved by the
first meta function.
When the application is not linked to any libGL*.so, loading st_GL.so
would give
/usr/local/lib/egl/st_GL.so: undefined symbol: _glapi_tls_Context
In that case, load libGL.so and try again. This works because
util_dl_open loads with RTLD_GLOBAL.
Fix "clear" OpenGL ES 1.1 demo.
Fixes piglit glx-shader-sharing crash.
When shaders are shared by multiple contexts, the shader's draw context
pointer may point to a previously destroyed context. Dereferencing the
context pointer will lead to a crash.
In this case, simply removing the flushing code avoids the crash (the
exec and sse shader paths don't flush here either).
There's a deeper issue here, however, that needs examination. Shaders
should not keep pointers to contexts since contexts might get destroyed
at any time.
NOTE: This is a candidate for the 7.10 branch (after this has been
tested for a while).
Currently we only unroll loops with conditional breaks at the end, which is
the form that lower_jumps generates.
However, if breaks are not lowered, they tend to appear at the beginning, so
add support for a conditional break anywhere.
Signed-off-by: Luca Barbieri <luca@luca-barbieri.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Found this bug by code inspection. Based off the comments just before
this code, the intent is to find whether the break exists in the "then"
branch or the "else" branch. However, the code actually looked at the
last instruction in the "then" branch twice.
If you used a constant array index to access the matrix, we'd flag a
bunch of wrong inputs/outputs as being used because the index was
multiplied by matrix columns and the actual used index was left out.
Fixes glsl-mat-attribute.
Fixes this GCC warning.
brw_fs.cpp: In function 'brw_reg brw_reg_from_fs_reg(fs_reg*)':
brw_fs.cpp:3255: warning: 'brw_reg' may be used uninitialized in this function
Allow important performance increase by doing hw specific implementation
of the upload manager helper. Drop the range flushing that is not hit with
this code (and wasn't with previous neither). Performance improvement are
mostly visible on slow CPU.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
r600g is up to a point where all small CPU cycle matter and pb* turn
high on profile. It's mostly because pb try to be generic and thus
trigger unecessary check for r600g driver. To avoid having too much
abstraction & too much depth in the call embedded everythings into
r600_bo. Make code simpler & faster. The performance win highly depend
on the CPU & application considered being more important on slower CPU
and marginal/unoticeable on faster one.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
This eases the gen6 implementation, which can only handle up to 32
registers of constants, while likely not penalizing real apps using
reladdr since all of those I've seen also end up hitting the pull
constant buffer. On gen6, the constant map means that simple NV VPs
fit under the 32-reg limit and now succeed. Fixes around 10 testcases.
Raise error if a sampler array is indexed with a non-constant expression.
From section 4.1.7 of the GLSL 1.30 spec:
"Samplers aggregated into arrays within a shader (using square
brackets [ ]) can only be indexed with integral constant
expressions [...]."
System values are shader inputs which don't necessarily change from
vertex to vertex or fragment to fragment. gl_InstanceID and
gl_FrontFacing are examples.
Attribute 0 has no special meaning in GLES2. Add VertexAttrib4f_nopos
for that purpose and make _es_VertexAttrib* call the new function.
Rename _vbo_* to _es_* to avoid confusion. These functions are only
used by GLES, and now some of them (_es_VertexAttrib*) even behave
differently than vbo_VertexAttrib*.
CMP may now use two less temps, other non-native instructions may end up
using one less temp, except for SIN/COS/SCS, which I am leaving unchanged
for now.
This may reduce register pressure inside loops, because the register
allocator doesn't do a very good job there.
That's what I get for not running piglit before pushing.
Don't try to patch types of unsized arrays when linking fails.
Don't try to patch types of unsized arrays that are shared between
shader stages.
Fixes piglit test case glsl-vec-array (bugzilla #31908).
NOTE: This bug does not affect 7.9, but I think this patch is a
candiate for the 7.9 branch anyway.
Types of declared variables and their initializer must match excatly
except for unsized arrays. Previously the type inherritance for
unsized arrays happened implicitly in the emitted assignment.
However, this assignment is never emitted for uniforms. Now that type
is explicitly copied unconditionally.
Fixes piglit test array-compare-04.vert (bugzilla #32035) and
glsl-array-uniform-length (bugzilla #31985).
NOTE: This is a candidate for the 7.9 branch.
With the change of extended math from having the arguments moved into
mrfs and handed off through message passing to being directly hooked
up to the EU, it looks like the piece for doing source modifiers
(negate and abs) was left out.
Fixes:
fog-modes
glean/fp1-ARB_fog_exp test
glean/fp1-ARB_fog_exp2 test
glean/fp1-Computed fog exp test
glean/fp1-Computed fog exp2 test
ext_fog_coord-modes
Previously the code only handled scalars and vectors. This new code
is modeled somewhat after similar code in ir_to_mesa.
Reviewed-by: Eric Anholt <eric@anholt.net>
R6XX GPU doesn't like to have two partial flush writting
back to memory in row without a prior flush of the pipeline.
Add PS_PARTIAL_FLUSH to flush all work between the CP and
the ES, GS, VS, PS shaders.
Thanks a lot to Alban Browaeys (prahal on irc) for investigating
this issue.
Signed-off-by: Alban Browaeys <prahal@yahoo.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
gen6 builtin RSQ apparently clamps negative values to 0 instead of
returning the RSQ of the absolute value like ARB_fragment_program
desires and pre-gen6 apparently does.
Fixes:
glean/fp1-RSQ test 2 (reciprocal square root of negative value)
glean/vp1-RSQ test 2 (reciprocal square root of negative value)
Fixes this GCC warning.
r200_maos_arrays.c: In function 'r200EmitArrays':
r200_maos_arrays.c:113: warning: 'emitsize' may be used uninitialized in this function
It's not always possible to preprocess the content of 3D_LOAD_VBPNTR
in a command buffer, because the offset to all vertex buffers (which
the packet depends on) is derived from the "start" parameter of draw_arrays
and the "indexBias" parameter of draw_elements, but we can at least lazily
make a command buffer for the case when offset == 0, which should occur
most of the time.
The original lazy built-in importing patch did not add the newly created
function to the symbol table, nor actually emit it into the IR stream.
Adding it to the symbol table is non-trivial since importing occurs when
generating some ir_call in a nested scope. A new add_global_function
method, backed by new symbol_table code in the previous patch, handles
this.
Fixes bug #32030.
Avoid rebuilding constant shader state at each draw call,
factor out spi update that might change at each draw call.
Best would be to update spi only when revealent states
change (likely only flat shading & sprite point).
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Vertex elements change are less frequent than draw call, those to
avoid rebuilding fetch shader to often build the fetch shader along
vertex elements. This also allow to move vertex buffer setup out
of draw path and make update to it less frequent.
Shader update can still be improved to only update SPI regs (based
on some rasterizer state like flat shading or point sprite ...).
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
More than 1023 temporaries were being used for a Cinebench shader before
doing temporary optimization, causing the index value to wrap around to
-1024.
In st_finalize_texture() we were looking at the st_texture_object::
lastLevel field instead of the pipe_resource::last_level field to
determine which resource to store the mipmap in.
Then, in st_generate_mipmap() we need to call st_finalize_texture() to
make sure the destination resource is properly allocated.
These changes fix the broken piglit fbo-generatemipmap-formats test.
Use --cppflags instead of --cflags so that we get the -I and -D flags
we want, but not compiler options like -O3.
A similar change should probably be made for autoconf.
When X or Y or Z is close to W the outcome of the floating point clip
test comparision may be different between the C and x86 asm paths.
That's OK; don't report an error.
See fd.o bug 32093
It was only used for gen6 fragment programs (not GLSL shaders) at this
point, and it was clearly unsuited to the task -- missing opcodes,
corrupted texturing, and assertion failures hit various applications
of all sorts. It was easier to patch up the non-glsl for remaining
gen6 changes than to make brw_wm_glsl.c complete.
Bug #30530
Since the 8-wide first-quarter and 16-wide first-half have the same
bit encoding, we now need to track "do you want instruction
compression" in the compile state.
The FS backend is fine with register level granularity. But for the
brw_wm_emit.c backend, it expects pairs of regs to be used for the
constants, because the whole world is pairs of regs. If an odd number
got used, we went looking for interpolation in the wrong place.
In the SF and brw_fs.cpp fixes to set up interpolation sanely on gen6,
the setup for 16-wide interpolation was left behind. This brings
relative sanity to that path too.
Payload reg setup on gen6 depends more on the dispatch width as well
as the uses_depth, computes_depth, and other flags. That's something
we want to decide at compile time, not at cache lookup. As a bonus,
the fragment shader program cache lookup should be cheaper now that
there's less to compute for the hash key.
The preprocessor magic in mapi was nothing but obfuscation. Rewrite
mapi_abi.py to generate real C code.
This commit removes the hack added in
43121f2086.
Need to check the required primitive type for GS on Sandybridge,
and when GS is disabled, the new state has to be issued too, instead
of only updating URB state with no GS entry, that caused hang on Sandybridge.
This fixes hang issue during conformance suite testing.
This fixes endless vertex shader recompilations in find_translated_vp
if the shader contains an edge flag output.
NOTE: This is a candidate for the 7.9 branch.
Signed-off-by Brian Paul <brianp@vmware.com>
Finished up by Marek Olšák.
We can set the constant space to use a different area per-call to the shader,
we can avoid flushing the PVS as often as we do by spreading out the constants
across the whole constant space.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
It appears to be a constant buffer index (in case there are more constant
buffers explicitly used by a shader), i.e. something that Gallium currently
does not use. We treated it incorrectly as the offset to a constant buffer.
Sometimes I'm on the train and want to just read what's generated
under INTEL_DEBUG=vs,wm for some code on another generation. Or, for
the next gen enablement we'll want to dump aub files before we have
the actual hardware. This will let us do that.
rgb_src_factor and rgb_dst_factor should be PIPE_BLENDFACTOR_ONE for
VG_BLEND_SRC_IN and VG_BLEND_DST_IN respectively. VG_BLEND_SRC_OVER can
be supported only when the fb has no alpha channel. VG_BLEND_DST_OVER
and VG_BLEND_ADDITIVE have to be supported with a shader.
Note that Porter-Duff blending rules assume premultiplied alpha.
Convert color values to and back from premultiplied form for blending.
Finally the rendering result of the blend demo looks much closer to that
of the reference implementation.
Drawing an image in VG_DRAW_IMAGE_STENCIL mode produces per-channel
alpha for use in blending. Add a new shader stage to produce and save
it in TEMP[1].
For other modes that do not need per-channel alpha, the stage does
MOV TEMP[1], TEMP[0].wwww
Add a helper function, blend_generic, that supports all blend modes and
per-channel alpha. Make other blend generators a wrapper to it.
Both the old and new code expects premultiplied colors, yet the input is
non-premultiplied. Per-channel alpha is also not used for stencil
image. They still need to be fixed.
In find_value() check if we've hit the 0th/invalid entry before checking
if the pname matches.
Fixes http://bugs.freedesktop.org/show_bug.cgi?id=31987
NOTE: This is a candidate for the 7.9 branch.
If querying the default/window-system FBO's attachment type, return
GL_FRAMEBUFFER_DEFAULT (per the GL_ARB_framebuffer_object spec).
See http://bugs.freedesktop.org/show_bug.cgi?id=31947
NOTE: This is a candidate for the 7.9 branch.
Return 0 instead of generating an error.
See http://bugs.freedesktop.org/show_bug.cgi?id=30993
Note that piglit fbo-getframebufferattachmentparameter-01 still does
not pass. But Mesa behaves the same as the NVIDIA driver in this case.
Perhaps the test is incorrect.
NOTE: This is a candidate for the 7.9 branch.
It was done in path-to-polygon conversion. That meant that the
results were invalidated when the transformation was modified, and CPU
had to recreate the vertex buffer with new vertices. It could be a
performance hit for apps that animate.
gl_FragCoord.y needs to be flipped upside down if a FBO is bound.
This fixes:
- piglit/fbo-fragcoord
- https://bugs.freedesktop.org/show_bug.cgi?id=29420
Here I add a new program state STATE_FB_WPOS_Y_TRANSFORM, which is set based
on whether a FBO is bound. The state contains a pair of transformations.
It can be either (XY=identity, ZW=transformY) if a FBO is bound,
or (XY=transformY, ZW=identity) otherwise, where identity = (1, 0),
transformY = (-1, height-1).
A classic driver (or st/mesa) may, based on some other state, choose whether
to use XY or ZW, thus negate the conditional "if (is a FBO bound) ...".
The reason for this is that a Gallium driver is allowed to only support WPOS
relative to either the lower left or the upper left corner, so we must flip
the Y axis accordingly again. (the "invert" parameter in emit_wpos_inversion)
NOTE: This is a candidate for the 7.9 branch.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Fixes nasty bug where some parts of the code didn't define WIN32_THREADS
and were using the integer mutex implementation, causing even confusion
to the debuggers.
And there is little interest of other thread implemenation on Win32
besides Win32 threads.
This allows 16K x 16K 2D textures, for example, but we don't want to
allow that for 3D textures. The new gl_constants::MaxTextureMBytes
field is used to prevent allocating too large of texture image.
This allows a 16K x 32 x 32 3D texture, for example, but prevents 16K^3.
Drivers can override this limit. The default is currently 1GB.
Apps should use the proxy texture mechanism to determine the actual
max texture size.
resources have a array_size parameter now.
get_tex_surface and tex_surface_destroy have been renamed to create_surface
and surface_destroy and moved to context, similar to sampler views (and
create_surface now uses a template just like create_sampler_view). Surfaces
now really should only be used for rendering. In particular they shouldn't be
used as some kind of 2d abstraction for sharing a texture. offset/layout fields
don't make sense any longer and have been removed, width/height should go too.
surfaces and sampler views now specify a layer range (for texture resources),
layer is either array slice, depth slice or cube face.
pipe_subresource is gone array slices (or cube faces) are now treated the same
as depth slices in transfers etc. (that is, they use the z coord of the
respective functions).
Squashed commit of the following:
commit a45bd509014743d21a532194d7b658a1aeb00cb7
Merge: 1aeca28 32e1e59
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Dec 2 04:32:06 2010 +0100
Merge remote branch 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/drivers/i915/i915_resource_texture.c
src/gallium/drivers/i915/i915_state_emit.c
src/gallium/drivers/i915/i915_surface.c
commit 1aeca287a827f29206078fa1204715a477072c08
Merge: 912f042 6f7c8c3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Dec 2 00:37:11 2010 +0100
Merge remote branch 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/state_trackers/vega/api_filters.c
src/gallium/state_trackers/vega/api_images.c
src/gallium/state_trackers/vega/mask.c
src/gallium/state_trackers/vega/paint.c
src/gallium/state_trackers/vega/renderer.c
src/gallium/state_trackers/vega/st_inlines.h
src/gallium/state_trackers/vega/vg_context.c
src/gallium/state_trackers/vega/vg_manager.c
commit 912f042e1d439de17b36be9a740358c876fcd144
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Dec 1 03:01:55 2010 +0100
gallium: even more compile fixes after merge
commit 6fc95a58866d2a291def333608ba9c10c3f07e82
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Dec 1 00:22:26 2010 +0100
gallium: some fixes after merge
commit a8d5ffaeb5397ffaa12fb422e4e7efdf0494c3e2
Merge: f7a202f 2da02e7
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Nov 30 23:41:26 2010 +0100
Merge remote branch 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/drivers/i915/i915_state_emit.c
src/gallium/state_trackers/vega/api_images.c
src/gallium/state_trackers/vega/vg_context.c
commit f7a202fde2aea2ec78ef58830f945a5e214e56ab
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Nov 24 19:19:32 2010 +0100
gallium: even more fixes/cleanups after merge
commit 6895a7f969ed7f9fa8ceb788810df8dbcf04c4c9
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Nov 24 03:07:36 2010 +0100
gallium: more compile fixes after merge
commit af0501a5103b9756bc4d79167bd81051ad6e8670
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Nov 23 19:24:45 2010 +0100
gallium: lots of compile fixes after merge
commit 0332003c2feb60f2a20e9a40368180c4ecd33e6b
Merge: 26c6346 b6b91fa
Author: Roland Scheidegger <sroland@vmware.com>
Date: Tue Nov 23 17:02:26 2010 +0100
Merge remote branch 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/auxiliary/gallivm/lp_bld_sample.c
src/gallium/auxiliary/util/u_blit.c
src/gallium/auxiliary/util/u_blitter.c
src/gallium/auxiliary/util/u_inlines.h
src/gallium/auxiliary/util/u_surface.c
src/gallium/auxiliary/util/u_surfaces.c
src/gallium/docs/source/context.rst
src/gallium/drivers/llvmpipe/lp_rast.c
src/gallium/drivers/nv50/nv50_state_validate.c
src/gallium/drivers/nvfx/nv04_surface_2d.c
src/gallium/drivers/nvfx/nv04_surface_2d.h
src/gallium/drivers/nvfx/nvfx_buffer.c
src/gallium/drivers/nvfx/nvfx_miptree.c
src/gallium/drivers/nvfx/nvfx_resource.c
src/gallium/drivers/nvfx/nvfx_resource.h
src/gallium/drivers/nvfx/nvfx_state_fb.c
src/gallium/drivers/nvfx/nvfx_surface.c
src/gallium/drivers/nvfx/nvfx_transfer.c
src/gallium/drivers/r300/r300_state_derived.c
src/gallium/drivers/r300/r300_texture.c
src/gallium/drivers/r600/r600_blit.c
src/gallium/drivers/r600/r600_buffer.c
src/gallium/drivers/r600/r600_context.h
src/gallium/drivers/r600/r600_screen.c
src/gallium/drivers/r600/r600_screen.h
src/gallium/drivers/r600/r600_state.c
src/gallium/drivers/r600/r600_texture.c
src/gallium/include/pipe/p_defines.h
src/gallium/state_trackers/egl/common/egl_g3d_api.c
src/gallium/state_trackers/glx/xlib/xm_st.c
src/gallium/targets/libgl-gdi/gdi_softpipe_winsys.c
src/gallium/targets/libgl-gdi/libgl_gdi.c
src/gallium/tests/graw/tri.c
src/mesa/state_tracker/st_cb_blit.c
src/mesa/state_tracker/st_cb_readpixels.c
commit 26c6346b385929fba94775f33838d0cceaaf1127
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Aug 2 19:37:21 2010 +0200
fix more merge breakage
commit b30d87c6025eefe7f6979ffa8e369bbe755d5c1d
Merge: 9461bf3 1f1928d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Aug 2 19:15:38 2010 +0200
Merge remote branch 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/drivers/llvmpipe/lp_rast.c
src/gallium/drivers/llvmpipe/lp_rast_priv.h
src/gallium/drivers/r300/r300_blit.c
src/gallium/drivers/r300/r300_screen_buffer.c
src/gallium/drivers/r300/r300_state_derived.c
src/gallium/drivers/r300/r300_texture.c
src/gallium/drivers/r300/r300_texture.h
src/gallium/drivers/r300/r300_transfer.c
src/gallium/drivers/r600/r600_screen.c
src/gallium/drivers/r600/r600_state.c
src/gallium/drivers/r600/r600_texture.c
src/gallium/drivers/r600/r600_texture.h
src/gallium/state_trackers/dri/common/dri1_helper.c
src/gallium/state_trackers/dri/sw/drisw.c
src/gallium/state_trackers/xorg/xorg_exa.c
commit 9461bf3cfb647d2301364ae29fc3084fff52862a
Merge: 17492d7 0eaccb3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jul 15 20:13:45 2010 +0200
Merge commit 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/auxiliary/util/u_blitter.c
src/gallium/drivers/llvmpipe/lp_rast.c
src/gallium/drivers/llvmpipe/lp_surface.c
src/gallium/drivers/r300/r300_render.c
src/gallium/drivers/r300/r300_state.c
src/gallium/drivers/r300/r300_texture.c
src/gallium/drivers/r300/r300_transfer.c
src/gallium/tests/trivial/quad-tex.c
commit 17492d705e7b7f607b71db045c3bf344cb6842b3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Fri Jun 18 10:58:08 2010 +0100
gallium: rename element_offset/width fields in views to first/last_element
This is much more consistent with the other fields used there
(first/last level, first/last layer).
Actually thinking about removing the ugly union/structs again and
rename first/last_layer to something even more generic which could also
be used for buffers (like first/last_member) without inducing headaches.
commit 1b717a289299f942de834dcccafbab91361e20ab
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 17 14:46:09 2010 +0100
gallium: remove PIPE_SURFACE_LAYOUT_LINEAR definition
This was only used by the layout field of pipe_surface, but this
driver internal stuff is gone so there's no need for this driver independent
layout definition neither.
commit 10cb644b31b3ef47e6c7b55e514ad24bb891fac4
Merge: 5691db9 c85971d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 17 12:20:41 2010 +0100
Merge commit 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/docs/source/glossary.rst
src/gallium/tests/graw/fs-test.c
src/gallium/tests/graw/gs-test.c
commit 5691db960ca3d525ce7d6c32d9c7a28f5e907f3b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 17 11:29:03 2010 +0100
st/wgl: fix interface changes bugs
commit 2303ec32143d363b46e59e4b7c91b0ebd34a16b2
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 16 19:42:32 2010 +0100
gallium: adapt code to interface changes...
commit dcae4f586f0d0885b72674a355e5d56d47afe77d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 16 19:42:05 2010 +0100
gallium: separate depth0 and array_size in the resource itself.
These fields are still mutually exclusive (since no 3d array textures exist)
but it ultimately seemed to error-prone to adapt all code accept the new
meaning of depth0 (drivers stick that into hardware regs, calculate mipmap
sizes etc.). And it isn't really cleaner anyway.
So, array textures will have depth0 of 1, but instead use array_size,
3D textures will continue to use depth0 (and have array_size of 1). Cube
maps also will use array_size to indicate their 6 faces, but since all drivers
should just be fine by inferring this themselves from the fact it's a cube map
as they always used to nothing should break.
commit 621737a638d187d208712250fc19a91978fdea6b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 16 17:47:38 2010 +0100
gallium: adapt code to interface changes
There are still usages of pipe_surface where pipe_resource should be used,
which should eventually be fixed.
commit 2d17f5efe166b2c3d51957c76294165ab30b8ae2
Author: Roland Scheidegger <sroland@vmware.com>
Date: Wed Jun 16 17:46:14 2010 +0100
gallium: more interface changes
In particular to enable usage of buffers in views, and ability to use a
different pipe_format in pipe_surface.
Get rid of layout and offset parameter in pipe_surface - the former was
not used in any (public) code anyway, and the latter should either be computed
on-demand or driver can use subclass of pipe_surface.
Also make create_surface() use a template to be more consistent with
other functions.
commit 71f885ee16aa5cf2742c44bfaf0dc5b8734b9901
Merge: 3232d11 8ad410d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 14 14:19:51 2010 +0100
Merge commit 'origin/master' into gallium-array-textures
Conflicts:
src/gallium/auxiliary/util/u_box.h
src/gallium/drivers/nv50/nv50_surface.c
src/gallium/drivers/nvfx/nvfx_surface.c
src/gallium/drivers/r300/r300_blit.c
src/gallium/drivers/r300/r300_texture.c
src/gallium/drivers/r300/r300_transfer.c
src/gallium/drivers/r600/r600_blit.c
src/gallium/drivers/r600/r600_screen.h
src/gallium/include/pipe/p_state.h
commit 3232d11fe3ebf7686286013c357b404714853984
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 14 11:40:04 2010 +0100
mesa/st: adapt to interface changes
still need to fix pipe_surface sharing
(as that is now per-context).
Also broken is depth0 handling - half the code assumes
this is also used for array textures (and hence by extension
of that cube maps would have depth 6), half the code does not...
commit f433b7f7f552720e5eade0b4078db94590ee85e1
Author: Roland Scheidegger <sroland@vmware.com>
Date: Mon Jun 14 11:35:52 2010 +0100
gallium: fix a couple of bugs in interface chnage fixes
commit 818366b28ea18f514dc791646248ce6f08d9bbcf
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:42:11 2010 +0200
targets: adapt to interface changes
Yes even that needs adjustments...
commit 66c511ab1682c9918e0200902039247793acb41e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:41:13 2010 +0200
tests: adapt to interface changes
Everything needs to be fixed :-(.
commit 6b494635d9dbdaa7605bc87b1ebf682b138c5808
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:39:50 2010 +0200
st: adapt non-rendering state trackers to interface changes
might not be quite right in all places, but they really don't want
to use pipe_surface.
commit 00c4289a35d86e4fe85919ec32aa9f5ffe69d16d
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:38:48 2010 +0200
winsys: adapt to interface changes
commit 39d858554dc9ed5dbc795626fec3ef9deae552a0
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:26:54 2010 +0200
st/python: adapt to interface changes
don't think that will work, sorry.
commit 6e9336bc49b32139cec4e683857d0958000e15e3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:26:07 2010 +0200
st/vega: adapt to interface changes
commit e07f2ae9aaf8842757d5d50865f76f8276245e11
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:25:56 2010 +0200
st/xorg: adapt to interface changes
commit 05531c10a74a4358103e30d3b38a5eceb25c947f
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:24:53 2010 +0200
nv50: adapt to interface changes
commit 97704f388d7042121c6d496ba8c003afa3ea2bf3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:24:45 2010 +0200
nvfx: adapt to interface changes
commit a8a9c93d703af6e8f5c12e1cea9ec665add1abe0
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:24:01 2010 +0200
i965g: adapt to interface changes
commit 0dde209589872d20cc34ed0b237e3ed7ae0e2de3
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:22:38 2010 +0200
i915g: adapt to interface changes
commit 5cac9beede69d12f5807ee1a247a4c864652799e
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:20:58 2010 +0200
svga: adapt to interface changes
resource_copy_region still looking fishy.
Was not very suited to unified zslice/face approach...
commit 08b5a6af4b963a3e4c75fc336bf6c0772dce5150
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:20:01 2010 +0200
rbug: adapt to interface changes
Not sure if that won't need changes elsewhere?
commit c9fd24b1f586bcef2e0a6e76b68e40fca3408964
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:19:31 2010 +0200
trace: adapt to interface changes
commit ed84e010afc5635a1a47390b32247a266f65b8d1
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:19:21 2010 +0200
failover: adapt to interface changes
commit a1d4b4a293da933276908e3393435ec4b43cf201
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:19:12 2010 +0200
identity: adapt to interface changes
commit a8dd73e2c56c7d95ffcf174408f38f4f35fd2f4c
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:18:55 2010 +0200
softpipe: adapt to interface changes
commit a886085893e461e8473978e8206ec2312b7077ff
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:18:44 2010 +0200
llvmpipe: adapt to interface changes
commit 70523f6d567d8b7cfda682157556370fd3c43460
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:18:14 2010 +0200
r600g: adapt to interface changes
commit 3f4bc72bd80994865eb9f6b8dfd11e2b97060d19
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:18:05 2010 +0200
r300g: adapt to interface changes
commit 5d353b55ee14db0ac0515b5a3cf9389430832c19
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:17:37 2010 +0200
cell: adapt to interface changes
not even compile tested
commit cf5d03601322c2dcb12d7a9c2f1745e2b2a35eb4
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:14:59 2010 +0200
util: adapt to interface changes
amazing how much code changes just due to some subtle interface changes?
commit dc98d713c6937c0e177fc2caf23020402cc7ea7b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Sat Jun 12 02:12:40 2010 +0200
gallium: more interface fail, docs
this also changes flush_frontbuffer to use a pipe_resource instead of
a pipe_surface - pipe_surface is not meant to be (or at least no longer)
an abstraction for standalone 2d images which get passed around.
(This has also implications for the non-rendering state-trackers.)
commit 08436d27ddd59857c22827c609b692aa0c407b7b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 10 17:42:52 2010 +0200
gallium: fix array texture interface changes bugs, docs
commit 4a4d927609b62b4d7fb9dffa35158afe282f277b
Author: Roland Scheidegger <sroland@vmware.com>
Date: Thu Jun 3 22:02:44 2010 +0200
gallium: interface changes for array textures and related cleanups
This patch introduces array textures to gallium (note they are not immediately
usable without the associated changes to the shader side).
Also, this abandons pipe_subresource in favor of using level and layer
parameters since the distinction between several faces (which was part of
pipe_subresource for cube textures) and several z slices (which were not part
of pipe_subresource but instead part of pipe_box where appropriate for 3d
textures) is gone at the resource level.
Textures, be it array, cube, or 3d, now use a "unified" set of parameters,
there is no distinction between array members, cube faces, or 3d zslices.
This is unlike d3d10, whose subresource index includes layer information for
array textures, but which considers all z slices of a 3d texture to be part
of the same subresource.
In contrast to d3d10, OpenGL though reuses old 2d and 3d function entry points
for 1d and 2d array textures, respectively, which also implies that for instance
it is possible to specify all layers of a 2d array texture at once (note that
this is not possible for cube maps, which use the 2d entry points, although
it is possible for cube map arrays, which aren't supported yet in gallium).
This should possibly make drivers a bit simpler, and also get rid of mutually
exclusive parameters in some functions (as z and face were exclusive), one
potential downside would be that 3d array textures could not easily be supported
without reverting this, but those are nowhere to be seen.
Also along with adjusting to new parameters, rename get_tex_surface /
tex_surface_destroy to create_surface / surface_destroy and move them from
screen to context, which reflects much better what those do (they are analogous
to create_sampler_view / sampler_view_destroy).
PIPE_CAP_ARRAY_TEXTURES is used to indicate if a driver supports all of this
functionality (that is, both sampling from array texture as well as use a range
of layers as a render target, with selecting the layer from the geometry shader).
Malloc likes to reuse old address as soon as possible this would cause the
new vbo buffer to get the same address as the old. So make sure we set it to
NULL when we allocate a new one. This fixes ipers which will fill up a couple
of VBO buffers per frame.
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
Byte offsets simply don't work with tiled render targets when using
tiling bits. Luckily we can cox the hw into doing the right thing
with the DRAWING_RECT command by disabling the drawing rect offset
for the depth buffer.
Minor fixes by Jakob Bornecrantz.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
Tiling is rather fragile in general and results in pure blackness when
unlucky. Hence add a new option to disable tiling.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
libdrm-intel can refuse to tile buffers for various reasons. For
potentially tiled buffers the stride is therefore only known after
the iws->buffer_create_tiled call. Unconditionally rounding up to whatever
tiling requires wastes space, so rework the code to not use tex->stride
in the layout code.
Luckily only the mimap/face offset calculation uses it which can easily
be solved by storing an (x, y) coordinate pair. Furthermore this will
be usefull later for properly supporting rendering into the different
levels of tiled mipmap textures.
v2: switch to nblocks(x|y): More in line with gallium and better
suited for rendering into mipmap textures.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
This is needed to properly implement tiling flags. And the gem
implemention fo buffer_from_handle already calls get_tiling, so
it's for free.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
This way relaxed fencing is handled by libdrm. And buffers _can't_
ever change their tiling.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
Different kernels have different restrictions for tiled buffers.
Hence use the libdrm abstraction to calculate the necessary
stride and height alignment requirements.
Not yet used.
v2: Incorporate review comments from Jakob Bornecrantz
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
It's unnecessary. The kernel gem ignores it totally and we can't
run on the old userspace fake bo manager due to lack of dri2.
Also drop the redundant name string from the sw winsys as suggested
by Jakob Bornecrantz
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
By not doing so, the uniform contents of
glsl-uniform-non-uniform-array-compare.shader_test was getting thrown
out since nobody was recorded as dereferencing the array.
The new lower_discard and opt_discard_simplification passes should
handle all the necessary transformations, so lower_jumps doesn't need to
support it.
Also, lower_jumps incorrectly handled conditional discards - it would
unconditionally truncate all code after the discard. Rather than fixing
the bug, simply remove the code.
NOTE: This is a candidate for the 7.9 branch.
This should allow lower_if_to_cond_assign to work in the presence of
discards, fixing bug #31690 and likely #31983.
NOTE: This is a candidate for the 7.9 branch.
There were a few places where we were using the wrong opcodes
on evergreen. arl still needs to be fixed on evergreen; see
r600g for reference.
NOTE: This is a candidate for the 7.9 branch.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
As the blend texture, a drawing surface mask is used when masking is
enabled. It should be created as needed.
s/alpha_mask/surface_mask/ to follow OpenVG 1.1 naming.
Depending on whether vgDrawPath(mode), vgDrawImage, or vgDrawGlyph[s] is
called, different paint-to-user and user-to-surface matrices should be
used to derive the sample points for the paint.
This fixes "paint" demo.
Divide bits of VegaShaderType into 6 groups: paint, image, mask, fill,
premultiply, and bw. Each group represents a stage. At most one shader
from each group will be selected when constructing the final fragment
shader.
Optional features such as auth-hinting are not implemented. There is no
anti-aliasing, and no effort is done to keep the glyph origin integral.
So the text quality is poor.
With this commit, the pipe states are entirely managed by the renderer.
The rest of the code interfaces with the renderer instead of
manipulating the states directly.
vg_manager_validate_framebuffer should mark the fb dirty and have
vg_validate_state call cso_set_framebuffer. Rename VIEWPORT_DIRTY to
FRAMEBUFFER_DIRTY.
The states are designated for polygon filling. Polygon filling is a
two-pass process utilizing the stencil buffer. polygon_fill and
polygon_array_fill functions are updated to make use of the state.
This state provides the ability to clear rectangles of the framebuffer
to the specified color, honoring scissoring. vegaClear is updated to
make use of the state.
This state provides glDrawTex-like function. It can be used for
vgSetPixels.
Rather than modifying every user of the renderer, this commit instead
modifies renderer_copy_surface to use DRAWTEX or COPY state internally
depending on whether the destination is the framebuffer.
Renderer states are high-level states to perform specific tasks. The
renderer is initially in INIT state. In that state, the renderer is
used for OpenVG pipeline.
This commit adds a new COPY state to the renderer. The state is used
for copying between two pipe resources using textured drawing. It can
be used for vgCopyImage, for example.
Rather than modifying every user of the renderer, this commit instead
modifies renderer_copy_texture to use the COPY state internally.
This branch defines a gallivm_state structure which contains the
LLVMBuilderRef, LLVMContextRef, etc. All data structures built with
this object can be periodically freed during a "garbage collection"
operation.
The gallivm_state object has to be passed to most of the builder
functions where LLVMBuilderRef used to be used.
Conflicts:
src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
src/gallium/drivers/llvmpipe/lp_state_setup.c
I made the texwrap test be more thorough and realized that this driver code
had not been quite right. This commit fixes the border color for depth
textures, compressed textures, and 16-bits-per-channel textures
with up to 2 channels (R16, RG16).
NOTE: This is a candidate for the 7.9 branch.
Previously, IR for a linked shader was allocated directly out of the
gl_shader object - meaning all of it lived as long as the shader.
Now, IR is allocated out of a temporary context, and any -live- IR is
reparented/stolen to (effectively) the gl_shader. Any remaining IR can
be freed.
NOTE: This is a candidate for the 7.9 branch.
This makes a very simple 1.30 shader go from 196k of memory to 9k.
NOTE: This -may- be a candidate for the 7.9 branch, as the benefit is
substantial. However, it's not a simple change, so it may be wiser to
wait for 7.10.
We were trying to emit a single ir_expression to compare the whole
thing. The backends (ir_to_mesa.cpp and brw_fs.cpp so far) expected
ir_binop_any_nequal or ir_binop_all_equal to apply to at most a vector
(with matrices broken down by the lowering pass). Break them down to
a bunch of ORed or ANDed any_nequals/all_equals.
Fixes:
glsl-array-compare
glsl-array-compare-02
glsl-fs-struct-equal
glsl-fs-struct-notequal
Bug #31909
This doesn't cover all expressions or all operand types, but it will
complain if you overreach and it allows for much greater slack on the
programmer's part.
This code was for the old GLcore build of the software rasteriser. The
X server switched to a DRI driver for software indirect GLX long ago.
Signed-off-by: Adam Jackson <ajax@redhat.com>
There are also some u_simple_shaders changes.
On r300, the TGSI_SEMANTIC_COLOR varying is a fixed-point number clamped
to the range [0,1] and limited to 12 bits of precision. Therefore we can't
use it for passing through a clear color in order to clear high precision
texture formats.
This also makes u_blitter use only one vertex shader instead of two.
Since the viewport is not updated on RandR12 mode switches anymore,
clipping to viewport may incorrectly clip away the video.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
This may help paint the colorkey before overlay updates in some
situations where the app paints the color key (mainly xine).
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
The compiler seriously needs a cleanup as far as the arrangement of functions
is concerned. It's hard to know whether some function was implemented or not
because there are so many places to search in and it can be anywhere and
named anyhow.
If nvfx_framebuffer prepare and validate were called successively with
fb->zsbuf not NULL and then NULL, nvfx->hw_zeta would contain garbage and
this would cause failures in nvfx_framebuffer_relocate/OUT_RELOC(hw_zeta).
This was triggered by piglit/texwrap 2D GL_DEPTH_COMPONENT24 and caused
first a 'write to user buffer!!' error in libdrm and then worse things.
Signed-off-by: Xavier Chantry <chantry.xavier@gmail.com>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Currently, the standalone compiler tries to do function inlining before
linking shaders (including linking against the built-in functions).
This resulted in the built-in function _prototypes_ being inlined rather
than the actual function definition.
This is only known to fix a bug in the standalone compiler; most
programs should be unaffected. Still, it seems like a good idea.
NOTE: This is a candidate for the 7.9 branch.
Fixes this GCC warning with linux-x86 build.
radeon_pair_regalloc.c: In function ‘compute_live_intervals’:
radeon_pair_regalloc.c:222: warning: ISO C90 forbids mixed declarations and code
Fixes this GCC warning with linux-x86 build.
radeon_pair_regalloc.c: In function ‘compute_live_intervals’:
radeon_pair_regalloc.c:221: warning: ISO C90 forbids mixed declarations and code
The alpha mask is addressed with unnormalized coordinates in the
fragment shader. It should have the same orientation as the surface
does.
This fixes "mask" OpenVG demo.
Previously, the get table listed all three as having custom locations,
yet find_custom_value did not have cases to handle them.
MAX_VARYING_VECTORS does not need a custom location since MaxVaryings is
already stored as float[4] (or vec4). MaxUniformComponents is stored as
the number of floats, however, so a custom implementation that divides
by 4 is necessary.
Fixes bugs.freedesktop.org #31495.
This fixes incorrect behaviour when the stencil clear value exceeds
the size of the stencil buffer, for example, when set with:
glClearStencil (~1); /* Set a bit pattern of 111...11111110 */
glClear (GL_STENCIL_BUFFER_BIT);
The clear value needs to be masked by the value 2^m - 1, where m is the
number of bits in the stencil buffer. Previously, we passed the value
masked with 0x7fffffff to _mesa_StencilFuncSeparate which then clamps,
NOT masks the value to the range 0 to 2^m - 1.
The result would be clearing the stencil buffer to 0xff, rather than 0xfe.
Signed-off-by: Peter Clifton <pcjc2@cam.ac.uk>
Signed-off-by: Brian Paul <brianp@vmware.com>
Make sure regions are properly updated and that the colorkey painting is
flushed before we update the HW overlay.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
This is needed to properly sync with host side rendering. For example,
make sure we flush colorkey painting before updating the overlay.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
When a context was made current to another surface, the old code did
this
dri2_dpy->core->bindContext(cctx, ddraw, rdraw);
dri2_dpy->core->unbindContext(old_cctx);
and there will be no current context due to the second line.
unbindContext should be called only when bindContext is not. This fixes
a regression since d19afc57. Thanks to Neil Roberts for noticing the
issue and creating a test case.
(Marek: added the initializion of "vec" in the default statement)
NOTE: This is a candidate for the 7.9 branch.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
This fixes piglit/glsl-vs-main-return and glsl-fs-main-return for the drivers
which don't support RET (i915g, r300g, r600g, svga).
ir_to_mesa does not currently generate subroutines, but it's a matter of time
till it's added. It would then break all the drivers which don't implement
them, so this CAP makes sense.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
When the result of the alpha instruction is being replicated to the RGB
destination register, we do not need to use alpha's destination register.
This fixes an invalid "Too many hardware temporaries used" error in
the case where a transcendent operation writes to a temporary register
greater than max_temp_regs.
NOTE: This is a candidate for the 7.9 branch.
This fixes an invalid "Too many hardware temporaries used" error in the
case where a source reads from a temporary register with an index greater
than max_temp_regs and then the source is marked as unused before the
register allocation pass.
NOTE: This is a candidate for the 7.9 branch.
Reads of registers that where not written to within the same block were
not being tracked. So in a situations like this:
0: IF
1: ADD t0, t1, t2
2: MOV t2, t1
Instruction 2 didn't know that instruction 1 read from t2, so
in some cases instruction 2 was being scheduled before instruction 1.
NOTE: This is a candidate for the 7.9 branch.
Gallium drivers pass all piglit tests for the two (there are 12 tests
for separate_shader_objects and 5 tests for explicit_attrib_location),
and I was told the extensions don't need any driver-specific code.
I made them dependent on PIPE_CAP_GLSL.
Signed-off-by: Brian Paul <brianp@vmware.com>
The drm winsys only ever handles one gem memory manager. Rip out
the unnecessary complication.
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
Not using the gtt is considered harmful for performance. And for
partial uploads there's always drm_intel_bo_subdata.
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
It looks like this was meant to facilitate unfenced access to textures/
color/renderbuffers. It's totally incomplete and fundamentally broken
on a few levels:
- broken: The kernel needs to about every tiled bo to fix up bit17
swizzling on swap-in.
- unflexible: fenced/unfenced relocs from execbuffer2 do the same, much
simpler.
- unneeded: with relaxed fencing tiled gem bos are as memory-efficient
as this trick.
Hence kill it.
Reviewed-by: Jakob Bornecrantz <wallbraker@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jakob Bornecrantz <wallbraker@gmail.com>
Fix a crash when the subrectangle is not inside the fb. Fix wrong
pipe transfer when sx > 0 or sy + height != fb->height.
This fixes "readpixels" demo.
These two samplers use non-normalized texture coordinates. wrap_r
cannot be PIPE_TEX_WRAP_REPEAT (the default).
This fixes
sp_tex_sample.c:1790:get_linear_unorm_wrap: Assertion `0' failed
assertion failure.
Fix OpenVG "filter" demo
Program received signal SIGSEGV, Segmentation fault.
0xb7153dc9 in str_match_no_case (pcur=0xbfffe564, str=0x0) at
tgsi/tgsi_text.c:86
86 while (*str != '\0' && *str == uprcase( *cur )) {
This is quite common for multitexture sampling, and not only cuts down
on the second and later set of MOVs, but typically also allows
compute-to-MRF on the first set.
No statistically siginficant performance difference in nexuiz (n=3),
but it reduces instruction count in one of its shaders and seems like
a good idea.
We were skipping it if the instruction producing the value we were
going to compute-to-mrf used its result reg as a source reg. This
meant that the typical "write interpolated color to fragment color" or
"texture from interpolated texcoord" shader didn't compute-to-MRF.
Just don't check for the interference cases until after we've checked
if this is the instruction we wanted to compute-to-MRF.
Improves nexuiz high-settings performance on my laptop 0.48% +- 0.08%
(n=3).
On pre-gen6, this turns 4 instructions into 1. We could still do
better by folding the saturate into the instruction generating the
value if nobody else uses it, but that should be a separate pass.
Hardware pretty commonly has saturate modifiers on instructions, and
this can be used in codegen to produce those, without everyone else
needing to understand clamping other than min and max.
This should save on the overhead of tree-walking and provide a
convenient place to add more instruction lowering in the future.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
The vector operator collects 2, 3, or 4 scalar components into a
vector. Doing this has several advantages. First, it will make
ud-chain tracking for components of vectors much easier. Second, a
later optimization pass could collect scalars into vectors to allow
generation of SWZ instructions (or similar as operands to other
instructions on R200 and i915). It also enables an easy way to
generate IR for SWZ instructions in the ARB_vertex_program assembler.
The operate just like ir_unop_sin and ir_unop_cos except that they
expect their inputs to be limited to the range [-pi, pi]. Several
GPUs require this limited range for their sine and cosine
instructions, so having these as operations (along with a to-be-written
lowering pass) helps this architectures.
These new operations also matche the semantics of the
GL_ARB_fragment_program SCS instruction. Having these as operations
helps in generating GLSL IR directly from assembly fragment programs.
Use fetch shader instead of having fetch instruction in the vertex
shader. Allow to restrict shader update to a smaller part when
vertex buffer input layout changes.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Occlusion query on evergreen need the event index field to be
set otherwise we endup locking up the GPU.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Condiation moves with a condition of (a < 0), (a > 0), (a <= 0), or (a
>= 0) can be generated with "a" directly as an operand of the CMP
instruction. This doesn't help much now, but it will help with
assembly shaders that use the CMP instruction.
This should prevent the field going unset in the future. See bug
http://bugs.freedesktop.org/show_bug.cgi?id=31544 for background.
Also remove unneeded calls to clear_teximage_fields().
Finally, call _mesa_set_fetch_functions() from the
_mesa_init_teximage_fields() function so callers have one less
thing to worry about.
Fix this GCC warning.
ir.cpp: In static member function
'static unsigned int ir_expression::get_num_operands(ir_expression_operation)':
ir.cpp:199: warning: control reaches end of non-void function
If an instruction writes reg but nothing later uses it, then we don't
need to bother doing it. Before, we were just killing code that was
never read after it was ever written.
This removes many interpolation instructions for attributes with only
a few comopnents used. Improves nexuiz high-settings performance .46%
+/- .12% (n=3) on my Ironlake.
It's annoying to use test suites under a Mesa debug build because
pretty output is cluttered with stderr's continuous reports that
you're still using the debug driver.
The new usage message lists possible command line options. (Newcomers to Mesa
currently have to trawl through the source to find the command line options,
and we should save them from that trouble.)
Example Output
--------------
usage: ./glsl_compiler [options] <file.vert | file.geom | file.frag>
Possible options are:
--glsl-es
--dump-ast
--dump-hir
--dump-lir
--link
This adds sentinel values to the ir_expression_operation enum type:
ir_last_unop, ir_last_binop, and ir_last_opcode. They are set to the
previous one so they don't trigger "unhandled case in switch statement"
warnings, but should never be handled directly.
This allows us to remove the huge array of 1s and 2s in
ir_expression::get_num_operands().
We are not aware of any GPU that actually implements the cross product
as a single instruction. Hence, there's no need for it to be an opcode.
Future commits will remove it entirely.
This is really supposed to be defined only if the driver supports highp
in the fragment shader - but all of our current ES2 implementations do.
So, just define it. In the future, we'll need to add a flag to
gl_context and only define the macro if the flag is set.
"Fixes" freedesktop.org bug #31673.
ir_binop_less, ir_binop_greater, ir_binop_lequal, and ir_binop_gequal
are defined to work on vectors as well as scalars, as long as the two
operands have the same type.
This is evident from both ir_validate.cpp and our use of these opcodes
in the GLSL lessThan, greaterThan, lessThanEqual, greaterThanEqual
built-in functions.
Found by code inspection. Not known to fix any bugs. Presumably, our
tests for the built-in comparison functions must pass because C.E.
handling is done on the ir_call of "greaterThan" rather than the inlined
opcode. The C.E. handling of the built-in function calls is correct.
NOTE: This is a candidate for the 7.9 branch.
Default group bytes to 512 on evergreen. Don't query
tiling config yet for evergreen, the current info returned is not
adequate for evergreen (no way to get bank info).
RET was interpreted as END, which was wrong. Instead, if a shader contains RET
in the main function, it will fail to compile with an error message
from now on.
The hack is from early days.
When drawing GL_DEPTH_COMPONENT the usual fragment pipeline steps apply
so don't override the depth state.
When drawing GL_STENCIL_INDEX (or GL_DEPTH_STENCIL) the fragment pipeline
does not apply (only the stencil and Z writemasks apply) so disable writes
to the color buffers.
Fixes some regressions from commit ef8bb7ada9
This consolidates the TOKEN_OR_IDENTIFIER and RESERVED_WORD macros into
a single KEYWORD macro.
The old TOKEN_OR_IDENTIFIER macros handled the case of a word going from
an identifier to a keyword; the RESERVED_WORD macro handled a word going
from a reserved word to a language keyword. However, neither could
properly handle samplerBuffer (for example), which is an identifier in
1.10 and 1.20, a reserved word in 1.30, and a keyword in 1.40 and on.
Furthermore, the existing macros didn't properly handle reserved words
in GLSL ES 1.00. The best they could do was return a token (rather than
an identifier), resulting in an obtuse parser error, rather than a
user-friendly "you used a reserved word" error message.
Functions are not first class objects in GLSL, so there is never a value
of function type. No code actually used this except for one function
which asserted it shouldn't occur. One comment mentioned it, but was
incorrect. So we may as well remove it entirely.
This driver is a fake swdri driver that perform no operations
beside allocation gallium structure and buffer for upper layer
usage.
It's purpose is to help profiling core mesa/gallium without
having pipe driver overhead hidding hot spot of core code.
scons file are likely inadequate i am unfamiliar with this
build system.
To use it simply rename is to swrast_dri.so and properly set
LIBGL_DRIVERS_PATH env variable.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
src/mesa/drivers/dri/*/*/*.[chS] is a superset of
src/mesa/drivers/dri/*/server/*.[ch] and
src/mesa/drivers/dri/common/xmlpool/*.[ch].
include/GL/internal/glcore.h is already in MAIN_FILES, no need for it in
DRI_FILES too. src/glx/Makefile was listed twice.
Signed-off-by: Julien Cristau <jcristau@debian.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
This showed up as cairo-gl gradients being inverted on everyone but
Intel, where I'd apparently tweaked the transformation to work around
the bug. Fixes piglit fbo-fragcoord.
Required because ATI and NVIDIA DX9 GPUs do not support indirect addressing
of temps, inputs, outputs, and consts (FS-only) or the hw support is so
limited that we cannot use it.
This should make r300g and possibly nvfx more feature complete.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Since this was talloced off of NULL instead of the compile state, it
was a real leak over the course of the program. Noticed with
valgrind --leak-check=full --show-reachable=yes. We should really
change these passes to generally get the compile context as an argument
so simple mistakes like this stop mattering.
This fixes a regression (failed assertion) from commit
c552f273f5 which was hit if glDeleteBuffers()
was called on a buffer that was never bound.
NOTE: this is a candidate for the 7.9 branch.
Currently r600_resource_copy_region() will turn these copies into
transfers + memcpys, so to avoid recursion we must not turn those
transfers back into blits.
Previously queries of MAX_SAMPLES were only allowed with
ARB_framebuffer_object, but EXT_framebuffer_multisample also enables
this query. This seems to only effect the i915. All other drivers
support both extensions or neither extension.
This patch is based on a patch that Kenneth sent along with the report.
NOTE: this is a candidate for the 7.9 branch.
Reported-by: Kenneth Waters <kwaters@chromium.org>
For driver performance analysis it usefull to be able to
disable as much as possible the GPU interaction so that
one can profile the userspace only.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
IF statements were getting flattened while they were broken. With
Zhenyu's last fix for ENDIF's type, everything appears to have lined
up to actually work.
This regresses two tests:
glsl1-! (not) operator (1, fail)
glsl1-! (not) operator (1, pass)
but fixes tests that couldn't work before because the IFs couldn't be
flattened:
glsl-fs-discard-01
occlusion-query-discard
(and, naturally, this should be a performance improvement for apps
that actually use IF statements to avoid executing a bunch of code).
Sometimes we swizzled in a different channel it looked like, and
sometimes we swizzled in zero. Or something.
Having looked at the output of another code generator for this chip,
this is approximately what they do, too: use align1 math on
temporaries, and then move the results into place.
Fixes:
glean/vp1-EX2 test
glean/vp1-EXP test
glean/vp1-LG2 test
glean/vp1-RCP test (reciprocal)
glean/vp1-RSQ test 1 (reciprocal square root)
shaders/glsl-cos
shaders/glsl-sin
shaders/glsl-vs-masked-cos
shaders/vpfp-generic/vp-exp-alias
Instead of messing with the callers simply copy our inputs into a
alloca array at the beginning of the function and then use it.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
DD_POINT_SIZE was broken for quite some time, and the only driver (r200) relying
on this no longer needs it.
Both DD_POINT_SIZE and DD_LINE_WIDTH have no users left outside of debugging
output, hence instead of fixing DD_POINT_SIZE setting just drop both of them -
there was a plan to remove tricaps flags entirely at some point.
DD_POINT_SIZE got never set for some time now (as it was set only in ifdefed
out code), which caused the r200 driver to use the point primitive mistakenly
in some cases which can only do size 1 instead of point sprite. Since the
logic to use point instead of point sprite prim is flaky at best anyway (can't
work correctly for per-vertex point size), just drop this and always emit point
sprites (except for AA points) - reasons why the driver tried to use points for
size 1.0 are unknown though it is possible they are faster or more conformant.
Note that we can't emit point sprites without point sprite cntl as that might
result in undefined point sizes, hence need drm version check (which was
unnecessary before as it should always have selected points). An
alternative would be to rely on the RE point size clamp controls which could
clamp the size to 1.0 min/max even if the SE point size is undefined, but currently
always use 0 for min clamp. (As a side note, this also means the driver does
not honor the gl spec which mandates points, but not point sprites, with zero size
to be rendered as size 1.)
This should fix recent reports of https://bugs.freedesktop.org/show_bug.cgi?id=702.
This is a candidate for the mesa 7.9 branch.
Fixes assertion failure with texture swizzling in the GLSL path when
it's triggered (such as gen6 FF or ARB_fp shadow comparisons).
Fixes:
texdepth
texSwizzle
fp1-DST test
fp1-LIT test 3
Also fixup code comment to reflect that the GPU requires DWORD
alignment, but in this case does not actually pass the value "in
DWORDs" as I previously stated.
This reverts commit 76360d6abc. On
second thought, it turned out that sync objects also used the
wait_rendering API like this, and would need the same treatment, and
so wait_rendering itself is fixed in libdrm now.
We were asking for a wait to GTT read (all GPU rendering to it
complete), instead of asking for all GPU reading from it to be
complete. Prevents swapbuffers-based apps from running away with
rendering, and produces a better input experience.
For some reason I though we needed the _DISCARD flag to avoid
readbacks, which isn't true at all. Now write operations should
pipeline properly, gives a good speedup to demos/tunnel.
When linked with certain builds of libstdc++, it appears like powf is resolved
by a symbol in that library. Other builds of libstdc++ doesn't contain that
symbol resulting in a linker / loader error. Optionally
resolve that symbol and replace it with calls to logf and expf.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Fixes this GCC warning.
graw.h:93: warning: 'struct pipe_surface' declared inside parameter list
graw.h:93: warning: its scope is only this definition or declaration,
which is probably not what you want
A call to radeon_prepare_render() at the beginning of draw
operations was placed too deep in the call chain,
inside r300RunRenderPrimitive(), instead of
r300DrawPrims() where it belongs. This leads to
emission of stale target color renderbuffer into the cs if
bufferswaps via page-flipping are used, and thereby causes
massive rendering corruption due to unsynchronized
rendering into the active frontbuffer.
This patch fixes such problems for use with the
upcoming radeon page-flipping patches.
Signed-off-by: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
The width of the 2D blits used to copy the data is defined as a 16-bit
signed integer, but the pitch must be DWORD aligned. Limit to an integral
number of DWORDs, (1 << 15 - 4) rather than (1 << 15 -1).
Fixes corruption to data uploaded with glBufferSubData.
Signed-off-by: Peter Clifton <pcjc2@cam.ac.uk>
Allows applications to dump surfaces to file without
referencing gallium/auxiliary entry points statically.
Existing test apps have been modified such that
they save the contents of the fronbuffer only
when the `-o' option's specified.
These ES1 features were only tested for in the vertex array code.
Checking the ctx->API field at runtime is cleaner than the #ifdef
stuff and supports choosing the API at runtime.
Flushing the vertices after having already updated the state doesn't
do any good. Fixes useshaderprogram-flushverts-1. As a side effect,
by moving it to the right place we end up skipping no-op state changes
for traditional glUseProgram.
Rebuilding the vertex format from scratch every time we see a new
vertex attribute is rather costly, new attributes can be appended at
the end avoiding a copy to current and then back again, and the full
attr pointer recalculation.
In the not so likely case of an already existing attribute having its
size increased the old behavior is preserved, this could be optimized
more, not sure if it's worth it.
It's a modest improvement in FlightGear (that game punishes the VBO
module pretty hard in general, framerate goes from some 46 FPS to 50
FPS with the nouveau classic driver).
Signed-off-by: Brian Paul <brianp@vmware.com>
Silences this GCC warning.
brw_wm_fp.c: In function 'brw_wm_pass_fp':
brw_wm_fp.c:966: warning: 'last_inst' may be used uninitialized in this function
brw_wm_fp.c:966: note: 'last_inst' was declared here
Silences this GCC warning.
brw_wm_fp.c: In function 'precalc_tex':
brw_wm_fp.c:666: warning: 'tmpcoord.Index' may be used uninitialized in this function
Fixes this GCC warning with linux-x86 build.
radeon_dataflow.c: In function 'get_readers_normal_read_callback':
radeon_dataflow.c:472: warning: ISO C90 forbids mixed declarations and code
Fixes this GCC warning with linux-x86 build.
radeon_pair_schedule.c: In function 'merge_presub_sources':
radeon_pair_schedule.c:312: warning: ISO C90 forbids mixed declarations and code
Commit 8dfafbf086 forgot to update r300g.
There is a buf == NULL check, but buf is used before for var init.
Tested-by: Guillermo S. Romero <gsromero@infernal-iceberg.com>
Fixes this GCC warning.
nouveau_vbo_t.c: In function 'nv10_vbo_render_prims':
nouveau_render_t.c:161: warning: 'max_out' may be used uninitialized in this function
nouveau_render_t.c:161: note: 'max_out' was declared here
We really only want to print spaces -between- elements, not after each
element. This cleans up error messages from IR reader, making them
(mildly) easier to read.
In particular, calling the abs function is silly, since there's already
an expression opcode for that. Also, assigning to temporaries then
assigning those to the final location is rather redundant.
Trivial change that avoids a segmentation fault when the blitter state
happens to be bound when the context is destroyed.
The free calls should probably removed altogether in the future -- the
responsibility to destroy the state atoms lies with whoever created it,
and the safest thing for the pipe driver is to not touch any bound state
in its destructor.
In testing on Ironlake, the histogram of clocks/pixel results for the
system memcpy and magic unaligned memcpy show no noticeable difference
(and no statistically significant difference with the 5510 samples
taken, though the stddev is large due to what looks like the cache
effects from the different texture sizes used).
This provides the optimizer with hints about code hotness, which we're
quite certain about for debug printouts (or, rather, while we
developers often hit the checks for debug printouts, we don't care
about performance while doing so).
Discard fractional bits from linewidth. This matches the nvidia
closed drivers, my reading of the OpenGL SI and current llvmpipe
behaviour.
It looks a lot nicer & avoids ugliness where lines alternate between n
and n+1 pixels in width along their length.
Also fix up r600g to match.
These were previously being left in the default (D3D) mode. This mean
that triangles were drawn slightly incorrectly, but also because this
state is relied on by the u_blitter code, all blits were half a pixel
off.
Generalize the existing tiled_buffer path in texture transfers for use
in some non-tiled up and downloads.
Use a staging buffer, which the winsys will restrict to GTT memory.
GTT buffers have the major advantage when they are mapped, they are
cachable, which is a very nice property for downloads, usually the CPU
will want to do look at the data it downloaded.
This opens the question of what interface the winsys layer should
really have for talking about these concepts.
For now I'm using the existing gallium resource usage concept, but
there is no reason not use terms closer to what the hardware
understands - eg. the domains themselves.
The callback presents the given attachment to the native engine. It
allows the swap behavior and interval to be controlled. It will replace
native_surface::flush_frontbuffer and native_surface::swap_buffers
shortly.
Silences warning such as:
main/texobj.c:442:40: warning: ISO C99 requires rest arguments to be used
main/texobj.c:498:58: warning: ISO C99 requires rest arguments to be used
This ensures that we increase bo->map_count when radeon_bo_map_internal()
returns successfully, which in turn makes sure we don't decrement
bo->map_count below zero later.
Signed-off-by: Tilman Sauerbeck <tilman@code-monkey.de>
The call to draw_bind_fragment_shader() was using the old fragment
shader. This bug would have really only effected the draw module's
use of the fragment shader in the wide point stage.
Some C++ header files were included in an extern "C" block. When building with
Clang, this caused the build to fail due to namespace errors. (GCC did not
report any errors.)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Important as more constant buffers per shader start to get used.
Fix up r600 (tested) and nv50 (untested) to cope with this. Drivers
previously didn't see unbinds of constant buffers often or ever, so
this isn't always dealt with cleanly.
For r600 just return and keep the reference. Will try to do better in
a followup change.
Don't trim triangle bounding box to scissor/draw-region until after
the logic for emitting tri_16. Don't generate tri_16 commands for
triangles with untrimmed bounding boxes outside the current tile.
This is important as the tri-16 itself can extend past tile bounds and
we don't want to add code to it to check against tile bounds (slow) or
restrict it to locations within a tile (pessimistic).
This effectively redoes 1741ddb747 in a
way that allows contexts of different APIs to coexist.
First, the changes to the remap table are reverted. The remap table
(driDispatchRemapTable) is always initialized in the same way regardless
of the context API.
es_generator.py is updated to use a local remap table, whose sole
purpose is to help initialize its dispatch table. The local remap table
and the global one are always different, as they use different
glapidispatch.h. But the dispatch tables initialized by both remap
tables are always compatible with glapi (libGL.so).
Finally, the semantics of one_time_init are changed to per-api one-time
initialization.
Core mesa should query glapi for the positions of the functions in
_glapi_table when multiple APIs are supported. It does not know which
glapitable.h glapi used.
Use scons target and dependency system instead of ad-hoc options.
Now is simply a matter of naming what to build. For example:
scons libgl-xlib
scons libgl-gdi
scons graw-progs
scons llvmpipe
and so on. And there is also the possibility of scepcified subdirs, e.g.
scons src/gallium/drivers
If nothing is specified then everything will be build.
There might be some rough corners over the next days. Please bare with me.
Without this, update_textures may not pick up the new pipe_resource.
It is actually update_textures that should check
stObj->sampler_view->texture != stObj->pt, but let's follow st_TexImage
and others for now.
Make autoconf decide the client APIs enabled first. Then when OpenGL
and OpenGL ES are disabled, there is no need to build src/mesa/; when
OpenGL is disabled, no $mesa_driver should be built. Finally, add
--enable-openvg to enable OpenVG.
With these changes, an OpenVG only build can be configured with
$ ./configure --disable-opengl --enable-openvg
src/mesa, src/glsl, and src/glx will be skipped, which saves a great
deal of compilation time.
And an OpenGL ES only build can be configured with
$ ./configure --disable-opengl --enable-gles-overlay
Fixes this GCC warning.
state_tracker/st_program.c: In function 'st_print_shaders':
state_tracker/st_program.c:735: warning: 'sh' may be used uninitialized in this function
This fixes some insanity that would otherwise be required for GLSL
1.30 bit ops or gen6 integer uniform operations in general, at the
cost of upload-time pain. Given that we only have that pain because
mesa's mangling our integer uniforms to be floats, this something that
should be fixed outside of the shader codegen.
First, it changes autoconf to use a "python2" binary when available,
rather than plain "python" (which is ambiguous). Secondly, it changes
the Makefiles to use $(PYTHON) $(PYTHON_FLAGS) rather than calling
python directly.
Signed-off-by: Xavier Chantry <chantry.xavier@gmail.com>
Signed-off-by: Matthew William Cox <matt@mattcox.ca>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes this GCC warning.
r300_state_derived.c: In function 'r300_update_derived_state':
r300_state_derived.c:593: warning: 'r' may be used uninitialized in this function
r300_state_derived.c:593: note: 'r' was declared here
radeon_bo_destroy() will want to read the list field. Without this patch,
we'd end up evaluating the list pointers before they have been properly
set up when we destroyed the newly created bo if it cannot be mapped.
Signed-off-by: Tilman Sauerbeck <tilman@code-monkey.de>
With 07b85457d9, glapitable.h is included
by core mesa only to know the size of _glapi_table. It is not necessary
as the same info is given by _gloffset_COUNT.
This change makes _glapi_table opaque to core mesa. All operations on
it are supposed to go through one of the SET/GET/CALL macros.
Move defines in glapioffsets.h to glapidispatch.h. Rename
_gloffset_FIRST_DYNAMIC to _gloffset_COUNT, which is equal to the number
of entries in _glapi_table.
Consistently use SET_by_offset, GET_by_offset, CALL_by_offset, and
_gloffset_* to recursively define all SET/GET/CALL macros.
glapioffsets.h exists for the same reason as glapidispatch.h does. It
is of no use to glapi. This commit also drops the use of glapioffsets.h
in glx as glx is considered an extension to glapi when it comes to
defining public GL entries.
glapidispatch.h exists so that core mesa (libmesa.a) can be built for
DRI drivers or for non-DRI drivers as a compile time decision (whether
IN_DRI_DRIVER is defined). It is of no use to glapi. This commit also
drops the use of glapidispatch.h in glx and libgl-xlib as they are
considered extensions to glapi when it comes to defining public GL
entries.
This lets us simplify and consolidate some state checking code.
This implements the GL_INVALID_OPERATION check for all drawing commands
required by GL_EXT_texture_integer.
It's a little more painful than before because we don't have the handy
mask register any more, and have to make do with cooking up a value
out of the flag register.
This is apparently required, as the thread will be initiated while it
still has dependencies, and this is what waits for those to be
resolved before writing color.
Function ast_declarator_list::hir(), when processing keywords added by
extension ARB_fragment_coord_conventions, made the mistake of checking only if
the extension was __supported by the driver__. The correct behavior is to check
if the extensi0n is __enabled in the parse state__.
NOTE: this is a candidate for the 7.9 branch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Ensure -L$(TOP)/$(LIB_DIR) (the staging dir for build products), appears
in the link line before any -L in $LDFLAGS, so that we link driver we are
building with libEGL we have just built, and not an installed version
[olv: make a similar change to targets/egl]
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
This call sequence
eglMakeCurrent(dpy, surf, surf, ctx1);
eglMakeCurrent(dpy, surf, surf, ctx2);
should be valid if ctx1 and ctx2 have the same client API and are not
current in another thread.
Internally a mode belongs to a screen. But functions like
eglGetModeAttribMESA treat a mode as a display resource: a mode can be
looked up without a screen. Considering how KMS works, it is better to
stick to the current implementation.
To properly support looking up a mode without a screen, this commit
assigns each mode (of all screens) a unique ID.
The opaque nature of EGLImage implies that extensions almost always
define their own attributes. Move attributes in _EGLImage to
_EGLImageAttribs and add a helper function to parse attribute lists.
Fixes glsl-fs-convolution-2, which was blowing up due to the array
access insanity getting at the uniform values within the loop. Each
temporary was considered live across the whole loop.
One, it was allocating increments of 1kb, but per thread scratch space
is a power of two. Two, the new FS wasn't getting total_scratch set
at all, so everyone thought they had 1kb and writes beyond 1kb would
go stomping on a neighbor thread.
With this plus the previous register spilling for the new FS,
glsl-fs-convolution-1 passes.
g0 is used by others, and is expected to be left exactly as it was
dispatched to us. So manually move g0 into our message reg when
spilling/unspilling and update the offset in the MRF. Fixes failures
in texture sampling after having spilled a register.
It's amazing this code worked. Basically, we would get lucky in
register allocation and the tests using frontfacing would happen to
allocate gl_FrontFacing storage and the instructions generating
gl_FrontFacing but pointing at another register to the same hardware
register. Noticed during register spilling debug, when suddenly they
didn't get allocatd the same storage.
Since this is just generated by python, it's questionable whether this
should continue to live in the repository - Mesa already has other
things generated from python as part of the build process.
This works around MSVC's 65535 byte limit, unfortunately at the expense
of any semblance of readability and much larger file size. Hopefully I
can implement a better solution later, but for now this fixes the build.
Fixes this GCC warning.
gallivm/lp_bld_tgsi_aos.c: In function 'lp_build_tgsi_aos':
gallivm/lp_bld_tgsi_aos.c:516: warning: 'dst0' may be used uninitialized in this function
gallivm/lp_bld_tgsi_aos.c:516: note: 'dst0' was declared here
Fixes these GCC warnings.
gallivm/lp_bld_sample_aos.c: In function 'lp_build_sample_image_nearest':
gallivm/lp_bld_sample_aos.c:271: warning: 't_ipart' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:271: warning: 'r_ipart' may be used uninitialized in this function
Fixes these GCC warnings.
gallivm/lp_bld_sample_aos.c: In function 'lp_build_sample_image_linear':
gallivm/lp_bld_sample_aos.c:439: warning: 'r_ipart' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:438: warning: 't_ipart' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:438: warning: 't_fpart' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:439: warning: 'r_fpart' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:438: warning: 't_fpart_hi' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:438: warning: 't_fpart_lo' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:439: warning: 'r_fpart_hi' may be used uninitialized in this function
gallivm/lp_bld_sample_aos.c:439: warning: 'r_fpart_lo' may be used uninitialized in this function
At the moment you need kernel patches to have texture tiling work
with the kernel CS checker, so once they are upstream and the drm version
is bumped we can make this enable flip the other way most likely.
Since the hw transitions from 2D->1D sampling below the 2D macrotile
size we need to keep track of the array mode per level so we can
render to it using the CB.
Silences this GCC warning.
ast_to_hir.cpp: In function 'void apply_type_qualifier_to_variable(const
ast_type_qualifier*, ir_variable*, _mesa_glsl_parse_state*, YYLTYPE*)'
ast_to_hir.cpp:1768: warning: enumeration value 'ir_shader' not handled
in switch
We were working around an LLVM 2.5 bug but we're using LLVM 2.6 or later now.
This basically reverts commit baddcbc522.
This fixes the piglit bug/tri-tex-crash.c failure.
Silences these GCC warnings.
r600_shader.c: In function 'tgsi_exp':
r600_shader.c:2339: warning: 'r600_src[0].rel' is used uninitialized in this function
r600_shader.c:2339: warning: 'r600_src[0].abs' is used uninitialized in this function
r600_shader.c:2339: warning: 'r600_src[0].neg' is used uninitialized in this function
r600_shader.c:2339: warning: 'r600_src[0].chan' is used uninitialized in this function
r600_shader.c:2339: warning: 'r600_src[0].sel' is used uninitialized in this function
Previously some shader input or outputs that hadn't received location
assignments could slip through. This could happen when a shader
contained user-defined varyings and was used with either
fixed-function or assembly shaders.
See the piglit tests glsl-[fv]s-user-varying-ff and
sso-user-varying-0[12].
NOTE: this is a candidate for the 7.9 branch.
This function type checks the operands of and returns the result type of
bit-logic operations. It replaces the type checking performed in the
following cases of ast_expression::hir() :
- ast_bit_and
- ast_bit_or
- ast_bit_xor
This function type checks the operands of and returns the result type of
bit-shift operations. It replaces the type checking performed in the following
cases of ast_expression::hir() :
- ast_lshift
- ast_rshift
This fixes hangs in some Z-writes-in-shaders tests, though other
pieces don't come out correctly.
Bug #30392: hang in fbo-fblit-d24s8. (still fails with bad color drawn
to some targets)
rc_get_readers_normal() supplies a list of readers for a given
instruction. This function is now being used by the copy propagate
optimization and will eventually be used by most other optimization
passes as well.
Existing code relies on IR being generated (possibly with error type)
rather than returning NULL. So, don't break - go ahead and generate the
operation. As long as an error is flagged, things will work out.
Fixes fd.o bug #30914.
Thanks to Alex Deucher for pointing out the FLT to int conversion is necessary
and writing an initial patch, this brings about 20 piglits, and I think this
is the last piece to make evergreen and r600 equal in terms of features.
We need to keep track of three different fragment shaders: Z-only, stencil-
only, and Z+stencil. Before, we were only keeping track of the first one
we encountered.
We often use reg_null as the destination when setting up the flag
regs. However, on gen6 there aren't general implicit conversions to
destination types from src types, so the comparison to produce the
flag regs would be done on the integer result interpreted as a float.
Hilarity ensued.
Fixes 20 piglit cases.
In ir_validate::visit_leave(), the cases for
- ir_binop_bit_and
- ir_binop_bit_xor
- ir_binop_bit_or
were incorrect. It was incorrectly asserted that both operands must be the
same type, when in fact one may be scalar and the other a vector. It was also
incorrectly asserted that the resultant type was the type of the left operand,
which in fact does not hold when the left operand is a scalar and the right
operand is a vector.
Implement by adding the following cases to ast_expression::hir():
- ast_lshift
- ast_rshift
Also, implement ir validation for the new operators by adding the following
cases to ir_validate::visit_leave():
- ir_binop_lshift
- ir_binop_rshift
On evergreen, interpolation has moved into the fragment shader,
with the interpolation parmaters being passed via GPRs and LDS entries.
This works out the number of interps required and reserves GPR/LDS
storage for them, it also correctly routes face/position values which
aren't interpolated from the vertex shader.
Also if we noticed nothing is to be interpolated we always setup perspective
interpolation for one value otherwise the GPU appears to lockup.
This fixes about 15 piglit tests on evergreen.
Previously _LinkedShaders was a compact array of the linked shaders
for each shader stage. Now it is arranged such that each slot,
indexed by the MESA_SHADER_* defines, refers to a specific shader
stage. As a result, some slots will be NULL. This makes things a
little more complex in the linker, but it simplifies things in other
places.
As a side effect _NumLinkedShaders is removed.
NOTE: This may be a candidate for the 7.9 branch. If there are other
patches that get backported to 7.9 that use _LinkedShader, this patch
should be cherry picked also.
This implements round() via the ir_unop_round_even opcode, rather than
adding a new opcode. We may wish to add one in the future, since it
might enable a small performance increase on some hardware, but for now,
this should suffice.
Tested with demos/pixeltest - line rasterization doesn't seem to be
set up for GL conventions yet, but at least width is respected now.
Signed-off-by: Dave Airlie <airlied@redhat.com>
It is now to the point where we have no regressing piglit tests. It
also fixes Yo Frankie! and Humus DynamicBranching, probably due to the
piglit bias tests that work that didn't on the Mesa IR backend.
As a downside, performance takes about a 5-10% performance hit at the
moment (e.g. nexuiz 19.8fps -> 18.8fps), which I plan to resolve by
reintroducing 16-wide fragment shaders where possible. It is a win,
though, for fragment shaders using flow control.
Simply using RNDU, RNDZ, or RNDE does not produce the desired result.
Rather, the RND* instructions place a value in the destination register
that may be 1 less than the correct answer. They can also set per-channel
"increment bits" in a flag register, which, if set, mean dest needs to
be incremented by 1. A second instruction - a predicated add -
completes the job.
Notably, RNDD always produces the correct answer in a single
instruction.
Fixes piglit test glsl-fs-trunc.
This cuts usually 2 out of 3 instructions for flag reg generation (if
statements, conditional assignment) by producing the conditional mod
in the expression representing the boolean value.
Fixes glsl-fs-vec4-indexing-temp-dst-in-nested-loop-combined (register
allocation no longer fails for the conditional generation
proliferation)
This will be a place to peephole comparisions directly to the flag
regs, and for now avoids using MOV with conditional mod on gen6, which
is now illegal.
GLES1 and GLES2 install their own exec pointers and don't need the
Save table. Also, the SET_* macros use different indices for the different
APIs so the offsets used in vtxfmt.c are actually wrong for the ES APIs.
Just always check for FLUSH_UPDATE_CURRENT and call Driver.BeginVertices
when necessary. By using the unlikely() macros, this ends up as
a 10% performance improvement (for isosurf, anyway) over the old,
complicated function pointer swapping.
Don't use r0 for FF_SYNC dest reg on Sandybridge, which would
smash FFID field in GS payload, that cause later URB write fail.
Also not use r0 in any URB write requiring allocate.
Actually validate that the implementation supports the particular
shader target as well. Previously if a driver only supported vertex
shaders, for example, glCreateShaderObjectARB would gladly create a
fragment shader.
NOTE: this is a candidate for the 7.9 branch.
This really amounts to just using the return value from
link_function_calls. All the work was being done, but the result was
being ignored.
Fixes piglit test link-unresolved-funciton.
NOTE: this is a candidate for the 7.9 branch.
Together with the previous commit, this generalize the benefits of
d2cf757f44 to all depth formats, in
particular:
- simpler float -> 24unorm conversion
- avoid unsigned comparisons (not directly supported on SSE) by aligning
to the least significant bit
- avoid unecessary/repeated mask ANDing
Verified with trivial/tri-z that the exact same assembly is produced for
X8Z24.
Z32_FLOAT uses <4 x float> as intermediate/destination type,
instead of <4 x i32>.
The necessary bitcasts got removed with commit
5b7eb868fd
Also use depth/stencil type and build contexts consistently, and
make the depth pointer argument a ordinary <i8 *>, to catch this
sort of issues in the future (and also to pave way for Z16 and
Z32_FLOAT_S8_X24 support).
__GLcontextModes is always only used as an implementation internal struct
at this point and we shouldn't install glcore.h anymore. Anything that
needs __GLcontextModes should just include the struct in its headers files
directly.
This is relying on lp_build_pack2 using the sse2 pack intrinsics which
handle clamping.
(Alternatively could have make it use lp_build_packs2 but it might
not even produce more efficient code than not using the fastpath
in the first place.)
Fixes this GCC warning.
r300_state.c: In function 'r300InvalidateState':
r300_state.c:2247: warning: 'hw_format' may be used uninitialized in this function
r300_state.c:2247: note: 'hw_format' was declared here
This adds proper support for the GL_ARB_shader_stencil_export extension
to the GLSL compiler. Thanks to Ian for pointing out where I need to add things.
If the pipe driver has shader stencil export we can accelerate DrawPixels
using it. It tries to pick an S8 texture and works its way to X24S8 and S8X24
if that isn't supported.
We need a texture to put the drawpixels stuff into, an S8 texture is less
memory/bandwidth than the 32-bit X24S8, but we might not be able to render
directly to an S8, so this lets us specify we won't be rendering to this
texture.
this improves mesa texstore for 8/24 so it can create S24X8/X24S8 variants
by keeping the depth bits static.
it also adds a texstore for S8 so we can write out an S8 texture to use
in the sampler for accel draw pixels to save memory bw.
The logic seems sound here, I've worked it out a few times on paper, though
it would be good to have some review.
Signed-off-by: Dave Airlie <airlied@redhat.com>
There was a check to only do the rebase if we didn't have everything
in VBOs, but nexuiz apparently hands us a mix of VBOs and arrays,
resulting in blocking on the GPU to do a rebase.
Improves nexuiz 800x600, high-settings performance on my Ironlake 41%
(+/- 1.3%), from 14.0fps to 19.7fps.
The format selection of the CopyTexSubImage is pretty bogus still, but
this at least avoids software fallbacks in nexuiz, bringing
performance from 7.5fps to 12.8fps on my machine.
This assertion was added in commit f1c1ee11, but it did not notice
that the array is accessed with 'size-1' instead of 'size'. As a
result, the assertion was off by one. This caused failures in at
least glsl-orangebook-ch06-bump.
If an GLSL shader is used that does not provide all stages and
assembly shaders are provided for the missing stages, validate the
assembly shaders.
Fixes bugzilla #30787 and piglit tests glsl-invalid-asm0[12].
NOTE: this is a candidate for the 7.9 branch.
There was actually a large quantity of scalar code in these functions
previously. This tries to move more into intrinsics.
Introduce an sse2 mm_mullo_epi32 replacement to avoid sse4 dependency
in the new rasterization code.
It's now much more correct for gen6 than the old backend, with just 2
regressions I've found (one of which is common with pre-gen6 and will
be fixed by an array splitting IR pass).
This does leave the old Mesa IR backend getting used still when we
don't have GLSL IR, but the plan is to get GLSL IR input to the driver
for the ARB programs and fixed function by the next release.
Pre-gen6, you could mix int and float just fine. Now, you get goofy
results.
Fixes:
glsl-arb-fragment-coord-conventions
glsl-fs-fragcoord
glsl-fs-if-greater
glsl-fs-if-greater-equal
glsl-fs-if-less
glsl-fs-if-less-equal
This is a hw requirement in math args. This also is inefficient, as
we're calculating the same result 8 times, but then we've been doing
that on pre-gen6 as well. If we're doing math on uniforms, though,
we'd probably be better served by having some sort of mechanism for
precalculating those results into another uniform value to use.
Fixes 7 piglit math tests.
Add ability to set the GLSL version used by the GLcontext by setting the
environment variable INTEL_GLSL_VERSION. For example,
env INTEL_GLSL_VERSION=130 prog args
If the environment variable is missing, the GLSL versions defaults to 120.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
By calling radeon_draw_buffers (which sets the necessary flags
in radeon->NewGLState) and revalidating if NewGLState is non-zero
in r200TclPrimitive. This fixes an assert in libdrm (the color-/
depthbuffer was changed but not yet validated) and and stops the
kernel cs checker from complaining about them (when they're too
small).
Thanks to Mario Kleiner for the hint to call radeon_draw_buffer
(instead of my half-broken hack).
v2: Also fix the swtcl r200 path.
Cc: Mario Kleiner <mario.kleiner@tuebingen.mpg.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This didn't produce a statistically significant performance difference
in my demo (n=4) or nexuiz (n=3), but it still seems like a good idea
and is recommended by the HW team.
Having the single opcode write then read the reg meant that single
instruction opcodes had to consider their source regs to interfere
with their dest regs.
SSE support for 32bit and 16bit unsigned arithmetic is not complete, and
can easily result in inefficient code.
In most cases signed/unsigned doesn't make a difference, such as for
integer texture coordinates.
So remove uint_coord_type and uint_coord_bld to avoid inefficient
operations to sneak in the future.
We need to move the texture sampler resources out of the range of the vertex attribs.
We could probably improve this using an allocator but this is the simple answer for now.
makes mesa-demos/src/glsl/vert-tex work.
We can't patch true-block at end-if time, as there is no guarantee that
the block at the beginning of the true stanza is the same at the end of
the true stanza -- other control flow elements may have been emitted half
way the true stanza.
Although this bug surfaced recently with the commit to skip mip filtering
when lod is an integer the bug was always there, although probably it
was avoided until now: e.g., cubemap selection nests if-then-else on the
else stanza, which does not suffer from the same problem.
We've been using these in the linear path for a while now. Based on
Chris's SSSE3 code, but using only sse2 opcodes. Speed seems to be
identical, but code is simpler & removes dependency on SSE3.
Should be easier to extend to other rgba8 formats.
Specifically, can do early-depth-test even when alpahtest or
kill-pixel are active, providing we defer the actual z write until the
final mask is avaialable.
Improves demos/fire.c especially in the case where you get close to
the trees.
The current interpolation schemes causes precision loss.
Changing the operation order helps, but does not completely avoid the
problem.
The only short term solution is to clamp z to 1.0.
This is unfortunate, but probably unavoidable until interpolation is
improved.
Operate simultanouesly on <width, height, depth> vector as much as possible,
instead of doing the operations on vectors with broadcasted scalars.
Also do the 24.8 fixed point scalar with integer shift of the texture size,
for unnormalized coordinates.
AoS path only for now -- the same thing can be done for SoA.
Fixes these GCC warnings.
brw_wm_fp.c: In function 'search_or_add_const4f':
brw_wm_fp.c:92: warning: 'reg.Index2' is used uninitialized in this function
brw_wm_fp.c:84: note: 'reg.Index2' was declared here
brw_wm_fp.c:92: warning: 'reg.RelAddr2' is used uninitialized in this function
brw_wm_fp.c:84: note: 'reg.RelAddr2' was declared here
Clamp against 0 instead of -0.5, which simplifies things.
The former version would have resulted in both int coords being zero
(in case of coord being smaller than 0) and some "unused" weight value,
whereas now the int coords will be 0 and 1, but weight will be 0, hence the
lerp should produce the same value.
Still not happy about differences between normalized and non-normalized...
Haven't looked at what code this exactly generates but URem can't be fast.
Instead of using two URem only use one and replace the second one with
select/add (this is what the corresponding aos code already does).
Rearrange order of operations a bit to make some clamps easier.
All calculations should be equivalent.
Note there seems to be some inconsistency in the clamp to edge case
wrt normalized/non-normalized coords, could potentially simplify this too.
Sometimes coords are clamped to positive numbers before doing conversion
to int, or clamped to 0 afterwards, in this case can use itrunc
instead of ifloor which is easier. This is only the case for nearest
calculations unfortunately, except linear MIRROR_CLAMP_TO_EDGE which
for the same reason can use a unsigned float build context so the
ifloor_fract helper can reduce this to itrunc in the ifloor helper itself.
sse2 supports round to nearest directly (or rather, assuming default nearest
rounding mode in MXCSR). Use intrinsic to use this rather than round (sse41)
or bit manipulation whenever possible.
The constness of the function parameter gets inlined with the rest of
the function. However, there is also an assignment to the parameter.
If this occurs inside a loop the loop analysis code will get confused
by the assignment to a read-only variable.
Fixes bugzilla #30552.
NOTE: this is a candidate for the 7.9 branch.
Improves performance of my GLSL demo 14.3% (+/- 4%, n=4) by
eliminating the moves used in ir_assignment and ir_swizzle handling.
Still 16.5% to go to catch up to the Mesa IR backend, presumably
because instructions are almost perfectly mis-scheduled now.
We were trying to remap a fully-filled array down to only handing the
WM the components it uses. This is called attribute swizzling, and if
you don't enable it you just get 1:1 mappings of inputs to outputs.
This almost fixes glsl-routing, except for the highest gl_TexCoord[]
indices.
We would compute a new buffer, but never point the hardware at the new
buffer. This partially fixes glsl-routing, as now it get the updated
uniform for which attribute to draw.
Avoid multiplying fixed-point values. Calculate triangle area in
floating point use that for culling.
Lift area calculations up a level as we are already doing this in the
triangle_both() case.
Would like to share the calculated area with attribute interpolation,
but the way the code is structured makes this difficult.
interp data is stored in gpr0 so first interp overwrote it
and subsequent ones got wrong values
reserve register 0 so it's not used for attribs.
alternative is to interpolate attrib0 last (reverse, as r600c does)
We sensibly only provide it if the FS asks for it. We could actually
skip WPOS unless the FS needed WPOS.zw, but that's something for
later.
Fixes: glsl-texture2d and probably many others.
- Handle PIPE_FORMAT_Z32_FLOAT packing correctly.
- In the integer version z shouldn't be passed as as double.
- Make it clear that the integer versions should only be used for masks.
- Make integer type sizes explicit (uint32_t for now, although
uint64_t will be necessary later to encode f32_s8_x24).
The jump delta is now in the part of the instruction where the
destination fields used to be, and the src args are ignored (or not,
for the new non-predicated IF that we don't use yet).
Once a fragment is generated with LP_INTERP_PERSPECTIVE set for an input,
it will do a divide by w for that input. Therefore it's not OK to treat LP_INTERP_PERSPECTIVE as
LP_INTERP_LINEAR or vice-versa, even if the attribute is known to not
vary.
A better strategy would be to take the primitive in consideration when
generating the fragment shader key, and therefore avoid the per-fragment
perspective divide.
Got a speed up by tracking the dirty blocks in a seperate list instead of looping through all blocks. This version should work with block that get their dirty state disabled again and I added a dirty check during the flush as some blocks were already dirty.
Flush read cache before writting register. Track flushing inside
of a same cs and avoid reflushing same bo if not necessary. Allmost
properly force flush if bo rendered too and then use as a texture
in same cs (missing pipeline flush dunno if it's needed or not).
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
When we go to do a lot of bos in one draw like constant bufs we need
to avoid bouncing off the busy ioctl, this mitigates by backing off
on busy bos for a short amount of times.
These texture formats (like R16G16B16A16_UNORM) were untested until now
because st/mesa doesn't use them. I am testing this with a hacked st/mesa
here.
I am trying to be exhaustive, but still I might have missed tons of other
changes to Gallium.
(cherry picked from commit 968a9ec76e)
Conflicts:
docs/relnotes-7.9.html
The glsl core should be handling most dead code issues for us, but we
generate some things in codegen that may not get used, like the 1/w
value or pixel deltas. It seems a lot easier this way than trying to
work out up front whether we're going to use those values or not.
this code was memcmp'ing two structs, but refcounting one of them afterwards,
so any subsequent memcmp was never going to work.
again this stops unnecessary uploads of vertex program,
The brw_wm_surface_state.c handling of GL_DEPTH_TEXTURE_MODE doesn't
apply to shadow compares, which always return an intensity value. The
texture swizzles can do the job for us.
Fixes:
glsl1-shadow2D(): 1
glsl1-shadow2D(): 3
The hw swizzles have been obtained by a brute force approach,
and only C0 and C2 are stored in UV88, the other channels are
ignored.
R16G16 is going to be a lot trickier.
Luckily, one of them would result in failing out register allocation
when the other bugs were encountered. Applies to
glsl-fs-vec4-indexing-temp-dst-in-nested-loop-combined, which still
fails register allocation, but now legitimately.
Fixes several problems:
The half-float, float, and integer internal formats depend on
ARB_texture_rg and other extensions.
RG_INTEGER is not a valid internal format.
Generic compressed formats depend on ARB_texture_rg, not
EXT_texture_compression_rgtc.
Use GL_RED instead of GL_R.
We should fix the SF to actually give us just the data we need, but
this fixes regressions in the new FS until then.
Fixes:
glsl-kwin-blur
glsl-routing
Trying to track the insanity of the different argument layouts for
normal/shadow crossed with normal/lod/bias one generation at a time is
enough.
Fixes: glsl1-texture2D() with bias.
(first test passing in this code that doesn't pass without it!)
We should end up with the same code, but anyone else with this issue
could share the handling (which I got wrong for shadow comparisons in
the driver before).
I've screwed this up enough times that I don't think it's worth it.
This time, it was that I was doing it once per top-level body
instruction instead of just once at the end of the loop body.
The interference is totally bogus (maximal), so this is equivalent to
our trivial register assignment before. As in, passes the same set of
piglit tests.
Now, virtual GRFs are consecutive integers, rather than offsetting the
next one by the size. We need the size information to still be around
for real register allocation, anyway.
Fixes this GCC warning on linux-x86 build.
r3xx_vertprog.c: In function ‘ei_if’:
r3xx_vertprog.c:396: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
r500_fragprog_emit.c: In function ‘emit_paired’:
r500_fragprog_emit.c:237: warning: ISO C90 forbids mixed declarations and code
r500_fragprog_emit.c: In function ‘emit_tex’:
r500_fragprog_emit.c:367: warning: ISO C90 forbids mixed declarations and code
r500_fragprog_emit.c: In function ‘emit_flowcontrol’:
r500_fragprog_emit.c:415: warning: ISO C90 forbids mixed declarations and code
r500_fragprog_emit.c: In function ‘r500BuildFragmentProgramHwCode’:
r500_fragprog_emit.c:633: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
r500_fragprog.c: In function ‘r500_transform_IF’:
r500_fragprog.c:45: warning: ISO C90 forbids mixed declarations and code
r500_fragprog.c: In function ‘r500FragmentProgramDump’:
r500_fragprog.c:256: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
r300_fragprog_emit.c: In function ‘emit_alu’:
r300_fragprog_emit.c:143: warning: ISO C90 forbids mixed declarations and code
r300_fragprog_emit.c:156: warning: ISO C90 forbids mixed declarations and code
r300_fragprog_emit.c: In function ‘finish_node’:
r300_fragprog_emit.c:271: warning: ISO C90 forbids mixed declarations and code
r300_fragprog_emit.c: In function ‘emit_tex’:
r300_fragprog_emit.c:344: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
r300_fragprog_swizzle.c: In function ‘r300_swizzle_is_native’:
r300_fragprog_swizzle.c:120: warning: ISO C90 forbids mixed declarations and code
r300_fragprog_swizzle.c: In function ‘r300_swizzle_split’:
r300_fragprog_swizzle.c:159: warning: ISO C90 forbids mixed declarations and code
Fixes this GCC warning on linux-x86 build.
radeon_rename_regs.c: In function ‘rc_rename_regs’:
radeon_rename_regs.c:112: warning: ISO C90 forbids mixed declarations and code
Fixes this GCC warning on linux-x86 build.
radeon_remove_constants.c: In function ‘rc_remove_unused_constants’:
radeon_remove_constants.c💯 warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warning on linux-x86 build.
radeon_optimize.c: In function ‘constant_folding’:
radeon_optimize.c:419: warning: ISO C90 forbids mixed declarations and code
radeon_optimize.c:425: warning: ISO C90 forbids mixed declarations and code
radeon_optimize.c:432: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
radeon_dataflow_deadcode.c: In function ‘push_branch’:
radeon_dataflow_deadcode.c:112: warning: ISO C90 forbids mixed declarations and code
radeon_dataflow_deadcode.c: In function ‘update_instruction’:
radeon_dataflow_deadcode.c:183: warning: ISO C90 forbids mixed declarations and code
radeon_dataflow_deadcode.c: In function ‘rc_dataflow_deadcode’:
radeon_dataflow_deadcode.c:352: warning: ISO C90 forbids mixed declarations and code
radeon_dataflow_deadcode.c:379: warning: ISO C90 forbids mixed declarations and code
Fixes this GCC warning on linux-x86 build.
radeon_pair_regalloc.c: In function ‘rc_pair_regalloc_inputs_only’:
radeon_pair_regalloc.c:330: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
radeon_pair_schedule.c: In function ‘emit_all_tex’:
radeon_pair_schedule.c:244: warning: ISO C90 forbids mixed declarations and code
radeon_pair_schedule.c: In function ‘destructive_merge_instructions’:
radeon_pair_schedule.c:291: warning: ISO C90 forbids mixed declarations and code
radeon_pair_schedule.c:438: warning: ISO C90 forbids mixed declarations and code
radeon_pair_schedule.c: In function ‘scan_read’:
radeon_pair_schedule.c:619: warning: ISO C90 forbids mixed declarations and code
radeon_pair_schedule.c: In function ‘scan_write’:
radeon_pair_schedule.c:645: warning: ISO C90 forbids mixed declarations and code
radeon_pair_schedule.c: In function ‘schedule_block’:
radeon_pair_schedule.c:673: warning: ISO C90 forbids mixed declarations and code
radeon_pair_schedule.c: In function ‘rc_pair_schedule’:
radeon_pair_schedule.c:730: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
radeon_pair_translate.c: In function ‘set_pair_instruction’:
radeon_pair_translate.c:153: warning: ISO C90 forbids mixed declarations and code
radeon_pair_translate.c:170: warning: ISO C90 forbids mixed declarations and code
radeon_pair_translate.c: In function ‘rc_pair_translate’:
radeon_pair_translate.c:336: warning: ISO C90 forbids mixed declarations and code
radeon_pair_translate.c:341: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings on linux-x86 build.
radeon_program_alu.c: In function ‘r300_transform_trig_simple’:
radeon_program_alu.c:882: warning: ISO C90 forbids mixed declarations and code
radeon_program_alu.c:932: warning: ISO C90 forbids mixed declarations and code
radeon_program_alu.c: In function ‘radeonTransformTrigScale’:
radeon_program_alu.c:996: warning: ISO C90 forbids mixed declarations and code
radeon_program_alu.c: In function ‘r300_transform_trig_scale_vertex’:
radeon_program_alu.c:1033: warning: ISO C90 forbids mixed declarations and code
Fixes this GCC warning on linux-x86 build.
radeon_emulate_loops.c: In function ‘rc_emulate_loops’:
radeon_emulate_loops.c:517: warning: ISO C90 forbids mixed declarations and code
Fixes these GCC warnings with linux-x86 build.
radeon_emulate_branches.c: In function ‘handle_if’:
radeon_emulate_branches.c:65: warning: ISO C90 forbids mixed declarations and code
radeon_emulate_branches.c:71: warning: ISO C90 forbids mixed declarations and code
radeon_emulate_branches.c: In function ‘handle_else’:
radeon_emulate_branches.c:94: warning: ISO C90 forbids mixed declarations and code
radeon_emulate_branches.c: In function ‘handle_endif’:
radeon_emulate_branches.c:201: warning: ISO C90 forbids mixed declarations and code
radeon_emulate_branches.c: In function ‘fix_output_writes’:
radeon_emulate_branches.c:267: warning: ISO C90 forbids mixed declarations and code
radeon_emulate_branches.c:284: warning: ISO C90 forbids mixed declarations and code
radeon_emulate_branches.c: In function ‘rc_emulate_branches’:
radeon_emulate_branches.c:307: warning: ISO C90 forbids mixed declarations and code
Fixes this GCC warning.
math/m_debug_xform.c: In function '_math_test_all_transform_functions':
math/m_debug_xform.c:320: warning: format not a string literal and no format arguments
Fixes this GCC warning.
math/m_debug_norm.c: In function '_math_test_all_normal_transform_functions':
math/m_debug_norm.c:365: warning: format not a string literal and no format arguments
Fixes this GCC warning.
math/m_debug_clip.c: In function '_math_test_all_cliptest_functions':
math/m_debug_clip.c:363: warning: format not a string literal and no format arguments
Instead of creating group of register use a hash table
to lookup into which block each register belongs. This
simplify code a bit.
Signed-off-by: Jerome Glisse <jglisse@redhat.com
Fixes glean pbo crash.
It would be possible to avoid crashing without decoupling, but given
that state trackers give no guarantee that number of views is consistent,
that would likely cause too many state updates (or miss some).
Corrections in store_clip to store clip coordinates in AoS form.
Viewport & cliptest flag options based on variant key.
Put back draw_pt_post_vs and now 2 paths based on whether clipping
occurs or not.
Cliptesting now done at the end of vs in draw_llvm instead of
draw_pt_post_vs.
Added viewport mapping transformation and further cliptesting to
vertex shader in draw_llvm.c
Alternative path where vertex header setup, clip coordinates store,
cliptesting and viewport mapping are done earlier in the vertex
shader.
Still need to hook this up properly according to the return value of
"draw_llvm_shader" function.
I'm still not pleased with how builtin uniforms are handled, but as
long as we're relying on the prog_statevar stuff this seems about as
good as it'll get.
Otherwise, we'll often end up with gl_TexCoord being 0 length, for
example. With ir_to_mesa, things ended up working out anyway, as long
as multiple implicitly-sized arrays weren't involved.
Fixes these tests using gl_FragData or just gl_FragDepth:
glsl1-Preprocessor test (extension test 1)
glsl1-Preprocessor test (extension test 2)
glsl-bug-22603
We don't set the type on the array virtual reg as a whole, so here's
the right place.
Fixes:
glsl1-GLSL 1.20 arrays
glsl1-temp array with constant indexing, fragment shader
glsl1-temp array with swizzled variable indexing
We need to re-run channel expressions afterwards as it generates new
vector expressions, and we need to successfully support conditional
assignment (brw_CMP takes 2 operands, not 1).
While much of this we will want to support natively, this should make
the task of reaching the Mesa IR backend's quality easier.
Fixes:
glsl-fs-main-return.
New design seems to be on parity according to piglit,
make it default to get more exposure and see if there
is any show stopper in the coming days.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Sandybridge's PS constant buffer payload size is decided from
push const buffer command, incorrect size would cause wrong data
in payload for position and vertex attributes. This fixes coefficients
for tex2d/tex3d.
The driver actually creates a 3D texture aligned to POT and does all
the magic with texture coordinates in the fragment shader. It first
emulates REPEAT and MIRRORED wrap modes in the fragment shader to get
the coordinates into the range [0, 1]. (already done for 2D NPOT)
Then it scales them to get the coordinates of the NPOT subtexture.
NPOT textures are now less of a lie and we can at least display
something meaningful even for the 3D ones.
Supported wrap modes:
- REPEAT
- MIRRORED_REPEAT
- CLAMP_TO_EDGE (NEAREST filtering only)
- MIRROR_CLAMP_TO_EDGE (NEAREST filtering only)
- The behavior of other CLAMP modes is undefined on borders, but they usually
give results very close to CLAMP_TO_EDGE with mirroring working perfectly.
This fixes:
- piglit/fbo-3d
- piglit/tex3d-npot
Taking the W component from coords directly ignores swizzling. Instead,
take the component which is mapped to W in the TEX instruction parameter.
The same for Z.
NOTE: This is a candidate for the 7.9 branch.
Some random stuff I had here.
1) Fixed some misleading comments.
2) Removed fake_npot, since it's redundant.
3) lower_texture_rect -> scale_texcoords
4) Reordered and reindented some TEX transform code.
It's trying to get an int smeared across all channels, not trying to
get a 1:1 mapping of a subset of a vector's channels. This usually
ended up not mattering with ir_to_mesa, since it just smears floats
into every chan of a vec4.
Fixes:
glsl1-temp array with swizzled variable indexing
The pipe_sampler_view's swizzle terms also apply to the texture border
color. Simply move the apply_sampler_swizzle() call after we fetch
the border color.
Fixes many piglit texwrap failures.
Changes in v2:
- Invalidate last_tile_addr on any change, fixing regressions
- Correct coding style
Currently softpipe ends up allocating more than 200 MB of memory
for each context due to the tile caches.
Even worse, this memory is all explicitly cleared, which means that the
kernel must actually back it with physical RAM right away.
This change allocates tile memory on demand.
Signed-off-by: Brian Paul <brianp@vmware.com>
Build packet header once and allow to add fake register support so
we can handle things like indexed set of register (evergreen sampler
border registers for instance.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Commit 80ee3a440c added a PROGS_DEPS
definition, but no uses, even though it seems clearly intended
to be a set of additional dependencies for $(PROGS).
Correct this.
Currently makedepend is used by the Mesa Makefile-based build system,
but not required.
Unfortunately, not having it makes dependency resolution non-existent,
which is a source of subtle bugs, and is a rarely tested
configuration, since all Mesa developers likely have it installed.
Furthermore some idioms require dependency resolution to work at all,
such as making headers depend on generated files.
Fixes these GCC warnings.
r600_shader.c: In function 'tgsi_tex':
r600_shader.c:1611: warning: 'src2_chan' may be used uninitialized in this function
r600_shader.c:1611: warning: 'src_chan' may be used uninitialized in this function
When occlusion query are running we want to have accurate
fragment count thus disable any early culling optimization
GPU has.
Based on work from Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Fixes this GCC warning.
radeon_state.c: In function 'radeon_state_fini':
radeon_state.c:140: warning: 'return' with a value, in function returning void
Move GB_ENABLE to derived rs state, and find sprite coord for the correct
generic and enable the tex coord for that generic.
Signed-off-by: Dave Airlie <airlied@redhat.com>
1. We can't turn an instruction into a presubtract operation if it
writes to one of the registers it reads from.
2. If we turn an instruction into a presubtract operation, we can't
remove that intruction unless all readers can use the presubtract
operation.
This fixes fdo bug 30337.
This is a candidate for the 7.9 branch.
The variables are used only in currently disabled code.
Fixes this GCC warning.
r600_context.c: In function 'r600_flush':
r600_context.c:76: warning: unused variable 'dname'
r600_context.c:75: warning: unused variable 'dc'
this is more a proof to show vector shifts on x86 with per-element shift count
are evil. Since we can avoid the shift with a single compare/select, use that
instead. Replaces more than 20 instructions (and slow ones at that) with about 3,
and cuts compiled shader size with mesa's yuvsqure demo by over 10%
(no performance measurements done - but selection is blazing fast).
Might want to revisit that for future cpus - unfortunately AVX won't have vector
shifts neither, but AMD's XOP will, but even in that case using selection here
is probably not slower.
While it's true that llvm can and will indeed replace this with bit
arithmetic (since block height/width is POT), it does so (llvm 2.7) by element
and hence extracts/shifts/reinserts each element individually.
This costs about 16 instructions (and extract is not really fast) vs. 1...
Fixes this GCC warning.
r600_hw_states.c: In function 'r600_translate_fill':
r600_state_inlines.h:136: warning: control reaches end of non-void function
The variables are only used in currently disabled code.
Fixes this GCC warning.
r600_state2.c: In function 'r600_flush2':
r600_state2.c:613: warning: unused variable 'dname'
r600_state2.c:612: warning: unused variable 'dc'
Fix this GCC warning.
intel_pixel_bitmap.c: In function 'intelBitmap':
intel_pixel_bitmap.c:343: warning: implicit declaration of function '_mesa_meta_Bitmap'
Silence this GCC warning.
r300_state_derived.c: In function 'r300_update_derived_state':
r300_state_derived.c:578: warning: 'r' may be used uninitialized in this function
r300_state_derived.c:578: note: 'r' was declared here
Up to 2010-09-19:
r600g: fix tiling support for ddx supplied buffers
9b146eae25
user buffer seems to be broken... new to fix that.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Before, changing any of these sampler values triggered generation
of new JIT code. Added a new flag for the special case of
min_lod == max_lod which is hit during auto mipmap generation.
TX_BORDER_COLOR should be formatted according to the texture format.
Also the interaction with ARB_texture_swizzle should be fixed too.
NOTE: This is a candidate for the 7.9 branch.
This commit fixes an infinite loop in foreach_s if remove_from_list is used
more than once on the same item with other list operations in between.
NOTE: This is a candidate for the 7.9 branch because the commit
"r300g: fixup long-lived BO maps being incorrectly unmapped when flushing"
depends on it.
This is already covered by radeon_mipmap_tree.c, and my convolution
cleanups broke in the presence of this code. Thanks to Marek Olšák
for tracking down the relevant miptree code for me.
Make libr300compiler.a a PHONY target so that this library will always be
built. This fixes the problem of libr300compiler.a not being updated
when r300g is being built and r300c is not.
This is a candidate for the Mesa 7.9 branch.
Many of the EXT_ extensions in the subset have significant code
overhead with no users. It is not a required part of GL -- though
text describing the extension is part of the core spec since 1.2, it
is always conditional on the ARB_imaging extension.
Some pathological triangles cause a theoritically impossible number of
clipped vertices.
The clipper will still assert, but at least release builds will not
crash, while this problem is further investigated.
use the blitter + custom stage to avoid doing a whole lot of state
setup by hand. This makes life a lot easier for doing this on evergreen
it also keeps all the state setup in one place.
We setup a custom context state at the start with a flag to denote
its for the flush, when it gets generated we generate the correct state
for the flush and no longer have to do it all by hand.
this should also make adding texture *to* depth easier.
It turns out that most people new to this IR are surprised when an
assignment to (say) 3 components on the LHS takes 4 components on the
RHS. It also makes for quite strange IR output:
(assign (constant bool (1)) (x) (var_ref color) (swiz x (var_ref v) ))
(assign (constant bool (1)) (y) (var_ref color) (swiz yy (var_ref v) ))
(assign (constant bool (1)) (z) (var_ref color) (swiz zzz (var_ref v) ))
But even worse, even we get it wrong, as shown by this line of our
current step(float, vec4):
(assign (constant bool (1)) (w)
(var_ref t)
(expression float b2f (expression bool >=
(swiz w (var_ref x))(var_ref edge))))
where we try to assign a float to the writemasked-out x channel and
don't supply anything for the actual w channel we're writing. Drivers
right now just get lucky since ir_to_mesa spams the float value across
all the source channels of a vec4.
Instead, the RHS will now have a number of components equal to the
number of components actually being written. Hopefully this confuses
everyone less, and it also makes codegen for a scalar target simpler.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
We can't expect to have a context when this is called, and we don't need one
so just require a __DRIscreen instead.
Reported by Yu Dai <yu.dai@intel.com>
Shader rebuild should be more clever, we should store along each
shader all the value that change shader program rather than using
flags in context (ie change sequence like : change vs buffer, draw,
change vs buffer, switch shader will trigger useless shader rebuild).
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Normally the Mesa state tracker uses TXP instructions for texturing.
But if a fragment shader uses texture2D() that's a TEX instruction.
In that case we were incorrectly computing the texcoord coefficients
in the point sprite setup code. Some new comments in the code explain
things.
Fix commit e7087175f8. Move the reference to
GL_VERSION_2_1_functions to intel_extensions.c where it's available,
don't try to enable a non-existing extension and advertise 1.20 for all
intel chipsets, not just GEN4 and up.
progs requires winsys, which hasn't yet been built by the time we
go into state_trackers.
It may be a good idea to also move it into tests.
After a normal build, run make in src/gallium/state_trackers/d3d1x/progs
to build them.
The Gallium EGL state tracker reuses dri2.c but not the GLX code.
Currently there is a bit of code in dri2.c that is incorrectly tied
to GLX: instead, make it call an helper that both GLX and Gallium EGL
implement, like dri2InvalidateBuffers.
This avoids a link error complaining that dri2GetGlxDrawableFromXDrawableId
is undefined.
Note that we might want to move the whole event translation elsewhere,
and probably stop using non-XCB DRI2 altogether, but this seems to be
the minimal fix.
This may just be hiding some other bug though, since the types are supposed
to be the same (and it compiles for me).
Anyway, this interface will likely need to changed, since it seems Wine needs
a more powerful one capable of expressing window subregions and called at
every Present.
add cb/db flush states to the blit code.
add support for the rv6xx that need special treatment.
according to R6xx_7xx_3D.pdf
set r700 CB_SHADER_CONTROL reg in blit code
docs say dual export should be disabled for DB->CB
Instead of using the invalid GL_ARB_shading_language_120 extension to
determine the GLSL version, use a new ctx->Const.GLSLVersion field.
Updated the intel and r600 drivers, but untested.
See fd.o bug 29910
NOTE: This is a candidate for the 7.9 branch (but let's wait and see if
there's any regressions).
The old code didn't really make sense. We only need to compare the
X channel of the texture (depth) against the texcoord.
For (bi)linear sampling we should move the calls to this function
and compute the final result as (s1+s2+s3+s4) * 0.25. Someday.
This fixes the glean glsl1 shadow2D() tests. See fd.o bug 29307.
There was some libstdc++-specific code that would only build with GCC 4.5
Now it should be much more compatible, at the price of reimplementing
the generic hash function.
Looks like the problem was we weren't passing the depth to the render
target as expected, so the chip would wedge. Fixes GPU hang in
occlusion-query-discard.
Bug #30097
Recent Wine versions provide a d3d11shader.h, which is however empty
and was getting used instead of our non-empty one.
Correct the include path order to fix this.
This is a new implementation of the Direct3D 11 COM API for Gallium.
Direct3D 10 and 10.1 implementations are also provided, which are
automatically generated with s/D3D11/D3D10/g plus a bunch of #ifs.
While this is an initial version, most of the code is there (limited
to what Gallium can express), and tri, gears and texturing demos
are working.
The primary goal is to realize Gallium's promise of multiple API
support, and provide an API that can be easily implemented with just
a very thin wrapper over Gallium, instead of the enormous amount of
complex code needed for OpenGL.
The secondary goal is to run Windows Direct3D 10/11 games on Linux
using Wine.
Wine dlls are currently not provided, but adding them should be
quite easy.
Fglrx and nvidia drivers can also be supported by writing a Gallium
driver that talks to them using OpenGL, which is a relatively easy
task.
Thanks to the great design of Direct3D 10/11 and closeness to Gallium,
this approach should not result in detectable overhead, and is the
most maintainable way to do it, providing a path to switch to the
open Gallium drivers once they are on par with the proprietary ones.
Currently Wine has a very limited Direct3D 10 implementation, and
completely lacks a Direct3D 11 implementation.
Note that Direct3D 10/11 are completely different from Direct3D 9
and earlier, and thus warrant a fully separate implementation.
The third goal is to provide a superior alternative to OpenGL for
graphics programming on non-Windows systems, particularly Linux
and other free and open systems.
Thanks to a very clean and well-though design done from scratch,
the Direct3D 10/11 APIs are vastly better than OpenGL and can be
supported with orders of magnitude less code and development time,
as you can see by comparing the lines of code of this commit and
those in the existing Mesa OpenGL implementation.
This would have been true for the Longs Peak proposal as well, but
unfortunately it was abandoned by Khronos, leaving the OpenGL
ecosystem without a graphics API with a modern design.
A binding of Direct3D 10/11 to EGL would solve this issue in the
most economical way possible, and this would be great to provide
in Mesa, since DXGI, the API used to bind Direct3D 10/11 to Windows,
is a bit suboptimal, especially on non-Windows platforms.
Finally, a mature Direct3D 10/11 implementation is intrinsically going
to be faster and more reliable than an OpenGL implementation, thanks
to the dramatically smaller API and the segregation of all nontrivial
work to object creation that the application must perform ahead of
time.
Currently, this commit contains:
- Independently created headers for Direct3D 10, 10.1, 11 and DXGI 1.1,
partially based on the existing Wine headers for D3D10 and DXGI 1.0
- A parser for Direct3D 10/11 DXBC and TokenizedProgramFormat (TPF)
- A shader translator from TokenizedProgramFormat to TGSI
- Implementation of the Direct3D 11 core interfaces
- Automatically generated implementation of Direct3D 10 and 10.1
- Implementation of DXGI using the "native" framework of the EGL st
- Demos, usable either on Windows or on this implementation
- d3d11tri, a clone of tri
- d3d11tex, a (multi)texturing demo
- d3d11gears, an improved version of glxgears
- d3d11spikysphere, a D3D11 tessellation demo (currently Windows-only)
- A downloader for the Microsoft HLSL compiler, needed to recompile
the shaders (compiled shader bytecode is also included)
To compile this, configure at least with these options:
--with-state-trackers=egl,d3d1x --with-egl-platforms=x11
plus some gallium drivers (such as softpipe with --enable-gallium-swrast)
The Wine headers (usually from a wine-dev or wine-devel package) must
be installed.
Only x86-32 has been tested.
You may need to run "make" in the subdirectories of src/gallium/winsys/sw
and you may need to manually run "sudo make install" in
src/gallium/targets/egl
To test it, run the demos in the "progs" directory.
Windows binaries are included to find out how demos should work, and to
test Wine integration when it will be done.
Enjoy, and let me know if you manage to compile and run this, or
which issues you are facing if not.
Using softpipe is recommended for now, and your mileage with hardware
drivers may vary.
However, getting this to work on hardware drivers is also obviously very
important.
Note that currently llvmpipe is buggy and causes all 3 gears to be
drawn with the same color.
Use export GALLIUM_DRIVER=softpipe to avoid this.
Thanks to all the Gallium contributors and especially the VMware
team, whose work made it possible to implement Direct3D 10/11 much
more easily than it would have been otherwise.
Use rc_pair_ prefix for all pair instruction structs
Create a named struct for pair instruction args
Replace structs radeon_pair_instruction_{rgb,alpha} with struct
radeon_pair_sub_instruction. These two structs were nearly identical
and were creating a lot of cut and paste code. These changes are the
first step towards removing some of that code.
This allow to share code path btw old & new, also
remove check on reference this might make things
a little slower but new design doesn't use reference
stuff.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
If we are not going to write to the X or Y components of the destination
vector we also don't need to prepare to compute SIN or COS.
Signed-off-by: Tilman Sauerbeck <tilman@code-monkey.de>
When ir_binop_all_equal and ir_binop_any_nequal were introduced, the
meaning of these two opcodes changed to return vectors rather than a
single scalar, but the constant expression handling code was incorrectly
written and only worked for scalars. As a result, only the first
component of the returned vector would be properly initialized.
This reverts commit c32bac57ed
and silences the warning differently.
The _mesa_reference_texobj() function is called quite a bit and
we don't want to call valid_texture_object() all the time in non-
debug builds.
Count int and float constants independently. Since there are only
few i# constants available and hundreds of c# constants, it would
be too easy to end up with an i# declaration out of its range.
Pixel shaders do not have address registers a#, only one
loop register aL. Our only hope is to assume the address
register is in fact a loop counter and replace it with aL.
Do not translate ARL instruction for pixel shaders -- MOVA
instruction is only valid for vertex saders.
Make it more explicit relative addressing of inputs is only valid
for pixel shaders and constants for vertex shaders.
we were leaking buffers since the flush code was added, it wasn't dropping references.
move setting up flush to the set_framebuffer_state.
clean up the flush state object.
make more space in the BOs array for flushing.
This reverts commit a1d9a58b82.
Flushing the upload buffers on draw is wrong, uploads aren't supposed to
cause flushes in the first place. The real issue was
radeon_bo_pb_map_internal() not respecting PB_USAGE_UNSYNCHRONIZED.
Depth-only and stencil-only clears should mask out depth/stencil from the
output, mask out stencil/input from input, and OR or ADD them together.
However, due to a typo they were being ANDed, resulting in zeroing the buffer.
If a upload buffer is used by a previous draw that's still in the CS,
accessing it would need a context flush. However, doing a context flush when
mapping the upload buffer would then flush/destroy the same buffer we're trying
to map there. Flushing the upload buffers before a draw avoids both the CS
flush and the upload buffer going away while it's being used. Note that
u_upload_data() could e.g. use a pool of buffers instead of allocating new
ones all the time if that turns out to be a significant issue.
Silences this GCC warning.
nvfx_vertprog.c: In function 'nvfx_vertprog_translate':
nvfx_vertprog.c:998: warning: assignment discards qualifiers from pointer target type
Those have the callee field set to the null pointer, so
calling the public constructor will segfault.
Signed-off-by: Tilman Sauerbeck <tilman@code-monkey.de>
Fixes this GCC warning.
lower_variable_index_to_cond_assign.cpp:
In member function
'bool variable_index_to_cond_assign_visitor::needs_lowering(ir_dereference_array*) const':
lower_variable_index_to_cond_assign.cpp:261:
warning: control reaches end of non-void function
This diagram shows the rendering pipeline with an emphasis on
the inputs/outputs for each stage. Some stages emit new vertex
attributes and others consume some attributes.
Implement the pipe_rasterizer_state::sprite_coord_enable field
in the draw module (and softpipe) according to what's specified
in the documentation.
The draw module can now add any number of extra vertex attributes
to a post-transformed vertex and generate texcoords for those
attributes per sprite_coord_enable. Auto-generated texcoords
for sprites only worked for one texcoord unit before.
The frag shader gl_PointCoord input is now implemented like any
other generic/texcoord attribute.
The draw module now needs to be informed about fragment shaders
since we need to look at the fragment shader's inputs to know
which ones need auto-generated texcoords.
Only softpipe has been updated so far.
Fixes this GCC warning.
r600_state2.c: In function 'r600_context_flush':
r600_state2.c:946: error: implicit declaration of function 'drmCommandWriteRead'
Winsys context build a list of register block a register block is
a set of consecutive register that will be emited together in the
same pm4 packet (the various r600_block* are there to provide basic
grouping that try to take advantage of states that are linked together)
Some consecutive register are emited each in a different block,
for instance the various cb[0-7]_base. At winsys context creation,
the list of block is created & an index into the list of block. So
to find into which block a register is in you simply use the register
offset and lookup the block index. Block are grouped together into
group which are the various pkt3 group of config, context, resource,
Pipe state build a list of register each state want to modify,
beside register value it also give a register mask so only subpart
of a register can be updated by a given pipe state (the oring is
in the winsys) There is no prebuild register list or define for
each pipe state. Once pipe state are built they are bound to
the winsys context.
Each of this functions will go through the list of register and
will find into which block each reg falls and will update the
value of the block with proper masking (vs/ps resource/constant
are specialized variant with somewhat limited capabilities).
Each block modified by r600_context_pipe_state_set* is marked as
dirty and we update a count of dwords needed to emit all dirty
state so far.
r600_context_pipe_state_set* should be call only when pipe context
change some of the state (thus when pipe bind state or set state)
Then to draw primitive you make a call to r600_context_draw
void r600_context_draw(struct r600_context *ctx, struct r600_draw *draw)
It will check if there is enough dwords in current cs buffer and
if not will flush. Once there is enough room it will copy packet
from dirty block and then add the draw packet3 to initiate the draw.
The flush will send the current cs, reset the count of dwords to
0 and remark all states that are enabled as dirty and recompute
the number of dwords needed to send the current context.
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Currenly GLSL happily generates indirect addressing of any kind of
arrays.
Unfortunately DirectX 9 GPUs are not guaranteed to support any of them in
general.
This pass fixes that by lowering such constructs to a binary search on the
values, followed at the end by vectorized generation of equality masks, and
4 conditional assignments for each mask generation.
Note that this requires the ir_binop_equal change so that we can emit SEQ
to generate the boolean masks.
Unfortunately, ir_structure_splitting is too dumb to turn the resulting
constant array references to individual variables, so this will need to
be added too before this pass can actually be effective for temps.
Several patches in the glsl2-lower-variable-indexing were squashed
into this commit. These patches fix bugs in Luca's original
implementation, and the individual patches can be seen in that branch.
This was done to aid bisecting in the future.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
this adds the bo caching layer and uses it for vertex/index/constant bos.
ctx needs to take references on hw bos so the flushing works okay, also
needs to flush the maps.
This function is executed outside _mesa_meta_begin/end(), that means
that e.g. _mesa_meta_Bitmap() clobbers the texturing state because it
changes the currently active texture object.
There's no need to bind the new texture when it's created, it's done
again later anyway (from setup_drawpix/copypix_texture()).
Signed-off-by: Brian Paul <brianp@vmware.com>
Allows disabling various operations (mainly texture-related, but
will grow) to try & identify bottlenecks.
Unlike LP_DEBUG, this is active even in release builds - which is
necessary for performance investigation.
The current context should be notified when the the front/back buffers
of the current drawable are swapped. The notification was skipped when
xmesa_strict_invalidate is false (the default).
This fixes fdo bug #29774.
Enable some extensions now that the needed tokens are defined in
GLES/glext.h and GLES2/glext.h. Update the prototype of MultiDrawArrays
now that the prototype of _mesa_MultiDrawArraysEXT has been updated.
The shadow versions of the texture targets use an extra component
(Z) to express distance from light source to the fragment.
Fixes the shadowtex demo with llvmpipe.
Execute before folding loads, because we don't check if it's legal
in lower_mods.
Ensure that a value's insn pointer is updated when transferring it
to a different instruction.
Fix the following GCC warning.
loop_controls.cpp: In function 'int calculate_iterations(ir_rvalue*, ir_rvalue*, ir_rvalue*, ir_expression_operation)':
loop_controls.cpp:88: warning: format not a string literal and no format arguments
This fixes a DRM deadlock in the cubestorm xscreensaver, because somehow
there must not be 2 different BOs relocated in one CS if both BOs back
the same handle. I was told it is impossible to happen, but apparently
it is not, or there is something else wrong.
Since atm our OPs aren't typed but instead values are, we need to
take care if they're used as different types (e.g. a load makes a
value u32 by default).
Maybe this should be changed (also to match TGSI), but it should
work as well if done properly.
At some point we'll want to support real subroutines instead of
just inlining them into the main shader.
Since recursive calls are forbidden, we can just save all used
registers to a fixed local memory region and restore them on a
return, no need for a stack pointer.
We only uploaded up to the highest offset a program would use,
and if the constant buffer isn't changed when a new program is
used, the new program is missing the rest of them.
Might want to introduce a "fill state" for user mem constbufs.
Known issues in the ARB_color_buffer_float implementation:
- Rendering to multiple render targets, some fixed-point, some floating-point, with FIXED_ONLY fragment clamping and polygon smooth enabled may write incorrect values to the fixed point buffers (depends on spec interpretation)
- For fragment programs with ARB_fog_* options, colors are clamped before fog application regardless of the fragment clamping setting (this depends on spec interpretation)
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.