We delay the null check only to jump through hoops to work around that.
Check early to make our lives easier.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
As of earlier all the targets use the non inline version. Don't forget
to remove the function prototypes/declarations.
v2: rebase on top of virgl support.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
Covert DRI to use only the pipe-loader interface.
With drisw_create_screen and kms_swrast_create_screen replaced by their
pipe-loader equivalent, we can now drop them.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
We cannot use this C99 feature here quite yet, as the code needs to be
build with MSVC prior to 2013.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
Unlike the inline ones, here we'd want to have an extern definition of
the functions. This is required as with follow-up commits, we'll
gradually start using the static pipe-loader, with the latter needing
the symbols.
These are direct copy from the inline version.
v2:
- rebase on top of virgl support
- add "driver missing" printfs (Nicolai)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
Rather than having all targets include the file, with only some defining
the relevant guard macro, just move things where they are used.
v2: rebase on top of virgl support.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
As of last few commits we have a static and dynamic pipe-loader. Either
of which will be used with (almost) all targets..
We can look into allowing the user to select which way the targets are
built, be that 'static for all' or 'per target' in follow up commits.
After which we can look into building only the static or dynamic
version, although building both shouldn't cause any issues.
Hack/workaround alert:
Control the standalone pipe-drivers via HAVE_CLOVER. Will need to be
fixed as the targets are converted/configure knobs are in.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
Analogous to previous commit with a small catch.
As the sw inline helpers are mere wrappers, and the screen <> winsys
split is more prominent (with the latter not being part of the final
pipe-driver), things will just work.
v2: rebase on top of earlier 'consolitate teardown' changes
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
Add a list of driver descriptors and select one from the list, during
probe time.
As we'll need to have all the driver pipe_foo_screen_create() functions
provided externally (i.e. from another static lib) we need a separate
(non-inline) drm_helper, which contains the function declarations.
v2: rebase on top of virgl support.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
It is to be used in contrast of the dynamic one. The state-tracker does
not need to know if the pipe-driver is built into the final blob or
a separate object. This will allow us to move the logic to the final
step (in target) where the appropriate pipe-loader will be chosen.
Cc: Tom Stellard <thomas.stellard@amd.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
With the next commits we'll introduce a 'static' version, which will
essentially load the statically linked-in pipe-drivers, rather than the
standalone pipe-$foo.so ones.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
Rather than giving false hopes that things might work, just check at
probe time. This allows us to remove the duplication and consolidate
the code wrt the upcomming static pipe-loader.
Cc: Tom Stellard <thomas.stellard@amd.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
i.e. plug some (hard to hit) memory leaks.
v2: fix rebase fallout - really teardown the winsys (Brian)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
Move the winsys into the pipe-target, similar to the hardware
pipe-driver.
v2:
- move int declaration outside of loop (Brian)
- fold the teardown into a goto + separate function.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
Currently the location is determined at configure/build time and
consistently copied across gallium. Just remove the extra argument, and
use PIPE_SEARCH_DIR where appropriate.
This will allow us to remove the duplication in the *configuration and
*screen_create APIs by moving util_dl_get_proc_address() and friends to
probe time.
v2: rebase on top of vl_winsys_drm.c addition
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
Since the up-streaming of nine, the static target was used by default.
The dynamic pipe-drivers being available only via manual tweak of
configure.ac.
As we'll be removing the library_path argument from the pipe-loader with
follow-up commits, we can remove D3D9_DRIVERS_PATH/D3D9_DRIVERS_DIR.
Everyone doing local hacking on nine, or wishing to have a env override
can bring them back within the pipe-loader.
Cc: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
... in favour of HAVE_LIBDRM. After all we solely want to build the code
when the latter is available.
In the not too distant future we will remove the libudev/sysfs
dependency and simplify configure.ac even further.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
They serve little to no purpose, as we don't need any additional
dependencies (headers and/or symbols). On the other hand dropping them
will allow us to use GALLIUM_PIPE_LOADER_DEFINES in only one single
place - the pipe-loader.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
Currently every target makes sure that the screen is non-null prior to
using the debug (trace including) wrappers. If that no longer holds true
we want to know and fix this ASAP rather than silently bailing out.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
nir_locals, nir_ssa_values, and nir_system_values are all dst_reg (not
that that makes a whole lot of sense to me), and only nir_inputs is a
src_reg.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
In most cases (when the negate is copy propagated and the MOV removed),
this is two instructions on Gen >= 8 and only two instructions on
earlier platforms -- and it doesn't use the flag register.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
a4xx hardware has real support for RGTC so there's no need to fake it
like we do on a3xx. Undo the hacks, and keep track of an "internal
format" of a resource, which on a3xx will be different, triggering the
transfer-time conversions to take place.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Looks like a4xx hw does this in a more standard way and we don't need to
hack around it like we do on a3xx. Fixes GL_ALPHA formats in
fbo-blending-formats, fbo-colormask-formats, and fbo-alphatest-formats.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Instead of playing the guessing game as to which texture format reads
from which border color encoding type, just write both of them always.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
There are not native RGBX render formats, so we must manually force
dst_alpha to be one, same as for a3xx.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The point of prepare_buffer is to ensure that the query buffer contains valid
initial data for conditional rendering: as long as the buffer is initialized
correctly, the GPU is able to tell whether query results have been written
already (and wait or fall back to unconditional rendering if desired).
This means prepare_buffer needs to be called again when a buffer is reused.
Conversely, for queries that cannot be used for conditional rendering
(notably pipeline statistics), we can re-use buffers immediately, and they
do not need to be initialized.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Andy Furniss <adf.lists@gmail.com>
To graph the number of bytes uploaded to GPU per frame (vertex buffer data,
constant buffer data, texture data, etc).
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
ARB_explicit_uniform_location allows the index for subroutine functions
to be explicitly set in the shader.
This patch reduces the restriction on the index qualifier in
validate_layout_qualifiers() to allow it to be applied to subroutines
and adds the new subroutine qualifier validation to ast_function::hir().
ast_fully_specified_type::has_qualifiers() is updated to allow the
index qualifier on subroutine functions when explicit uniform locations
is available.
A new check is added to ast_type_qualifier::merge_qualifier() to stop
multiple function qualifiers from being defied, before this patch this
would cause a segfault.
Finally a new variable is added to ir_function_signature to store the
index. This value is validated and the non explicit values assigned in
link_assign_subroutine_types().
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This patch replaces the old interger constant qualifiers with either
the new ast_layout_expression type if the qualifier requires merging
or ast_expression if the qualifier can't have mulitple declarations
or if all but the newest qualifier is simply ignored.
We also update the process_qualifier_constant() helper to be
similar to the one in the ast_layout_expression class, but in
this case it will be used to process the ast_expression qualifiers.
Global shader layout qualifier validation is moved out of the parser
in this change as we now need to evaluate any constant expression
before doing the validation.
V2: Fix minimum value check for vertices (Emil)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
In this patch we introduce a new ast type for holding the new
compile-time constant expressions. The main reason for this is that
we can no longer do merging of layout qualifiers before they have been
converted into GLSL IR so we need to store them to be proccessed later.
The new type has two helper functions:
- process_qualifier_constant()
Used to merge and then evaluate qualifier expressions
- merge_qualifier()
Simply appends a qualifier to a list to be merged later by
process_qualifier_constant()
In order to avoid cascading error messages the process_qualifier_constant()
helpers return a bool
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This will allow us to add error checking to this function
in a later patch, if we don't move it the error messages
will go missing.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This change moves the binding layout handing code into an apply
function to be consistent with other helper functions in the ast
code, and to encapsulate the code so that when we introduce
compile time constants the code will be much cleaner.
One small downside is for unnamed interface blocks we will now
be revalidating the binding for each member its applied to.
However this seems a small sacrifice in order to have code which
is readable.
We also remove the incorrect comment in the named interface code
about propagating bindings to members which seems to have been
copied from the unnamed interface code.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
This validation is moved later so we can validate the
max value when compile time constant support is added in a
later patch.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
We are moving this out of the parser in preparation for compile
time constant support.
The reason a validation function is used rather than an apply
function like what is used with bindings is because glsl allows
streams to be defined on members of blocks even though they must
match the stream thats associated with the current block, this
means we need access to the value after validation to do this
comparision.
V2: Fix typo in comment (Emil)
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Use new helper that will in a later patch allow for
compile time constants.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
The minimum value for index is validated in apply_explicit_location()
and we want to remove validation from the parser so we can add
compile time constant support.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
We are moving this out of the parser in preparation for compile
time constant support.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
For now this just validates that a qualifier is inside its
minimum boundary, in a later patch we will expand it to
evaluate compile time constants.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
SKL supports the ability to do fast clears and resolves of 32b RGBA as both
integer and floats. This patch only enables float color clears because we
haven't yet enabled integer color clears, (HW support for that was added in
BDW).
v2: Remove LUMINANCE16F and INTENSITY16F special cases since they are now
handled by Neil's patch to disable MSAA fast clears.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This reverts commit 8a0c85b258.
It's not a strict revert because I don't want to bring back the gen < 9 check at
this point in time.
Reviewed-by: Neil Roberts <neil@linux.intel.com>
The impetus for this patch comes from a seemingly benign statement within the
spec (quoted within the patch).
It is very important for clearing multiple color buffer attachments and can be
observed in the following piglit tests:
spec/arb_framebuffer_object/fbo-drawbuffers-none glclear
spec/ext_framebuffer_multisample/blit-multiple-render-targets 0
v2: Doing the framebuffer binding only once (Chad)
Directly use the renderbuffers from the mt (Chad)
v3: Patch from Neil whose feedback I originally missed.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Some of the information originally in this commit message is now in the patch
before this.
SKL adds compressible render targets and as a result mutates some of the
programming for fast clears and resolves. There is a new internal surface type
called the CCS. The old AUX_MCS bit becomes AUX_CCS_D. "Auxiliary Surfaces For
Sampled Tiled Resource".
The formats which are supported are defined in the table titled "Render Target
Surface Types [SKL+]". There is no PRM yet to reference. The previously
implemented helper function already does the right thing provided the table is
correct.
v2: Use better English in commit message (Matt)
s/compressable/compressible/ (Matt)
Don't compare bools to true (Matt)
Use the helper function and don't increase the context size - this is mostly
implemented in the patch just before this (Chad, Neil)
Remove an "invalid" assert (Chad)
Fix assertion to check num_samples > 1, instead of num_samples (Chad)
v3:
Use Matt's code as Requested-by: Chad. I didn't even look at it since Chad said
he was fine with that, and presumably Matt is fine with it.
v4: Use better quote from spec (Topi)
Cc: Chad Versace <chad.versace@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Background: Prior to Skylake and since Ivybridge Intel hardware has had the
ability to use a MCS (Multisample Control Surface) as auxiliary data in
"compression" operations on the surface. This reduces memory bandwidth. This
hardware was either used for MSAA compression, or fast clear operations. On
Gen8, a similar mechanism exists to allow the hiz buffer to be sampled from, and
therefore this feature is sometimes referred to more generally as "AUX buffers".
Skylake adds the ability to have the display engine directly source compressed
surfaces on top of the ability to sample from them. Inference dictates that
enabling this display features adds a restriction to the formats which could
actually be compressed. This is backed up by a blurb in the AUX_CCS_D section
from the RENDER_SURFACE_STATE: "In addition, if the surface is bound to the
sampling engine, Surface Format must be supported for Render Target Compression
for surfaces bound to the sampling engine." The current set of surfaces seems
to be a subset as compared to previous gens (see the next patch). Also, if I had
to guess I would guess that future gens add support for more surface formats. To
make handling this a bit easier to read, and more future proof, the support for
this is moved into the surface formats table.
Along with the modifications to the table, a helper function is also provided to
determine if a surface is CCS_E compatible. Because fast clears are currently
disabled on SKL, we can plumb the helper all the way through here, and not
actually have anything break.
v2:
- rename ccs to ccs_e; Requested-by: Chad
- rename lossless_compression to lossless_compression Requested-by: Chad
- change meaning of brw_losslessly_compressible_format Requested-by: Chad
- related changes to the code to reflect this.
- remove excess ccs (Chad)
v3:
- Commit message changes (Topi)
- Const some things which could be const (Topi)
Requested-by: Chad Versace <chad.versace@intel.com>
Requested-by: Neil Roberts <neil@linux.intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Patch was originally called:
i965/skl: Enable fast color clears on SKL
Skylake introduces some differences in the way that fast clears are programmed
and in the restrictions for using fast clears. Since some of these are
non-obvious, and fast clears are currently disabled globally, we can enable the
simple stuff here and leave the weirder stuff and separately reviewable work.
Based on a patch originally from Kristian.
Note that within this patch the change in scaling factors could be achieved with
this hunk instead. I've opted to keep things more like how the docs describe it
however.
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -150,9 +150,13 @@ intel_get_non_msrt_mcs_alignment(struct brw_context *brw,
/* In release builds, fall through */
case I915_TILING_Y:
*width_px = 32 / mt->cpp;
- *height = 4;
+ if (brw->gen >= 9)
+ *height = 2;
+ else
+ *height = 4;
v2: Add braces for the multiline (Matt + Chad)
Comment updates (requested by Chad)
Modified commit message
Commit message from Chad explaining the MCS height change (Chad)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Commit b9b40ef9b7 moved the file, but forgot to update the reference in
the makefile. Thus the out of tree build was busted :\
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
With llvm 3.7 semi-dropping the autoconf build, we rely on their cmake
build. With the latter of which annoyingly using another (busted?)
SONAME.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The queries_suspended_for_flush flag is redundant because suspended queries
are not removed from their respective linked list.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Some drivers (in particular radeon[si], but also freedreno judging from
a quick grep) may want to expose performance counters that cannot be
individually enabled or disabled.
Allow such drivers to mark driver-specific queries as requiring a new
type of batch query object that is used to start and stop a list of queries
simultaneously.
v3: adjust recently added nv50 queries
v2: documentation for create_batch_query
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
It is easy enough to pre-determine the required size, and arrays are
generally better behaved especially when they get large.
v2: make sure init_perf_monitor returns true when no counters are active
(spotted by Samuel Pitoiset)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Previously, when a performance monitor was initialized, an inner loop through
all driver queries with string comparisons for each enabled performance
monitor counter was used. This hurts when a driver exposes lots of queries.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This was only used to implement an unnecessarily restrictive interpretation
of the spec of AMD_performance_monitor. The spec says
A performance monitor consists of a number of hardware and software
counters that can be sampled by the GPU and reported back to the
application.
I guess one could take this as a requirement that counters _must_ be sampled
by the GPU, but then why are they called _software_ counters? Besides,
there's not much reason _not_ to expose all counters that are available,
and this simplifies the code.
v3: add a missing change in the nouveau driver (thanks Samuel Pitoiset)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
texel fetches don't use any samplers. Previously we just set the same
number for both texture and sampler unit (as per "ordinary" gl style
sampling where the numbers are always the same) however this would trigger
some assertions checking that the sampler index isn't over PIPE_MAX_SAMPLERS
limit elsewhere with d3d10, so just set to 0.
(Fixing the assertion instead isn't really an option, the sampler isn't
really used but might still pass an out-of-bound pointer around and even
copy some things from it.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The change is necessary to avoid building errors in glsl and i965
modules due to missing glsl_types.h header
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
In preparation for supporting GL_KHR_debug in OpenGL ES
v2: add a missing hunk in _mesa_IsEnabled (Emil)
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
As defined in the spec
when implemented in an OpenGL ES context, all entry points defined
by this extension must have a "KHR" suffix.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Current behavior on the interface matching:
layout (location = 0) out0; // Assigned to VARYING_SLOT_VAR0 by user
out1; // Assigned to VARYING_SLOT_VAR0 by the linker
New behavior on the interface matching:
layout (location = 0) out0; // Assigned to VARYING_SLOT_VAR0 by user
out1; // Assigned to VARYING_SLOT_VAR1 by the linker
v4:
* Fix variable name in assert
Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Analogous to previous commit. While we're here prefix all functions
identically -> vl_dri2_foo
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
As of last commit everyone is using the vl_screen dispatch, thus we can
hide this function from the headers and make it static.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
As mentioned previously, it will allow us to use different vl backend in
a generic way from either video state-tracker.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
In a preparation of having proper multi-platform/backend handling in VL.
With follow up commits we'll introduce a dispatch within vl_screen
similar to the one in pipe_screen. This way any VL state-tracker can
operate seamlessly, considering the backend/platform is properly setup.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
The current code is busted in a number of ways.
- initially checks for omx_display (rather than omx_screen), which may
or may not be around.
- blindly feeds the empty env variable string to loader_open_device()
- reads the env variable every time get_screen is called
- the latter manifests into memory leaks, and other issues as one sets
the variable between two get_screen calls.
Additionally it cleans up a couple of extra bits
- drops unneeded set/check of omx_display.
- make the teardown (put_screen) order was not symmetrical to the setup
(get_screen)
v2: Drop the "is empty string" check (Leo)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Rather than duplicating things, just use the generic AM_CPPFLAGS. This
has the fortunate side-effect of adding VISIBILITY_CFLAGS for the dri3
helper. The latter of which was erroneously exposing some internal
symbols.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reported-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On the vec4 backend, textureSamplesIdentical() will always return
false. There are currently no test cases for the vec4 backend, so we
don't have much confidence in any implementation. We also don't think
anyone is likely to miss it.
v2: Handle immediate value for MCS smarter. Rebase on changes to
nir_texop_sampels_identical (missing second parameter). Suggested by
Jason.
v3: Add Neil's code to handle 16x MSAA in the FS. Also rebase on top of
f9a9ba5e. Stub out the vec4 implementation.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v2]
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> [v2]
This is the NIR analog to GLSL IR ir_samples_identical.
v2: Don't add the second nir_tex_src_ms_index parameter. Suggested by
Ken and Jason.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
v2: Add Neil to the list of contributors. I meant to do that before,
but Matt reminded me.
v3: Fix typos noticed by Nicolai.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Note: not quite perfect, we should use type_size vfunc (in
compiler_options or nir_shader?) to determine how much we
increment num_inputs/outputs/uniforms. But we don't have
that yet, so let's at least fix things for the existing
users of these passes.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Otherwise, passing -1 gets you:
error: invalid conversion from 'int' to 'nir_variable_mode' [-fpermissive]
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This fixes teximage-colors, fbo-generatemipmap-formats, and probably
others (in relation to the RGB5 formats, others still fail).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
The lower layers assume that we support this, and it's been core since
GL 1.4. This fixes a slew of piglit tests, especially around
tex-miplevel-selection.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
The hardware can actually generates vertexid when vertices come from
a client-side buffer like when glDrawElements is used.
This doesn't fix (or break) any piglit tests but it improves the
previous attempt of Ilia (c830d19 "nv50: avoid using inline vertex
data submit when gl_VertexID is used")
The only disadvantage is that only works on G84+, but we don't really
care of that weird and old NV50 chipset.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
W/UW immediates are 16-bits, but those 16-bits must be replicated
in the high 16-bits of the 32-bit field.
Remove the useless W/UW immediate saturating code, since we'll now be
using the appropriate immediate (and W/UW immediates in the IR can now
no longer be larger than 16-bits).
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cuts 10k of .text, of which only 776 bytes are the fs_reg constructor
implementations themselves.
text data bss dec hex filename
5204535 214112 27784 5446431 531b1f i965_dri.so before
5193977 214112 27784 5435873 52f1e1 i965_dri.so after
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This partially reverts commit bbf8239f92.
I didn't like that commit to begin with -- computing things at compile
time is fine -- but for purposes of verifying that the resulting values
are correct, looking up 0x00 and 0x30 in a table is a lot better than
evaluating a recursive function.
Anyway, by making brw_imm_vf4() take the actual 8-bit restricted floats
directly (instead of only integral values that would be converted to
restricted float), we can use this function as a replacement for the
vector float src_reg/fs_reg constructors.
brw_float_to_vf() is not currently an inline function, so it will not be
evaluated at compile time. I'll address that in a follow-up patch.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Enable developers to know if the table's alphabetical sorting
is maintained or lost.
v2: Move "*" next to pointer name (Matt)
Include extensions_table.h instead of extensions.h (Ian)
Remove extra " *" in comment (Ian)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Make it easier to determine where to add new extensions.
Performed with the vim sort command.
v2: Insert newline after last #define (Matt)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
gl_MaxDualSourceDrawBuffersEXT - Maximum dual-source draw buffers supported
For ESSL 1.0, it provides two builtins since you can't have user-defined
color output variables:
gl_SecondaryFragColorEXT
gl_SecondaryFragDataEXT[MaxDSDrawBuffers]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This adds a state for the maximum dual source draw variables available
and the variable for determining if the extension has been enabled
in the program shaders.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
I think the intention was to mark the "this" parameter as const, but
const goes on the other end to do that.
In file included from glsl_symbol_table.cpp:26:0:
ast.h:339:35: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
const bool is_single_dimension()
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
This allows arbitrary non-constant indices on GS input arrays,
both for the vertex index, and any array offsets beyond that.
All indirects are handled via the pull model. We could potentially
handle indirect addressing of pushed data as well, but it would add
additional code complexity, and we usually have to pull inputs anyway
due to the sheer volume of input data. Plus, marking pushed inputs
as live due to indirect addressing could exacerbate register pressure
problems pretty badly. We'd need to be careful.
v2: Use updated MOV_INDIRECT opcode.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
- env GALLIUM_HUD_VISIBLE: control default visibility
- env GALLIUM_HUD_SIGNAL_TOGGLE: toggle visibility via signal
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This commit adds code for testing nir_shader_clone by running it after each
and every optimization pass and throwing away the old shader. Testing
nir_shader_clone is hidden behind a new INTEL_CLONE_NIR environment
variable.
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Failing to call nir_metadata_preserve() can have nasty consequences:
some pass breaks dominance information, but leaves it marked as valid,
causing some subsequent pass to go haywire and probably crash.
This pass adds a simple validation mechanism to ensure passes handle
this properly. We add a new bogus metadata flag that isn't used for
anything in particular, set it before each pass, and ensure it *isn't*
still set after the pass. nir_metadata_preserve will reset the flag,
so correct passes will work, and bad passes will assert fail.
(I would have made these functions static inline, but nir.h is included
in C++, so we can't bit-or enums without lots of casting...)
Thanks to Dylan Baker for the idea.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
OPT() is the normal macro for passes that return booleans, while OPT_V()
is a variant that works for passes that don't properly report progress.
(Such passes should be fixed to return a boolean, eventually.)
These macros take care of calling nir_validate_shader() and setting
progress appropriately. In the future, it would be easy to add shader
dumping similar to INTEL_DEBUG=optimizer by extending the macro.
v2 (Jason Ekstrand):
- Fix an unused variable warning
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The a4xx bits corresponding to 'freedreno/a3xx: add fake RGTC support
(required for GL3)'
TODO some more r/e.. maybe we get lucky and hw supports some of this
directly? For now this will help us enable gl3.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The main issue is that the current logic looked into cso->u.tex, which
is the wrong side of the union to look into for texture buffers. While I
was at it, it was easy enough to add the logic to handle offsets
(first_element).
- reduce texture buffer size limit (determined experimentally)
- don't look at first/last levels, instead look at first/last element
- include the first element offset
- set offset alignment to 16 (determined experimentally)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
A smarter implementation would make it possible to attach this to emit
state for the BY_REGION versions to avoid breaking the tiling. But this
is a start.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Also throw in LATC while we're at it (same exact format). This could be
made more efficient by keeping a shadow compressed texture to use for
returning at map time. However... it's not worth it for now...
presumably compressed textures are not updated often.
Lastly fix up Z32S8 transfers to non-0 layers.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The previously RE'd formats were from an ES driver implementing
OES_vertex_type_10_10_10_2 and thus backwards. A future change could add
the 2_10_10_10 support.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We'd end up in a state where shader uses no inputs, yet num_elements is
greater than zero. Triggered by a TF vertex shader which did:
gl_Position = vec4(0.0, 0.0, 0.0, 0.0);
resulting in a binning pass variant with no inputs.
Includes equiv fix in a4xx, even though we don't have binning-pass
enabled yet on a4xx.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
point_size_per_vertex is always TRUE for GLES, causing us to configure
the hw as if gl_PointSize was written, even if it was not. Which makes
for grumpy hw.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
In nv50, and in the python script that Rob circulated, we do:
bld.mkCmp(OP_SET, CC_GE, TYPE_U32, (s = bld.getSSA()), TYPE_U32, m, b);
Do the same in the nir div lowering pass. This fixes the large-udiv-udiv
piglit tests on freedreno.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This patch disables the use of VSX instructions, as they cause some
piglit tests to fail
For more details, see: https://llvm.org/bugs/show_bug.cgi?id=25503#c7
With this patch, ppc64le reaches parity with x86-64 as far as piglit test
suite is concerned.
v2:
- Added check that we have at least LLVM 3.4
- Added the LLVM bug URL as a comment in the code
v3:
- Only disable VSX if Altivec is supported, because if Altivec support
is missing, then VSX support doesn't exist anyway.
- Change original patch description.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
3DSTATE_TE has partitioning, output topology, and domain fields,
each of which has several enumerated values. We'll also need to
switch on the domain, so enums (rather than #defines) seem like a
natural fit.
I chose to put these in brw_compiler.h because they'll be stored
in struct brw_tes_prog_data, which will live there.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We always want to prefer the VGPU10 formats over the VGPU9 ones when
we have VGPU10 support.
Original patch by Jose and updated by Brian.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This is important for the case of sampling from a depth texture. In
that case, we need to sample the texture as if it were a single-channel
color texture. For other/color formats, we can use the format as-is.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The idea here is that driver queries implemented outside of common code
will use the same query buffer handling with different logic for starting
and stopping the corresponding counters.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
Move r600_query and r600_query_hw into the header because we will want to
reuse the buffer handling and suspend/resume logic outside of the common
radeon code.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
Software queries are all queries that do not require suspend/resume
and explicit handling of result buffers.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
The goal here is to be able to move the implementation details of hardware-
specific queries (in particular, performance counters) out of the common code.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
[Fixed a rebase conflict and re-tested before pushing.]
More query-related structures will have to be moved into their own
header file to support hardware-specific performance counters.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There are currently a bunch of formats that behave strangely when
sampling the cleared color from the MCS buffer on SKL. They seem to
mostly be formats that don't have an alpha component, although it's
not all of them, and we haven't yet found anything in the specs which
would explain this. For now to be on the safe side this patch just
prevents fast clears for MSRTs on SKL altogether so that when fast
clears are eventually enabled it will only be for single-sampled
surfaces. The assumption is that clears are probably more likely to be
used in single-sampled applications anyway so we can at least get them
working and we can enable MSRTs later once we understand the problem
better.
This patch should have no functional effect other than perhaps
receiving fewer perf_debug messages on SKL+.
v2: Improve the commit message to avoid saying the patch disables fast
clears because it will be merged before fast clears are enabled
for any surfaces so it doesn't actually disable anything.
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
DEQP likes to do math on uniforms, and the "fmaxabs dst, uni, uni" to get
the absolute value would get lowered. The lowering doesn't bother to try
to restrict the lifetime of the lowered uniforms, so we'd end up register
allocation failng due to this on 5 of the tests (More tests still fail in
RA, which look like we'll need to reduce lowered uniform lifetimes to
fix).
No changes on shader-db, though fewer extra MOVs are generated on even
glxgears (MOVs pair well enough that it ends up being the same instruction
count).
It looks like nir_lower_idiv is going to use it soon, so add support.
With Ilia's change, this fixes one case in fs-op-div-large-uint-uint (with
GL 3.0 forced on).
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
This helps address a coverity warning and prevents future questions about this
code.
Reported-by: Coverity (via Ilia)
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Checking that the flag has been set is all the validation thats
needed here.
Also not calling the binding validation function will make things
much simpler when adding compile time constant support as we
won't need to resolve the binding value.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Previously if the member was an array of matrices then a
warning message would be incorrectly given.
Also the struct case could never be met so it has been removed.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Previously we only gave the location for some members and never
gave the variable location. In those cases we were just giving
the location of the struct/block.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
For struct and block members previously we were doing it for
every variable declaration.
So for example
struct S {
atomic_uint x, y, z;
};
Would previously generate three error messages when one is sufficient.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
We now also only apply these rules to variables rather than also
trying to apply them to function params.
V2: move code for handling stream layout qualifier
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Even though both tessellation shader stages must be used together, I
still think it makes sense to add separate debug flags for each stage.
It makes it possible to read the TCS/HS, rule out problems, then read
the TES/DS separately, without sifting through as much printed text.
I decided to add both the GL names (tcs/tes) and hardware names (hs/ds)
so they can be used interchangeably.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This is needed for the FILE * type in brw_print_vue_map().
Apparently, all files that include brw_compiler.h already pick this up
via some include chain, so this isn't actually a build fix. However,
I have patches which introduce new consumers of brw_compiler.h that
fail to build because of the missing #include.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
v2: From Martin Peres
- Tell we are compiling the dri3 backend in configure.ac
- Update the Makefile.am
- get rid of the LIBDRM_HAS_RENDERNODE_SUPPORT macro
- fix some warnings related to EGLuint64KHR to int64_t conversions
- use dri2_get_dri_config to get the __DRIconfig instead of open-coding it
- replace the occasional tabs with spaces
v3: From Martin Peres
- fix and indent problem (Matt Turner)
- drop the authenticate function, use NULL in the vtable instead (Emil)
- drop some useless includes (Emil Velikov)
- mandate libdrm (Emil Velikov)
- link to xcb-dri3 (Kristian Høgsberg)
- convert to the new loader interface for drwable (Kristian)
- remove some dead code after the dropping of some vfuncs (Kristian)
- add a comment on the topic of rendering to the frontbuffer
v4: From Martin Peres
- do not expose the preserved swap behavior (Acked by Eric Anholt)
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
dri3 for EGL will use different struct other than dri2_egl_surface for
an EGL surface, the common code only uses __DRIdrawable from that
struct, so instead of converting _EGLSurface to dri2_egl_surface, let
the platform code return the __DRIdrawable by its own (although the
current platforms use the same function).
v2: From Martin Peres
- convert to the new drawable interface (Kristian)
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
v2: From Martin Peres
- convert to the new drawable interface
- delete dead code after the dropping of some vfuncs
- delete the width and height attributes since they are found in the helper
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
v2: From Martin Peres
- Try to fit in the 80-col limit as much as possible
v3: From Martin Peres
- introduce loader_dri3_helper.la to avoid dragging the xcb dep everywhere (Kristian & Emil)
- get rid of the width, height, dri_screen and is_different_gpu vfuncs (Kristian)
- replace the create/destroy functions with init/fini for dri3 drawables
- prefix static functions with dri3_ and exported ones with loader_dri3 (Emil)
- keep the function definition consistent (Emil)
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
brw_compile_gs() should return a pointer to unsigned, but it is returning the
bool 'false' at some point, hence annoying us with a compiler warning:
In function 'const unsigned int* brw::brw_compile_gs(const brw_compiler*,
void*, void*, const brw_gs_prog_key*, brw_gs_prog_data*, const nir_shader*,
gl_shader_program*, int, unsigned int*, char**)':
brw_vec4_gs_visitor.cpp:776:14: warning: converting 'false' to pointer type
'const unsigned int*' [-Wconversion-null]
return false;
^
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Precision qualifier should be ignored on desktop OpenGL.
v2: include spec quote (Samuel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Up until now, we've been letting core Mesa initialize it to 36 for us
(which is presumably BRW_MAX_UBO (12) * (VS+GS+FS stages -> 3)).
With compute and tessellation, we need to increase this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This was getting pretty out of hand, and with compute partially in place
and tessellation on the way, it was only going to get worse.
This patch makes a "stage exists?" predicate and a "number of stages"
count and uses them to clean up a lot of calculations. We can just
loop over shader stages and set things for the ones that exist. For
combined counts, we can just multiply by the number of stages.
It also tries to organize a little bit.
We should probably use _mesa_has_geometry_shaders/tessellation/compute
here, but we can't because ctx->Version isn't initialized yet. Perhaps
that could be fixed in the future.
No change in "glxinfo -l" on Broadwell.
v2: Drop stray compute shader hunk. Mark stage_exists as const.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
I was going to add scalar_tcs and scalar_tes flags, and then thought
better of it and decided to convert this to an array. Simpler.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Since 779cabfc7d the same txformat table entries
are used for "normal" texturing as well as for blits. However, I forgot to put
in an entry for the bgrx8 (le) and xrgb8 (be) formats - the normal texturing
path can't hit them because the radeon tex format chooser will never chose
them, but we get that format from the dri buffers (at least I assume we got
it from there).
This is untested but essentially addressing the same bug as for radeon.
(I don't think that the second entry per le/be table is actually necessary,
but shouldn't hurt...)
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Since d21320f625 the same txformat table entries
are used for "normal" texturing as well as for blits. However, I forgot to put
in an entry for the bgrx8 (le) and xrgb8 (be) formats - the normal texturing
path can't hit them because the radeon tex format chooser will never chose
them, but we get that format from the dri buffers (at least I assume we got
it from there). This caused lots of piglit regressions (and probably lots of
trouble outside piglit too).
This fixes bug https://bugs.freedesktop.org/show_bug.cgi?id=92900.
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Previously GL_FRAMEBUFFER was used. However, if GL_EXT_framebuffer_blit
is supported (note: it is supported by every Mesa driver), this is
*sometimes* an alias for GL_DRAW_FRAMEBUFFER (getters) and *sometimes*
an alias for *both* GL_DRAW_FRAMEBUFFER and GL_READ_FRAMEBUFFER
(setters). As a result, the code saved one binding but modified both.
If the bindings were different, the GL_READ_FRAMEBUFFER would be
incorrect on exit.
Fixes the piglit fbo-generatemipmap-versus-READ_FRAMEBUFFER test.
Ideally this function would use DSA functions and not modify the binding
at all. However, that would be a much more intrusive change because
_mesa_meta_bind_fbo_image would also need to be modified.
_mesa_meta_bind_fbo_image has a lot of callers. Much of this code is
about to get a major rework due to bug #92363, so I don't think it
matters too much. In fact, I discovered this bug while working on the
other bug. Le bon temps!
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Replace the current loop by a direct call to _mesa_fls() function.
It also fixes an implicit bug in the current code where num_textures
seems to be one value less than it should be when sh->Program->SamplersUsed > 0.
For instance, num_textures is 0 instead of 1 when
sh->Program->SamplersUsed is 1.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If a source operand in a MOV has source modifiers, then we cannot
copy-propagate it from the parent instruction and remove the MOV.
v2: remove the check for source modifiers from is_move() (Jason)
v3: Put the check for source modifiers back into is_move() since
this function is called from copy_prop_alu_src(). Add source
modifiers checks to is_vec() instead.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Only the regular "clear" call is supposed to respect the render
condition. The rest should ignore it.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The geometry and tessellation control shader stages both read from
multiple URB entries (one per vertex). The thread payload contains
several URB handles which reference these separate memory segments.
In GLSL, these inputs are represented as per-vertex arrays; the
outermost array index selects which vertex's inputs to read. This
array index does not necessarily need to be constant.
To handle that, we need to use indirect addressing on GRFs to select
which of the thread payload registers has the appropriate URB handle.
(This is before we can even think about applying the pull model!)
This patch introduces a new opcode which performs a MOV from a
source using VxH indirect addressing (which allows each of the 8
SIMD channels to select distinct data.)
Based on a patch by Jason Ekstrand.
v2: Rename from INDIRECT_THREAD_PAYLOAD_MOV to MOV_INDIRECT; make it
a bit more generic. Use regs_read() instead of hacking up the
register allocator. (Suggested by Jason Ekstrand.)
v3: Fix regs_read() to be more accurate for small unaligned regions.
Also rebase on Matt's work.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v3]
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> [v1]
These compute-related MP performance counters have been reverse
engineered using CUPTI which is part of NVIDIA CUDA.
As for nvc0, we use a compute kernel to read out those performance
counters, and the command stream to configure them. Note that Tesla
only exposes 4 MP performance counters, while Fermi has 8.
Only G84+ is supported because G80 is an old and weird card.
Tested on G84, G96, G200, MCP79 and GT218 with glxgears, glxspheres64,
xonotic-glx, heaven and valley.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
This adds the ability to launch simple compute kernels like the one I
will use to read out MP performance counters in the upcoming patch.
This compute support is based on the work of Francisco Jerez (aka curro)
that he did as part of his EVoC project in 2011/2012 to get OpenCL
working on Tesla. His original work can be found here:
https://github.com/curro/mesa/commits/nv50-compute
I did some improvements on the original code, like fixing using both 3D
and COMPUTE simultaneously, improving global buffers binding, and making
the code closer to what nvc0 already does. This compute support has been
tested by Pierre Moreau and myself with some compute kernels. This is a
step towards OpenCL.
Speaking about this, it seems like compute programs overlap fragment
programs when they are used both. To fix this, we need to re-validate
fragment programs when binding compute programs and vice versa.
Note that, textures, samplers and surfaces still need to be implemented.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
As for nvc0, we need to free memory allocated by interpolation
parameters. This fixes a memory leak spotted by valgrind.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
No need to allocate more GPR than used in the compute kernel which
reads MP performance counters on Fermi.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
nir/nir_control_flow.c: In function ‘split_block_cursor.isra.11’:
nir/nir_control_flow.c:460:15: warning: ‘after’ may be used uninitialized in this function [-Wmaybe-uninitialized]
*_after = after;
^
nir/nir_control_flow.c:458:16: warning: ‘before’ may be used uninitialized in this function [-Wmaybe-uninitialized]
*_before = before;
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
We need to use per-slot offsets when there's non-uniform indexing,
as each SIMD channel could have a different index. We want to use
them for any non-constant index (even if uniform), as it lives in
the message header instead of the descriptor, allowing us to set
offsets in GRFs rather than immediates.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
GLSL 4.00 and GL_ARB_gpu_shader5 introduced a new int -> uint implicit
conversion rule and updated the rules for modulus to use them. (In
earlier languages, none of the implicit conversion rules did anything
relevant, so there was no point in applying them.)
This allows expressions such as:
int foo;
uint bar;
uint mod = foo % bar;
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
I've been carrying around a patch to do this for the last few months,
and it's been exceedingly useful for debugging GS and tessellation
problems. I've caught lots of bugs by inspecting the interface
expectations of two adjacent stages.
It's not that much spam, so I figure we may as well just print it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
This makes expressions like component(fs_reg(ATTR, n), 7) get a proper
<0,1,0> region instead of the invalid <0,8,0>.
Nobody uses this today, but I plan to.
v2: Rebase on Matt's changes; simplify.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
With the many variants of IO intrinsics, particular sources are often in
different locations. It's convenient to say "give me the indirect
offset" or "give me the vertex index" and have it just work, without
having to think about exactly which kind of intrinsic you have.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We'd like to shadow these when possible, but the current code doesn't
work properly for TCS outputs. For now, disable it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Normally, we rely on nir_lower_outputs_to_temporaries to create shadow
variables for outputs, buffering the results and writing them all out
at the end of the program. However, this is infeasible for tessellation
control shader outputs.
Tessellation control shaders can generate multiple output vertices, and
write per-vertex outputs. These are arrays indexed by the vertex
number; each thread only writes one element, but can read any other
element - including those being concurrently written by other threads.
The barrier() intrinsic synchronizes between threads.
Even if we tried to shadow every output element (which is of dubious
value), we'd have to read updated values in at barrier() time, which
means we need to allow output reads.
Most stages should continue using nir_lower_outputs_to_temporaries(),
but in theory drivers could choose not to if they really wanted.
v2: Rebase to accomodate Jason's review feedback.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Similar to nir_load_per_vertex_input, but for outputs. This is not
useful in geometry shaders, but will be useful in tessellation shaders.
v2: Change stage_uses_per_vertex_outputs() to is_per_vertex_output(),
taking a nir_variable (requested by Jason Ekstrand).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tessellation control shader inputs are an array indexed by the vertex
number, like geometry shader inputs. There aren't per-patch TCS inputs.
Tessellation evaluation shaders have both per-vertex and per-patch
inputs. Per-vertex inputs get the new intrinsics; per-patch inputs
continue to use the ordinary load_input intrinsics, as they already
work like we want them to.
v2: Change stage_uses_per_vertex_inputs into is_per_vertex_input(),
which takes a variable (requested by Jason Ekstrand).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
intel_asm_annotation.c: In function ‘annotation_insert_error’:
intel_asm_annotation.c:214:18:
warning: ‘ann’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
ann->error = ralloc_strdup(annotation->mem_ctx, error);
^
I initially tried changing the type of ann_count to unsigned (is
currently int), since that in addition to the check that it's non-zero
at the beginning of the function seems sufficient to prove that it must
be greater than zero. Unfortunately that wasn't sufficient.
The first four values (2-bits) are hardware values, and VGRF, ATTR, and
UNIFORM remain values used in the IR.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
HW_REGs are (were!) kind of awful. If the file was HW_REG, you had to
look at different fields for type, abs, negate, writemask, swizzle, and
a second file. They also caused annoying problems like immediate sources
being considered scheduling barriers (commit 6148e94e2) and other such
nonsense.
Instead use ARF/FIXED_GRF/MRF for fixed registers in those files.
After a sufficient amount of time has passed since "GRF" was used, we
can rename FIXED_GRF -> GRF, but doing so now would make rebasing awful.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The fs_reg() constructors for immediates set stride to 0, except for
vector-immediates, which set stride to 1. This patch makes the fs_reg
constructor that takes a brw_reg do likewise, so that stride is set
correctly for cases such as fs_reg(brw_imm_v(...)).
The generator asserts that this is true (and presumably it's useful in
some optimization passes?) and the VF fs_reg constructors did this (by
virtue of the fact that it doesn't override what init() does).
In the next commit, calling this constructor with brw_imm_* will generate
an IMM file register rather than a HW_REG, making this change necessary
to avoid breakage with existing uses of brw_imm_v().
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We use brw_imm_v() to produce type-V immediates, which generates a
brw_reg with fs_reg's .file set to HW_REG. The next commit will rid us
of HW_REGs, so we need to handle BRW_REGISTER_TYPE_V in the IMM case.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The 2-bit hardware register file field is ARF, GRF, MRF, IMM.
Rename GRF to VGRF (virtual GRF) so that we can reuse the GRF name to
mean an assigned general purpose register.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'm going to begin using brw_reg's file field in backend_reg and its
derivatives, and in order to keep the hardware value for ARF as 0, we
have to do something different.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The test (file == BAD_FILE) works on registers for which the constructor
has not run because BAD_FILE is zero. The next commit will move
BAD_FILE in the enum so that it's no longer zero.
In the case of this->outputs, the constructor was being run implicitly,
and we were unnecessarily memsetting is to zero.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In addition to combining another field, we get replace silliness like
"reg.reg" with something that actually makes sense, "reg.nr"; and no one
will ever wonder again why dst.reg isn't a dst_reg.
Moving the now 16-bit nr field to a 16-bit boundary decreases code size
by about 3k.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since backend_reg now inherits brw_reg, we can use it in place of the
fixed_hw_reg field.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Put fields that are meaningless with an immediate in the same storage
with the immediate. This leaves fields type, file, nr, subnr in the
first dword where there's now extra room for expansion.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Generated by
sed -i -e 's/\.bits\././g' *.c *.h *.cpp
sed -i -e 's/dw1\.//g' *.c *.h *.cpp
and then reverting changes to comments in gen7_blorp.cpp and
brw_fs_generator.cpp.
There wasn't any utility offered by forcing the programmer to list these
to access their fields. Removing them will reduce churn in future
commits.
This is C11 (and gcc has apparently supported it for sometime
"compatibility with other compilers")
See https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Switching from an implicitly-sized type field to field with an explicit
bit width is safe because we have fewer than 2^4 types, and gcc will
warn if you attempt to set a value that will not fit.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead use the ones provided by brw_reg. Also allows us to handle
HW_REGs in the negate() functions.
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since the types of the expression were
bool ? src_reg : (bool ? brw_reg : brw_reg)
the result of the second (nested) ternary would be implicitly
converted to a src_reg by the src_reg(struct brw_reg) constructor. I.e.,
bool ? src_reg : src_reg(bool ? brw_reg : brw_reg)
In the next patch, I make backend_reg (the parent of src_reg) inherit
from brw_reg, which changes this expression to return brw_reg, which
throws away any fields that exist in the classes derived from brw_reg.
I.e.,
src_reg(bool ? brw_reg(src_reg) : bool ? brw_reg : brw_reg)
Generally this code was gross, and wasn't actually shorter or easier to
read than an if ladder.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This reduces the shader key for ES.
Use a fixed attrib location based on (semantic name, index).
The ESGS item size is determined by the physical index of the highest ES
output, so it's almost always larger than before, but I think that
shouldn't matter as long as the ESGS ring buffer is large enough.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
I discovered that increasing the ESGS ring size fixes GS hangs on Tonga,
so let's do it properly.
There is now a separate init_config_gs_rings state that is not immutable,
because GS rings are resized when needed.
This also saves some memory. Most apps won't need more than 1MB
per ring per shader engine.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Not needed anymore. A similar flag will be introduced in the next commit,
which will be private in radeonsi.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
need_cs_space isn't invoked so often and is called before all commands too.
This is a lot cleaner. The code in radeon_add_to_buffer_list always seemed
dodgy to me.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
KCACHE, TC L1 and TC L2 are renamed to:
- SMEM L1
- VMEM L1
- GLOBAL L2
You can easily tell what they are used for now.
Shaders must deal with coherency issues between both L1s manually,
e.g. by setting GLC=1 or by using s_dcache_*.
BOTH_ICACHE_KCACHE was an unused definition.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This can't crash currently, but it would crash if clear_buffer
from u_blitter were used with a clean context.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Discovered by luck. This code path hasn't been exercised since transform
feedback was implemented.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Use a LIBDIR variable, set per-platform.
Update the Mesa configuration flags.
Run update-initramfs or dracut, update /etc/modules
Signed-off-by: Brian Paul <brianp@vmware.com>
eglSwapBuffersWithDamage accepts damage-region rectangles to hint the
compositor that it only needs to redraw certain areas, which was passed
through the wl_surface_damage request, as designed.
Wayland also offers a buffer transformation interface, e.g. to allow
users to render pre-rotated buffers. Unfortunately, there is no way to
query buffer transforms, and the damage region was provided in surface,
rather than buffer, co-ordinate space.
Users could in theory account for this themselves, but EGL also requires
co-ordinates to be passed in GL/mathematical co-ordinate space, with an
inversion to Wayland's natural/scanout co-ordinate space, so
transformations other than a 180-degree rotation will fail as EGL
attempts to subtract the region from (its view of the) surface height.
Pending creation and acceptance of a wl_surface.buffer_damage request,
which will accept co-ordinates in buffer co-ordinate space, pessimise to
always sending full-surface damage.
bce64c6c provides the explanation for why we send maximum-range damage,
rather than the full size of the surface: in the presence of buffer
transformations, full-surface damage may not actually cover the entire
surface.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The change proposed in the review leads to piglit regressions because
is_move() is used in other places and relies on the checks for source
modifiers to be there.
Revert this until we agree on a better solution.
Commit 8b28b35 added 'shared' as a keyword for compute shaders
but it broke the existing 'shared' layout qualifier support for
uniform and shader storage blocks.
This patch fixes 578 dEQP-GLES31.functional.ssbo.* tests.
v2:
- Move SHARED to interface_block_layout_qualifier (Timothy)
- Don't remove "shared" case insensitive check (Timothy)
- Remove the clearing of shared_storage flag (Timothy)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
If a source operand in a MOV has source modifiers, then we cannot
copy-propagate it from the parent instruction and remove the MOV.
v2: remove the check for source source modifiers from is_move() (Jason)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This was a remnant of an early attempt to handle output reads in
vars_to_ssa. That attempt was abandon a long time ago but these few lines
were aparently left in the pass and managed to evade review.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Previously, we walked through a given deref_node's copies and, after
lowering the copy away, removed it from both the source and destination
copy sets. This commit changes this to only remove it from the other
node's copy set (not the one we're lowering). At the end of the loop, we
just throw away the copy set for the node we're lowering since that node no
longer has any copies. This has two advantages:
1) It's more efficient because we're doing potentially half as many set
search operations.
2) It now properly handles copies from a node to itself. Perviously, it
would delete the copy from the set when processing the destinatioon and
then assert-fail when we couldn't find it for the source.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92588
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The shader-subroutine code creates uniforms of type SUBROUTINE for
subroutines that are then read as integers in the backends. If we ever
want to do any optimizations on these, we'll need to come up with a better
plan where they are actual scalars or something, but this works for now.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92859
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Mesa unconditionally sets this driver flag to true in
_mesa_init_extensions(). There is therefore no need for
the driver to communicate support for this extension.
Replace the driver capability flag with ::dummy_true.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This commit accidentally used a '==' when '=' was intended.
commit 96b22fb080
Author: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Date: Wed Nov 4 14:58:54 2015 -0800
glsl: Use array deref for access to vector components
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Make API context and version checks done by the helper functions pass
unconditionally while meta is in progress. This transparently makes
extension checks solely dependent on struct gl_extensions while in meta.
v2: Use an 8-bit data type instead of a GLuint
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Rename the following types and variables:
* struct extension -> struct mesa_extension,
like the mesa_format type.
* extension_table -> _mesa_extension_table,
like the _mesa_extension_override_{enables,disables} structs.
Suggested-by: Marek Olšák <marek.olsak@amd.com>
Suggested-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Generate functions which determine if an extension is supported in the
current context. Initially, enums were going to be explicitly used with
_mesa_extension_supported(). The idea to embed the function and enums
into generated helper functions was suggested by Kristian Høgsberg.
For performance, the function body no longer uses
_mesa_extension_supported() and, as suggested by Chad Versace, the
functions are also declared static inline.
v2: Place function qualifiers on separate line (Chad)
v3: Move function curly brace to new line (Chad)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
The api_set field has no users outside of _mesa_extension_supported().
Remove it and allow the version field to take its place.
The brunt of the transformation was performed with the following vim commands:
s/\(GL [^,]\+\),\s*\d*,\s*\d*\(,\s*\d*\)\(,\s*\d*\)/\1, GLL, GLC\2\3/g
s/\(GLL [^,]\+\)\,\s*\d*/\1, GLL/g
s/\(GLC [^,]\+\)\(,\s*\d*\),\s*\d*\(,\s*\d*\)\(,\s*\d*\)/\1\2, GLC\3\4/g
s/\( ES1[^,]*\)\(,\s*\(\w\|\d\)\+\)\(,\s*\(\w\|\d\)\+\),\s*\d*/\1\2\4, ES1/g
s/\( ES2[^,]*\)\(,\s*\(\w\|\d\)\+\)\(,\s*\(\w\|\d\)\+\)\(,\s*\(\w\|\d\)\+\),\s*\d*/\1\2\4\6, ES2/g
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Replace open-coded checks for extension support with
_mesa_extension_supported().
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Create a function which determines if an extension is supported in the
current context.
v2: Use common variable names (Emil)
Insert new line between variables and return statement (Chad)
Rename api_set variable to api_bit (Chad)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Enable limiting advertised extension support by context version with
finer granularity. This new field is currently unused and is set to
0 everywhere. When it is used, a value of 0 will indicate that the
extension is supported for any version of a context.
v2: Use uint*t type for version and note the expected values (Emil)
Use an 8-bit data type
Reformat macro for better readability (Chad)
v3: Note preparatory nature of commit (Chad)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
With this infrastructure set in place, we can now reuse the entries to
generate useful code.
v2: Add the new file into Makefile.sources (Emil)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Now that we're using macros, remove the redundant text from each entry.
Remove comments between the entries to make editing easier and separate
the sections with blank lines. Structure the EXT macros in a way that
helps reviewers verify that no meaning has been altered.
v2: Indent the entries (Chad)
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Simplify future updates to the extension struct array by removing
the sentinel.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Initially just checks that sources are non-NULL, which would have
alerted us to the problem fixed by commit 6c846dc5.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Will allow annotations to contain error messages (indicating an
instruction violates a rule for instance) that are printed after the
disassembly of the block.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Often annotations are identical between sets of consecutive
instructions. We can perhaps avoid some memory allocations by reusing
the previous annotation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
And why did IFF have a destination?
I suspect that once upon a time the disassembler used this information
to know which fields to find the jump targets in. The jump targets have
moved, so the disassembler has to know how to handle these
per-generation anyway.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add some instructions: illegal, movi, sends, sendsc.
Remove some instructions with reused opcodes: msave, mrestore, push,
pop, goto. I did have some gross code for disassembling opcodes
per-generation, but there's very little meaningful overlap so it's
probably not needed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This would have caught the locking bug that was fixed in the earlier
"st/wgl: fix locking issue in stw_st_framebuffer_present_locked()"
patch.
v2: minor coding style changes by Brian.
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
v2: update comments on the stw_framebuffer::mutex field regarding locking
order.
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
To get declaration for debug_printf() directly instead of getting it
indirectly through os_thread.h
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This is Windows-only code so we can use the native Win32 functions for
critical sections. This will also allow us to (cleanly) add some mutex
check/debug code in subsequent patches.
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
We support "cpu" but not "cpu#" because there's no good way of querying
per-cpu usage. Also, the cpu usage is for the process, not the whole
system.
Original code cobbled together by Brian and then fixed/polished by Jose.
Signed-off-by: Brian Paul <brianp@vmware.com>
Patch sets matrix_stride as 0 for non matrix uniforms that are in a
atomic counter buffer. Matrix stride calculation for actual matrix
uniforms is done during link_assign_uniform_locations.
From ARB_program_interface_query specification:
GL_MATRIX_STRIDE:
"For active variables not declared as a matrix or array of matrices,
zero is written to <params>. For active variables not backed by a
buffer object, -1 is written to <params>, regardless of the variable
type."
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
This information will be used by cross stage validation of varyings
for pipeline objects.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We will need this later on when we implement proper support for
precision qualifiers in the drivers and also to do link time checks for
uniforms as indicated by the spec.
This patch also adds compile-time checks for variables without precision
information (currently, Mesa only checks that a default precision is set
for floats in fragment shaders).
As indicated by Ian, the addition of the precision information to
ir_variable has been done using a bitfield and pahole to identify an
available hole so that memory requirements for ir_variable stay the
same.
v2 (Ian):
- Avoid if-ladders by defining arrays of supported sampler names and
indexing
into them with type->sampler_array + 2 * type->sampler_shadow
- Make the code that selects the precision qualifier to use an utility
function
- Fix a typo
v3 (Tapani):
- rebased
- squashed in "Precision qualifiers are not allowed on structs"
- fixed select_gles_precision for sampler arrays
- fixed precision_qualifier_allowed for arrays of structs
v4 (Tapani):
- add atomic_uint handling
- do not allow precision qualifier on images
(issues reported by Marta)
v5 (Tapani):
- support precision qualifier on image types
v6 (Tapani):
- set precision qualifier on interface block members
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Notice that the spec requires that a default precision has been set for every
type used by a shader that can use a precision qualifier and does not have a
predefined precision, however, at the moment, Mesa only checks this for floats
in the fragment shader. This is probably because the GLSL ES 1.0 specs mentions
this case specifically, but GLSL ES 3.0 clarifies that the same applies to
other types:
"The fragment language has no default precision qualifier for floating point
types. Hence for float, floating point vector and matrix variable
declarations, either the declaration must include a precision qualifier or
the default float precision must have been previously declared. Similarly,
there is no default precision qualifier for the following sampler types in
either the vertex or fragment language:
sampler3D;
samplerCubeShadow;
sampler2DShadow;
sampler2DArray;
sampler2DArrayShadow;
isampler2D;
isampler3D;
isamplerCube;
isampler2DArray;
usampler2D;
usampler3D;
usamplerCube;
usampler2DArray;"
we will fix this in a later patch.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
The GLSL ES spec specifies default precision qualifiers for certain types,
so populate the symbol table with these.
Notice that the desktop GLSL spec also indicates defaults for some types
but this is not really useful since precision qualifiers are completely
ignored in desktop GLSL.
v2: simplify and add samplerExternalOES, specified by
OES_EGL_image_external (Tapani)
v3: add atomic_uint (reported missing by Marta)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
These have scoping rules that match the ones defined for other things such
as variables, so we want them in the symbol table.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
FS_OPCODE_GET_BUFFER_SIZE is calculated with a resinfo's sampler message.
This patch adjusts the number of registers written by the opcode
following what the PRM spec says about the number of registers written
by the SIMD8 and SIMD16's writeback messages for sampler messages.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The comment in the code details the restriction. Thanks to Ken for having a very
helpful conversation with me, and spotting the blurb in the link I sent him :P.
There are still stability problems for me on GT4, but this definitely helps with
some of the failures.
v2: Comment fixes
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Many intrinsics only apply to a particular stage (such as discard).
In other cases, we may want to interpret them differently based on
the stage (such as load_primitive_id or load_input).
The current method isn't that pretty - we handle all intrinsics in
one giant function. Sometimes we assert on stage, sometimes we forget.
Different behaviors are handled via if-ladders based on stage.
This commit introduces new nir_emit_<stage>_intrinsic() functions,
and makes nir_emit_instr() call those. In turn, those fall back to
the generic nir_emit_intrinsic() function for cases they don't want
to handle specially.
This makes it clear which intrinsics only exist in one stage, and makes
it easy to handle inputs/outputs differently for various stages.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
For compressed textures, the image size is not necessarily a multiple of
the block size (e.g. the last mip levels). Section 18.3.2 (Copying
Between Images) of the OpenGL 4.5 Core Profile spec says:
An INVALID_VALUE error is generated if the dimensions of either
subregion exceeds the boundaries of the corresponding image
object, or if the image format is compressed and the dimensions of
the subregion fail to meet the alignment constraints of the
format.
and Section 8.7 (Compressed Texture Images) says:
An INVALID_OPERATION error is generated if any of the following
conditions occurs:
* width is not a multiple of four, and width + xoffset is not
equal to the value of TEXTURE_WIDTH.
* height is not a multiple of four, and height + yoffset is not
equal to the value of TEXTURE_HEIGHT.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92860
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Just a minor code change to make it obvious that NULL is returned when
we don't find the given HWND.
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
When stw_st_framebuffer_present_locked() is called, the
stw_framebuffer's mutex will already be locked. Normally, the
stw_framebuffer_present_locked() function calls
stw_framebuffer_release() to unlock the mutex when it's done. But if
for some reason the 'resource' pointer in
stw_st_framebuffer_present_locked() is null, we'd return without
unlocking the stw_framebuffer. This fixes that to avoid potential
deadlocks.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
A while back, we moved to directly emitting the Gen7+ state when
constructing the binding tables. These flags are only used on
Gen4-6, which emit all the binding table pointers at once.
We gain nothing by having separate flags, so combine them.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Inspired by a patch by Fabian Bieler.
Fabian defined a _3DPRIM_PATCHLIST_0 macro (which isn't actually a valid
topology type); I instead chose to make a macro that takes an argument.
He also took the number of patch vertices from _mesa_prim (which was set
to ctx->TessCtrlProgram.patch_vertices) - I chose to use it directly to
avoid the need for the VBO patch.
v2: Change macro to 0x20 + (n - 1) instead of 0x1F + n to better match
the documentation (suggested by Ian).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When both fadd and fmul instructions have at least one operand that is a
constant and it is only used once, the total number of instructions can
be reduced from 3 (1 ffma + 2 load_const) to 2 (1 fmul + 1 fadd); because
the constants will be progagated as immediate operands of fmul and fadd.
This patch detects these situations and prevents fusing fmul+fadd into ffma.
Shader-db results on i965 Haswell:
total instructions in shared programs: 6235835 -> 6225895 (-0.16%)
instructions in affected programs: 1124094 -> 1114154 (-0.88%)
total loops in shared programs: 1979 -> 1979 (0.00%)
helped: 7612
HURT: 843
GAINED: 4
LOST: 0
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Because the next patch will add an optimization that is specific to i965,
we want to move this loweing pass to that driver altogether.
This is safe because i965 is the only consumer.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We've assumed that we could lower per-component vector access from
vec[i] = scalar
to
vec = ir_triop_vector_insert(vec, scalar, i)
but with SSBOs (and compute shader SLM and tesselation outputs) this is
no longer valid. If a vector is "externally visible", multiple threads
can write independent components simultaneously. With lowering to
ir_triop_vector_insert, each thread read the entire vector, changes one
component, then writes out the entire vector. This is racy.
Instead of generating a ir_binop_vector_extract when we see v[i], we
generate ir_dereference_array. We then add a lowering pass to lower the
ir_dereference_array to ir_binop_vector_extract for rvalues and for to
vector_insert for lvalues in a separate lowering pass.
The resulting IR is the same as before, but we now have a window between
ast->ir conversion and the lowering pass where v[i] appears in the IR as
an array deref. This lets us run lowering passes that lower the vector
access to I/O (eg for SSBO load/store) before we lower the per-component
access to full vector writes.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
All GLSL IR consumers run this lowering pass so we can move it to the
linker. This moves the pass up quite a bit, but that's the point: it
needs to run before we throw away information about per-component vector
access.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
We always pass in shader->ir and we already pass in the shader, so just
drop the exec_list. Most passes either take just a exec_list or a
shader, so this seems more consistent.
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Its only user now returns a nir_ssa_def *, and we'll need this since the
builder returns a nir_ssa_def *.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A long time ago, before NIR was even merged to master, glsl_to_nir used
registers and these sources were actually register sources. But nowadays
everything in glsl_to_nir is an SSA value, so stop pretending that by
evaluating an rvalue we can get an arbitrary nir_src. Most importantly,
we need this since the builder takes nir_ssa_def * sources directly.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ideally we should have a _mesa_cleanup_buffer_object function in
src/mesa/bufferobj.c so that the destruction logic resided in a single
place.
Reviewed-by: Brian Paul <brianp@vmware.com>
These tessellation shader related fields need plumbing through NIR.
v2: Use uint32_t instead of uint64_t to match the source type of
GLbitfield (caught by Iago Toral).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Since X has undefined contents in new pixmaps, it will allocate new
textures for an FBO and draw to them without an explicit clear. For
VC4, it's much faster to emit a clear than the load of the actual
undefined memory contents, so just do that instead.
I'm not sure what the caller does is appropriate (just have a NULL sampler
at this slot), but it fixes the immediate crash.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
I was afraid our callers weren't prepared for this, but it looks like
at least for resource creation, mesa/st throws an error appropriately.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Shared variables are stored in a common pool accessible by all threads
in a compute shader local work group.
These variables are similar to OpenCL's local/__local variables.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
v2:
* Move shared parsing under storage qualifiers (tarceri)
* Fail to compile if shared is used in non-compute shader (tarceri)
* Use separate shared_storage bit for shared variables (tarceri)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Qualifiers on member variables are redundent all we need to do
if check if it matches the stream associated with the block and
throw an error if its not.
Reviewed-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
The stw_st_framebuffer_present_locked() function was getting called
twice per SwapBuffers. First, when st_context_iface::flush() was
called from DrvSwapBuffers() because the ST_FLUSH_FRONT flag was
given. Second, by stw_st_swap_framebuffer_locked() which does the
actual SwapBuffers.
Two code changes:
1. Pass ST_FLUSH_END_OF_FRAME, instead of ST_FLUSH_FRONT.
2. Move the implementation of stw_flush_current_locked() into
DrvSwapBuffers() since it's not called anywhere else.
Not much change in perf for benchmarks like Lightsmark, but some simple
Mesa demos are measurably faster.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
And put 8-bit/channel formats before 5/6/5 formats.
The ChoosePixelFormat() function seems to be finicky about format
selection. Putting the MSAA formats after the non-MSAA formats
means most apps get a low-numbered format. Now we generally get
the same pixel format regardless of whether using vgpu9 or 10.
VMware bug 1455030
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This allows to use apitrace's retracediff script on Windows to retrace and
compare two builds of a Mesa based opengl32.dll/ICD side-by-side.
See also e4a4f15f5b
This will allow dec/enc/transcode without X
v2: use env override even with X,
use loader_open_device instead of open
v3: clean up
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
This will allow the state trackers to use render nodes
with screen creation
v2: dup fd for pipe loader
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
There is no dev in drv, and dev should be from vl_screen here
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Altough the compute support is still not complete because textures and
surfaces need to be implemented, it allows to launch very simple compute
kernel like one which reads reading MP performance counters.
This turns on PIPE_CAP_COMPUTE and PIPE_SHADER_COMPUTE.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
There might only be a single arg (e.g. cvt), so use mode rather than
looking at the source directly. Also we don't want to rely on the type
of the value, which can be unreliable, but instead use the
instruction's. This works out well since mkSplit doesn't adjust the
type.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Not reachable from TGSI since it only has UMUL, no IMUL. However it's
surprising that setting argument types to s32 will cause sign to get
lost.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Force the fence to get kicked off, which won't actually wait for its
completion, but any additional work will be put onto a fresh list.
This fixes crashes in teximage-colors --benchmark with too many active
maps.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
As pointed out by Emil, this sometimes hangs, appears to be due to threading
need to rethink how this stuff works for llvmpipe.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Coverity reported that ret could only be 0 or 1, since it
was setting ret = fn() > 0, instead of doing (ret = fn()) > 0.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Previously, we were assuming that everything read/wrote exactly 1 logical
GRF (1 in SIMD8 and 2 in SIMD16). This isn't actually true. In
particular, the PLN instruction reads 2 logical registers in one of the
components. This commit changes post-RA scheduling to use regs_read and
regs_written instead so that we add enough dependencies.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92770
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Now that we support 64 bit immediates in insnCanLoad, we need to swap
64 bit immediate sources too for optimal effect.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Teach insnCanLoad about double immediates, together with the
"Add support for merge-s to the ConstantFolding pass"
This turns the following (nvc0) code:
1: mov u32 $r2 0x00000000 (8)
2: mov u32 $r3 0x3fe00000 (8)
3: add f64 $r0d $r0d $r2d (8)
Into:
1: add f64 $r0d $r0d 0.500000 (8)
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This allows later passes like LoadPropagation to properly deal with 64
bit immediates.
If the new 64 bit load this introduces does not get optimized away then
split64BitOpPostRA() will split this into 2 instructions again.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Add support for encoding double immediates (up to 20 bits of precision)
into the generated gm107 machine-code.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Add support for encoding double immediates (up to 20 bits of precision)
into the generated nvc0 machine-code.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
For these nir intrinsics, we emit the same code as
nir_intrinsic_memory_barrier:
* nir_intrinsic_memory_barrier_atomic_counter
* nir_intrinsic_memory_barrier_buffer
* nir_intrinsic_memory_barrier_image
We treat these nir intrinsics as no-ops:
* nir_intrinsic_group_memory_barrier
* nir_intrinsic_memory_barrier_shared
v3:
* Add comment for no-op cases (curro)
v4:
* Moving comment to a separate patch authored by curro
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When these functions are called in glsl-ir, we create a corresponding
nir intrinsic function call.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When these functions are called in GLSL code, we create an intrinsic
function call:
* groupMemoryBarrier => __intrinsic_group_memory_barrier
* memoryBarrierAtomicCounter => __intrinsic_memory_barrier_atomic_counter
* memoryBarrierBuffer => __intrinsic_memory_barrier_buffer
* memoryBarrierImage => __intrinsic_memory_barrier_image
* memoryBarrierShared => __intrinsic_memory_barrier_shared
v2:
* Consolidate with memoryBarrier function/intrinsic creation (curro)
v3:
* Instead of add_memory_barrier_function, add an intrinsic_name
parameter to _memory_barrier (curro)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Before it was only possible to convert a NV12 surface to
RGBA or BGRA. This patch uses the same post processing
function, "handleVAProcPipelineParameterBufferType", but
add definitions for RGBX and BGRX.
This patch also makes vlVaQuerySurfaceAttributes more generic
to avoid copy and pasting the same lines.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Useful is one wants to create RGBX or BGRX surfaces.
The infrastructure is such that it required just a
few definitions to support these formats.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Christian K<C3><B6>nig <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
The various cf nodes all get allocated w/ shader as their ralloc_parent,
so lets make this more explicit. Plus couple other corrections/
clarifications.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Commit 342e68dc60 (nvc0: remove BGRA4 format support) removed the
support to fix a WoW trace. However after further experimentation, I was
able to get the blit to work by using a different "fake" format in the
2d engine.
The reason why this worked on nv50 is that nv50 falls back to the 3d
blit path in case either the src or the dst aren't "faithfully"
supported, while nvc0 only does it for the dst format. RG8 is better
supported by the nvc0 2d engine than R16.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The scalar VS backend has never handled float[] and vec2[] outputs
correctly (my original code was broken). Outputs need to be padded
out to vec4 slots.
In fs_visitor::nir_setup_outputs(), we tried to process each vec4 slot
by looping from 0 to ALIGN(type_size_scalar(type), 4) / 4. However,
this is wrong: type_size_scalar() for a float[2] would return 2, or
for vec2[2] it would return 4. This looked like a single slot, even
though in reality each array element would be stored in separate vec4
slots.
Because of this bug, outputs[] and output_components[] would not get
initialized for the second element's VARYING_SLOT, which meant
emit_urb_writes() would skip writing them. Nothing used those values,
and dead code elimination threw a party.
To fix this, we introduce a new type_size_vec4_times_4() function which
pads array elements correctly, but still counts in scalar components,
generating correct indices in store_output intrinsics.
Normally, varying packing avoids this problem by turning varyings into
vec4s. So this doesn't actually fix any Piglit or dEQP tests today.
However, if varying packing is disabled, things would be broken.
Tessellation shaders can't use varying packing, so this fixes various
tcs-input Piglit tests on a branch of mine.
v2: Shorten the implementation of type_size_4x to a single line (caught
by Connor Abbott), and rename it to type_size_vec4_times_4()
(renaming suggested by Jason Ekstrand). Use type_size_vec4
rather than using type_size_vec4_times_4 and then dividing by 4.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This will allow gallium drivers to send messages to KHR_debug endpoints
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Do it in the visitor, like we do for other opcodes.
v2: use const, get rid of useless surf_index temporary (Curro)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Do it in the visitor, like we do for other opcodes.
v2: use const, get rid of useless surf_index temporary (Curro)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.
v2: Use const, do not add unnecessary temporary (Curro)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.
v2: Use const and remove useless surf_index temporary (Curro)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Previously there was a problem in i965 where if 16x MSAA is used then
some of the sample positions are exactly on the 0 x or y axis. When
the MSAA copy blit shader interpolates the texture coordinates at
these sample positions it was possible that it would jump to a
neighboring texel due to rounding errors. It is likely that these
positions would be used on 16x MSAA because that is where they are
defined to be in D3D.
To fix that this patch makes it use interpolateAtOffset in the blit
shader whenever 16x MSAA is used and the GL_ARB_gpu_shader5 extension
is available. This forces it to interpolate the texture coordinates at
the pixel center to avoid these problematic positions.
This fixes ext_framebuffer_multisample-unaligned-blit and
ext_framebuffer_multisample-clip-and-scissor-blit with 16x MSAA on
SKL+.
v2: Use interpolateAtOffset instead of interpolateAtSample
v3: Always try to enable GL_ARB_gpu_shader5 in the shader
[Ian Romanick]
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Previously this extension was only enabled when blitting between two
multisampled buffers. However I don't think it does any harm to just
enable it all the time. The ‘enable’ option is used instead of
‘require’ so that the shader will still compile if the extension isn't
available in the cases where it isn't used. This will make the next
patch simpler because it wants to add another optional extension.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The destination rectangle is now drawn at 4x4 the size and the shader
code to calculate the sample number is adjusted accordingly.
Acked-by: Ben Widawsky <ben@bwidawsk.net>
In order to accomodate 16x MSAA, the starting sample pair index is now
3 bits rather than 2 on SKL+.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The gen7_surface_msaa_bits function already returns the right values
for 16 samples but it just needs its assert to be relaxed.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
When 16x MSAA is used for sampling with texelFetch the compiler needs
to use a different instruction which passes more arguments for the MCS
data. Previously on skl+ it was unconditionally using this new
instruction. However since 16x MSAA is probably going to be pretty
rare, it is probably worthwhile to avoid using this instruction for
the other sample counts. In order to do that this patch adds a new
member to brw_sampler_prog_key_data to track when a sampler refers to
a buffer with 16 samples.
Note that this isn't done for the vec4 backend because it wouldn't
change how many registers it uses.
Acked-by: Ben Widawsky <ben@bwidawsk.net>
In order to support 16x MSAA, skl+ has a wider version of ld2dms that
takes two parameters for the MCS data. The MCS data in the response
still fits in a single register so we just need to ensure we copy both
values rather than just the lower one.
Acked-by: Ben Widawsky <ben@bwidawsk.net>
In order to support 16x MSAA, skl+ has a wider version of ld2dms that
takes two parameters for the MCS data. The MCS data retrieved from the
ld_mcs instruction already returns 4 or 8 registers and is documented
to return zeroes for the mcsh value when the sample count is less than
16.
v2: Use get_lowered_simd_width to fall back to SIMD8 instructions when
the message length would be too long in SIMD16.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
This is the standard pattern used by the other 3D graphics API.
BDW has slots for these values, but they aren't actually used until
SKL. Even though the documentation for BDW says they must be zero, it
doesn't seem to cause any harm to program them anyway.
The comment above for the 8x sample positions says that the hardware
implements centroid interpolation by picking the centre-most sample
that is inside the primitive. That implies that it might be worthwhile
to pick a pattern that includes 0.5,0.5. However by experimentation
this doesn't seem to actually be the case. With the sample positions
in this patch, if I modify the piglit test below so that it instead
reports the centroid position, it reports 0.492188,0.421875 which
doesn't match any of the positions. If I modify the sample positions
so that they include one at exactly 0.5,0.5 it doesn't help and it
reports another position which is even further from the center for
some reason.
arb_gpu_shader5-interpolateAtSample-different
Kenneth Graunke experimented with some other patterns that have a
higher standard deviation but I think after some discussion it was
decided that it would be better to pick the same pattern as the other
graphics API in case there are games that rely on this pattern.
(Based on a patch by Kenneth Graunke)
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben at bwidawsk.net>
Equivalent to commit 8ac3b525c but with sel operations. In this case
we select the PredCtrl based on the writemask.
This patch helps on cases like this:
1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D
3: (+f0.0) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD
In this case, cmod propagation can't optimize instruction #2, because
instructions #1 and #2 have different writemasks, and we can't update
directly instruction #2 writemask because our code thinks that sel at
instruction #3 reads all four channels of the flag, when it actually
only reads .x.
So, with this patch, the previous case becames this:
1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D
3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD
Now only the x channel of the flag is used, allowing dead code
eliminate to update the writemask at the second instruction:
1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
2: cmp.nz.f0.0 null.x:D, vgrf40.xxxx:D, 0D
3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD
So now cmod propagation can simplify out #2:
1: cmp.l.f0.0 vgrf40.0.x:F, attr18.wwww:F, vgrf7.xxxx:F
2: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD
Shader-db numbers:
total instructions in shared programs: 6235835 -> 6228008 (-0.13%)
instructions in affected programs: 219850 -> 212023 (-3.56%)
total loops in shared programs: 1979 -> 1979 (0.00%)
helped: 1192
HURT: 0
We also have the "reserved for kick" space available. Some of my earlier
changes can probably be removed, but this is a quick fix for some of the
rarer fallout.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
This greatly increases the pressure you can put on the driver before
create fails. Ultimately we need to let the kernel take control of
our cached BOs and just take them from us (and other clients)
directly, but this is a very easy patch for the moment.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
I thought that aliased functions didn't need to be added, but that might
only be if the function aliases something in the same {desktop,ES}
space. Resolves the dispatch sanity test failure.
Fixes: 13b19aa81 (mesa: expose support for GL_EXT_buffer_storage)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92824
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Very long line loops which spanned 3 or more vertex buffers were not
handled correctly and could result in stray lines.
The piglit lineloop test draws 10000 vertices by default, and is not
long enough to trigger this. Even 'lineloop -count 100000' doesn't
trigger the bug.
For future reference, the issue can be reproduced by changing Mesa's
VBO_VERT_BUFFER_SIZE to 4096 and changing the piglit lineloop test to
use glVertex2f(), draw 3 loops instead of 1, and specifying -count
1023.
Acked-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
When we emulate XOR logicop mode with blend-subtract, we need to ensure
that the fragment shader always emits white. We had this implemented
for VGPU9, but not VGPU10.
VMware bug 1545492.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This actually stored the values as 8bit linear values in the cache,
then did another srgb->linear conversion...
We don't want to do the former (decoding 8bit srgb values to 8bit linear
completely defeats the purpose of srgb in the first place), so just decode
to 8bit srgb.
Fixes piglit.spec.ext_texture_srgb.texwrap formats-s3tc tests.
There is nothing wrong with the code today, but as one modifies the code it
turns out to be not too difficult to mess up the code, and this easy assertion
should catch such driver implementation failures quickly.
Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
This extension requires ES 3.1 since it relies on glMemoryBarrier.
For testing purposes I temporarily moved glMemoryBarrier to be an ES 3.0
function.
This has been tested with the piglit in the ML and the Dolphin emulator.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
V3: clamp array index to the correct size (the size of the current array
rather than the inner array) Francisco Jerez.
V2: avoid useless zero-initialization and addition for the first AoA level,
avoid redundant temporary, make use of type_size_scalar(), rename aoa_size
to element_size, assign the indirect indexing temporary directly to
image.reladdr, and replace while loop with a for loop. All suggested
by Francisco Jerez.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
compressed textures are very slow because decoding is rather complex
(and because there's no jit code code to decode them too for non-technical
reasons).
Thus, add some texture cache which holds a couple of decoded blocks.
Right now this handles only s3tc format albeit it could be extended to work
with other formats rather trivially as long as the result of decode fits into
32bit per texel (ideally, rgtc actually would decode to more than 8 bits
per channel, but even then making it work for it shouldn't be too difficult).
This can improve performance noticeably but don't expect wonders (uncompressed
is unsurprisingly still faster). It's also possible it might be slower in
some cases (using nearest filtering for example or if there's otherwise not
many cache hits, the cache is only direct mapped which isn't great).
Also, actual decode of a block relies on util code, thus even though always
full blocks are decoded it is done texel by texel - this could obviously
benefit greatly from simd-optimized code decoding full blocks at once...
Note the cache is per (raster) thread, and currently only used for fragment
shaders.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
There are currently two methods in llvmpipe code to calculate coeffs to
be used as inputs for the fragment shader. The two methods use slightly
different ways to do the floating point calculations and thus produce
slightly different results.
The decision which method to use is determined by the size of the vector
that is used by the platform.
For vectors with size of more than 128bit, a single-step method is used,
in which coeffs_init_simple() + attribs_update_simple() are called.
For vectors with size of 128bit or less, a two-step method is used, in
which coeffs_init() + attribs_update() are called.
This causes some piglit tests (clip-distance-bulk-copy,
interface-vs-unnamed-to-fs-unnamed) to fail when using platforms with
128bit vectors (such as ppc64le or x86-64 without AVX).
This patch makes platforms with 128bit vectors use the single-step
method (aka "simple" method) instead of the two-step method.
This would make the resulting coeffs identical between more platforms,
make sure the piglit tests passes, and make debugging and maintainability
a bit easier as the generated LLVM IR will be the same for more platforms.
The performance impact is negligible for x86-64 without AVX, and
basically non-existent for ppc64le, as it can be seen from the following
benchmarking results:
- glxspheres, on ppc64le:
- original code: 4.892745317 frames/sec 5.460303857 Mpixels/sec
- with the patch: 4.932083873 frames/sec 5.504205571 Mpixels/sec
- Additional 0.8% performance boost
- glxspheres, on x86-64 without AVX:
- original code: 20.16418809 frames/sec 22.50323395 Mpixels/sec
- with the patch: 20.31328989 frames/sec 22.66963152 Mpixels/sec
- Additional 0.74% performance boost
- glmark2, on ppc64le:
- original code: score of 58
- with my change: score of 57
- glmark2, on x86-64 without AVX:
- original code: score of 175
- with the patch: score of 167
- Impact of of -4.5% on performance
- OpenArena, on ppc64le:
- original code: 3398 frames 1719.0 seconds 2.0 fps
255.0/505.9/2773.0/0.0 ms
- with the patch: 3398 frames 1690.4 seconds 2.0 fps
241.0/497.5/2563.0/0.2 ms
- 29 seconds faster with the patch, which is about 2%
- OpenArena, on x86-64 without AVX:
- original code: 3398 frames 239.6 seconds 14.2 fps
38.0/70.5/719.0/14.6 ms
- with the patch: 3398 frames 244.4 seconds 13.9 fps
38.0/71.9/697.0/14.3 ms
- 0.3 fps slower with the patch (about 2%)
Additional details can be found at:
http://lists.freedesktop.org/archives/mesa-dev/2015-October/098635.html
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
pipe->flush never returned SDMA fences. This fixes it.
This is only an issue on amdgpu where fences can signal out of order.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This is hidden behind INTEL_SCALAR_GS=1 for now, as we don't yet support
instanced geometry shaders, and Orbital Explorer's shader spills like
crazy. But the infrastructure is in place, and it's largely working.
v2: Lots of rebasing.
v3: (feedback from Kristian Høgsberg)
- Handle stride and subreg_offset correctly for ATTRs; use a helper.
- Fix missing emit_shader_time_end() call.
- Delete dead code after early EOT in static vertex case to avoid
tripping asserts in emit_shader_time_end().
- Use proper D/UD type in intexp2().
- Fix "EndPrimitve" and "to that" typos.
- Assert that invocations == 1 so we know this is missing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We really ought to compute the VUE map at link time and stash it, rather
than recomputing it here, but with the mess of program structures I
wasn't sure where to put it. We can improve that later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Jason reworked this so it isn't simply ST_GS anymore...it's either -1
(not enabled) or an actual offset.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
On future generation platforms the color clear value is stored elsewhere in the
surface state. By extracting this logic, we can cleanly implement the difference
in an upcoming patch.
Should have no functional impact.
v2: Move hunk from the next patch into this patch (Matt)
Whitespace fix (Ben)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
The allocate_surface_state already zeroes out the surface state, and doing it
later in the function is destructive for what we want to accomplish when we
split out support for gen9 fast clears (next patch).
NOTE: Only dword 12 actually needed to be fixed, but it seemed more consistent
to remove the other instances as well. I can make an argument both ways (open
coding it, vs. not). I can rework the next patch if requires.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
To get the size (in bytes) of a compute parameter, clover first calls
get_compute_param() with a NULL data pointer. The RET() macro is based
on nv50.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
A few new PCI ids are added here, and one is removed (0x190B) because it no
longer seems to exist anywhere.
v2-4:
Only use ascii characters (Ilia)
0x1921 is no longer marked as f
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Like other gen8+ hardware, the hardware automatically scales up thread counts.
We must be careful about the URB sizes since GT4 adds another slice.
One of the existing PCI IDs is actually mislabeled as GT3. Arguably this is a
real bug since the URB size will be wrong. Because this patch is simply meant to
add the missing IDs, that will be fixed in a later patch.
v2: No longer relevant.
v3: Update the wm thread count to support GT4. The WM thread count is used to
determine the maximum scratch space required. Currently the code always
allocates the maximum amount even though lower GT SKUs require less. The formula
is threads_per_psd * subslices_per_slice * slices
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Note: The OpenGL 4.3 - 4.5 specification language for DispatchCompute
appears to have an error regarding the max allowed values. When adding
the specification citation, we note why the code does not match the
specification language.
v2:
* Updates based on review from Iago
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
There is some discrepancy between the return values for some error
cases for the DispatchComputeIndirect call in the ARB_compute_shader
specification. Regarding the indirect parameter, in one place the
extension spec lists that the error returned for invalid values should
be INVALID_OPERATION, while later it specifies INVALID_VALUE.
The OpenGL 4.3 and OpenGLES 3.1 specifications appear to be consistent
in requiring the INVALID_VALUE error return in this case.
Here we update the code to match the main specifications, and update
the citations use the main specification rather than the extension
specification.
v2:
* Updates based on review from Iago
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
We've made a mistake in calling the Channel Enable bits "writemask",
because they do more than control which channels of the destination are
written -- they actually control which channels are enabled (surprise!
surprise!)
So, if we emit
cmp.z.f0(8) null.xy<1>D g10<4,4,1>.xyzzD g2<0,4,1>.xyzzD
mov(8) g12<1>.xUD 0x00000000UD
(+f0.all4h) mov(8) g12<1>.xUD 0xffffffffUD
where the CMP instruction has only .xy channel enables, it won't write
the .zw channels of the flag register, which are of course read by the
+f0.all4 predicate.
We need to always emit CMP instructions whose flag result might be read
by such a predicate with all channels enabled.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Since introduction of SSBO, UniformStorage contains not just uniforms
but also buffer variables, this needs to be taken in to account when
calculating active uniforms with GL_ACTIVE_UNIFORMS and
GL_ACTIVE_UNIFORM_MAX_LENGTH.
No Piglit regressions.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
gl_active_atomic_buffer contains index to UniformStorage, we need to
calculate resource index for that gl_uniform_storage.
Fixes following CTS tests:
ES31-CTS.program_interface_query.atomic-counters
ES31-CTS.program_interface_query.atomic-counters-one-buffer
No Piglit regressions.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
These helpers are ran for same case the same loop. Here joined
their operation so the loop is ran just once. Also fixed
out-of-memory condition here.
v2: Make the loop simpler to read as per Tapani's suggestion
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
[itoral@igalia.com: Reviewed-by for all except the ctx->_Shader change]
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
So I've known this was broken before, cogl has a workaround
for it from what I know, but with the gallium based swrast
drivers BlitFramebuffer from back to front or vice-versa
was pretty broken.
The legacy swrast driver tracks when a front buffer is used
and does the get/put images when it is mapped/unmapped,
so this patch attempts to add the same functionality to the
gallium drivers.
It creates a new context interface to denote when a front
buffer is being created, and passes a private pointer to it,
this pointer is then used to decide on map/unmap if the
contents should be updated from the real frontbuffer using
get/put image.
This is primarily to make gtk's gl code work, the only
thing I've tested so far is the glarea test from
https://github.com/ebassi/glarea-example.git
v2: bump extension version,
check extension version before calling get image. (Ian)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91930
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Wrap some of the 'omg it's getting out of hand' long lines, and
re-indent where things feel off.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Include what you want, rather than relying on a header foo.h N levels
down the include chain, to provide something that you need.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The only two remaining cases of (struct virgl_resource *) require a
closer look. Either the error checking is missing or the arguments
provided feel wrong.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The screen already has a pointer to the (base) winsys object.
With the latter of which implemented/sub-classed as either drm or sw
based one, depending on the target.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Strictly speaking virgl_hw.h should reside in the driver folder, as
it describes the hardware. Moving it allows us to nuke the following
strange dependency
winsys/vtest > driver > winsys/drm
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Use the relevant GALLIUM_foo_CFLAGS which has all the requirements
(not to mention VISIBITY_CFLAGS) and keep ../ out of the include
directives.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The drm/ prefix is required, if using the kernel provided headers. As
most distros don't ship them it and we already depend on libdrm (which
adds the relevant -I flag) just drop the drm/ from the include.
Once a libdrm release with the virtgpu_drm.h header is released, we can
drop our local copy of the file.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
v2:
- Add a few const qualifiers for good measure.
- Drop unneeded retype()s (Matt)
- Convert timestamp to SIMD8/16, as fs_visitor::get_timestamp() returns
SIMD4 (Connor)
v3:
- Remove unneeded temporary + MOV (Connor)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We're about to reuse get_timestamp() for the nir_intrinsic_shader_clock.
In the latter the generalisation does not apply, so move the smear()
where needed. This also makes the function analogous to the vec4 one.
v2: Tweak the comment - The caller -> We (Matt, Connor).
v3: More comment tweaks (Connor)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
v2: Add flags and inline comment/description.
v3: None of the input/outputs are variables
v4: Drop clockARB reference, relate code motion barrier comment wrt
intrinsic flag.
v5: Drop the "thus we can eliminate..." comment (Connor)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
While we are at it, store the rotate offset for occlusion queries to
nv50_hw_query like on nvc0.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Like for nvc0, this will allow to split different types of queries and
to prepare the way for both global performance counters and MP counters.
While we are at it, make use of nv50_query struct instead of pipe_query.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
I.e. implements:
VaAcquireBufferHandle
VaReleaseBufferHandle
for memory of type VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME
And apply relatives change to:
vlVaMapBuffer
vlVaUnMapBuffer
vlVaDestroyBuffer
Implementation inspired from cgit.freedesktop.org/vaapi/intel-driver
Tested with gstreamer-vaapi with nouveau driver.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
And apply relatives change to:
vlVaBufferSetNumElements
vlVaCreateBuffer
vlVaMapBuffer
vlVaUnmapBuffer
vlVaDestroyBuffer
vlVaPutImage
It is unfortunate that there is no proper va buffer type and struct
for this. Only possible to use VAImageBufferType which is normally
used for normal user data array.
On of the consequences is that it is only possible VaDeriveImage
is only useful on surfaces backed with contiguous planes.
Implementation inspired from cgit.freedesktop.org/vaapi/intel-driver
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Add support for VA_PROFILE_NONE and VAEntrypointVideoProc
in the 4 following functions:
vlVaQueryConfigProfiles
vlVaQueryConfigEntrypoints
vlVaCreateConfig
vlVaQueryConfigAttributes
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Add support for VPP in the following functions:
vlVaCreateContext
vlVaDestroyContext
vlVaBeginPicture
vlVaRenderPicture
vlVaEndPicture
Add support for VAProcFilterNone in:
vlVaQueryVideoProcFilters
vlVaQueryVideoProcFilterCaps
vlVaQueryVideoProcPipelineCaps
Add handleVAProcPipelineParameterBufferType helper.
One application is:
VASurfaceNV12 -> gstvaapipostproc -> VASurfaceRGBA
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Inspired from http://cgit.freedesktop.org/vaapi/intel-driver/
especially src/i965_drv_video.c::i965_CreateSurfaces2.
This patch is mainly to support gstreamer-vaapi and tools that uses
this newer libva API. The first advantage of using VaCreateSurfaces2
over existing VaCreateSurfaces, is that it is possible to select which
the pixel format for the surface. Indeed with the simple VaCreateSurfaces
function it is only possible to create a NV12 surface. It can be useful
to create a RGBA surface to use with video post processing.
The avaible pixel formats can be query with VaQuerySurfaceAttributes.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
If formats are not the same vlVaPutImage re-creates the video
buffer with the right format. But if the creation of this new
video buffer fails then the surface looses its current buffer.
Let's just destroy the previous buffer on success.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Added PIPE_VIDEO_CHROMA_FORMAT_NONE in p_format.h
and return it by default in ChromaToPipe.
Renamed YCbCrToPipe to VaFourccToPipeFormat because it now
contains RGB.
Implemented PipeFormatToVaFourcc which will be used later in
VlVaDeriveImage.
Note that gstreamer-vaapi check all the VAImageFormat fields.
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Commit 4565b6f did not update the basename match's check for
the case that string would exactly match the name of the
variable if the suffix "[0]" were appended to it.
Fixes two dEQP-GLES31 tests:
dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array
dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array_single_element
v2:
- Change the position of rname_has_array_index_zero to avoid an out-of-bounds
read. Reported by Tapani Pälli.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Previously, we were using some heuristics to try and detect when a write
was about to begin a live range, or when a read was about to end a live
range. We never used the liveness analysis information used by the
register allocator, though, which meant that the scheduler's and the
allocator's ideas of when a live range began and ended were different.
Not only did this make our estimate of the register pressure benefit of
scheduling an instruction wrong in some cases, but it was preventing us
from knowing the actual register pressure when scheduling each
instruction, which we want to have in order to switch to register
pressure scheduling only when the register pressure is too high.
This commit rewrites the register pressure tracking code to use the same
model as our register allocator currently uses. We use the results of
liveness analysis, as well as the compute_payload_ranges() function that
we split out in the last commit. This means that we compute live ranges
twice on each round through the register allocator, although we could
speed it up by only recomputing the ranges and not the live in/live out
sets after scheduling, since we only shuffle around instructions within
a single basic block when we schedule.
Shader-db results on bdw:
total instructions in shared programs: 7130187 -> 7129880 (-0.00%)
instructions in affected programs: 1744 -> 1437 (-17.60%)
helped: 1
HURT: 1
total cycles in shared programs: 172535126 -> 172473226 (-0.04%)
cycles in affected programs: 11338636 -> 11276736 (-0.55%)
helped: 876
HURT: 873
LOST: 8
GAINED: 0
v2: use regs_read() in more places.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We'll need this for the scheduler too, since it wants to know when the
live ranges of payload registers end in order to model them in our
register pressure calculations.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The heuristic we're using is rather lame, since it assumes everything is
non-uniform and loops execute 10 times, but it should be enough for
measuring improvements in the scheduler that don't result in a change in
the number of instructions.
v2:
- Switch loops and cycle counts to be compatible with older shader-db.
- Make loop heuristic 10x to match with spilling code.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Before, we would only do scheduling after register allocation if we
spilled, despite the fact that the pre-RA scheduler was only supposed to
be for register pressure and set the latencies of every instruction to
1. This meant that unless we spilled, which we rarely do, then we never
considered instruction latencies at all, and we usually never bothered
to try and hide texture fetch latency. Although a later commit removes
the setting the latency to 1 part, we still want to always run the
post-RA scheduler since it's able to take the false dependencies that
the register allocator creates into account, and it can be more
aggressive than the pre-RA scheduler since it doesn't have to worry
about register pressure at all.
Test master post-ra-sched diff %diff
bench_OglPSBump2 396.730 402.386 5.656 +1.400%
bench_OglPSBump8 244.370 247.591 3.221 +1.300%
bench_OglPSPhong 241.117 242.002 0.885 +0.300%
bench_OglPSPom 59.555 59.725 0.170 +0.200%
bench_OglShMapPcf 86.149 102.346 16.197 +18.800%
bench_OglVSTangent 388.849 395.489 6.640 +1.700%
bench_trex 65.471 65.862 0.390 +0.500%
bench_trexoff 69.562 70.150 0.588 +0.800%
bench_heaven 25.179 25.254 0.074 +0.200%
Reviewed-by: Jason Ekstrand <jasoan.ekstrand@intel.com>
Although write-after-write dependencies have the same latency as
read-after-write dependencies due to how the register scoreboard works,
write-after-read dependencies aren't checked by the EU at all, so
they're purely a constraint on how the scheduler can order the
instructions.
v2: fix accumulator dependencies too.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The issue time for an instruction is how many cycles it takes to
actually put it into the pipeline. If there's a pipeline stall that
causes the instruction to be delayed, we should first take that into
account to figure out when the instruction would start executing and
*then* add the issue time. The old code had it backwards, and so we
would underestimate the total time whenever we thought there would be a
pipeline stall by up to the issue time of the instruction.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Uses the same technique as for nvc0 of fixups before upload, and
evicting in case of state change. Removes one source of variants kept by
st/mesa.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
These are often useful in debugging, and the writemask (actually
"Channel Enables") determines more than just what goes into the
destination.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
No functional change, since they were both 3, but BRW_IMMEDIATE_VALUE is
the hardware value and IMM was the IR value -- and you can see that
BRW_IMMEDIATE_VALUE was correctly used in the context of this patch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
No functional change, since they were both 3, but BRW_IMMEDIATE_VALUE is
the hardware value and IMM was the IR value -- and you can see that
BRW_IMMEDIATE_VALUE was correctly used in the context of this patch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I've checked with piglit and one tests fails, but it fails
on evergreen as well, so will get fixed later.
Otherwise SB seems to be working fine for geom shaders on my
rv635.
Signed-off-by: Dave Airlie <airlied@redhat.com>
We really weren't taking advantage of vec4_generator being a class.
By adding a "p" parameter to the helper methods, and "prog_data" to
ones which need binding table information, we can convert everything
to static functions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The public API for the generator is brw_vec4_generate_code(); nobody
actually needs to use the class. This means we can extend it without
triggering the recompiles associated with altering brw_vec4.h.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
vec4_generator is a class for convenience, but only exports a single
method as its public API. It makes much more sense to just export a
single function.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This patch makes the visitor convert registers to the HW_REG file at the
very end, after register allocation, post-RA scheduling, and dependency
control flagging. After that, everything is in fixed brw_regs.
This simplifies the code generator, as it can just use the hardware
registers rather than having to interpret our abstract files. In
particular, interpreting the UNIFORM file meant reading prog_data
to figure out where push constants are supposed to start.
Having the part of the code that performs register allocation also
translate everything to hardware registers seems sensible.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Some constants (like 1.0 and 0.5) could be inlined as immediate inputs
without using their literal value. The r600_bytecode_special_constants()
function emulates the negative of these constants by using NEG modifier.
However some shaders define -1.0 constant and want to use it as 1.0.
They do so by using ABS modifier. But r600_bytecode_special_constants()
set NEG in addition to ABS. Since NEG modifier have priority over ABS one,
we get -|1.0| as result, instead of |1.0|.
The patch simply prevents the additional switching of NEG when ABS is set.
[According to Ivan Kalvachev, this bug was fond via
https://github.com/iXit/Mesa-3D/issues/126 and
https://github.com/iXit/Mesa-3D/issues/127]
Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
CC: <mesa-stable@lists.freedesktop.org>
Without the clamping by NumLevels, the state tracker would reallocate the
texture storage (incorrect) and even fail to copy the base level image
after reallocation, leading to the graphical glitch of
https://bugs.freedesktop.org/show_bug.cgi?id=91993 .
A piglit test has been submitted for review as well (subtest of
arb_texture_storage-texture-storage).
v2: also bypass all calls to st_finalize_texture (suggested by Marek Olšák)
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
In OpenGL ES, the COMPRESSED_TEXTURE_FORMATS query returns the set of
supported specific compressed formats. Since ASTC formats fit within
that category, include them in the set and update the
NUM_COMPRESSED_TEXTURE_FORMATS query as well.
This enables GLES2-based ASTC dEQP tests to run. See the Bugzilla for
more info.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92193
Reported-by: Tapani Pälli <tapani.palli@intel.com>
Suggested-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
In agreement with the extension spec and commit
dd0eb00487, filter FXT1 formats to the
desktop GL profiles. Now we no longer advertise such formats as supported
in an ES context and then throw an INVALID_ENUM error when the client
tries to use such formats with CompressedTexImage2D.
Fixes the following 26 dEQP tests:
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_neg_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_neg_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_neg_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_pos_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_pos_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_border_cube_pos_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_invalid_size
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_level_max_cube_pos
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_level_max_tex2d
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_level_cube
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_level_tex2d
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_neg_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_neg_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_neg_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_pos_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_pos_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_cube_pos_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_neg_width_height_tex2d
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_neg_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_neg_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_neg_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_pos_x
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_pos_y
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_cube_pos_z
* dEQP-GLES2.functional.negative_api.texture.compressedteximage2d_width_height_max_tex2d
v2. Use _mesa_is_desktop_gl() (Ilia, Ian)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Not sure if this is actually reachable in practice (to have a complex
copy with MS textures).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This makes sure that user is still able to query properties about
variables that have gotten removed by opt_dead_builtin_varyings pass.
Fixes following OpenGL ES 3.1 test:
ES31-CTS.program_interface_query.output-layout
No Piglit regressions.
v2: cleanup, drop extra parenthesis (Topi)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
This is required to store information about fragdata arrays, currently
these variables get lost and cannot be retrieved later in sensible way
for program interface queries. List will be utilized by next patch.
Patch also modifies opt_dead_builtin_varyings pass to build list when
lowering fragdata arrays. This is identical approach as taken with
packed varyings pass.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
From ARB_program_interface_query:
"For the property of BUFFER_DATA_SIZE, then the implementation-dependent
minimum total buffer object size, in basic machine units, required to hold
all active variables associated with an active uniform block, shader
storage block, or atomic counter buffer is written to <params>. If the
final member of an active shader storage block is array with no declared
size, the minimum buffer size is computed assuming the array was declared
as an array with one element."
Fixes the following dEQP-GLES31 tests:
dEQP-GLES31.functional.program_interface_query.shader_storage_block.buffer_data_size.named_block
dEQP-GLES31.functional.program_interface_query.shader_storage_block.buffer_data_size.unnamed_block
dEQP-GLES31.functional.program_interface_query.shader_storage_block.buffer_data_size.block_array
v2:
- Fix comment's indentation and explain that the parser already
checked that unsized array is in last element of a shader
storage block (Iago).
- Add assert (Iago).
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Normally, we could read gl_Layer from bits 26:16 of R0.0. However, the
specification requires that bogus out-of-range 32-bit values written by
previous stages need to appear in the fragment shader as-written.
Instead, we pass in the full 32-bit value from the VUE header as an
extra flat-shaded varying. We have the SF override the value to 0
when the previous stage didn't actually write a value (it's actually
defined to return 0).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Traditionally, we've hardcoded "URB Entry Read Offset" to 1 (which
represents 2 vec4 varying slots) to skip over the 8 DWord VUE header.
In order to support ARB_fragment_layer_viewport, we'll need to read
from that header. This patch adds the basic plumbing necessary to
calculate a value dynamically and hook it up in the SBE packets.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Integer varyings need to be flat qualified - all others were already.
I think we just missed this. Presumably some hardware passes this via
sideband and ignores attribute interpolation, so no one has noticed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Commit 268008f98c changed unused VUE map
slots to be initialized with BRW_VARYING_SLOT_PAD, not COUNT. I missed
updating this. It also means that commit message was wrong, as some
code *did* rely slots being initialized to COUNT.
This may fix a bug with SSO programs with > 16 FS input varyings.
I think we probably just emitted extra pointless code, but probably
didn't break anything. We might also just have no tests for that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Avoid deferring building shaders until draw time, should hopefully
reduce any stuttering, as well as enable shader-db style analysis.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Unfortunately flatshading is an all-or-nothing proposition on nvc0,
while GL 3.0 calls for the ability to selectively specify explicit
interpolation parameters on gl_Color/gl_SecondaryColor which would
override the flatshading setting. This allows us to fix up the
interpolation settings after shader generation based on rasterizer
settings.
While we're at it, we can add support for dynamically forcing all
(non-flat) shader inputs to be interpolated per-sample, which allows
st/mesa to not generate variants for these.
Fixes the remaining failing glsl-1.30/execution/interpolation piglits.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This was introduced in GLSL IR after NIR development had branched.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
nir_intrinsic_load_patch_vertices_in corresponds to gl_PatchVerticesIn,
a special input in both the TCS and TES stages.
nir_intrinsic_load_tess_coord corresponds to gl_TessCoord, a special
tessellation evaluation shader input.
nir_intrinsic_load_tess_level_outer/inner correspond to the
gl_TessLevelOuter[] and gl_TessLevelInner[] evaluation shader inputs,
which we treat as system values because they're stored specially.
(These intrinsics are only for the TES - the TCS uses output variables.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Consider the case of two nearly identical GLSL fragment shaders:
out vec4 color;
void main() { color = vec4(1); }
and
layout(early_fragment_tests) in;
out vec4 color;
void main() { color = vec4(1); }
These shaders compile to the exact same assembly, but have distinct
values for brw_wm_prog_data::early_fragment_tests.
Since these are two independent GLSL shaders, they have different
program keys - notably, brw_wm_prog_key::program_string_id differs.
When uploading the second, brw_upload_cache will find an existing copy
of the assembly in the cache BO, which means matching_data will be
non-NULL. Although we create a second cache item (with the new key
and prog_data), we set item->offset to the existing copy and avoid
re-uploading duplicate assembly.
However, brw_search_cache() would only flag BRW_NEW_*_PROG_DATA if
item->offset differed from the supplied offset. With reuse, both
programs have the same offset, but prog_data changed. We have to
flag it, but failed to.
To fix this, we simply need to check if the aux (prog_data) pointer
changed. If either the assembly or the prog_data differs, flag it.
This fixes a regression since 1bba29ed40,
where Topi fixed brw_upload_cache() to actually reuse identical
assembly. Prior to that, reuse basically never happened due to bugs.
Unfortunately, this code apparently wasn't prepared to handle reuse!
Fixes GPU hangs in Dolphin on Broadwell.
Huge thanks to Pierre Bourdon and Ilia Mirkin for debugging this
and helping track down the real issue.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92623
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tested-by: Pierre Bourdon <delroth@gmail.com>
With just the right sequence of per-vertex commands and state changes,
it's possible for this assertion to fail (such as with viewperf11's
lightwave-06-1 test). Instead of asserting, return 0 so that the
caller knows the VBO is full and needs to be flushed.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
which is where a block in src maps to a pixel in dst and vice versa.
e.g. DXT1 <-> R32G32_UINT
DXT5 <-> R32G32B32A32_UINT
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
The variable is already of type src_reg. creating a new instance only to
destroy it seems unnecessary.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There is only one function that can be called, which is well known at
compilation time.
The abstraction used here seems unnecessary, so let's use a direct call
to brw_stage_prog_data_free() when appropriate, cut down the size of
struct brw_cache.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The maximum number of active variables for shader storage blocks should
take into account the specific rules for shader storage blocks, i.e. for
an active shader storage block member declared as an array, an entry
will be generated only for the first array element, regardless of its type.
Fixes 3 dEQP-GLES31.functional.* tests:
dEQP-GLES31.functional.program_interface_query.shader_storage_block.active_variables.named_block
dEQP-GLES31.functional.program_interface_query.shader_storage_block.active_variables.unnamed_block
dEQP-GLES31.functional.program_interface_query.shader_storage_block.active_variables.block_array
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
From OpenGL ES 3.1 specification, section 10.5:
"DrawArraysIndirect requires that all data sourced for the
command, including the DrawArraysIndirectCommand
structure, be in buffer objects, and may not be called when
the default vertex array object is bound."
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
For debugging, bug reports, etc.
This is not in the radeonsi directory, but it is about radeonsi.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
From OpenGL 4.4 specification, section 10.4 and
Open GL Es 3.1 section 10.5:
"An INVALID_VALUE error is generated if indirect is not a multiple
of the size, in basic machine units, of uint."
However, the current code follow the ARB_draw_indirect:
https://www.opengl.org/registry/specs/ARB/draw_indirect.txt
"INVALID_OPERATION is generated by DrawArraysIndirect and
DrawElementsIndirect if commands source data beyond the end
of a buffer object or if <indirect> is not word aligned."
V2: After discussions on the list, it was suggested to
only keep the INVALID_VALUE error.
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
From ARB_program_query_interface spec:
"uint GetProgramResourceIndex(uint program, enum programInterface,
const char *name);
[...]
If <name> exactly matches the name string of one of the active resources
for <programInterface>, the index of the matched resource is returned.
Additionally, if <name> would exactly match the name string of an active
resource if "[0]" were appended to <name>, the index of the matched
resource is returned. [...]"
"A string provided to GetProgramResourceLocation or
GetProgramResourceLocationIndex is considered to match an active variable
if:
[...]
* if the string identifies the base name of an active array, where the
string would exactly match the name of the variable if the suffix
"[0]" were appended to the string;
[...]
"
Fixes the following two dEQP-GLES31 tests:
dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array
dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array_single_element
v2:
- Add AoA support (Timothy)
- Apply it too for GetUniformLocation(), GetUniformName() and others
because ARB_program_interface_query says that they are equivalent
to GetProgramResourceLocation() and GetProgramResourceName() (Tapani)
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This paves the way for copy propagating our unpacks. We end up with a
small change on shader-db:
total instructions in shared programs: 89390 -> 89251 (-0.16%)
instructions in affected programs: 19041 -> 18902 (-0.73%)
which appears to be because we no longer convert MOVs for an FMAX dst,
r4.unpack, r4.unpack (instead of the previous MOV dst, r4.unpack), and
this ends up with a slightly better schedule.
At one point I thought packs and unpacks were in the same field of the
instruction. They aren't. These instructions therefore never cause a
pack.
total instructions in shared programs: 89472 -> 89390 (-0.09%)
instructions in affected programs: 15261 -> 15179 (-0.54%)
When a TCS is present, the TES input gl_PatchVerticesIn is actually a
constant - it's simply the # of output vertices specified by the TCS
layout qualifiers. So, we can replace the system value with a constant,
which may allow further optimization, and will likely be more efficient.
If the TCS is absent, we can't do this optimization.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously we could create a renderbuffer with format
MESA_FORMAT_R8G8B8A8_UNORM, convert that renderbuffer to an EGLImage,
then FAIL to convert the EGLImage back to a renderbuffer because
reasons. Just use the same check in
intel_image_target_renderbuffer_storage that brw_render_target_supported
uses.
There are more checks in brw_render_target_supported, but I don't think
they are necessary here. A different approach would be to refactor
brw_render_target_supported to take rb->Format and rb->NumSamples as
parameters (instead of a gl_renderbuffer) and use the new function here.
Fixes:
ES2-CTS.gtf.GL2ExtensionTests.egl_image.egl_image
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92476
Cc: "10.3 10.4 10.5 10.6 11.0" <mesa-stable@lists.freedesktop.org>
This is more optimal as it means we no longer have to upload the same set
of ABO surfaces to all stages in the program.
This also fixes a bug where since commit c0cd5b var->data.binding was
being used as a replacement for atomic buffer index, but they don't have
to be the same value they just happened to end up the same when binding is 0.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Alejandro Piñeiro <apinheiro@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90175
f16c intrinsic can only be emitted when AVX is used. So when we disable AVX
due to forcing 128bit vectors we must not use this intrinsic (depending on
llvm version, this worked previously because llvm used AVX even when we didn't
tell it to, however I've seen this fail with llvm 3.3 since
718249843b which seems to have the side effect
of disabling avx in llvm albeit it only touches sse flags really, but
with ea421e919a it's now really disabled).
Albeit being able to use AVX with 128bit vectors also would have its uses, the
code as is really was meant to emulate jit code creation for less capable cpus.
v2: add some (ifdefed out) missing de-featuring options for simulating
less capable cpus.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Patch adds additional check to make sure we don't return locations for
structures or arrays of structures.
From page 79 of the OpenGL 4.2 spec:
"A valid name cannot be a structure, an array of structures, or any
portion of a single vector or a matrix."
v2: use without-array() to simplify code (Timothy)
No Piglit or CTS regressions observed.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
One instruction instead of four, and it turns out you do this a lot for
the Over operator.
total uniforms in shared programs: 32168 -> 32087 (-0.25%)
uniforms in affected programs: 318 -> 237 (-25.47%)
total instructions in shared programs: 89830 -> 89472 (-0.40%)
instructions in affected programs: 6434 -> 6076 (-5.56%)
I left the function to obtain the revision because it is, and will continue to
be useful in the future. I'd rather not have to dig it up every time we need it.
Comments left at the implementation to say as much.
This was accidentally left here when I moved the early platform support:
commit 28ed1e08e8
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date: Fri Aug 7 13:58:37 2015 -0700
i965/skl: Remove early platform support
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
According to piglit/xonotic/neverball/stc, blend/rasterize/zsa state
will always be bound (never null). And the null checks were in-
consistent anyways, so remove them.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Uses the DCC buffer instead of the CMASK buffer. The ELIMINATE_FAST_CLEAR
still works. Furthermore, with DCC compression we can directly clear
to a limited set of colors such that we do not need a postprocessing step.
v2 Marek: check dcc_buffer && dirty_level_mask in set_sampler_view
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Can't see why anyone would ever want to use this, but it was clearly broken.
This fixes the piglit texwrap offset test using this combination.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
When using nearest filtering and clamp / clamp to edge wrapping results could
be wrong for negative offsets. Fix this by adding the offset before doing
the conversion to int coords (could also use floor instead of trunc int
conversion but probably more complex on "typical" cpu).
This fixes the piglit texwrap offset failures with this filter/wrap combo
(which only leaves the linear/mirror repeat combination broken).
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
For vertex/geometry shader sampling, this is the same as for llvmpipe - just
use the original resource target.
For fragment shader sampling though (which does not use first-layer based mip
offsets) adjust the sampling code to use first_layer in the non-array cases.
While here also fix up some code which looked wrong wrt buffer texel fetch
(no piglit change).
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Just need to use resource target not view target when calculating
first-layer based mip offsets. (This is a gl specific problem since
d3d10 does not distinguish between non-array and array resources neither
at the resource nor view level, only at the shader level.)
Fixes new piglit arb_texture_view sampling-2d-array-as-2d-layer test.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch was originally written before stoney support
was merged. Add stoney.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
As the alignment requirements can be 32 KiB or more, also adding
an aligned buffer creation function.
DCC is disabled for textures that can be shared as sharing the
DCC buffers has not been implemented yet.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Out of 7063 shaders from my shader-db:
- 6564 (93%) shaders don't have any state parameters.
- 347 (5%) shaders have 1 state parameter for WPOS lowering.
- The remaining 2% have more state parameters, usually matrices.
Reviewed-by: Brian Paul <brianp@vmware.com>
The edgeflag comes in as ubyte with glEdgeFlagPointer but as float with
plain immediate glEdgeFlag. Avoid reading bytes that weren't meant for
the edgeflag in the pointer case.
Fixes intermittent failures with gl-2.0-edgeflag piglit (and valgrind
complaints about reading uninitialized memory).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
We can't do this all the time, because you want blending to be done in
linear space, and sRGB would lose too much precision being done in 4x8.
The win on instructions is pretty huge when you can, though.
total uniforms in shared programs: 32065 -> 32168 (0.32%)
uniforms in affected programs: 327 -> 430 (31.50%)
total instructions in shared programs: 92644 -> 89830 (-3.04%)
instructions in affected programs: 15580 -> 12766 (-18.06%)
Improves openarena performance at 1920x1080 from 10.7fps to 11.2fps.
This corresponds to instructions used on vc4 for its blending inside of
shaders. I've seen these opcodes on other architectures before, but I
think it's the first time these are needed in Mesa.
v2: Rename to 'u' instead of 'i', since they're all 'u'norm (from review
by jekstrand)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Write groups of enabled components together.
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Now that we don't read each component one-by-one, we don't need the
temoprary vgrf for the offset. More importantly, this register was type
UD while the nir source was type D. This broke copy propagation and left
a redundant MOV in the generated code.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
The scalar destination registers break copy propagation. Instead compute
the results to a regular register and then reference a component when we
later use the result as a source.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
The emit_untyped_read and emit_untyped_write helpers already uniformize
the surface index argument. No need to do it before calling them.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
The destination for SHADER_OPCODE_FIND_LIVE_CHANNEL is always a UD
register. When we replace the opcode with a MOV, make sure we use a UD
immediate 0 so copy propagation doesn't bail because of non-matching
types.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Instead of looping through single-component reads, read all components
in one go.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
We always set the mask to 0xffff, which is what it defaults to when no
header is present. Let's drop the header instead.
v2: Only remove header for untyped reads. Typed reads always need the
header.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Commit f17b78 added an alternative reads_flag(channel) that returned
if the instruction was reading a specific channel flag. By mistake it
only took into account the predicate, but when the opcode is
VS_OPCODE_UNPACK_FLAGS_SIMD4X2 there isn't any predicate, but the flag
are used.
That mistake caused some regressions on old hw. More information on
this bug:
https://bugs.freedesktop.org/show_bug.cgi?id=92621
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a little bit like the mprotect-based fencing I've experimented
with, but it's simple and low overhead. The downside is that only catches
writes, not reads.
It didn't catch any bad writes on a current piglit run, but may be useful
in the future.
Move scratch_size out of ilo_state_shader_kernel_info and
ilo_state_compute_interface_info. A scratch space is shared by all
kernels/interfaces. Update builder to emit relocs for scratch bos.
Location has never been able to be a negative value because it has
always been validated in the parser.
Also the linker doesn't check for negatives like the comment claims.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
virgl/vtest is a swrast driver that allows the
virgl acceleration to be tested without having
a virtual machine.
The backend has a unix socket server that
this connects to.
This is run by setting
LIBGL_ALWAYS_SOFTWARE=y
GALLIUM_DRIVER=virpipe
In this mode all renderering is sent over
a socket to the remote renderer, and the
results are readback and copies to the screen
using drisw. This works well enough to develop
new features and to help debug.
Signed-off-by: Dave Airlie <airlied@redhat.com>
virgl is the 3D acceleration backend for the
virtio-gpu shipping with qemu.
The 3D acceleration is designed around gallium
and TGSI as the virtualisation layer. The backend
renderer translates the virgl interface into
OpenGL currently.
This is the initial import of the driver to mesa.
The kernel driver portions are lined up for drm-next.
Currently this driver supports up to GL3.3 and some
misc extensions if the host driver exposes it. It is
planned to iterate the virgl API to new GL levels
as mesa host drivers gain features.
v2: fix resource tracking across flushes to avoid
->bind hack in mapping.
consolidate mapping and waiting code for transfers.
use u_range for dirt tracking.
handle larger shaders in protocol.
include virtgpu_drm.h in mesa for now.
add translation layer for gallium tgsi to virgl tgsi.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is used to detect error in virgl if we overflow the shader
dumping buffers.
v2: return a bool.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support to the parser to accept hex values as floats,
and then adds support to the dumper to allow the user to select
to dump float as 32-bit hex numbers.
This is required to get accurate values for virgl use of TGSI.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
On ultra high resolution modes, the preemptive flush flag can be
set midway through command submission, a condition that cannot be
recovered from a flush-retry, causing rendering artifacts.
This patch prevents a preemtive_flush until a draw has been
emitted.
Signed-off-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The svga device doesn't directly support quads, quad strips or polygons
so we have to convert those types to indexed triangle lists. But we
can sometimes avoid that if we're drawing flat/constant-colored prims
and we don't have to worry about provoking vertex.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Provoking vertex comes into play when doing flat shading. But if we know
that all fragments in a primitive are the same color, the provoking vertex
doesn't matter. Check for that case and use whichever provoking vertex
convention is supported by the device.
This avoids generating an index buffer to do the PV conversion.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Examine the fragment shader to try to detect TGSI shaders which use
"MOV OUT[0], CONST[i]" to write a constant value for the fragment color.
In this case, all fragments will have the same color (unless blending is
enabled).
This is a common case for OpenGL code such as: glColor(), glBegin(),
glVertex(), ..., glEnd() when lighting/fog/etc are disabled. In this
case, the Mesa/gallium state tracker actually generates a simple
"MOV OUT[0], CONST[i]" fragment shader.
This will be used by the next commit to avoid provoking vertex conversion
(creating/rewriting an index buffer) when drawing flat-shaded primitives.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The st_RasterPos() function goes to great pains to implement the
rasterpos transformation. It basically uses gallium's draw module to
execute the vertex shader to draw a point, then capture that point's
attributes.
But glRasterPos isn't typically used with a vertex shader so we can
usually use the old/fixed-function implementation which is a lot simpler
and faster.
This can add up for legacy apps that make a lot of calls to glRasterPos.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
We'll remove it from the tnl module next. By lifting this code into core
Mesa we can use it from the gallium state tracker.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Instead of calling memcpy() 'n' times, we can do it all at once since
the source and dest regions are all contiguous.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The section for UVD 2 and older was not updated
when HEVC support was added. Reported by Kano
on irc.
v2: integrate the UVD2 and older checks into the
main switch statement.
v3: handle encode checking as well. Encode is
already checked in the top case statement, so
drop encode checks in the lower case statement.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: mesa-stable@lists.freedesktop.org
v2: externalize pred_ctrl_align16 from brw_disasm.c instead of adding
a copy on brw_vec4.c, as suggested by Matt Turner
Reviewed-by: Matt Turner <mattst88@gmail.com>
The complete way to do this would be parse INTEL_DEBUG and
print the output if DEBUG_VS (or a new one) is present
(see intel_debug.c).
But that seems like an overkill for the unit tests, that
after all, the most common use case is being run when
calling make check.
v2: use the same idea for the fs counterpart too, as suggested by
Matt Turner
Reviewed-by: Matt Turner <mattst88@gmail.com>
This include the same tests coming from test_fs_cmod_propagation, (non
vector glsl types included) plus some new with vec4 types, inspired on
the regressions found while the optimization was a work in progress.
Additionally, the check of number of instructions after the
optimization was changed from EXPECT_EQ to ASSERT_EQ. This was done to
avoid a crash on failing tests that expected no optimization, as after
checking the number of instructions, there were some checks related to
this last instruction opcode/conditional mod.
v2: update tests after Matt Turner's review of the optimization pass
v3: tweaks on the tests (mostly on the comments), after Matt Turner's
review
Reviewed-by: Matt Turner <mattst88@gmail.com>
vec4 port of fs_cmod_propagation.
Shader-db results (no vec4 grepping):
total instructions in shared programs: 6240413 -> 6235841 (-0.07%)
instructions in affected programs: 401933 -> 397361 (-1.14%)
total loops in shared programs: 1979 -> 1979 (0.00%)
helped: 2265
HURT: 0
v2: remove extra space and combine two if blocks, as suggested by
Matt Turner
v3: add condition check to bail out if current inst and inst being
scanned has different writemask, as pointed by Matt Turner
v3: updated shader-db numbers
v4: remove block from foreach_inst_in_block_*_starting_from after
commit 801f151917
Reviewed-by: Matt Turner <mattst88@gmail.com>
vec4_live_variables tracks now each flag channel independently, so
vec4_dead_code_eliminate can update the writemask of null registers,
based on which component are alive at the moment. This would allow
vec4_cmod_propagation to optimize out several movs involving null
registers.
v2: added support to track each flag channel independently at vec4
live_variables, as v1 assumed that it was already doing it, as
pointed by Francisco Jerez
v3: general cleaningn after Matt Turner's review
Reviewed-by: Matt Turner <mattst88@gmail.com>
Gen8+ lifted the register region restriction that an instruction whose
destination spans two registers must have sources that also span two
registers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The AND and SHR produce a scalar value that we had been replicating
across $dispatch_width channels. The immediate MOV produces only four
useful channels of data.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Not a functional difference, but register is loaded with a signed
immediate (V) and added to a signed type (D) producing a signed result
(D).
Also change the type of g0 to allow for compaction.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We implement textureQueryLevels (which takes no arguments, save the
sampler) using the resinfo message (which takes an argument of LOD).
Without initializing it, we'd generate a MOV from the null register to
load the LOD argument.
Essentially the same logic applies to texture. A vertex shader cannot
compute derivatives and so cannot produce an LOD, so TXL with an LOD of
0.0 is used.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
An untyped surface read is volatile because it might be affected by a
write.
In the ES31-CTS.compute_shader.resources-max test, two back to back
read/modify/writes of an SSBO variable looked something like this:
r1 = untyped_surface_read(ssbo_float)
r2 = r1 + 1
untyped_surface_write(ssbo_float, r2)
r3 = untyped_surface_read(ssbo_float)
r4 = r3 + 1
untyped_surface_write(ssbo_float, r4)
And after CSE, we had:
r1 = untyped_surface_read(ssbo_float)
r2 = r1 + 1
untyped_surface_write(ssbo_float, r2)
r4 = r1 + 1
untyped_surface_write(ssbo_float, r4)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
v2: remove useless source_stencil_to_render_target (Ken)
Squash in the actual packing function, which also got to
v2:
Move the definition of the OPCODE outside of FB_WRITE opcodes (Matt)
Reorder the regioning to be in VWH order (Matt)
Don't retype src in the backend, just assert instead (Matt)
Rename the debug prints to something better (Matt)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gen9 adds the ability to write out a stencil value, so we need to expand the
virtual payload by one. Abstracting this now makes that change easier to read.
I was admittedly confused early on about some of the hardcoding. If people
believe the resulting code is inferior, I am not super attached to the patch.
v2:
Remove explicit numbering from the enumeration (Matt).
Use a real naming scheme, and reference it in the opcode definition (Curro)
Add a missed hardcoded logical position in get_lowered_simd_width (Ben)
Add an assertion to make sure the component numbering is correct (Ben)
Cc: Matt Turner <mattst88@gmail.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Before the change "tgsi/scan: use properties for clip/cull distance
writemasks", the tgsi_shader_info::num_written_clipdistance field
was a multiple of four, now it's an accurate count. In the svga
driver, we need a minor change to the loop test.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
It's stored in bits 31:27 of g1 (along with the URB handles).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Unlike the vs/wm structs, brw_gs_compile is actually useful: it contains
the input VUE map and information about the control data headers.
Passing this in allows us to share that code in brw_gs.c, and calculate
them before deciding on vec4 vs. scalar mode, as it's independent of
that choice.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This patch introduces a brw->scalar_gs flag, similar to brw->scalar_vs,
which controls whether or not to use SIMD8 geometry shaders.
For now, we control it via a new environment variable, INTEL_SCALAR_GS.
This provides a convenient way to try it out.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Geometry shaders have additional header data at the beginning of their
output URB entries. Shaders that use EndPrimitive() or multiple streams
have a control data header; shaders with a dynamic vertex count have an
additional vec4 slot to hold the 32-bit vertex count (and 96 bits of
padding).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
The GS will emit a bunch of vertices, and we don't want to do an EOT
prematurely. We'll emit GS_OPCODE_THREAD_END when we want to terminate
the thread.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
GS doesn't have ClampVertexColor, and we don't want to go through VS
structures.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Tessellation shaders and SIMD8 geometry shaders may need to resort to
the pull model for inputs at times. When set, the state upload code
will tell the hardware to provide URB handles for input data.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
In scalar mode, geometry shader inputs can easily take up hundreds of
registers. This makes pushing VUE entries impractical; we'll need to
resort to the pull model in some cases.
To support this, we introduce a new opcode corresponding to the "URB
Read SIMD8" message.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
In the vec4 backend, we have a vec4_instruction::urb_write_flags field.
There are many kinds of flags for SIMD4x2 messages.
However, there are really only two (per-slot offset, use channel masks)
for SIMD8 messages. Rather than adding a boolean flag for per-slot
offsets (polluting all instructions), I decided to just make three new
opcodes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This commit moves the large pile of setup calculations we have to do for
geometry shaders out of brw_gs_emit and into brw_compile_gs. This has a
couple of nice implications. First, it's less work that the caller of
brw_compile_gs has to do. Second, it's consistent with the vertex and
fragment stages. Finally, it allows us to put brw_gs_compile back behind
the API boundary where it belongs.
v2 (Jason Ekstrand):
- Pull the changes to use nir info into a separate patch
- Put brw_gs_compile into brw_shader.h rather than brw_vec4_gs_visitor.h
so that we can use it for scalar GS.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we were pulling bits from GL data structures in order to set up
the prog_data. However, in this brave new world of NIR, we want to be
pulling it out of the NIR shader whenever possible. This way, we can move
all this setup code into brw_compile_gs without depending on the old GL
stuff.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
All the documentation I can find says that this bit (and functionality) only
exists on SKL+. Since the bit isn't yet used, there is no real impact here.
The original code was added by Ken here (a surprisingly long time ago):
commit f3c6d6f1e1
Author: Kenneth Graunke <kenneth@whitecape.org>
Date: Thu Nov 29 21:00:27 2012 -0800
i965: Update 3DSTATE_PS, 3DSTATE_WM, and add 3DSTATE_PS_EXTRA.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Documentation is sparse, but it appears to have existed on G45 and ILK
as a second bit extension of the mask_control field. Setting the pair of
bits to 0b11 enables "NoCMask".
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
It only exists on Gen6+, and the next patches will add compaction
support for the (unused) field in the same location on earlier
platforms.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The next commit will add assertions dependent on devinfo->gen.
Use compact()/uncompact() macros where possible, like the 3-src code
does.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Otherwise we'd emit a MOV from the null register (which isn't allowed).
Helps 24 programs in shader-db (the geometry shaders in GSCloth):
instructions in affected programs: 302 -> 262 (-13.25%)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
GNU make predefines RM to rm -f but this is not required by POSIX
so ensure that RM is set. This fixes "make clean" on OpenBSD.
v2: use AC_CHECK_PROG
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
In bfdae9149e I disabled the opt_sampler_eot optimisation for TG4
message types because I found by experimentation that it doesn't work.
I wrote in the comment that I couldn't find any documentation for this
problem. However I've now found the documentation and it has
additional restrictions on further message types so this patch updates
the comment and adds the others.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Since 49374fab5d these macros no longer actually use the block
argument. I think this is worth doing to make the macros easier to use
because they already have really long names and a confusing set of
arguments.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This fixes assigning explicit locations in the CTS test:
ES31-CTS.explicit_uniform_location.uniform-loc-arrays-of-arrays
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
process_parameters() will now be called earlier because we need
actual_parameters processed earlier so we can use it with
match_subroutine_by_name() to get the subroutine variable, we need
to do this inside the recursive function generate_array_index() because
we can't create the ir_dereference_array() until we have gotten to the
outermost array.
For the remainder of the array dimensions the type doesn't matter so we
can just use the existing _mesa_ast_array_index_to_hir() function to
process the ast.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Check current_var directly instead of using the passed in record_type.
This fixes following failing CTS test:
ES31-CTS.explicit_uniform_location.uniform-loc-types-structs
No Piglit regressions.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
UniformRemapTable is used only for remapping user specified uniform
locations to driver internally used ones, shader storage buffer
variables should not utilize uniform locations.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
A third instance of this was needed but missed in the previous commit.
Return 32 as for the two other cases.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
When the draw module splits long line loops, the sections are emitted
as line strips. But the primitive type wasn't set correctly so each
section was being drawn as a loop, introducing extra line segments.
To fix this, we pass a new DRAW_LINE_LOOP_AS_STRIP flag to the run()
function. The linear/elt_run() functions have to check for this flag
and set their primitive type accordingly.
No piglit regressions. Fixes piglit's lineloop with -count 4097 or
higher.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81174
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Patch just does some refactoring to make the code look better. No
functional changes in here.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
When a long GL_LINE_LOOP prim was split across primitives we drew
stray lines. See previous commit for details.
This patch converts GL_LINE_LOOP prims into GL_LINE_STRIP prims so
that drivers don't have to worry about the _mesa_prim::begin/end flags.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81174
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Acked-by: Sinclair Yeh <syeh@vmware.com>
When long GL_LINE_LOOP primitives don't fit in one vertex buffer they
have to be split across buffers. The code to do this was basically correct
but drivers had to pay special attention to the _mesa_prim::begin,end flags
in order to draw the sections of the line loop properly. Apparently, the
only drivers to do this were those using the old 'tnl' module for software
vertex processing.
Now we convert the split pieces of GL_LINE_LOOP prims into GL_LINE_STRIP
primitives so that drivers don't have to worry about the special begin/end
flags. The only time a driver will get a GL_LINE_LOOP prim is when the
whole thing fits in one vertex buffer.
Mostly fixes bug 81174, but not completely. There's another bug somewhere
in the src/gallium/auxiliary/draw/ code. If the piglit lineloop test is
run with -count 4096, rendering is correct, but with -count 4097 there are
stray lines. 4096 is a magic number in the draw code (search for "4096").
Also note that this does not fix long line loops in display lists. The
next patch fixes that.
v2: fix incorrect -1 in vbo_compute_max_verts(), per Charmaine. Remove
incorrect assertion which was added in vbo_copy_vertices().
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81174
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=49779
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=28130
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
As before, use a new 'last_prim' pointer to simplify things. Plus, add
some const qualifiers.
v2: use 'sz' in another place, per Sinclair. And update subject line.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Use a new 'last_prim' pointer to simplify things.
v2: remove unneeded assert(exec->vtx.prim_count > 0)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Whenever we got a glColor, glNormal, glTexCoord, etc. call outside a
glBegin/End pair, we'd immediately map a vertex buffer to begin
accumulating vertex data. In some cases, such as with display lists,
this led to excessive vertex buffer mapping. For example, if we have
a display list such as:
glNewList(42, GL_COMPILE);
glBegin(prim);
glVertex2f();
...
glVertex2f();
glEnd();
glEndList();
Then did:
glColor3f();
glCallList(42);
We'd map a vertex buffer as soon as we saw glColor3f but we'd never
actually write anything to it. Note that the vertex position data
was put into a vertex buffer during display list compilation.
With this change, we delay mapping the vertex buffer until we actually
have a vertex to write to it (triggered by a glVertex() call). In the
above case, we no longer map a vertex buffer when setting the color and
calling the list.
For drivers such as VMware's, reducing buffer mappings gives improved
performance.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If we didn't find a gallium surface format that exactly matched the
glDrawPixels format/type combination, we used some other 32-bit packed
RGBA format and swizzled the whole image in the mesa texstore/format code.
That slow path can be avoided in some common cases by using the
pipe_samper_view's swizzle terms to do the swizzling at texture sampling
time instead.
For now, only GL_RGBA/ubyte and GL_BGRA/ubyte combinations are supported.
In the future other formats and types like GL_UNSIGNED_INT_8_8_8_8 could
be added.
v2: fix incorrect swizzle setup (need to invert the tex format's swizzle)
Reviewed by: Jose Fonseca <jfonseca@vmware.com>
So that we can use it directly from the mesa/gallium state tracker.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Before, if make_texture() or st_create_texture_sampler_view() failed
we silently no-op'd the glDrawPixels. Now, set GL_OUT_OF_MEMORY.
This also allows us to un-nest a bunch of code.
v2: also check if allocation of sv[1] fails, per Jose.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
src/mesa/drivers/dri/i965/brw_program.c:94:39:
warning: passing argument 1 of ‘_mesa_init_gl_program’ from incompatible
pointer type [-Wincompatible-pointer-types]
return _mesa_init_gl_program(&prog->program, target, id);
^
Runtime was unaffected as brw_geometry_program is subclassed from
gl_geometry_program, thus the address passed was the same.
Fixes: bcb56c2c69 (program: convert _mesa_init_gl_program() to take
struct gl_program *)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This avoids a serious r600g bug leading to a GPU hang.
The chances this bug will get fixed are pretty low now.
I deeply regret listening to others and not pushing this patch, leaving
other users with a GPU-crashing driver. Yes, it should be fixed
in the compiler and it's ugly, but users couldn't care less about that.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86720
Cc: 11.0 10.6 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
This exposes more information to NIR's optimization, and should be
particularly useful when we do range-based optimization.
total uniforms in shared programs: 32066 -> 32065 (-0.00%)
uniforms in affected programs: 21 -> 20 (-4.76%)
total instructions in shared programs: 93104 -> 92630 (-0.51%)
instructions in affected programs: 31901 -> 31427 (-1.49%)
The TGSI usage mask can't be used, because these are declared as an output
array of 2 elements.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
The "current" shader pointer is moved from the CSO to the context, so that
the CSO is mostly immutable.
The only drawback is that the "current" pointer isn't saved when unbinding
a shader and it must be looked up when the shader is bound again.
This is also a prerequisite for multithreaded shader compilation.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
All tests pass. We don't need to do much - just set CUBE if the view
target is CUBE or CUBE_ARRAY, otherwise set the resource target.
The reason this can be so simple is that texture instructions
have a greater effect on the target than the sampler view.
Thanks Glenn for the piglit test.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This also removes the validation from the parser as it is not required
and once arb_enhanced_layouts comes along we wont be able to do validation
on the stream qualifier in the parser anyway as it adds constant expression
support to the stream qualifier.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Fixes regression cased by bb5aeb8549
We don't care about the swizzle when building the name so just skip over it.
Tested-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
NIR considers bcsel to produce and consume unsigned types, leading to
SEL instructions operating on unsigned types when the data is really
floating-point. Previous to this patch, saturate propagation would
happily transform
(+f0) sel g20:UD, g30:UD, g40:UD
mov.sat g50:F, g20:F
into
(+f0) sel.sat g20:UD, g30:UD, g40:UD
mov g50:F, g20:F
But since the meaning of .sat is dependent on the type of the
destination register, this is not valid.
Instead, allow saturate propagation to change the types of dest/source
on instructions that are simply copying data in order to propagate the
saturate modifier.
Fixes bad code gen in 158 programs.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Make them members of fs_inst/vec4_instruction for use elsewhere.
Also fix the fs version to check that dst.type == src[1].type and for
!saturate.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
At this point, the compiler API has been substantially simplified. In the
spirit of Kristian's making a compiler library, this commit makes a single
header file that contains, more-or-less, the entire compiler API.
There's still a bit of cleanup to do particularly in the area of geometry
shaders. However, this gets us much closer to having a separate compiler.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This commit moves the common/modern stuff. Some legacy stuff such as
setting use_alt_mode was left because it needs to know whether or not we're
an ARB program.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This commit removes all dependence on GL state by getting rid of the
brw_context parameter and the GL data structures.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This commit removes all dependence on GL state by getting rid of the
brw_context parameter and the GL data structures. Unfortunately, we still
have to pass in the gl_shader_program for gen6 because it's needed for
transform feedback.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This commit removes all dependence on GL state by getting rid of the
brw_context parameter and the GL data structures.
v2 (Jason Ekstrand):
- Patch use_legacy_snorm_formula through as a function argument rather
than trying to go through the shader key.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This commit removes all dependence on GL state by getting rid of the
brw_context parameter and the GL data structures.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This way we can have other stage-specific info without consuming too much
extra space. While we're at it, we make sure that the geometry info is
only set if we're actually a goemetry shader.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Undoes early hacks, and ensures nir/glsl_types.cpp is built once, and
only once.
The root problem is that SCons doesn't know about NIR nor any source
file in the NIR_FILES source list.
Tested with libgl-gdi and libgl-xlib scons targets.
Reviewed-by: Brian Paul <brianp@vmware.com>
Spotted by Roland. Luckily, this code should never really be hit
since the const buffer size and offset should already be multiples
of 16. I could probably add more assertions to that effect, but
let's just fix the arithmetic for now.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The latter holds both UBOs and SSBOs, but here we only want UBOs.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The latter holds both UBOs and SSBOs, but here we only want UBOs.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is the only place in the driver where we use this. Since we now work
with separate index spaces, always use NumUniformBlocks and
NumShaderStorageBlocks instead of NumBufferInterfaceBlocks to be more
consistent with the rest of the code.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Now that we have separate index spaces for UBOs and SSBOs we do not need
to iterate through BufferInterfaceBlocks any more, we can just take the
UBO count directly from NumUniformBlocks.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
It looks like binding a constant buffer on compute overwrites the 3D
state. To avoid that, we already re-bind all the 3D constant buffers
after launching a compute grid but this is not enough.
Binding the constant buffer of input parameters for the compute state at
initialization corrupts the 3D constant buffers, and it's just useless
to bind it because this is not needed until we really launch a grid.
This fixes some piglit regressions related to interpolation tests
introduced in "nvc0: enable compute support by default on Fermi".
Fixes: 00d6186 (nvc0: enable compute support by default on Fermi)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Marek made core Mesa call ProgramStringNotify(), which solves this
properly. The hack is no longer needed.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
VS, GS, and FS continue doing the same thing they did before. We can
simplify the FS code a bit because it is always scalar.
Compute shaders now assert that there are no outputs instead of doing
a loop over 0 outputs.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The previous version has precision issues. This can be a problem
with tessellation. Sadly, I can't find the article where I read it
anymore. I'm not sure if the unsafe-fp-math flag would be enough to revert
this.
v2: added the comment
Changing the matrix mode alone has no effect on rendering and does
not need to trigger a flush or state validation.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
commit a6a6a71092
Author: Rob Clark <robclark@freedesktop.org>
AuthorDate: Sat Oct 10 14:13:50 2015 -0400
glsl: (mostly) remove libglsl_util
Was a bit too ambitious on removal of libglsl_util.. it is still needed
by some of the tests.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
As explained in the CUDA toolkit documentation, "a metric is a
characteristic of an application that is calculated from one or more
event values."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Now that NIR does not depend on glsl, we can (mostly[*]) get rid of the
libglsl_util hack.
[*] glsl_compiler is the one remaining user of libglsl_util
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Move glsl_types into NIR, now that the dependency on glsl_symbol_table
has been split out.
Possibly makes sense to rename things at this point, but if we do that
I'd like to keep it split out into a separate patch to make git history
easier to follow (IMHO).
v2: fix android build
v3: I f***ing hate scons.. but at least it builds
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
First step at untangling NIR's dependency on glsl_types without bringing
in the dependency on glsl_symbol_table. The builtin types are now in
glsl_types (which will end up in NIR), but adding them to the symbol-
table stays in builtin_types.cpp (which will not be part of NIR).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This was only being done in one of the two process methods.
Fixes an issue with samplers using the array size of a previous record.
Tested-by: Marek Olšák <marek.olsak@amd.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
MP counters on GF100/GF110 (compute capability 2.0) are buggy
because there is a context-switch problem that we need to fix.
Results might be wrong sometimes, be careful!
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
GF100 and GF110 chipsets are compute capability 2.0, while the other
Fermi chipsets are compute capability 2.1. That's why, some MP counters
are different between these chipsets and we need to handle variants.
Signed-off-by: Samuel Pitoiet <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Compute support was not enabled by default because weird effects
on 3D state happened, but I can't reproduce them anymore.
This also enables MP performance counters by default on Fermi.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Because we can't expose the number of hardware counters needed for each
different query, we don't want to allow more than one active query
simultaneously to avoid failure when the maximum number of counters
is reached. Note that these groups of GPU counters are currently only
used by AMD_performance_monitor.
Like for Kepler, this limits the maximum number of active queries
to 1 on Fermi.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
When a card has more than one GPC, the grid used by the compute
kernel which reads MP performance counters seems to be too small.
The consequence is that the kernel is not launched on all TPCs.
Increasing the grid size using the number of GPCs now launches
enough blocks and we can read MP performance counters of all TPCs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
NOUVEAU_GETPARAM_GRAPH_UNITS param returns the number of GPCs, the total
number of TPCs and the number of ROP units. Note that when the DRM
version is too old the default number of GPCs is fixed to 4.
This will be used to launch the compute kernel which is used to read MP
performance counters over all GPCs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Memory access have to be aligned to 128-bits. Note that this
doesn't happen when the card only has TPC.
This patch fixes the following dmesg fail:
gr: GPC0/TPC1/MP trap: global 00000004 [MULTIPLE_WARP_ERRORS] warp 000f
[UNALIGNED_MEM_ACCESS]
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
For strange reasons, the signal id depends on the slot selected on Fermi
but not on Kepler. Fortunately, the signal ids are just offseted by the
slot id!
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Queries which use more than one MP counters was misconfigured and
computing the final result was also wrong because sources need to
be configured on different hardware counters instead.
According to the blob, computing the result is now as follows:
FOR i..n
val += ctr[i] * pow(2, i)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
On Fermi, we have one domain of 8 MP counters while we have
two domains of 4 MP counters on Kepler.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Sequence fields are located at MP[i] + 0x20 in the buffer object.
This is used to check if result is available for MP[i].
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Writing 0x408000 to 0x419e00 (like on Kepler) has no effect on Fermi
because we only have one domain of 8 counters. Instead, we have to
write 0x80000000.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Writing 0x1fcb to 0x419eac is definitely not related to MP counters and
has no effect on Fermi (although this enables MP counters on Kepler).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The way we configure MP performance counters is going to pretty
different between Fermi and Kepler. Having two separate functions
is much better.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Add new GALLIUM_HUD queries for:
num-shaders
num-resources
num-state-objects
num-validations
map-buffer-time
num-surface-views
num-resources-mapped
num-flushes
Most of this patch was originally written by Neha. Additional clean-ups
and num-flushes counter added by Brian Paul.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Silences 5 warnings of the type:
state_tracker/st_cb_program.c: In function 'st_new_program':
state_tracker/st_cb_program.c:108:7: warning: passing argument 1 of
'_mesa_init_gl_program' from incompatible pointer type [enabled by default]
return _mesa_init_gl_program(&prog->Base, target, id);
^
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
This reverts commit 0de5e0f3fb.
Michel Dänzer spotted two piglit regressions from the change. I suspect
that removing the FLUSH_VERTICES() actually exposed a bug elsewhere but
I don't have time to hunt down the root issue at this time.
has_shader_storage_buffer_objects() returns true also if the OpenGL
context is 4.30 or ES 3.1.
Previously, we were saying that all atomic*() GLSL builtin functions
for SSBOs were not available when OpenGL ES 3.1 context was in use.
Fixes 48 dEQP-GLES31 tests:
dEQP-GLES31.functional.ssbo.atomic.*
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Otherwise there are problems when user overrides version and application
such as Piglit wants to detect used api with glGetString(GL_VERSION).
This makes it currently impossible to run glslparsertest tests for
OpenGL ES when using version override.
Below is example when using MESA_GLES_VERSION_OVERRIDE=3.1.
Before:
"3.1 Mesa 11.1.0-devel (git-24a1a15)"
After:
"OpenGL ES 3.1 Mesa 11.1.0-devel (git-78042ff)"
v2: only include api prefix for OpenGL ES (Boyan Ding)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Before d31f98a272 and 56e2bdbca3 we had a sigle index space for UBOs
and SSBOs, so NumBufferInterfaceBlocks would contain the combined number of
blocks, not just one kind. This means that for shader programs using both
UBOs and SSBOs, we were setting num_ssbos and num_ubos to a larger number than
we should. Since the above commits we have separate index spaces for each
so we can just get the right numbers.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
(yes, we want PRI?64, but we want the x version rather than the u
version)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
nir_variable_create already inserts it in the right list for us so
inserting it again causes a linked list corruption.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This has the better name to use. Aparently, sh->Name is usually 0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
For glBlendFunc and glBlendFuncSeparate(), the _UsesDualSrc flag
will be the same for all buffers, so no need to compute it N times.
Reviewed-by: Eric Anholt <eric@anholt.net>
A redundant call to glBlendFuncSeparateiARB() is more likely than getting
invalid values, so do the no-op check first.
Reviewed-by: Eric Anholt <eric@anholt.net>
Streamline the checking for no state change in _mesa_BlendFuncSeparate()
(and _mesa_BlendFunc()). If _BlendFuncPerBuffer is false, we only need
to check the 0th buffer state. Move argument validation after the no-op
check.
I'm looking at an app that issues about 1000 redundant glBlendFunc()
calls per frame!
Reviewed-by: Eric Anholt <eric@anholt.net>
We can skip to the end of _mesa_update_state_locked() if only the
_NEW_LINE flag is set since none of the derived state depends on it
(just like _NEW_CURRENT_ATTRIB). Note that we still call the
ctx->Driver.UpdateState() function, of course.
v2: use bitmask-based test, per Eric.
Reviewed-by: Eric Anholt <eric@anholt.net>
Changing the matrix mode alone has no effect on rendering and does
not need to trigger a flush or state validation.
Reviewed-by: Eric Anholt <eric@anholt.net>
Rather than accepting a void pointer, only to down and up cast around
it, convert the function to take the base (struct gl_program) pointer.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
V3: use a check_*_allowed style function for requirements checking
rather than has_* which doesn't encapsulate the error message
V2: add missing 's' to the extension name in error messages
and add decimal place in version string
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
This adds support for setting up the UniformBlock structures for AoA
and also adds support for resizing AoA blocks with a packed layout.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Add support for setting the max access of an unsized member
of an interface array of arrays.
For example ifc[j][k].foo[i] where foo is unsized.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This marks all counters in an AoA as active.
For AoA all but the innermost array are treated as separate
counters/uniforms. The Nvidia binary also goes further and
finds inactive counters in the AoA, in future we should do
this too, however this gets things working for the time being.
This change also removes the use of UniformHash for atomic counters,
this avoids having to generate name strings used as hash keys.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This allows the correct offset to be calculated for use in indirect
indexing of samplers.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Currently only one ir assignment is removed for each var in a single
dead code optimisation pass. This means if a var has more than one
assignment, then it requires all the glsl optimisations to be run again
for each additional assignment to be removed.
Another pass is also required to remove the variable itself.
With this change all assignments and the variable are removed in a single
pass.
Some of the arrays of arrays conformance tests that were looping
through 8 dimensions ended up with a var with hundreds of assignments.
This change helps ES31-CTS.arrays_of_arrays.InteractionFunctionCalls1
go from around 3 min 20 sec -> 2 min
ES31-CTS.arrays_of_arrays.InteractionFunctionCalls2 went from
around 9 min 20 sec to 7 min 30 sec
I had difficulty getting the public shader-db to give a consistent result
with or without this change but the results seemed unchanged at between
15-20 seconds.
Thomas Helland measured change with shader-db on his machine from
approx 117 secs to 112 secs.
V3: Simplify freeing of list as suggested by Ian, and spelling fixes.
V2: Add assert to be sure references are counted before assignments.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-By: Thomas Helland <thomashelland90@gmail.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
V3: move patch after fixes to ast for AoA and add const to helper
as suggested by Ian
V2: move single dimensional array detection into a helper
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
V3: Fix setting of data.location for struct AoA UBO members
V2: Handle arrays of arrays in the same way structures are handled
The ARB_arrays_of_arrays spec doesn't give very many details on how
AoA uniforms are intended to be implemented. However in the
ARB_program_interface_query spec there are details that show AoA are
intended to be handled in a similar way to structs.
Issues 7 from the ARB_program_interface_query spec:
We define rules consistent with our enumeration rules for
other complex types. For existing one-dimensional arrays, we enumerate
a single entry if the array is an array of basic types, or separate
entries for each array element if the array is an array of structures.
We follow similar rules here. For a uniform array such as:
uniform vec4 a[5][4][3];
we enumerate twenty different entries ("a[0][0][0]" through
"a[4][3][0]"), each of which is treated as an array with three elements.
This is morally equivalent to what you'd get if you worked around the
limitation in current GLSL via:
struct ArrayBottom { vec4 c[3]; };
struct ArrayMid { ArrayBottom b[3]; };
uniform ArrayMid a[5];
which would enumerate "a[0].b[0].c[0]" through "a[4].b[3].c[0]".
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The ES31-CTS.compute_shader.pipeline-compute-chain test case generates
an unsigned index by using gl_LocalInvocationID.x and
gl_LocalInvocationID.y as array indices.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The ES31-CTS.compute_shader.pipeline-compute-chain test case generates
an unsigned index by using gl_LocalInvocationID.x and
gl_LocalInvocationID.y as array indices.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The commit shown below caused compute shaders to hit the unreachable
in the default of the switch block. Since compute shaders don't have
any inputs, we can make brw_nir_lower_inputs a no-op for CS.
commit 2953c3d761
Author: Kenneth Graunke <kenneth@whitecape.org>
Date: Fri Aug 14 15:15:11 2015 -0700
i965/vs: Map scalar VS input locations properly; avoid tons of MOVs.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
pipe_surface_reference have problems with deleted contexts,
so use of pipe_surface_release might be more appropriate.
Fixes Wasteland 2 Director's Cut crash on start.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
The split of Uniform blocks and shader storage block only loops
up to MESA_SHADER_FRAGMENT and igonres compute shaders.
This cause segfault when running the OpenGL ES 3.1 CTS tests
with GL_ARB_compute_shader enabled.
V2: Changed to use MESA_SHADER_STAGES instead of
MESA_SHADER_COMPUTE
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Patch moves existing calculation code from shader_query.cpp to happen
during program resource list creation.
No Piglit or CTS regressions were observed during testing.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Patch adds 2 new fields to gl_uniform_storage so that we don't need to
calculate these values during runtime shader queries. This is required by
upcoming changes to free GLSL IR after linking.
Patch moves 3 booleans inside structure so that structure size stays the
same after this change.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Currently, these arrays in gl_shader and gl_shader_program hold both
UBOs and SSBOs, so this looks like a better name. We were already
using NumBufferInterfaceBlocks in gl_shader_program, so this makes
things more consistent as well.
In a later patch we will add {Num}UniformBlocks and
{Num}ShaderStorageBlocks which will contain only references to
UBOs and SSBOs respectively that will provide backends with
a separate index space for both types of objects.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We get these when we operate on vector variables with array accessors
(i.e. things like a[0] where 'a' is a vec4). When we call variable_referenced()
on these expressions we want to return a reference to 'a' instead of NULL.
This fixes a problem where we pass a[0] as the first argument to an atomic
SSBO function that expects a buffer variable. In order to check this, we use
variable_referenced(), but that is currently returning NULL in this case, since
the underlying rvalue is a vector_extract expression.
Tested-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
NIR is typeless so this is the only way to keep track of the
type to select the proper atomic to use.
v2:
- Use imin,imax,umin,umax for the intrinsic names (Connor Abbott)
- Change message for unreachable paths (Michael Schellenberger)
Tested-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The variable 'i' is a value in [0, MAT_ATTRIB_MAX-1] so subtracting
VERT_ATTRIB_GENERIC0 gave a bogus value and we executed the default
switch clause for all loop iterations.
This doesn't fix any known issues but was clearly incorrect.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is the result of applying several rules:
From OpenGL 4.3 spec, section 7.6.2.2 "Standard Uniform Block Layout":
"2. If the member is a two- or four-component vector with components
consuming N basic machine units, the base alignment is 2N or 4N,
respectively."
[...]
"4. If the member is an array of scalars or vectors, the base alignment
and array stride are set to match the base alignment of a single array
element, according to rules (1), (2), and (3), and rounded up to the
base alignment of a vec4."
[...]
"7. If the member is a row-major matrix with C columns and R rows, the
matrix is stored identically to an array of R row vectors with C
components each, according to rule (4)."
[...]
"When using the std430 storage layout, shader storage blocks will be
laid out in buffer storage identically to uniform and shader storage
blocks using the std140 layout, except that the base alignment and
stride of arrays of scalars and vectors in rule 4 and of structures in
rule 9 are not rounded up a multiple of the base alignment of a vec4."
In summary: vec2 has a base alignment of 2*N, a row-major mat2xY is
stored like an array of Y row vectors with 2 components each. Because
of std430 storage layout, the base alignment of the array of vectors
is not rounded up to vec4, so it is still 2*N.
Fixes 15 dEQP tests:
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_lowp_mat2
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_mediump_mat2
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_highp_mat2
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_lowp_mat2x3
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_mediump_mat2x3
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_highp_mat2x3
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_lowp_mat2x4
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_mediump_mat2x4
dEQP-GLES31.functional.ssbo.layout.single_basic_type.std430.row_major_highp_mat2x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat2
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat2x3
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat2x4
dEQP-GLES31.functional.ssbo.layout.instance_array_basic_type.std430.row_major_mat2
dEQP-GLES31.functional.ssbo.layout.instance_array_basic_type.std430.row_major_mat2x3
dEQP-GLES31.functional.ssbo.layout.instance_array_basic_type.std430.row_major_mat2x4
v2:
- Add spec quote in both commit log and code (Timothy)
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Previously, ATTR was indexed by VERT_ATTRIB_* slots; at the end of
compilation, assign_vs_urb_setup() translated those into GRF units,
and converted ATTR to HW_REGs.
This patch moves the transslation earlier, making ATTR work in terms of
GRF units from the beginning. assign_vs_urb_setup() simply has to add
the number of payload registers and push constants to obtain the final
hardware GRF number. (We can't do this earlier as those values aren't
known.)
ATTR still supports reg_offset; however, it's simply added to reg.
It's not clear whether this is valuable or not.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The act of ensuring that there is space can cause a flush to happen,
which will emit the current screen fence. If that is the fence we're
trying to wait on, then it will have been emitted as a result of doing
the PUSH_SPACE. Don't attempt to emit it a second time.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Fixes: 8053c9208f (nouveau: avoid emitting new fences unnecessarily)
Cc: mesa-stable@lists.freedesktop.org
Fixes:
ES3-CTS.shaders.negative.constant_sequence
spec/glsl-es-3.00/compiler/global-initializer/from-sequence.vert
spec/glsl-es-3.00/compiler/global-initializer/from-sequence.frag
v2: Fix a couple copy-and-paste mistake in the spec quotations.
Suggested by Matt.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
This will be used in the next patch to enforce some language sematics.
v2: Fix inverted logic in
ast_function_expression::has_sequence_subexpression. The method
originally had a different name and a different meaning. I fixed the
logic in ast_to_hir.cpp, but I only changed the names in
ast_function.cpp.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
v2: Combine this check with the existing const and uniform checks. This
change depends on the previous patch (glsl: Only set
ir_variable::constant_value for const-decorated variables).
Fixes:
ES2-CTS.shaders.negative.initialize
ES3-CTS.shaders.negative.initialize
spec/glsl-es-1.00/compiler/global-initializer/from-attribute.vert
spec/glsl-es-1.00/compiler/global-initializer/from-uniform.vert
spec/glsl-es-1.00/compiler/global-initializer/from-uniform.frag
spec/glsl-es-1.00/compiler/global-initializer/from-global.vert
spec/glsl-es-1.00/compiler/global-initializer/from-global.frag
spec/glsl-es-1.00/compiler/global-initializer/from-varying.frag
spec/glsl-es-3.00/compiler/global-initializer/from-uniform.vert
spec/glsl-es-3.00/compiler/global-initializer/from-uniform.frag
spec/glsl-es-3.00/compiler/global-initializer/from-in.vert
spec/glsl-es-3.00/compiler/global-initializer/from-in.frag
spec/glsl-es-3.00/compiler/global-initializer/from-global.vert
spec/glsl-es-3.00/compiler/global-initializer/from-global.frag
Note: spec/glsl-es-3.00/compiler/global-initializer/from-sequence.*
still fail because the result of a sequence operator is still considered
to be a constant expression.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92304
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> [v1]
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Right now we're also setting for uniforms, and that doesn't seem to hurt
things. The next patch will make general global variables in GLSL ES,
and those definitely should not have constant_value set!
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
This is the way layout(binding=xxx) works from GLSL. The old method
just happened to work (and significantly predated support for
layout(binding=xxx)), but future changes will break this.
v2: Remove some stale comments. Suggested by Matt and Chris Forbes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
In d4a24745 (August 2012), Paul made functions calls not be constant
expressions in GLSL ES 1.00. Since this feature was added in desktop
GLSL 1.20, we believed that it was added in GLSL ES 3.00. That turns
out to be completely wrong. Built-in functions have always been allowed
as constant expressions in GLSL ES, and the patch adds the (many) spec
quotations to prove it.
While we never previously encountered this, a later patch enforces a GLSL
ES 1.00 rule that global variable initializers must be constant
expressions. Without this fix, several dEQP tests fail.
Fixes:
tests/spec/glsl-es-1.00/compiler/const-initializer/from-function.frag
tests/spec/glsl-es-1.00/compiler/const-initializer/from-function.vert
tests/spec/glsl-es-1.00/compiler/const-initializer/from-sequence-in-function.frag
tests/spec/glsl-es-1.00/compiler/const-initializer/from-sequence-in-function.vert
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.0 10.1 10.2 10.3 10.4 10.5 10.6 11.0" <mesa-stable@lists.freedesktop.org>
Yes, I know we don't maintain stable branches that far back, but that
*is* how far back this bug goes!
Vertex attributes of different categories (constant/per-instance/
per-vertex) go into different buffers for translation, and this is now
properly reflected in the vertex buffers passed to the driver.
Fixes e.g. piglit's point-vertex-id divisor test.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Also fix style / wrong indentation along the way and make the messages
more uniform.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
GLSL Spec 4.20.8, 4.3 Storage Qualifiers:
"Initializers in global declarations may only be used in declarations of
global variables with no storage qualifier, with a const qualifier or
with a uniform qualifier."
We do this for input variables, but not for output variables. AMD and NVIDIA
proprietary drivers don't allow this either.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
For the VS and FS stages that use ARB_vertex_program or
ARB_fragment_program we don't have a shader program, however,
when debuging is enabled, we call brw_dump_ir like this:
brw_dump_ir("vertex", prog, &vs->base, &vp->program.Base);
where vs will be NULL (since prog is NULL).
As pointed out by Chris, this &vs->base is not really a dereference,
it simply computes a new address that just happens to be 0x0 because
the offset of base in brw_shader is 0. Then brw_dump_ir will see a
NULL pointer and not do anything. This is why this does not crash at
the moment. However, this does not look very safe (it would crash
for any location of base that is not the first in brw_shader), so
patch it to prevent a potential (even if unlikely) problem in the
future.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The initial glGetUniformdv support didn't cover all the
casting cases that are apparantly legal, and cts seems to
test for them.
I've updated the piglit test to cover these cases now.
v2: fix indentation - it's all broken in this file (Ilia)
fix src/dst index tracking in light of fp64 support (Ilia)
cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Curro added this in commit 3ee2daf23d (before the vec4/NIR backend was
added) but it was missed in the new NIR backend. Add it there as well.
instructions in affected programs: 1857 -> 1810 (-2.53%)
helped: 15
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We still have to push everything out, might as well kick earlier and
flip pushbufs when we know we'll need it. This resolves some issues with
the new policy of making sure that we always leave a bit of room at the
end for fences.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 47d11990b (nouveau: make sure there's always room to emit a fence)
Cc: mesa-stable@lists.freedesktop.org
Right now we emit on every kick, but this is only necessary if something
will ever be able to observe that the fence completed. If there are no
refs, leave the fence alone and emit it another day.
This also happens to work around an issue for the kick handler -- a kick
can be a result of e.g. nouveau_bo_wait or explicit kick, or it can be
due to lack of space in the pushbuf. We want the emit to happen in the
current batch, so we want there to always be enough space. However an
explicit kick could take the reserved space for the implicitly-triggered
kick's fence emission if it happened right after. With the new mechanism,
hopefully there's no way to cause two fences to be emitted into the same
reserved space.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 47d11990b (nouveau: make sure there's always room to emit a fence)
Cc: mesa-stable@lists.freedesktop.org
In theory, GF110+ should also support NVC8_COMPUTE_CLASS but, in practice,
a ILLEGAL_CLASS dmesg fail appears when using it.
This fixes compute support and MP performance counters on GF110.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
For scalar VS, I'll need this in brw_fs.cpp as well. It seems silly to
redeclare it in three places.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we used nir_lower_io with the scalar type_size function,
which mapped VERT_ATTRIB_* locations to...some numbers. Then, in
fs_visitor::nir_setup_inputs(), we created temporaries indexed by
those numbers, and emitted MOVs from the actual ATTR registers to
those temporaries. Virtually all of these were copy propagated away,
but it's still ugly.
This patch reworks our input lowering to produce NIR lower_input
intrinsics that properly index into the ATTR file, so we can access
it directly.
No changes in shader-db.
v2: Fix unreachable() message (Ken), update commit message (Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
nr_attributes is used to compute first_non_payload_grf, which is the
first register we're allowed to use for ordinary register allocation.
The hardware requires us to read at least one pair of values, but we're
completely free to overwrite that garbage register with whatever we like.
Instead of altering nr_attributes, we should alter urb_read_length, which
only affects the amount we ask the VF to read. This should save us a
register in trivial cases (which admittedly isn't very useful).
While we're at it, improve the explanation in the comments.
v2: Actually do what I said (caught by Ilia).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Both the vec4 and scalar VS backends had virtually identical URB entry
size and read length calculations. We can move those up a level to
backend-agnostic code and reuse it for both.
Unfortunately, the backends need to know nr_attributes to compute
first_non_payload_grf, so I had to store that in prog_data. We could
use urb_read_length, but that's nr_attributes rounded up to a multiple
of two, so doing so would waste a register in some cases.
There's more code to be removed in the vec4 backend, but that will
come in a follow-on patch.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Switch statements introduce a bogus loop with an unconditional break at
the end of the loop, just before the while...so the while is unreachable
and has no immediate dominator.
v2: With less exuberance
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The (gen < 9) check in brw_clear() was too broad. It disabled all types
of fast color clears:
a. singlesample rep clears
b. singlesample MCS fast clears
c. multisample MCS fast clears
The MCS clears are still buggy, but the rep clear works well. So let's
enable it.
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Fast color clears are disabled for gen9 (see the checks in
brw_meta_fast_clear), so there is no reason to allocate the MCS and
track its clear/resolve state.
Reviewed-by: Neil Roberts <neil@linux.intel.com>
The translate functions is split into two:
- translation to TGSI
- creating the variant (TGSI transformations only)
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
The samplers for DrawPixels data and the pixel map are assigned to slots
which don't overlap with the existing sampler slots.
The texture coordinates for the user texture are uploaded as a constant.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
- there is no connection to user fragment shaders, so having these as
shader variants makes no sense
- don't use Mesa IR, use TGSI
- don't create gl_fragment_program, just create the shader CSO
v2: generate exactly the same shader as before to fix llvmpipe
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
No other shader stage has a "prepare" function.
This will allow removing some variables from st_vertex_program.
Also, prepare_fragment_program was a dead prototype.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Drivers weren't notified about this at all.
This allows disabling on-demand compilation in drivers.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
First step towards inverting the dependency between glsl and nir (so nir
can be used without glsl). Also solves this issue with 'make distclean'
Making distclean in mesa
make[2]: Entering directory '/mnt/sdb1/Src64/Mesa-git/mesa/src/mesa'
Makefile:2486: ../glsl/.deps/shader_enums.Plo: No such file or directory
make[2]: *** No rule to make target '../glsl/.deps/shader_enums.Plo'. Stop.
make[2]: Leaving directory '/mnt/sdb1/Src64/Mesa-git/mesa/src/mesa'
Makefile:684: recipe for target 'distclean-recursive' failed
make[1]: *** [distclean-recursive] Error 1
make[1]: Leaving directory '/mnt/sdb1/Src64/Mesa-git/mesa/src'
Makefile:615: recipe for target 'distclean-recursive' failed
make: *** [distclean-recursive] Error 1
Reported-by: Andy Furniss <adf.lists@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The call to _mesa_test_texobj_completeness() is unnecessary if the
texture is already known to be complete. If the texture object is
dirtied in the meantime _BaseComplete and _MipmapComplete will be
reset to false. _mesa_is_image_unit_valid() will start to be called
more frequently in a future commit, so it seems desirable to avoid the
unnecessary work.
Tested-by: Ye Tian <yex.tian@intel.com>
CC: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
A future commit will remove all texture object-dependent derived state
from the image unit struct to make validation unnecessary on texture
state changes. Instead of checking gl_image_unit::_Valid drivers will
be required to call this function when needed to find out whether an
image unit is in a valid state and whether access from the shader is
allowed.
Tested-by: Ye Tian <yex.tian@intel.com>
CC: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The hardware documentation relating to the UAV HW-assisted coherency
mechanism and UAV access enable bits is scarce and sometimes
contradictory, and there's quite some guesswork behind this commit, so
let me summarize the background first: HSW and later hardware have
infrastructure to support a stricter form of data coherency between
shader invocations from separate primitives. The mechanism is
controlled by the "Accesses UAV" bits on 3DSTATE_VS, _HS, _DS, _GS and
_PS (or _PS_EXTRA on BDW+), and the "UAV Coherency Required" bit on
the 3DPRIMITIVE command.
Regardless of whether "UAV Coherency Required" is set, the hardware
fixed-function units will increment a per-stage semaphore for each
request received if "Accesses UAV" is set for the same or any lower
stage. An implicit DC flush is emitted by the lowermost stage with
"Accesses UAV" set once it's done processing the request, this also
happens regardless of the value of "UAV Coherency Required". The
completion of the DC flush will cause the same stage and all previous
ones to decrement the semaphore, marking the UAV accesses for the
primitive as coherent with L3.
The "UAV Coherency Required" 3DPRIMITIVE bit will cause a pipeline
stall before any threads are dispatched for the first FF stage with
"Accesses UAV" set until the semaphore is cleared for the same stage.
Effectively this guarantees that UAV memory accesses performed by
previous primitives from any stage will be strictly ordered (and
thanks to the implicit DC flush visible in memory) with UAV accesses
from the following primitives.
None of this is required by the usual image, atomic counter and SSBO
GL APIs which have very relaxed cross-primitive coherency and ordering
requirements, so we don't actually ever set the "UAV Coherency
Required" bit -- Ordering with respect to shader invocations from
previous stages on the same primitive where there is a data dependency
is of course already guaranteed as the spec requires, regardless of
this mechanism being enabled. We do set the "Accesses UAV" bits
though since my commit ac7664e493 (which
this patch partially reverts), mainly because of comments like the
following from the BDW PRM:
> 3DSTATE_GS
>[...]
> 12 Accesses UAV
> Format: Enable
> This field must be set when GS has a UAV access.
There are similar comments in the documentation for the other
3DSTATE_*S commands. The "must" part is misleading and unjustified
AFAIK. Most of the "Accesses UAV" bits don't seem to have any side
effects other than the implicit DC flushes and the related
book-keeping in anticipation for a subsequent primitive with "UAV
Coherency Required" set, so in most cases they are unnecessary and may
incur a performance penalty. There is an exception though. On Gen8+
the PS_EXTRA UAV access bit influences the calculation of the PS
UAV-only and ThreadDispatchEnable signals which on previous
generations were set explicitly by the driver, so we cannot always
avoid enabling it on the PS stage.
The primary motivation for this change is that in fact the hardware
coherency mechanism is buggy and will cause a rather non-deterministic
hang on Gen8 when VS is the only stage with "Accesses UAV" set and the
processing of a request terminates immediately after the implicit DC
flush is sent for a previous primitive with no additional vertices
being emitted for the second primitive, what will cause the hardware
to skip sending a second DC flush and cause the VS to stall
indefinitely waiting for a response from the DC (BDWGFX HSD 1912017).
This hardware bug can be reproduced on current master with the
spec@arb_shader_image_load_store@host-mem-barrier@Indirect/RaW piglit
subtest (if you have the patience to run it a few dozen times).
The proposed workaround is to insert CS STALLs speculatively between
3DPRIMITIVE commands when "Accesses UAV" is enabled for the VS stage
only. Because this would affect one of the hottest paths in the
driver and likely decrease performance even further due to the
unnecessary serialization, and because we don't actually need the
implicit DC flushes, it seems better to just disable them.
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
This was originally added to nir_instrs_equal() instead of
nir_instr_can_cse() incorrectly, but this was fixed when moving to the
instruction set API (as it had to be, otherwise hashing wouldn't work).
Now, this is dead code since instr_can_rewrite() will only return true
for texture instructions that use an index, so we can turn the check into
an assert. This also means that now nir_instrs_equal(instr, instr) will
always return true unless it assert-fails.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
This was previously tied to CSE, since it would only work for
instructions where nir_can_cse() (now instr_can_rewrite()) returned
true. Now that CSE uses the instruction set abstraction which only uses
this internally, we can make it local to nir_instr_set.c.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
This replaces an O(n^2) algorithm with an O(n) one, while allowing us to
import most of the infrastructure required for GVN. The idea is to walk
the dominance tree depth-first, similar when converting to SSA, and
remove the instructions from the set when we're done visiting the
sub-tree of the dominance tree so that the only instructions in the set
are the instructions that dominate the current block.
No piglit regressions. No shader-db changes.
Compilation time for full shader-db:
Difference at 95.0% confidence
-35.826 +/- 2.16018
-6.2852% +/- 0.378975%
(Student's t, pooled s = 3.37504)
v2:
- rebase on start_block removal
- remove useless state struct
- change commit message
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
This will replace direct usage of nir_instrs_equal() in the CSE pass,
which reduces an O(n^2) algorithm with an effectively O(n) one. It'll
also be useful for implementing GVN on top of GCM.
v2:
- Add texture support.
- Add more comments.
- Rename instr_can_hash() to instr_can_rewrite() since it's really more
about whether its uses can be rewritten, and it's implicitly used by
nir_instrs_equal() as well.
- Rename nir_instr_set_add() to nir_instr_set_add_or_rewrite() (Jason).
- Make the HASH() macro less magical (Topi).
- Rewrite the commit message.
v3:
- For sorting phi sources, use a VLA, store pointers to the sources, and
compare the predecessor pointer directly (Jason).
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Right now nir_instrs_equal() is tied pretty tightly to CSE, but we're
going to introduce the idea of an instruction set and tie it to that
instead. In anticipation of that, move this into its own file where
we'll add the rest of the instruction set implementation later.
v2: Rebase on texture support.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
If a non-const sample number is given to interpolateAtSample it will
now generate an indirect send message with the sample ID similar to
how non-const sampler array indexing works. Previously non-const
values were ignored and instead it ended up using a constant 0 value.
The generator will try to determine if the sample ID is dynamically
uniform via nir_src_is_dynamically_uniform. If not it will query the
pixel interpolator in a loop, once for each different live sample
number. The next live sample number is found using emit_uniformize. If
multiple live channels have the same sample number then they will be
handled in a single iteration of the loop. The loop is necessary
because the indirect send message doesn't seem to have a way to
specify a different value for each fragment.
This fixes the following two Piglit tests:
arb_gpu_shader5-interpolateAtSample-nonconst
arb_gpu_shader5-interpolateAtSample-dynamically-nonuniform
v2: Handle dynamically non-uniform sample ids.
v3: Remove the BREAK instruction and predicate the WHILE directly.
Make the tokens arrays const. (Matt Turner)
v4: Iterate over the live channels instead of each possible sample
number.
v5: Don't special case immediate values in
brw_pixel_interpolator_query. Make a better wrapper for the
function to set up the PI send instruction. Ensure that the SHL
instructions are scalar. (Francisco Jerez).
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
It is possible to directly predicate the WHILE instruction. In this
case there will be a second successor block because the execution can
resume from the instruction after the loop. This will be used in a
subsequent patch.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Adds nir_src_is_dynamically_uniform which returns true if the source
is known to be dynamically uniform. This will be used in a later patch
to add a workaround for cases that only work with dynamically uniform
sources. Note that the function is not definitive, it can return false
negatives (but not false positives). Currently it only detects
constants and uniform accesses. It could easily be extended to include
more cases.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This will allow to split SW and HW queries in an upcoming patch.
While we are at it, make use of nvc0_query struct instead of pipe_query.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
From ARB_program_query_interface:
For the property ARRAY_SIZE, a single integer identifying the number of
active array elements of an active variable is written to <params>. The
array size returned is in units of the type associated with the property
TYPE. For active variables not corresponding to an array of basic types,
the value one is written to <params>. If the variable is a shader
storage block member in an array with no declared size, the value zero
is written to <params>.
v2:
- Unsized arrays of arrays have an array size different than zero
v3:
- Arrays and unsized arrays will have an array_stride > 0. Use it
instead of is_unsized_array flag (Timothy).
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
From ARB_shader_storage_buffer_object:
"When using the ARB_program_interface_query extension to enumerate the
set of active buffer variables, only the first element of arrays (sized
or unsized) will be enumerated"
_mesa_program_resource_array_size() is used when getting the name (and
name length) of the active variables. When it is an unsized array,
we want to indicate it has one active element so the returned name
would have "[0]" at the end.
v2:
- Use array_stride > 0 and array_elements == 0 to detect unsized
arrays. Because of that, we don't need is_unsized_array flag
(Timothy)
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
When the active variable is an array which is already a top-level
shader storage block member, don't return its array size and stride
when querying TOP_LEVEL_ARRAY_SIZE and TOP_LEVEL_ARRAY_STRIDE
respectively.
Fixes the following 12 dEQP-GLES31 tests:
dEQP-GLES31.functional.ssbo.layout.single_basic_array.shared.mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.shared.row_major_mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.shared.column_major_mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.packed.mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.packed.row_major_mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.packed.column_major_mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std140.mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std140.row_major_mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std140.column_major_mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.row_major_mat3x4
dEQP-GLES31.functional.ssbo.layout.single_basic_array.std430.column_major_mat3x4
v2:
- Fix check when the shader storage block is instanced
- Write auxiliary function to do the check.
v3:
- Check if full_instanced_name is NULL just after allocation (Ilia)
- Remove () from one strcmp() in the if statement (Ilia)
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Patch adds missing type (used with NV_read_depth) so that it gets
handled correctly. This fixes errors seen with following CTS test:
ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Before, we were unconditionally assigning the TargetIndex field in
_mesa_BindTexture(), even if it was already set properly. Now we
initialize TargetIndex wherever we initialize the Target field, in
_mesa_initialize_texture_object(), finish_texture_init(), etc.
v2: also update the meta_copy_image code. In make_view() the
view_tex_obj->Target field was set, but not the TargetIndex field.
Also, remove a second, redundant assignment to view_tex_obj->Target.
Add sanity check assertions too.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Callers of create_texture() will either pass target=0 or a validated
GL texture target enum so no need to do another error check inside
the loop.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
We can now link the unit tests against just libi965_compiler.la. This
lets us drop a lot of DRI driver dependencies, but we still pull in all
of libmesa and more.
This also provides a few standalone users of libi965_compiler.la, which
will help us accidentally using i965_dri.so functions from the compiler.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
This introduces a new libtool helper library, libi965_compiler.la. This
library is moderately self-contained, but still needs to link to all of
libmesa.la among other things.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
brw_get_shader_time_index() is all tangled up in brw_context state and
we can't call it from the compiler. Thanks the Jasons recent
refactoring, we can just get the index and pass to the emit functions
instead.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
We call this from the compiler so move it to brw_shader.cpp.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
This function computes the next power of two, but at least 1024. We can
do that by bitwise or'ing in 1023 and calling util_next_power_of_two().
We use brw_get_scratch_size() from the compiler so we need it out of
brw_program.c. We could move it to brw_shader.cpp, but let's make it a
small inline function instead.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
brw_program.c won't be part of the compiler library, but we need
brw_mark_surface_used() in the compiler. Move to brw_shader.cpp.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
The initial motivation for this patch was to avoid calling
brw_cs_prog_local_id_payload_dwords() in gen7_cs_state.c from the
compiler. This commit ends up refactoring things a bit more so as to
split out the logic to build the local id payload to brw_fs.cpp. This
moves the payload building closer to the compiler code that uses the
payload layout and makes it available to other users of the compiler.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
We want to use the rest of brw_shader.cpp with the rest of the compiler
without pulling in the GLSL linking code.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
We need the debug flag parsing and INTEL_DEBUG in the compiler, but we
don't want the dependency on bufmgr (libdrm_intel) in there. Move to
intel_screen.c.
There are now only two lines left in brw_process_intel_debug_variable(),
but we keep it in intel_debug.h to avoid having to expose
'debug_control' as a global variable.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
We want to use intel_debug.c in code that doesn't link to dri common.
v2: Remove unnecessary stddef.h include (Topi), use util/debug.h
in all DRI driver and remove driParseDebugString() (Iago).
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
We move these calls one level up into the codegen functions.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Previously the name of the nir shader was being freed prematurely during
nir_sweep. Since 756613ed35 the name was later being used to generate
filenames for the optimiser debug output and these would end up with
garbage from the dangling pointer.
Co-authored-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Comit d48ac93066 addressed this for VS, but we forgot to do the same for
URB writes generated by the gen6 GS.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Varyings can be considered inputs or outputs of a program only when
SSO is in use. With multi-stage programs, inputs contain only inputs
for first stage and outputs contains outputs of the final shader stage.
I've tested that fix works for Assault Android Cactus (demo version)
and does not cause Piglit or CTS regressions in glGetProgramiv tests.
Following ES 3.1 CTS separate shader tests that do query properties
of varyings in SSO shader programs pass:
ES31-CTS.program_interface_query.separate-programs-vertex
ES31-CTS.program_interface_query.separate-programs-fragment
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92122
The EXT_texture_format_BGRA8888 extension (which mesa supports
unconditionally) adds a new format and internal format called GL_BGRA_EXT.
Previously, this was not really handled at all in
_mesa_ex3_error_check_format_and_type. When the checks were tightened in
commit f15a7f3c, we accidentally tightened things too far and GL_BGRA_EXT
would always cause an error to be thrown.
There were two primary issues here. First, is that
_mesa_es3_effective_internal_format_for_format_and_type didn't handle the
GL_BGRA_EXT format. Second is that it blindly uses _mesa_base_tex_format
which returns GL_RGBA for GL_BGRA_EXT. This commit fixes both of these
issues as well as adds explicit checks that GL_BGRA_EXT is only ever used
with GL_BGRA_EXT and GL_UNSIGNED_BYTE.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92265
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Fixes a regression that broke EGL since
commit 858f2f2ae6
Author: Emil Velikov <emil.l.velikov@gmail.com>
Date: Sun Sep 13 12:25:27 2015 +0100
egl/dri2: ease srgb __DRIconfig conditionals
Version 3: Simplify the code comment, word wrap commit description.
Version 2: Return GL_FALSE if ARB_shadow is unsupported instead of
pretending to store the value as suggested by Brian Paul.
This fixes a GL error warning on r200 in Wine.
The GL_ARB_sampler_objects extension does not specify a dependency on
GL_ARB_shadow or GL_ARB_depth_texture for setting the depth texture
compare mode and function. Silently ignore attempts to change these
settings. They won't matter without a depth texture being assigned
anyway.
Reviewed-by: Brian Paul <brianp@vmware.com>
The svga3d device requires constant buffers to be a multiple of 16 bytes
in size. OpenGL UBOs may not fit that restriction. As a work-around,
round the size up if possible, else round down.
Note that this patch only effects UBO constant buffers (index 1 or higher),
not the 0th/default constant buffer.
Fixes the game Grim Fandango Remastered. VMware bug 1510130.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
One can simplify the if-else chain, by declaring the driconfigs as a
two sized array, whist using srgb as a index to the correct entry.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Move all the enums but CONTEXT_FLAGS. The spec seems quite explicit
about the latter (wrt OpenGL ES)
"In OpenGL ES versions prior to and including ES 3.1 there is no
CONTEXT_FLAGS state and therefore the CONTEXT_FLAG_DEBUG_BIT cannot
be queried."
v2 [Emil Velikov] Rebase.
v3 [Emil Veliokv] Drop the CONTEXT_FLAGS hunk - not applicable for GLES
Signed-off-by: Matthew Waters <ystreet00@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
v2 [Emil Velikov]
- Rebase.
- Correct version in gles11 dispatch_sanity.
- Move the extension enable to a separate patch.
Signed-off-by: Matthew Waters <ystreet00@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
I started seeing a lot of situations on nv30 where fence emission
wouldn't fit into the previous buffer (causing assertions). This ensures
that whenever checking for space, we always leave a bit of extra room
for the fence emission commands. Adjusts the nv30 and nvc0 fence
emission logic to bypass the space checking as well.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Similar to 9ffc1049ca (freedreno/ir3: use nir two-sided-color lowering).
No piglit regression.
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bring the following commit over to i915:
commit ec542d7457
Author: Eric Anholt <eric@anholt.net>
Date: Mon Mar 3 10:43:10 2014 -0800
i965: Drop broken front_buffer_reading/drawing optimization.
Not sure if it might fix anything, but since the i965 and i915 used to
share a bunch of that code, it would seem reasonable the same problems
could be present in the i915 code still, and the i965 approach is well
tested by now so bringing it over seems fairly safe.
No piglit regressions on 855.
v2: Rebase on _mesa_is_front_buffer_* refactor.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There are multiple similar implementations of these functions, and a
later patch was going to add another.
v2: Move removing intel_framebuffer to a different patch.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
v2: Since state_tracker does not call _mesa_init_driver_functions, we
need to initialize the dd::NewFramebuffer pointer to
_mesa_new_framebuffer here. Suggested by Brian.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Flip the cull bits when rendering to a user fbo on gen2. This
was already done on gen3 (since before git history starts)
but was missing from the gen2 code.
Fixes rendering of the driver+kart model in supertuxkart kart
selection screen.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The hardware can draw lines 0.5 to 7.5 pixels wide. Adjust the limits
to 1.0-7.0. The old limits seems to be from the era when i915 and i965
were sharing this code.
Not really sure if 1.0-7.0 is correct. Maybe it could be 0.5.7.5 as
those are the hw limits, or maybe some combination of the two?
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The sub-pixel adjustment for points was killed off in
commit 60d762aa62
Author: Xiang, Haihao <haihao.xiang@intel.com>
Date: Wed Jan 2 11:38:51 2008 +0800
i915: Needn't adjust pixel centers. fix#12944
so if we don't need it in intel_tris.c we don't need it in
intel_render.c either, which means we can allow intel_render.c to render
points.
No apparent regressions on PNV in ES1 or ES2 conformance.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
intelFastRenderClippedPoly() renders the polygon using triangles. For
polygons the provoking vertex is always the first one, and currently
this function assumes that the provoking vertex for triangles is the
last one. In case the user changed the provoking vertex convention,
the hardware may be configured to treat the first vertex of triangles
as the provoking vertex. So check the convention and emit the triangles
in the appropriate order to avoid having to change the hardware
provoking vertex convention for rendering polygons.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
When drawing quads using triangles we need to be careful to make
the provoking vertices match when flat shading.
v2: Major rebase on top of Ian's other t_dd_dmatmp.h work.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
When rendering quad strips via tri strips we can't get the provoking
vertex right, so disallow flat shading.
v2: Major rebase on top of Ian's other t_dd_dmatmp.h work.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
We can allow rendering flat shaded polygons using tri fans if we check
the provoking vertex convention.
v2 (idr): Remove _EXT suffixes from GL_FIRST_VERTEX_CONVENTION.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
From http://lists.freedesktop.org/archives/mesa-dev/2015-May/084883.html:
"There are no real error cases here, just dead code.
validate_render() is supposed to make sure we never call these
functions if the code can't actually render the primitives. The
fprintf()+return branches should really just contain assert(0) or
equivalent."
I also rearranged the if-else-block in render_quad_strip_verts to look
more like the other functions. A future patch is going to change a
bunch of that code anyway.
v2: Make "unreachable" message more descriptive. Suggested by Iago.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
With NIR, it actually hurts things.
total instructions in shared programs: 6529329 -> 6528888 (-0.01%)
instructions in affected programs: 14833 -> 14392 (-2.97%)
helped: 299
HURT: 1
In all affected programs I inspected (including the single hurt one) the
pass CSE'd some multiplies and caused some reassociation (e.g., caused
(A * B) * C to be A * (B * C)) when the original intermediate result was
reused elsewhere.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
We're not using any fs_inst fields, and the next commit will make the
peephole used by the vec4 backend.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We never emit IF instructions with an embedded comparison (lost in the
switch to NIR), so this code is not used. If we want to readd support,
we should have a pass that merges a CMP instruction with an IF or a
WHILE instruction after other optimizations have run.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
According to the Intel Software Development Manual (Volume 1: Basic
Architecture, 12.10.3 Streaming Load Hint Instruction):
Streaming loads may be weakly ordered and may appear to software to
execute out of order with respect to other memory operations.
Software must explicitly use fences (e.g. MFENCE) if it needs to
preserve order among streaming loads or between streaming loads and
other memory operations.
That is, a memory fence is needed to preserve the order between the GPU
writing the buffer and the streaming loads reading it back.
Reported-by: Joseph Nuzman <joseph.nuzman@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
There are three types of fast clears:
a. fast depth clears
b. fast singlesample color clears
c. fast multisample color clears
Function intel_miptree_is_fast_clear_capable() checks if a miptree
supports fast clears of type (b).
Rename the function to disambiguate what it does:
old: intel_miptree_is_fast_clear_capable
new: intel_miptree_supports_non_msrt_fast_clear
The functionally accidentally rejected multisampled color surfaces
because it thought they were singlesample array surfaces. Fix that by
explicitly rejecting surfaces with samples > 1.
This fix would have been needed before we enabled layered fast
singlesample color clears (introduced in gen8), which we want to do
eventually. For now, though, this patch changes no behavior; it just
fixes how the driver chooses its behavior.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
intel_tiling_supports_non_msrt_mcs() and
intel_miptree_is_fast_clear_capable() are not used outside of
intel_mipmap_tree.c.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We need a virtual destructor when at least one of the class' methods is virtual.
Failure to do so might lead to undefined behavior when destructing derived classes.
Fixes the following warning:
brw_vec4_gs_visitor.cpp: In function 'const unsigned int* brw::brw_gs_emit(brw_context*, gl_shader_program*, brw_gs_compile*, void*, unsigned int*)':
brw_vec4_gs_visitor.cpp:703:11: warning: deleting object of polymorphic class type 'brw::vec4_gs_visitor' which has non-virtual destructor might cause undefined behaviour [-Wdelete-non-virtual-dtor]
delete gs;
Curro: This shouldn't be causing any actual bugs at the moment because
gen6_gs_visitor is the only subclass of vec4_visitor destroyed through
a pointer of a base class (vec4_gs_visitor *) and its destructor is
basically the same as its parent's. Anyway it seems sensible to change
this so it doesn't bite us in the future.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Fixes following Piglit test:
global-scope-binding-qualifier.frag
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
In theory we can't break this assertion since the compiler frontend checks
that we don't exceed any of the individual limits, but it does not hurt to
be extra safe.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These share the space with UBO surfaces but we need to make sure we
allocate enough space for both sets (12 of each)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The thing you want to do with the output files is diff them, which is
made more difficult by line numbers changing.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
It seems like things are either coming in slighly wrong, or perhaps
uploaded incorrectly, but either way passing them through the translate
module seems to fix everything. Eventually we should figure out what's
going wrong and fix it "for real", but this should do for now.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
This puts us in line with what the DDX/DRI2 st are expecting. It also
happens to work... no idea why, but seems better to have it work than to
ask lots of questions.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
The uniform will only be of a single type so store the data for
opaque types in a single array.
Cc: Francisco Jerez <currojerez@riseup.net>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Geometry and tessellation shaders process multiple vertices; their
inputs are arrays indexed by the vertex number. While GLSL makes
this look like a normal array, it can be very different behind the
scenes.
On Intel hardware, all inputs for a particular vertex are stored
together - as if they were grouped into a single struct. This means
that consecutive elements of these top-level arrays are not contiguous.
In fact, they may sometimes be in completely disjoint memory segments.
NIR's existing load_input intrinsics are awkward for this case, as they
distill everything down to a single offset. We'd much rather keep the
vertex ID separate, but build up an offset as normal beyond that.
This patch introduces new nir_intrinsic_load_per_vertex_input
intrinsics to handle this case. They work like ordinary load_input
intrinsics, but have an extra source (src[0]) which represents the
outermost array index.
v2: Rebase on earlier refactors.
v3: Use ssa defs instead of nir_srcs, rebase on earlier refactors.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
get_io_offset() already walks the dereference chain and discovers
whether or not we have an indirect; we can just return that rather than
computing it a second time via deref_has_indirect(). This means moving
the call a bit earlier.
By returning a nir_ssa_def *, we can pass back both an existence flag
(via NULL checking the pointer) and the value in one parameter. It
also simplifies the code somewhat. nir_lower_samplers works in a
similar fashion.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is only a half of the work. The next patch will handle
gl_SampleID/SamplePos, which is the other half of ARB_sample_shading.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Required by ARB_sample_shading for drivers that don't want a shader variant
in st/mesa.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Roland Scheidegger <sroland@vmware.com>
We will only do depth-only or stencil-only decompress blits, whichever is
needed by textures, instead of always doing both.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Now that everything comes in through NIR, we can pick this directly out of
the shader source and don't need to reference the gl_fragment_program.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a common enough operation that it's nice to not have to think about
the arguments to foreach_list_typed every time.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Drivers and state trackers that use LLVM for generating code, must
register the targets they use with LLVM's global TargetRegistry.
The TargetRegistry is not thread-safe, so all targets must be added
to the registry before it can be queried for target information.
When drivers and state trackers initialize their own targets, they need
a way to force gallivm to initialize its targets at the same time.
Otherwise, there can be a race condition in some multi-threaded
applications (e.g. glx-multihreaded-shader-compile in piglit),
when one thread creates a context for a driver that uses LLVM (e.g.
radeonsi) and another thread creates a gallivm context (glxContextCreate
does this).
The race happens when the driver thread initializes its LLVM targets and
then starts using the registry before the gallivm thread has a chance to
register its targets.
This patch allows users to force gallivm to register its targets by
calling the gallivm_init_llvm_targets() function.
v2:
- Use call_once and remove mutexes and static initializations.
- Replace gallivm_init_llvm_{begin,end}() with
gallivm_init_llvm_targets().
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Unfortunately, we can't get rid of them entirely. The FS backend still
needs gl_program for handling TEXTURE_RECTANGLE. The GS vec4 backend still
needs gl_shader_program for handling transfom feedback. However, the VS
needs neither and we can substantially reduce the amount they are used.
One day we will be free from their tyranny.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It doesn't exist for anything other than an assert that, as far as I can
tell, isn't possible to trip. Soon, we will remove prog from the visitor
entirely and this will become even more impossible to hit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The texunit variable we create and assign in nir_emit_texture gets passed
through two more layers of function calls before it gets to its sole use in
rescale_texcoord. The best part is that we already pass the sampler into
rescale_texcoord so we can just look it up there.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
As of now, uniform setup is more-or-less unified between vec4 and fs and no
longer requires the fs_visitor. This makes uniform setup more of a
language/API thing than a backend compiler thing. This commit moves
setting up the stage_prog_data.params arrays to the same place as we set up
the rest of stage_prog_data.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Setting up binding tables really has little to do with the actual process
of turning shaders into instructions; it's more part of setting up
prog_data. This commit moves it out of the visitors and with the rest of
the prog_data setup stuff.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This really has nothing to do with the backend compiler and we'd like to
eventually be able to set this up earlier in the compile process.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The way we deal with GLSL uniforms and builtins is basically the same in
both the vec4 and the fs backend. This commit takes the best parts of both
implementations and pulls the common code into a shared helper function.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The way we deal with ARB program uniforms is basically the same in both the
vec4 and the fs backend. This commit takes the best parts of both
implementations and pulls the common code into a shared helper function.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Previously, we were counting up uniforms as we set them up. However, this
count should be exactly identical to shader->num_uniforms provided by
nir_assign_var_locations. (If it's not, we're in trouble anyway because
that means that locations don't match up.) This matches what the fs
backend is already doing.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
I tried to do this once before but Curro pointed out that having it in
backend_shader meant it could use the setup_vec4_uniform_values helper
which did different things in vec4 and fs. Now the setup_uniform_values
function differs only by an assert in the two backends so there's no real
good reason to be using it anymore.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The uniform_vector_size array was only ever used by pack_uniform_registers
which no longer needs it.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Previously, pack_uniform_registers worked based on the size of the uniform
as given to us when we initially set up the uniforms. However, we have to
walk through the uniforms and figure out liveness anyway, so we migh as
well record the number of channels used as we go. This may also allow us
to pack things tighter in a few cases.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
That way, if we do the usual thing of multiplying vector_elements by
matrix_columns we get the actual number of components in the type as per
component_slots().
While we're at it, we also switch to using the actual C++ field
initializers for vector_elements and matrix_columns.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Previously, we had a bunch of code in each stage to figure out how many
slots we needed in stage_prog_data.param. This code was mostly identical
across the stages and had been copied and pasted around. Unfortunately,
this meant that any time you did something special, you had to add code for
it to each of these places. In particular, none of the stages took
subroutines into account; they were working entirely by accident. By
taking this data from the NIR shader, we know the exact number of entries
we need and everything goes a bit smoother.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The next commit will add code to codegen_vs_prog that requires the NIR
shader to be there in all cases. It doesn't hurt anything to just move it
from brw_vs_emit to its only caller.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
GLSL IR vs. NIR shader-db results for vec4 programs on i965:
total instructions in shared programs: 1499328 -> 1388354 (-7.40%)
instructions in affected programs: 1245199 -> 1134225 (-8.91%)
helped: 7469
HURT: 2440
GLSL IR vs. NIR shader-db results for vec4 programs on G4x:
total instructions in shared programs: 1436799 -> 1325825 (-7.72%)
instructions in affected programs: 1205599 -> 1094625 (-9.20%)
helped: 7469
HURT: 2440
GLSL IR vs. NIR shader-db results for vec4 programs on Iron Lake:
total instructions in shared programs: 1436654 -> 1325682 (-7.72%)
instructions in affected programs: 1205503 -> 1094531 (-9.21%)
helped: 7468
HURT: 2440
GLSL IR vs. NIR shader-db results for vec4 programs on Sandy Bridge:
total instructions in shared programs: 2016249 -> 1787033 (-11.37%)
instructions in affected programs: 1850547 -> 1621331 (-12.39%)
helped: 14856
HURT: 1481
GLSL IR vs. NIR shader-db results for vec4 programs on Ivy Bridge:
total instructions in shared programs: 1848027 -> 1648216 (-10.81%)
instructions in affected programs: 1660279 -> 1460468 (-12.03%)
helped: 14668
HURT: 1369
GLSL IR vs. NIR shader-db results for vec4 programs on Bay Trail:
total instructions in shared programs: 1848027 -> 1648216 (-10.81%)
instructions in affected programs: 1660279 -> 1460468 (-12.03%)
helped: 14668
HURT: 1369
GLSL IR vs. NIR shader-db results for vec4 programs on Haswell:
total instructions in shared programs: 1848027 -> 1648216 (-10.81%)
instructions in affected programs: 1660279 -> 1460468 (-12.03%)
helped: 14668
HURT: 1369
I also ran our full suite of benchmarks on a Haswell and had the following
statistically significant (according to ministat) changes:
Test master-glsl master-nir diff
bench_OglGeomPoint 461.556 463.006 1.450
bench_OglTerrainFlyInst 184.484 187.574 3.090
bench_OglTerrainPanInst 132.412 136.307 3.895
bench_OglTexFilterAniso 19.653 19.645 -0.008
bench_OglTexFilterTri 58.333 58.009 -0.324
bench_OglVSInstancing 65.049 65.327 0.278
bench_trexoff 69.474 69.694 0.220
bench_valley 40.708 41.125 0.417
v2 (Jason Ekstrand):
- Remove more uses of NirOptions as a switch
- New shader-db numbers
- Added benchmark numbers
Reviewed-by: Matt Turner <mattst88@gmail.com>
Commit 0a1adaf11d (nir: Report progress
from nir_lower_system_values().) introduced a bug caught by Valgrind:
==823== Conditional jump or move depends on uninitialised value(s)
==823== at 0xB09020C: convert_block (nir_lower_system_values.c:68)
==823== by 0xB079FB8: foreach_cf_node (nir.c:1310)
==823== by 0xB07A0AF: nir_foreach_block (nir.c:1336)
==823== by 0xB09026B: convert_impl (nir_lower_system_values.c:79)
...
==823== Uninitialised value was created by a stack allocation
==823== at 0xB090249: convert_impl (nir_lower_system_values.c:76)
which is trivially fixed by initializing progress.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add a macro GL_LIB_NAME to hold the filename that configure comes up with
based on the --with-gl-lib-name and --enable-mangling options.
In driOpenDriver, use the GL_LIB_NAME macro instead of hard-coding
"libGL.so.1".
v2: Add an #ifndef/#define for GL_LIB_NAME so that non-autoconf builds will
work.
v3: Fix the library filename in the Makefile.
Signed-off-by: Kyle Brenneman <kbrenneman@nvidia.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
When USE_MGL_NAMESPACE is defined, _glapi_get_stub will check for the "m"
prefix before trying to skip it, so that "glFoo" and "mglFoo" are
equivalent.
This should let it work with all the places where something calls
_glapi_get_proc_offset with a hard-coded name that starts with the normal
"gl" prefix.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55552
Signed-off-by: Kyle Brenneman <kbrenneman@nvidia.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Fixes following Piglit test:
member-invalid-binding-qualifier.frag
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
When writing to a column of a row-major matrix, each component of the
vector is stored to non-consecutive memory addresses, so we generate
one instruction per component.
This patch skips the disabled components in the writemask, saving some
store instructions plus avoid storing wrong data on each disabled
component.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Fixes a failing subtest in:
ES31-CTS.shader_storage_buffer_object.negative-glsl-compileTime
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reading this output was really confusing. reg represents attribute
slots; reg_offset is the x/y/z/w component (0..3) within a vec4 slot.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The code for input lowering is going to get significantly more
complicated shortly, so I wanted to pull it out. Vertex shader inputs
are handled nearly identically regardless of vec4/scalar mode, so I
opted to not split that.
I thought about having each function actually do the lowering, but one
pass through nir_lower_io that handles all types (which weren't handled
earlier) is probably more efficient.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We may want to use different type_size functions for (e.g.) inputs
vs. uniforms. Passing in -1 for mode ignores this, handling all
modes as before.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If the texture object exists, but the Name field is zero, it means
the object was created but never bound to a target. Trying to bind it
in _mesa_BindTextureUnit() should generate GL_INVALID_OPERATION.
Fixes piglit's arb_direct_state_access-bind-texture-unit test.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This helper was only called from _mesa_BindTextureUnit(). It's simpler
to just inline it.
The error check / code / message in the helper was incorrect. It was
written for glBindTextures(), not glBindTextureUnit(). The correct
error for a bad texture unit number is GL_INVALID_VALUE. The error
message now reports the unit number rather than a GL_TEXTUREi enum.
Fixes a failure in piglit's arb_direct_state_access-bind-texture-unit test.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Before, we were doing the actual _mesa_reference_texobj() call and
ctx->Driver.BindTexture() and misc housekeeping in three different
places. This consolidates the common code in a new bind_texture()
function.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Since we store both in UniformBlocks, we can't just compute the index by
subtracting the array address start, we need to count the number of
buffers of the approriate type.
v2:
- Just fall back to calc_resource_index (Tapani)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The old code had some significant problems with respect to
sampler2DArray textures. The biggest problem was that some of the code
would use vec3 for the texture coordinate type, and other parts of the
code would use vec2. The resulting shader would not even compile.
Since there were not tests for this path, nobody noticed.
The input to the fragment shader is always treated as a vec3. If the
source data is only vec2, the vertex puller will supply 0 for the .z
component. The texture coordinate passed to the fragment shader is
always a vec2 that comes from the .xy part of the vertex shader input.
The layer, taken from the .z of the vertex shader input is passed
separately as a flat integer. If the generated fragment shader does not
use the layer integer, the GLSL linker will eliminate all the dead code
in the vertex shader.
Fixes the new piglit tests "blit-scaled samples=2 with
gl_texture_2d_multisample_array", etc. on i965.
Note for stable maintainer: This patch may depend on 46037237, and that
patch should be safe for stable.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Add comments that link the driver's miptree structures to the hardware
structures documented in the PRM. This provides sorely needed
orientation to developers new to the miptree code. And for miptree
veterans, this clarifies some of the more obscure miptree data.
For each driver struct field that closely corresponds to a
hardware struct field, add a PRM reference to that hardware field's
name. For example,
struct intel_mipmap_tree {
...
/**
* @brief One of GL_TEXTURE_2D, GL_TEXTURE_2D_ARRAY, etc.
*
* @see RENDER_SURFACE_STATE.SurfaceType
* @see RENDER_SURFACE_STATE.SurfaceArray
* @see 3DSTATE_DEPTH_BUFFER.SurfaceType
*/
GLenum target;
...
};
Also annotate the INTEL_MSAA_LAYOUT_* enums with the name of the PRM
sections that documents the layout.
v2: Replace "2D subimage" with "slice", and define what a "slice" is.
For Ben.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v1)
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
The values of intel_mipmap_tree::align_w and ::align_h correspond to the
hardware enums HALIGN_* and VALIGN_*.
See the confusion?
align_h != HALIGN
align_h == VALIGN
Reduce the confusion by renaming the variables to match the hardware
enum names:
git ls-files |
xargs sed -i -e 's/align_w/halign/g' \
-e 's/align_h/valign/g'
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Because that's what it is. It's an untiled, *linear* miptree.
v2:
- Add space after /*.
- Use one comment per function argument.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Ben Widawsky <benjamin.widawsky@intel.com>
The comment for intel_miptree_map::mode claimed that it was a bitmask of
GL_MAP_{READ,WRITE,INVALIDATE}_BIT. In reality, the bitmask may include
any of {GL,BRW}_MAP_*_BIT.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Ben Widawsky <benjamin.widawsky@intel.com>
Clarify that this bit extends the set of GL_MAP_*_BIT enums.
Also fix typo of "temporary".
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Ben Widawsky <benjamin.widawsky@intel.com>
Jason open coded this in 60befc63 when cleaning up some ugly code;
using our existing helper tidies it up a bit more.
v2: Drop inline (suggested by Matt).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
i915 fragment programs utilize the texture coordinate registers
for both texture coordinates and varyings. Unfortunately the
code doesn't check if the same index might be in use for both.
It just naively uses the index to pick a texture unit, which
could lead to collisions.
Add an extra mapping step to allocate non conflicting texture
units for both uses.
The issue can be reproduced with a pair of simple shaders like
these:
attribute vec4 in_mod;
varying vec4 mod;
void main() {
mod = in_mod;
gl_TexCoord[0] = gl_MultiTexCoord0;
gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
}
varying vec4 mod;
uniform sampler2D tex;
void main() {
gl_FragColor = texture2D(tex, vec2(gl_TexCoord[0])) * mod;
}
Fixes many piglit tests on i915:
glsl-link-varyings-2
glsl-orangebook-ch06-bump
interpolation-none-gl_frontcolor-smooth-fixed
interpolation-none-gl_frontcolor-smooth-none
interpolation-none-gl_frontcolor-smooth-vertex
interpolation-none-gl_frontsecondarycolor-smooth-fixed
interpolation-none-gl_frontsecondarycolor-smooth-vertex
interpolation-none-gl_frontsecondarycolor-smooth-none
interpolation-none-other-flat-fixed
interpolation-none-other-flat-none
interpolation-none-other-flat-vertex
interpolation-none-other-smooth-fixed
interpolation-none-other-smooth-none
interpolation-none-other-smooth-vertex
v2 [idr]: Minor formatting tweaks.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
I830_UPLOAD_RASTER_RULES and I830_UPLOAD_TEX(0) are trying to occupy
the same bit. Move the texture bits upwards a bit to make room for
I830_UPLOAD_RASTER_RULES.
Now the driver will actually upload the raster rules which is rather
important to get the provoking vertex right. Fixes the appearance
of glxgears teeth on gen2.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Commit 1665d29ee3 introduced an incorrect
format specifier that operates on GLintptr indirect within the function
_mesa_DispatchComputeIndirect().
This patch mitigates the introduced GCC warning:
src/mesa/main/compute.c: In function '_mesa_DispatchComputeIndirect':
src/mesa/main/compute.c:53:7: warning: format '%d' expects argument of type 'int', but argument 3 has type 'GLintptr' [-Wformat=]
_mesa_debug(ctx, "glDispatchComputeIndirect(%d)\n", indirect);
^
v2: Amend for Boyan Ding <boyan.j.ding@gmail.com> feedback.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
intel_update_winsys_renderbuffer_miptree() will release the existing
miptree when wrapping a new DRI2 buffer, so we can remove the early
release and so prevent a NULL mt dereference should importing the new
DRI2 name fail for any reason. (Reusing the old DRI2 name will result
in the rendering going astray, to a stale buffer, and not shown on the
screen, but it allows us to issue a warning and not crash much later in
innocent code.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86281
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
From GLSL 1.50 spec, section 4.1.8 "Structures":
"Structures must have at least one member declaration."
So the base_alignment should be higher than zero.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
If the string being copied is not NULL-terminated the result of
strlen() is undefined.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Fix OpenGL ES 3.1 conformance tests: advanced-readWrite-case1-vsfs
and advanced-matrix-vsfs.
v2:
- Fix SHADER_OPCODE_MEMORY_FENCE emission and the allocation of 'tmp'
(Francisco).
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
From ARB_program_interface_query:
"For an active shader storage block member declared as an array, an
entry will be generated only for the first array element, regardless
of its type. For arrays of aggregate types, the enumeration rules are
applied recursively for the single enumerated array element."
v2:
- Simplify 'if' conditions and return true if it is not a buffer
variable, because these rules only apply to buffer variables (Timothy).
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
This matches the function signature created in
lower_ubo_reference_visitor::ssbo_store which has a void return.
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
At least on Intel hardware, gl_PrimitiveIDIn comes in as a special part
of the payload rather than a normal input. This is typically what we
use system values for. Dave and Ilia also agree that a system value
would be nicer.
At some point, we should change it at the GLSL IR level as well. But
that requires changing most of the drivers. For now, let's at least
make NIR do the right thing, which is easy.
v2: Add a comment about not creating a temporary (suggested by Iago).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
For 8-bit RGB(A) texture formats we set the PIPE_BIND_RENDER_TARGET flag
to try to get a hardware format which also supports rendering (for FBO
textures). Do the same thing for floating point formats.
This allows the Redway3D Flat demo to run.
Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
I've temporarily added code like this many times. Wrap it in a
conditional that can be enabled when needed.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This code also sets cs_prog_data->uses_num_work_groups which is later
used by state setup to indicate that the gl_NumWorkGroups surface
needs to be setup.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This will only be setup when the prog_data uses_num_work_groups
boolean is set.
At this point nothing will set uses_num_work_groups, but soon code
will set it when emitting code for the intrinsic that loads
gl_NumWorkGroups.
We can't emit this surface information earlier at the start of the
DispatchCompute* call because we may not have generated the program
yet. Until we generate the program, we don't know if the
gl_NumWorkGroups variable is accessed.
We also can't emit the surface as part of the brw_cs_state atom,
because we might not need the surface if gl_NumWorkGroups is not used
by the program.
Lastly, we cannot emit the surface later (after state upload) in the
DispatchCompute* call, because it needs to be run before the
brw_cs_state atom is emitted, since it changes the surface state.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
If glDispatchComputeIndirect is used, then the value for this variable
must be read from the indirect BO.
To allow the same generated code to support indirect and
glDispatchCompute, we will also setup a BO for the number of work
groups using the intel_upload_data mechanism. This will only be
required if the gl_NumWorkGroups variable is accessed.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We will need this in an atom to setup a surface to read the
gl_NumWorkGroups values from.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Unlike rendering (BINDING_TABLE_POINTERS_*S), compute doesn't have a
binding table pointers command. Instead it is part of the
MEDIA_INTERFACE_DESCRIPTOR structure loaded by the brw_cs_state atom.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We need to re-emit push constansts when a new batch is started since
the push constants are stored in the batch. We also need to re-emit
the MEDIA_INTERFACE_DESCRIPTOR (in brw_cs_state) since it is stored in
the batch.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Patch also refactors name length queries which were using array size
in computation, this has to be done in same time to avoid regression in
arb_program_interface_query-resource-query Piglit test.
Fixes rest of the failures with
ES31-CTS.program_interface_query.no-locations
v2: make additional check only for GS inputs
v3: create helper function for resource name length
so that it gets calculated only in one place
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
The comment says that it should be impossible for decl_type to be NULL
here, so don't try to handle the case where it is, simply add an assert.
>>> CID 1324977: Null pointer dereferences (FORWARD_NULL)
>>> Comparing "decl_type" to null implies that "decl_type" might be null.
No piglit regressions observed.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Add an assert on the result of as_dereference() not being NULL:
>>> CID 1324978: Null pointer dereferences (NULL_RETURNS)
>>> Dereferencing a null pointer "deref_record->record->as_dereference()".
Since we are introducing a new variable to hold the result of
as_dereference(), take the opportunity to rename deref_record_type to
interface_type and just name the new variable interface_deref, which is
less confusing.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We don't use param in this part of the code, so no point in advancing
the pointer forward:
>>> CID 1324983: Code maintainability issues (UNUSED_VALUE)
>>> Assigning value from "param->get_next()" to "param" here, but that stored value is overwritten before it can be used.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
- Add strndup.h to Makefile.sources (Emil)
- Use calloc instead of malloc (Emil).
- Check if allocation fails (Emil, Jose)
- Add '#pragma once' and include stdlib.h to strndup.h (Jose)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92124
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Because it counts shader storage blocks too.
v2:
- Use NumBufferInterfaceBlocks instead (Jordan).
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
NumUniformBlocks also counts shader storage blocks.
NumUniformBlocks variable will be renamed in a later patch to avoid
misunderstandings.
v2:
- Modify the condition to use !IsShaderStorage and the list of
uniform blocks (Timothy)
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
This condition restricts the use of fast copy blit to cases
where starting pixel of src and dst is oword (16 byte) aligned.
Many piglit tests (if using fast copy blit in Mesa) failed earlier
because I missed adding this condition.Fast copy blit is currently
enabled for use only with Yf/Ys tiling.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
The bo will often come from a slab in which case it doesn't matter. But
for larger allocations this will be in its own bo, and we have to make
sure to wait until it's no longer used in order for it to be freed.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
If there is an unflushed fence on the bo, then the resource may still be
used in commands built up in the local pushbuf. Flushing can cause all
sorts of unwanted effects, so just free the bo when the relevant fence
is hit.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
This function isn't specific to miptrees. So, drop the "miptree"
from function name.
V3: Add a comment explaining how the 1D Array texture height and
depth is interpreted by Intel hardware.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
I misinterpreted the alignmnet restriction in XY_FAST_COPY_BLT earlier.
Instead of checking pitch for 64KB alignmnet we need to check it for
tile widh alignment.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Current code checks the alignment restrictions only for Y tiling.
From Broadwell PRM vol 10:
"pitch is of 512Byte granularity for Tile-X: This means the tiled-x
surface pitch can be (512, 1024, 1536, 2048...)/4 (in Dwords)."
This patch adds the restriction for X tiling as well.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
It takes care of using the correct tile width if we later use other
tiling patterns for aux miptree.
V2: Remove the comment about using Yf for aux miptree.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
This will require change in the parameters passed to
intel_miptree_get_tile_masks().
V2: Rearrange the order of parameters. (Ben)
Change the name to intel_get_tile_masks(). (Topi)
V3: Use temporary variables in intel_get_tile_masks()
for clarity. Fix mask_y computation.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
V2:
- Do the tile width/height computations in the new helper
function and use it later in intel_miptree_get_tile_masks().
- Change the name to intel_get_tile_dims().
V3: Return the tile_h in number of rows in place of bytes.
Document the units of tile_w, tile_h parameters.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
When validating format+type+internalFormat for texture pixel operations
on GLES3, the effective internal format should be used if the one
specified is an unsized internal format. Page 127, section "3.8 Texturing"
of the GLES 3.0.4 spec says:
"if internalformat is a base internal format, the effective internal
format is a sized internal format that is derived from the format and
type for internal use by the GL. Table 3.12 specifies the mapping of
format and type to effective internal formats. The effective internal
format is used by the GL for purposes such as texture completeness or
type checks for CopyTex* commands. In these cases, the GL is required
to operate as if the effective internal format was used as the
internalformat when specifying the texture data."
v2: Per the spec, Luminance8Alpha8, Luminance8 and Alpha8 should not be
considered sized internal formats. Return the corresponding unsize format
instead.
v4: * Improved comments in
_mesa_es3_effective_internal_format_for_format_and_type().
* Splitted patch to separate chunk about reordering of
error_check_subtexture_dimensions() error check, which is not directly
related with this patch.
v5: Dropped the splitted patch because it was actually a work around 3
dEQP tests that are buggy:
dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_offset
dEQP-GLES2.functional.negative_api.texture.texsubimage2d_offset_allowed
dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_wdt_hgt
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
This function will be needed as part of validating the combination of format,
type and internal format of texture pixel operations, which happens in
glformats files. Specifically, we want to be able to obtain the base format
of a resolved effective internal format, to compare it with the original
internal format passed.
Also, since this function deals solely with GL formats, it fits better in
glformats where the rest of similar format functionality rests.
The function is moved as-is, without any modification.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
The more specific GLES constrains should be checked after the general
validation performed by _mesa_error_check_format_and_type(). This is also
for consistency with the error checks order of glTexSubImage ops.
v3: The change of order uncovered a bug that regresses a couple of piglit
tests written against OpenGL-ES 1.1 spec, which expects an INVALID_VALUE
instead of the INVALID_ENUM returned by _mesa_error_check_format_and_type()
when an invalid format is passed to glTexImage2D. This version of the patch
accounts for those cases.
Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.texture.teximage2d
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
IVB and VLV hang sporadically when an untyped surface read or write
message is used to access a surface of format other than RAW, as may
happen when there is a mismatch between the format qualifier of the
image uniform and the format of the actual image bound to the
pipeline. According to the spec this condition gives undefined
results but may not lead to program termination (which is one of the
possible outcomes of the hang). Fix it by checking at runtime whether
the surface is of the right type.
Fixes the "arb_shader_image_load_store.invalid/format mismatch" piglit
subtest.
Reported-by: Mark Janes <mark.a.janes@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91718
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This will be used to share the same logic between buffer and image
creation.
v2: Make memory flag set constants local to validate_flags. (Serge
Martin)
This reverts commit 586142658e.
The specs are not explicit about any restrictions related to the types allowed
on buffer variables, however, the description of opaque types (like atomic
counters) is in conclict with the purpose of buffer variables:
"The opaque types declare variables that are effectively opaque
handles to other objects. These objects are
accessed through built-in functions, not through direct reading or
writing of the declared variable.
(...)
Opaque variables cannot be treated as l-values;(...)"
Also, Mesa is already disallowing opaque types in interface blocks anyway, so
that commit was not really achieving anything.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
With static vertex counts, the final EOT write doesn't actually write
any data - it's just there to end the thread. Typically, the last
thing before ending the thread will be an EmitVertex() call, resulting
in a URB write. We can just set EOT on that.
Note that this isn't always possible - there might be an intervening
SSBO write/image store, or the URB write may have been in a loop.
shader-db statistics for geometry shaders only:
total instructions in shared programs: 3173 -> 3149 (-0.76%)
instructions in affected programs: 176 -> 152 (-13.64%)
helped: 8
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
GS_OPCODE_SET_WRITE_OFFSET is a MUL with a constant src[1] and special
strides. We can easily make the generator handle constant src[0]
arguments by instead generating a MOV with the product of both operands.
This isn't necessarily a win in and of itself - instead of a MUL, we
generate a MOV, which should be basically the same cost. However, we
can probably avoid the earlier MOV to put src[0] into a register.
shader-db statistics for geometry shaders only:
total instructions in shared programs: 3207 -> 3173 (-1.06%)
instructions in affected programs: 3207 -> 3173 (-1.06%)
helped: 11
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Broadwell's 3DSTATE_GS contains new "Static Output" and "Static Vertex
Count" fields, which control a new optimization. Normally, geometry
shaders can output arbitrary numbers of vertices, which means that
resource allocation has to be done on the fly. However, if the number
of vertices is statically known, the hardware can pre-allocate resources
up front, which is more efficient.
Thanks to the new NIR GS intrinsics, this is easy. We just call the
function introduced in the previous commit to get the vertex count.
If it obtains a count, we stop emitting the extra 32-bit "Vertex Count"
field in the VUE, and instead fill out the 3DSTATE_GS fields.
Improves performance of Gl32GSCloth by 5.16347% +/- 0.12611% (n=91)
on my Lenovo X250 laptop (Broadwell GT2) at 1024x768.
shader-db statistics for geometry shaders only:
total instructions in shared programs: 3227 -> 3207 (-0.62%)
instructions in affected programs: 242 -> 222 (-8.26%)
helped: 10
v2: Don't break non-NIR paths (just skip this optimization).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The visitor was setting a mlen that was wrong for Broadwell, but the
generator was ignoring it and doing the right thing regardless. We may
as well move the logic fully into the visitor. This will be useful in
the next commit as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Some hardware (such as Broadwell) can run geometry shaders more
efficiently when the number of vertices emitted is statically known.
This pass provides a way to obtain the constant vertex count, or
-1 indicating that the vertex count is unknown/non-constant.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The old code was disasterously complex - spread across multiple atoms
which may not even run, inspecting the dirty bits to try and decide
whether it was necessary to do checks...storing VS information in
brw_context...extra flagging...
This code tripped me and Carl up very badly when working on the
shader cache code. It's very fragile and hard to maintain.
Now that geometry shaders only depend on their inputs and don't have
to worry about the VS VUE map, we can dramatically simplify this:
just compute the VUE map coming out of the geometry shader stage
in brw_upload_programs. If it changes, flag it. Done.
v2: Also check vue_map.separable.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Because we only support geometry shaders in core profile, we can safely
ignore any driver-extending of VS outputs.
Those are:
- Legacy userclipping (doesn't exist in core profile)
- Edgeflag copying (Gen4-5 only, no GS support)
- Point coord replacement (Gen4-5 only, no GS support)
- front/back color hacks (Gen4-5 only, no GS support)
v2: Rebase; leave a comment about why SSO works.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Previously, our VUE map code always assigned slots to varyings
sequentially, in one contiguous block.
This was a bad fit for separate shaders - the GS input layout depended
or the VS output layout, so if we swapped out vertex shaders, we might
have to recompile the GS on the fly - which rather defeats the point of
using separate shader objects. (Tessellation would suffer from this
as well - we could have to recompile the HS, DS, and GS.)
Instead, this patch makes the VUE map for separate shaders use a fixed
layout, based on the input/output variable's location field. (This is
either specified by layout(location = ...) or assigned by the linker.)
Corresponding inputs/outputs will match up by location; if there's a
mismatch, we're allowed to have undefined behavior.
This may be less efficient - depending what locations were chosen, we
may have empty padding slots in the VUE. But applications presumably
use small consecutive integers for locations, so it hopefully won't be
much worse in practice.
3% of Dota 2 Reborn shaders are hurt, but only by 2 instructions.
This seems like a small price to pay for avoiding recompiles.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Our plan of assigning consecutive slots doesn't work properly for
separate shader objects - at least, if we want to avoid recompiling them
whenever the interface changes.
As a first step, make assign_vue_map take an explicit slot parameter,
rather than implicitly incrementing it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Nothing actually relies on unused slots being initialized to
BRW_VARYING_SLOT_COUNT. Soon, we're going to have VUE maps with holes
in them, at which point pre-filling with BRW_VARYING_SLOT_PAD make a lot
more sense.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
We can't just break for padding slots. Instead, treat them like
unwritten output variables, so we handle flushing and incrementing
urb_offset correctly.
Paul introduced the concept of padding slots back in 2011, but we've
never actually used them for anything. So it's unsurprising that the
scalar VS backend didn't handle them quite right.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Passing NULL to C11 threads functions isn't safe, so there's no need for
our implementation to handle it. Cuts about 1k of .text.
text data bss dec hex filename
5009514 198440 26328 5234282 4fde6a i965_dri.so before
5008346 198440 26328 5233114 4fd9da i965_dri.so after
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Brian Paul <brianp@vmware.com>
Commit 1ca25ab (glsl: Do not eliminate 'shared' or 'std140' blocks
or block members) considered as active 'shared' and 'std140' uniform
blocks and uniform block arrays, but did not include the block array
elements. Because of that, it was possible to have an active uniform
block array without any elements marked as used, making the assertion
((b->num_array_elements > 0) == b->type->is_array())
in link_uniform_blocks() fail.
Fixes the following 5 dEQP tests:
* dEQP-GLES3.functional.ubo.random.nested_structs_instance_arrays.18
* dEQP-GLES3.functional.ubo.random.nested_structs_instance_arrays.24
* dEQP-GLES3.functional.ubo.random.nested_structs_arrays_instance_arrays.19
* dEQP-GLES3.functional.ubo.random.all_per_block_buffers.49
* dEQP-GLES3.functional.ubo.random.all_shared_buffer.36
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83508
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
- Add tessellation shader constants support
v3:
- Add GLES 3.1 support.
v4:
- Move the getters to the proper place
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Including TOP_LEVEL_ARRAY_SIZE and TOP_LEVEL_ARRAY_STRIDE queries.
v2:
- Use std430_array_stride() to get top level array stride following std430's rules.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
The error location won't be right, but fixing that would require to check
for this as we process each type of AST node that can involve a variable
read.
v2:
- Limit the check to buffer variables, image variables have different
semantics involved.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
- Merge the error check for the readonly qualifier with the already
existing check for variables flagged as readonly (Timothy).
- Limit the check to buffer variables, image variables have different
semantics involved (Curro).
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
- Memory qualifiers on shader storage buffer objects do not come in the form
of layout qualifiers, they are block-level qualifiers.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
- Save memory qualifier info in the top level members of a shader
storage block.
- Add a checks to record_compare() which is used when comparing
shader storage buffer declarations in different shaders.
- Always report an error for incompatible readonly/writeonly
definitions, whether they are present at block or field level.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
According to ARB_uniform_buffer_object spec:
"If the parameter (starting offset or size) was not specified when the
buffer object was bound (e.g. if bound with BindBufferBase), or if no
buffer object is bound to <index>, zero is returned."
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
These handle querying the buffer name attached to a giving binding point
as well as the start offset and size of that buffer.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Defined in ARB_shader_storage_buffer_object extension.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
- Add ssbo_in the names of the static functions so it is clear that this
is specific to SSBO atomics.
v3:
- Move the check after the loop (Kristian Høgsberg)
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
The original GLSL IR intrinsics have been lowered to an internal
version that accepts a block index and an offset instead of a
SSBO reference.
v2 (Connor):
- Document the sources used by the atomic intrinsics.
Reviewed-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
The first argument to SSBO atomics is a reference to a SSBO buffer variable
so we want to compute its block index and offset and provide these values
to an internal version of the intrinsic that takes them instead of the
buffer variable reference.
v2:
- Support single components of integer vectors to be passed in as arguments.
- Get interface packing information from interface's type.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
In a later commit we will need to handle ir_swizzle nodes too, which are
not an ir_dereference. That can happen, for example, when we pass a
component of an integer vector as argument to any of the SSBO atomic
functions.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Shader Storage Buffer Object will add new atomic functions that are not
associated with counters, so better have atomic counter-specific functions
explicitly include the word "counter" in their names.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This patch moves nir_instr_insert_after_cf_list call into each case
in the intrinsics switch at nir_visitor::visit(ir_call *ir) and
define a nir_dest variable which will be used when handling
ir->return_deref after the switch.
This patch simplifies the code for nir_intrinsic_load_ssbo
implementation changes we are going to do next.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2 (Connor):
- Make the STORE() macro take arguments for the extra sources (and their
size) and any extra indices required.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Implement helper functions that can be used to construct and send
untyped and typed surface read, write and atomic messages to the
shared dataport unit.
v2: Split from the FS implementation.
v3: Rewrite to avoid evil array_reg, emit_collect and emit_zip.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
These functions handle the conversion of a vec4 into the form expected
by the dataport unit in message and message return payloads. The
conversion is not always trivial because some messages don't support
SIMD4x2 for some generations, in which case a strided copy may be
necessary.
v2: Split from the FS implementation.
v3: Rewrite to avoid evil array_reg, emit_collect and emit_zip.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
See "i965/fs: Introduce FS IR builder." for the rationale.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags. Rename "instr"
variable. Initialize cursor to NULL by default and add method to
explicitly point the builder at the end of the program.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Notice that we should differentiate between shader storage blocks and
uniform blocks, since they have different limits.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Otherwise, generate a link time error as per the
ARB_shader_storage_buffer_object spec.
v2:
- Fix error message (Jordan)
v3:
- Move std140_size() changes to its own patch (Kristian)
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
- Get interface packing information from interface's type, not the
variable type.
- Simplify is_std430 condition in emit_access() for readability (Jordan)
- Add a commment explaing why array of three-component vector case is
different in std430 than the rest of cases.
- Add calls to std430_array_stride().
v3:
- Simplify size_mul change for std430's case (Jordan)
- Fix commit log lines length (Jordan)
- Pass 'packing' instead of 'is_std430' to emit_access() (Kristian)
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
They are used to calculate the offset, array stride of uniform/shader
storage buffer variables. Take into account this info to get the right
value for std430.
v2:
- Fix commit log line length and indention. (Jordan)
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
- Fix a missing check in has_layout()
v3:
- Mention shader storage block in error message for layout qualifiers
(Kristian).
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
They are used to calculate size, base alignment and array stride values
for a glsl_type following std430 rules.
v2:
- Paste OpenGL 4.3 spec wording as it mentions stride of array. (Jordan)
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This kind of definitions:
layout(xxx) buffer;
was not supported by commit 84fc5fece0.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Also if GL_ARB_shading_language_420pack extension is enabled.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
The returned drm buffer object has a size multiple of 4096 but that should not
be exposed to the API user, which is working with a different size.
As far as I can see this problem is only visible in the calculation of the
length of unsized arrays used in SSBOs, as the implementation of this needs
to query the underlying buffer size via a message.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Otherwise we can expect odd things to happen if, for example, we ask
for the size of the attached buffer from shader code, since that
might query this value from the surface we uploaded and get random
results.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
- Remove inst->regs_written assignment as the instruction only
writes to one register.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Notice that Skylake needs to include a header in the sampler message
so it will need some tweaks to work there.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This is how backends provide the buffer size required to compute
the size of unsized arrays in the previous patch
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
- Reduce the number of lines over 80 character line width
limit. (Thomas Hellan)
v3:
- Inject the formula to compute the array length in the IR, backends
only need to provide the buffer size (Curro)
- Create an auxiliary function to simplify code (Jordan Justen)
- Rename variables (Jordan Justen)
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
The unsized array length is computed with the following formula:
array.length() =
max((buffer_object_size - offset_of_array) / stride_of_array, 0)
Of these, only the buffer size needs to be provided by the backends, the
frontend already knows the values of the two other variables.
This patch identifies the cases where we need to get the length of an
unsized array, injecting ir_unop_ssbo_unsized_array_length expressions
that will be lowered (in a later patch) to inject the formula mentioned
above.
It also adds the ir_unop_get_buffer_size expression that drivers will
implement to provide the buffer length.
v2:
- Do not define a triop that will force backends to implement the
entire formula, they should only need to provide the buffer size
since the other values are known by the frontend (Curro).
v3:
- Call state->has_shader_storage_buffer_objects() in ast_function.cpp instead
of using state->ARB_shader_storage_buffer_object_enable (Tapani).
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
They only can be defined in the last position of the shader
storage blocks.
When an unsized array is used in different shaders, it might be
converted in different sized arrays, avoid get a linker error
in that case.
v2:
- Rework error condition and error messages (Timothy Arceri)
v3:
- Move OpenGL ES check to its own patch.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Buffer variables are the same as uniforms, only that read/write, so we want
the same treatment.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Since these are a special kind of UBOs we emit them together reusing the
same infrastructure, however, we use a RAW surface so we can reuse
existing untyped read/write/atomic messages which include a pixel mask
header that we need to set to obtain correct behavior with helper
invocations of the fragment shader.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
v2:
- Set it after the driver's MaxShaderStorageBuffers value assignment.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We use the same dirty state for SSBOs and UBOs because they share the
same infrastructure.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This should be a cacheline (64 bytes) so that we can safely have the
CPU and GPU writing the same SSBO on non-cachecoherent systems (our
Atom CPUs). With UBOs, the GPU never writes, so there's no
problem. For an SSBO, the GPU and the CPU can be updating disjoint
regions of the buffer simultaneously and that will break if the
regions overlap the same cacheline.
v2:
- Use cacheline size (64 bytes) instead of 16 bytes (Kristian).
- Update commit log and add a comment in the code explaining
why we use cacheline size (Ben).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
This makes sure that user is still able to query properties about
variables that have gotten packed by lower_packed_varyings pass.
Fixes following OpenGL ES 3.1 test:
ES31-CTS.program_interface_query.separate-programs-vertex
v2: fix 'name included in packed list' check (Ilia Mirkin)
v3: iterate over instances of name using strtok_r (Ilia Mirkin)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
This is required to store information about packed varyings, currently
these variables get lost and cannot be retrieved later in sensible way
for program interface queries. List will be utilized by next patch.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
If the immutable compressed texture didn't have the full mip pyramid,
this didn't work, because it tried to generate mip levels for non-existing
levels. _mesa_prepare_mipmap_level() would correctly handle this by returning
FALSE if the mip level didn't exist, however we actually created the
non-existing mip level right before that because we used _mesa_get_tex_image()
before calling _mesa_prepare_mipmap_level(). It would then proceed to crash
(we allocated the mip level, which is a bad idea on an immutable texture,
but didn't initialize the values, leading to assertion failures or segfaults).
Fix this by using _mesa_select_tex_image() instead and call it after
_mesa_prepare_mipmap_level(), as that function will allocate missing mip levels
for non-immutable textures already.
This fixes a (2 year old) crash with astromenace which was hack-fixed in ubuntu
packages instead: http://bugs.debian.org/718680 (I guess most apps do full mip
chains - I believe this app not doing it is actually unintentional, always one
level less than full mip chain...).
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
... with only ARB_shader_atomic_counters.
I expected to see interactions with ARB_tessellation_shader in the
ARB_shader_atomic_counters spec, but they do not exist. It seems that we
should unconditionally expose these variables in the presence of
ARB_shader_atomic_counters:
gl_MaxTessControlAtomicCounters
gl_MaxTessEvaluationAtomicCounters
This partially reverts commit da7adb99e8. The commit also affected
gl_MaxTessControlImageUniforms and gl_MaxTessEvaluationImageUniforms
similarly but the ARB_shader_image_load_store spec does list an
interaction with ARB_tessellation_shader.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92095
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Without this commit, copy propagation is discarded if it involves
a uniform with an instruction that has 3 sources. But 3 sourced
instructions can access scalar values.
For example, this is what vec4_visitor::fix_3src_operand() is already
doing:
if (src.file == UNIFORM && brw_is_single_value_swizzle(src.swizzle))
return src;
Shader-db results (unfiltered) on NIR:
total instructions in shared programs: 6259650 -> 6241985 (-0.28%)
instructions in affected programs: 812755 -> 795090 (-2.17%)
helped: 7930
HURT: 0
Shader-db results (unfiltered) on IR:
total instructions in shared programs: 6445822 -> 6441788 (-0.06%)
instructions in affected programs: 296630 -> 292596 (-1.36%)
helped: 2533
HURT: 0
v2:
- Updated commit message, using Matt Turner suggestions
- Move the check after we've created the final value, as Jason
Ekstrand suggested
- Clean up the condition
v3:
- Move the check back to the original place, to keep things
tidy, as suggested by Jason Ekstrand
v4:
- Fixed missing is_single_value_swizzle() as pointed by Jason Ekstrand
Reviewed-by: Matt Turner <mattst88@gmail.com>
When we assign hw regs to attributes, we don't incorporate the stride
and subreg_offset from the fs_reg. It's rarely used, but the integer
multiplication lowering uses unusual stride and subreg_offset
combination breaks when one source is an attribute.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91970
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, core Mesa's _mesa_CopyImageSubData() created temporary textures
to wrap renderbuffer sources/destinations. This caused a bit of a mess in
the Mesa/gallium state tracker because we had to basically undo that
wrapping.
Instead, change ctx->Driver.CopyImageSubData() to take both gl_renderbuffer
and gl_texture_image src/dst pointers (one being null, the other non-null)
so the driver can handle renderbuffer vs. texture as needed.
For the i965 driver, we basically moved the code that wrapped textures
around renderbuffers from copyimage.c down into the met and driver code.
The old code in copyimage.c also made some questionable calls to
_mesa_BindTexture(), etc. which weren't undone at the end.
v2 (Jason Ekstrand): Rework the intel bits
v3 (Brian Paul): Update the temporary st_CopyImageSubData() function.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
Check for PIPE_FORMAT_R8_UNORM when setting up the copy shader.
Also re-enable the dest alpha blending with A8 destination that
actually turned out to be correct.
Verified using rendercheck that the composite operators
overreverse, in, out, atop, atopreverse and xor seem to work fine
with a8 destiation.
v2: Fix a copy-paste error.
Reported-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
It doesn't matter whether a write is saturated or not, in another
implementation it might even have been a separate opcode. This code was
most likely copied from the copy-propagation pass (where one does have
to distinguish saturation).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
I left a bunch of code indented a level in the previous patch to make
the diff easier to read. But now we should fix that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
By performing the vertex counting in NIR, we're able to elide a ton of
useless safety checks around every EmitVertex() call:
total instructions in shared programs: 3952 -> 3720 (-5.87%)
instructions in affected programs: 3491 -> 3259 (-6.65%)
helped: 11
HURT: 0
Improves performance in Gl32GSCloth by 0.671742% +/- 0.142202% (n=621)
on Haswell GT3e at 1024x768.
This should also make it easier to implement Broadwell's "Static Vertex
Count" feature someday.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This patch also introduces a lowering pass to convert the simple GS
intrinsics to the new ones. See the comments above that for the
rationale behind the new intrinsics.
This should be useful for i965; it's a generic enough mechanism that I
could see other drivers potentially using it as well, so I don't feel
too bad about putting it in the generic code.
v2:
- Use nir_after_block_before_jump for the cursor (caught by Jason
Ekstrand - I'd mistakenly used nir_after_block when rebasing this
code onto the new NIR control flow API).
- Remove the old emit_vertex intrinsic at the end, rather than in
the middle (requested by Jason).
- Use state->... directly rather than locals (requested by Jason).
- Report progress from nir_lower_gs_intrinsics() (requested by me).
- Remove "Authors:" section from file comment (requested by
Michael Schellenberger Costa).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The NIR control flow modification API churns the block structure,
splitting blocks, stitching them back together, and so on. Preserving
information about block dominance is hard (and probably not worthwhile).
This patch makes nir_cf_extract() throw away all metadata, like we do
when adding/removing jumps.
We then make the dead control flow pass compute dominance information
right before it uses it. This is necessary because earlier work by the
pass may have invalidated it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Calling unlink_blocks(block, block->successors[0]) will successfully
unlink the first successor, but then will shift block->successors[1]
down to block->successor[0]. So the successors[1] != NULL check will
always fail.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Consider the case of "while (...) { break }". Or in NIR:
block block_0 (0x7ab640):
...
/* succs: block_1 */
loop {
block block_1:
/* preds: block_0 */
break
/* succs: block_2 */
}
block block_2:
Calling nir_handle_remove_jump(block_1, nir_jump_break) will remove the break.
Unfortunately, it would mangle the predecessors and successors.
Here, block_2->predecessors->entries == 1, so we would create a fake
link, setting block_1->successors[1] = block_2, and adding block_1 to
block_2's predecessor set. This is illegal: a block cannot specify the
same successor twice. In particular, adding the predecessor would have
no effect, as it was already present in the set.
We'd then call unlink_block_successors(), which would delete the fake
link and remove block_1 from block_2's predecessor set. It would then
delete successors[0], and attempt to remove block_1 from block_2's
predecessor set a second time...except that it wouldn't be present,
triggering an assertion failure.
The fix appears to be simple: simply unlink the block's successors and
recreate them to point at the correct blocks first. Then, add the fake
link. In the above example, removing the break would cause block_1 to
have itself as a successor (as it becomes an infinite loop), so adding
the fake link won't cause a duplicate successor.
v2: Add comments (requested by Connor Abbott) and fix commit message.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
There is a bug where we mess up predecessors/successors due to the
ordering of unlinking/recreating edges/adding fake edges. In order to
fix that, I need everything in one routine.
However, calling block_add_normal_succs() isn't safe from
cleanup_cf_node() - it would crash trying to insert phi undefs.
So unfortunately I need to add a parameter.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Consider the following NIR:
block block_0;
/* succs: block_1 block_2 */
if (...) {
block block_1;
...
} else {
block block_2;
}
Calling split_block_beginning() on block_1 would break block_0's
successors: link_block() sets both successors of a block, so calling
link_block(block_0, new_block, NULL) would throw away the second
successor, leaving only /* succ: new_block */. This is invalid: the
block before an if statement must have two successors.
Changing the call to link_block(pred, new_block, pred->successors[0])
would correctly leave both successors in place, but because unlink_block
may shift successor[1] to successor[0], it may not preserve the original
order. NIR maintains a convention that successor[0] must point to the
"then" block, while successor[1] points to the "else" block, so we need
to take care to preserve this ordering.
This patch creates a new function that swaps out one successor for
another, preserving the ordering. It then uses this to fix the issue.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
I need to do this in a second place, and I'd rather make a helper
function than cut and paste the code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is invalid, and causes disasters if we try to unlink successors:
removing the first will work, but removing the second copy will fail
because the block isn't in the successor's predecessor set any longer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
It's possible that, if a vecN operation is involved in a phi node, that we
could end up moving from a register to itself. If swizzling is involved,
we need to emit the move but. However, if there is no swizzling, then the
mov is a no-op and we might as well not bother emitting it.
Shader-db results on Haswell:
total instructions in shared programs: 6262536 -> 6259558 (-0.05%)
instructions in affected programs: 184780 -> 181802 (-1.61%)
helped: 838
HURT: 0
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
I don't know of any piglit tests that are currently broken. However, there
is nothing stopping a vecN instruction from getting source modifiers and
lower_vec_to_movs is run after we lower to source modifiers.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:83:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:83:55: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:116:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:116:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:140:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:140:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h: In function 'intel_render_line_loop_verts':
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:174:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:174:55: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:224:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:224:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:255:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:255:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:281:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j + 1);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:281:56: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j + 1);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h: In function 'intel_render_poly_verts':
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:313:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j + 1);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:313:59: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j + 1);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:365:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - nr);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:365:56: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - nr);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:83:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:83:55: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:116:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:116:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:140:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:140:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h: In function 'radeon_dma_render_line_loop_verts':
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:174:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:174:55: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:224:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:224:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:255:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:255:52: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:281:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j + 1);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:281:56: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j + 1);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h: In function 'radeon_dma_render_poly_verts':
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:313:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - j + 1);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:313:59: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - j + 1);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:365:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
nr = MIN2(currentsz, count - nr);
^
../../../../../src/mesa/tnl_dd/t_dd_dmatmp.h:365:56: warning: signed and unsigned type in conditional expression [-Wsign-compare]
nr = MIN2(currentsz, count - nr);
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Two drivers use this file, and neither supports ELTs.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Two drivers use this file, and both support triangle fans.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Fix '- nr' typo noticed by Marius.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com> [v1]
Two drivers use this file, and both support triangle strips.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Two drivers use this file, and both support triangles.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Two drivers use this file, and both support line strips.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Two drivers use this file, and both support lines.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Two drivers use this file, and neither supports quads.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Two drivers use this file, and neither supports quad strips.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
The value passed in count previously was "vertex after the last vertex
to be processed." Calling that "count" was misleading and kind of mean.
Looking at the code, many functions immediately do "count-start" to get
back the true count. That's just silly.
If it is better for the loops to be 'for (j = start; j < (start +
count); j++)', GCC will do that transformation.
NOTE: There is some strange formatting left by this patch. That was
done to make it more obvious that the before and after code is
equivalent. These will be fixed in the next patch.
No piglit regressions on i915 (G33) or radeon (Radeon 7500).
v2: Fix a remaining (count-start) in render_quad_strip_verts.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com> [v1]
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
From section 9.2. Binding and Managing Framebuffer Objects:
"Upon successful return from Get*FramebufferAttachmentParameteriv, if
pname is FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE, then params will contain
one of NONE, FRAMEBUFFER_DEFAULT, TEXTURE, or RENDERBUFFER, identifying
the type of object which contains the attached image."
And then it clarifies further:
"If the value of FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE is NONE, then
either no framebuffer is bound to target; or the default framebuffer is
bound, attachment is DEPTH or STENCIL, and the number of depth or stencil
bits, respectively, is zero"
Currently, if the default framebuffer is bound, we always return
GL_FRAMEBUFFER_DEFAULT for FRAMEBUFFER_ATTACHMENT_OBJECT_TYPE, but
according to the spec, when GL_DEPTH or GL_STENCIL attachments are
the ones being queried, we should return GL_NONE if they don't exist.
Fixes the following dEQP test:
dEQP-GLES3.functional.state_query.fbo.framebuffer_attachment_x_size_initial
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Patch fixes a crash in conformance test that tries out different
invalid arguments for glShaderSource and glGetShaderSource:
ES2-CTS.gtf.GL.glGetShaderSource.getshadersource_programhandle
This is a regression from commit:
04e201d0c0
Additions in v2 also fix following failing deqp test:
dEQP-GLES[2|3].functional.negative_api.shader.shader_source
v2: cleanup function, do check earlier (Iago Toral)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We don't use any of the code after the switch anyway. Since we check for
num_components == 1 and early-return, it doesn't get executed so
everything's ok. However, it makes it much clearer what's going on if we
simply do an early return.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (Ken):
- Squash together commits for HS, DS, and TE, as well as fixes.
- Add INTEL_MASK variants so we can use SET_FIELD if we want.
- Rename GEN7_HS_INSTANCE_CONTROL to GEN7_HS_INSTANCE_COUNT to match
the documentation.
- Add some more fields from the PRMs.
- Add Broadwell variants.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
"r600g: apply disable workaround on all scissors" forgot to update
num_dw, fix it.
Fixes: fbb423b433 "r600g: apply disable workaround on all scissors"
Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Now it is more similar to brw_fs_copy_propagation, with three
clear stages:
1) Build up the value we are propagating as if it were the source of a
single MOV:
2) Check that we can propagate that value
3) Build the final value
Previously everything was somewhat messed up, making the
implementation on some specific cases, like knowing if you can
propagate from a previous instruction even with type mismatches, even
messier (for example, with the need of maintaining more of one
has_source_modifiers). The refactoring clears stuff, and gives
support to this mentioned use case without doing anything extra
(for example, only one has_source_modifiers is used).
Shader-db results for vec4 programs on Haswell:
total instructions in shared programs: 1683842 -> 1669037 (-0.88%)
instructions in affected programs: 739837 -> 725032 (-2.00%)
helped: 6237
HURT: 0
v2: using 'arg' index to get the from inst was wrong
v3: rebased against last change on the previous patch of the series
v4: don't need to track instructions on struct copy_entry, as we
only set the source on a direct copy
v5: change the approach for a refactoring
v6: tweaked comments
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The function was a no-op and if the ctx->Driver.BindFramebuffer pointer
is null, Mesa won't try to use it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
According to OpenGL ES 3.1 specification 10.3.1:
"An INVALID_OPERATION error is generated if buffer is not zero
or a name returned from a previous call to GenBuffers,
or if such a name has since been deleted with DeleteBuffers."
This error check was previously limited to OpenGL Core.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
The only functional change here is that we now set EmitNoIndirectOutput and
EmitNoIndirectTemp for compute shaders. Compute shaders don't have outputs
per-se and we should have been setting EmitNoIndirectTemp all along.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
All SKL SKUs except the lowest one which has half the L3 size actually have 384K
of URB per slice.
For once, I can explain how this mistake was made and how it was missed in
review... Historically when we enable a platform and put the production sizes,
you can simply look at the "smallest" SKU and see what its URB size is (and we
assumed it was the 1 slice variant). Since on newer platforms the URB sizes are
scaled automatically by HW, this was sufficient. On SKL, this is a bit different
as the lowest SKU actually has half of the L3 fused off. GT2 is the 1 slice (not
GT1) variant and it has 384K.
There are no Jenkins tests fixed (or regressions) and we don't expect any fixes
here because you can always run with less URB size.
Thanks to Sarah for bringing this to my attention.
Cc: Sarah Sharp <sarah.a.sharp@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Designated initializers are not allowed in C++ (not even C++11). Since
nir_lower_samplers is now using nir_builder, and nir_lower_samplers is in
C++, this breaks the build on some compilers. Aparently, GCC 5 allows it
in some limited extent because mesa still builds on my system without this
patch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92052
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With the only C++ function having its own wrapper we can 'demote' this
file to a normal C one. This allows us to get rid of extern C { #include
<foo.h> } 'hacks'. Plus some of the headers may use C99 initializers,
which are not supported by the ISO standard.
This may cause build issue on incremental builds. If so run the
following:
sed -i -e 's|samplers\.cpp|samplers.c|' src/glsl/nir/.deps/nir_lower_samplers.Plo
Fixes: ef8eebc6ad5(nir: support indirect indexing samplers in struct arrays)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reported-by: Gottfried Haider <gottfried.haider@gmail.com>
Tested-by: Gottfried Haider <gottfried.haider@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
compr4 is represented by setting the high bit on the MRF number.
We need to mask it out before sanity checking the register number.
Fixes ~8000 assert fails on Ironlake and G45.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92066
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
res_ptr already contains the resource values. fmask_ptr needs to be
looked up relative to the start of the resource params.
Note that this only affects indirect loads of MS sampler arrays.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
There are some bug reports about shaders failing to compile in gen6
because MRF 14 is used when we need to spill. For example:
https://bugs.freedesktop.org/show_bug.cgi?id=86469https://bugs.freedesktop.org/show_bug.cgi?id=90631
Discussion in bugzilla pointed to the fact that gen6 might actually have
24 MRF registers available instead of 16, so we could use other MRF
registers and avoid these conflicts (we still need to investigate why
some shaders need up to MRF 14 anyway, since this is not expected).
Notice that the hardware docs are not clear about this fact:
SNB PRM Vol4 Part2's "Table 5-4. MRF Registers Available in Device
Hardware" says "Number per Thread" - "24 registers"
However, SNB PRM Vol4 Part1, 1.6.1 Message Register File (MRF) says:
"Normal threads should construct their messages in m1..m15. (...)
Regardless of actual hardware implementation, the thread should
not assume th at MRF addresses above m15 wrap to legal MRF registers."
Therefore experimentation was necessary to evaluate if we had these extra
MRF registers available or not. This was tested in gen6 using MRF
registers 21..23 for spilling and doing a full piglit run (all.py) forcing
spilling of everything on the FS backend. It was also tested by doing
spilling of everything on both the FS and the VS backends with a piglit run
of shader.py. In both cases no regressions were observed. In fact, many of
these tests where helped in the cases where we forced spilling, since that
triggered the same underlying problem described in the bug reports. Here are
some results using INTEL_DEBUG=spill_fs,spill_vec4 for a shader.py run on
gen6 hardware:
Using MRFs 13..15 for spilling:
crash: 2, fail: 113, pass: 6621, skip: 5461
Using MRFs 21..23 for spilling:
crash: 2, fail: 12, pass: 6722, skip: 5461
This patch sets the ground for later patches to implement spilling
using MRF registers 21..23 in gen6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In a later patch we will make BRW_MAX_MRF return a different value depending
on the hardware generation, but it is inconvenient to add a gen parameter
to the brw_reg functions only for the assertions, so move these to places where
we have the hardware generation available.
Ken suggested to add the asserts to brw_set_src0 and brw_set_dest since that
would make sure that we catch all uses of MRF registers, even those coming
from modules that generate native code directly, like blorp. Unfortunately,
this is very late in the process which can make things harder to debug, so add
asserts to the generator as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Until now we only used MRFs 1..15 for regular SEND messages, so the
message length could not possibly exceed the maximum size. Soon we'll
allow to use MRF registers 1..23 in gen6, so we need to be careful
not to build messages that can go beyond the limit. That could occur,
specifically, when building URB write messages, which we may need to
split in chunks due to their size. Previously we would simply go and
create a new message when we reached MRF 13 (since 13..15 were
reserved for spilling), now we also want to check the size of the
message explicitly.
Besides adding that condition to split URB write messages properly,
this patch also adds asserts in the generator. Notice that
brw_inst_set_mlen already asserts for this, but asserting in the
generators is easy and can make debugging easier in some cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Not something actually hit in real life (now state is never non-null,
but only case state->syms is null is if nir_print_instr() path). But it
was something I overlooked the first time, so might as well fix it.
*** CID 1324642: Null pointer dereferences (REVERSE_INULL)
/src/glsl/nir/nir_print.c: 299 in print_var_decl()
293
294 fprintf(fp, " (%s, %u)", loc, var->data.driver_location);
295 }
296
297 fprintf(fp, "\n");
298
>>> CID 1324642: Null pointer dereferences (REVERSE_INULL)
>>> Null-checking "state" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
299 if (state) {
300 _mesa_set_add(state->syms, name);
301 _mesa_hash_table_insert(state->ht, var, name);
302 }
303 }
304
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For consistency, either we have all class members dereferenced, or none.
In this case, very few are so lets get rid of them all.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reuse utility functions instead of reimplementing the same logic.
* _mesa_is_compressed_format() performs the required checking to
determine format support in the current context.
* _mesa_gl_compressed_format_base_format() returns the base format.
As a side effect, we now check that we're in a desktop context when
determining support for the FXT1 and RGTC formats. This is in agreement
with our extension table and the glext headers.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Instead of case statements, use _mesa_get_format_layout() to
determine if a GL format is part of a family of compressed formats.
v2. restrict LATC formats to API_OPENGL_COMPAT (Ilia).
rename the variable mFormat to m_format.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
This enables us to predicate statments on a compressed format being
a type of LATC format. Also, remove the comment that lists the enum
(it was getting a tad long).
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
With this, we completely switch over to nir lowering passes instead of
tgsi_lowering. So one step closer to supporting direct glsl or spirv to
nir support for freedreno a3xx/a4xx.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Some hardware needs to clamp texture coordinates to [0.0, 1.0] in the
shader to emulate GL_CLAMP. This is added to lower_tex_proj since, in
the case of projected coords, the clamping needs to happen *after*
projection.
v2: comments/suggestions from Ilia and Eric, use txs to get texture size
and clamp RECT textures to their dimensions rather than [0.0, 1.0] to
avoid having to lower RECT textures to 2D.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: comments/suggestions from Ilia and Eric, split out get_texture_size()
helper so we can use it in the next commit for clamping RECT textures.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Some hardware, such as adreno a3xx, supports txp on some but not all
sampler types. In this case we want more fine grained control over
which texture projectors get lowered.
v2: split out nir_lower_tex_options struct to make it easier to
add the additional parameters coming in the following patches
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since the following patches will add additional tex-lowering related
functionality, which doesn't make sense to split out into a separate
pass (as they would require duplication of the projector lowering
logic), let's give this pass a more generic name.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
SEL and MOV instructions, as long as they don't have source modifiers, are
just copying bits around. So those kind of instruction could be propagated
even if there are type mismatches. This is needed because NIR generates
integer SEL and MOV instructions whenever it doesn't know what else to
generate.
This commit adds support for copy propagation using current instruction
as reference.
Equivalent to commit 472ef9 but for vec4.
v2: include check for saturate, as Jason Ekstrand suggested
v3: check that the dst.type and the src type are the same, in order to
solve (among others) the following deqp regression with v2:
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.lowp_uint_vertex
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
brw_fs_visitor.cpp: In member function 'void fs_visitor::emit_urb_writes()':
brw_fs_visitor.cpp:977:58: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
OpenGL ES 3.0 spec 3.7.2 "Transfer of Pixel Rectangles" specifies
DEPTH_COMPONENT, UNSIGNED_INT as a valid couple, validation for
internal format is checked by is_float_depth().
Fix regression caused by 81d2fd91a9 in:
ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels
Test uses GL_DEPTH_COMPONENT, UNSIGNED_INT only when GL_NV_read_depth
extension is present.
v2: change check in _mesa_error_check_format_and_type to be explicit
for ES 2.0+, desktop OpenGL does not allow this behaviour + uses
this function for both glReadPixels and glDrawPixels validation.
(No Piglit regressions seen with v2.)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v1]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92009
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Fixes:
In file included from nir/nir_lower_samplers.cpp:27:0:
nir/nir_builder.h: In function 'nir_ssa_def* nir_channel(nir_builder*, nir_ssa_def*, int)':
nir/nir_builder.h:222:37: warning: narrowing conversion of 'c' from 'int' to 'unsigned int' inside { } is ill-formed in C++11 [-Wnarrowing]
unsigned swizzle[4] = {c, c, c, c};
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Use nir_lower_clip pass for adding the VS/FS instructions to handle
user-clip-planes and CLIPDIST. Wire up support for load_user_clip_plane
intrinsic to fetch ucp[plane] values as driver-params (passed as const's
to the shader).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The vertex shader lowering adds calculation for CLIPDIST, if needed
(ie. user-clip-planes), and the frag shader lowering adds conditional
kills based on CLIPDIST value (which should be treated as a normal
interpolated varying by the driver).
Note that this won't quite do the right thing in the face of MSAA plus
user-clip-planes, since all the samples would be killed or not (rather
than potentially only a portion of them). But it's better than no UCP
support at all for drivers that don't have this in hw.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
For lowering user-clip-planes, we need a way to pass the enabled/used
user-clip-planes in to shader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Used internally in freedreno/ir3 to calc stream-out position. Seems
like a generic enough way to implement stream-out (using str instrs),
plus it avoids compiler warnings by sneaking in a non-enum value in
switch statements.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
When updating texture buffers, we might end up replacing the whole
buffer. Check that the tic address matches the resource address, and if
not, update the tic and reupload it.
This fixes:
arb_direct_state_access-texture-buffer
arb_texture_buffer_object-data-sync
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Instructions with difference in PM field can actually be paired up if
the one without PM doesn't do packing/unpacking and non-NOP
packing/unpacking operations from PM instruction aren't added to the
other without PM.
total instructions in shared programs: 48209 -> 47460 (-1.55%)
instructions in affected programs: 11688 -> 10939 (-6.41%)
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The idea here is not that it gives register coalescing a little bit of a
helping hand. It doesn't actually fix the coalescing problems, but it
seems to help a good bit.
Shader-db results for vec4 programs on Haswell:
total instructions in shared programs: 1746280 -> 1683959 (-3.57%)
instructions in affected programs: 1259166 -> 1196845 (-4.95%)
helped: 11363
HURT: 148
v2 (Jason Ekstrand):
- Run nir_move_vec_src_uses_to_dest after going out of SSA
- New shader-db numbers
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
The provided indices have the very nice property that if A dominates B then
A->index <= B->index. We should document that somewhere.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Various pieces of code to create compressed textures will first
generate an uncompressed RGBA texture into a temporary buffer,
and then read from that buffer while creating the final compressed
texture in the requested format.
The code reading from the temporary buffer assumes the buffer is
formatted as an array of bytes in RGBA order. However, the buffer
is filled using a _mesa_texstore call with MESA_FORMAT_R8G8B8A8_UNORM
format -- this is defined as an array of *integers* holding the
RGBA values in packed format (least-significant to most-significant).
This means incorrect bytes are accessed on big-endian systems.
This patch fixes this by using the MESA_FORMAT_A8B8G8R8_UNORM format
instead on big-endian systems when filling the buffer. This fixes
about 100 piglit test case failures on s390x for me.
Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@gmail.com>
XA has been using L8_UNORM for a8 and yuv component surfaces.
This commit instead makes XA prefer R8_UNORM since it's assumed to have a
higher availability.
Also neither of these formats are suitable as destination formats using
destination alpha blending, so reject those operations.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
From OpenGL 4.5 Core spec (7.13):
"If pipeline is a name that has been generated (without subsequent
deletion) by GenProgramPipelines, but refers to a program pipeline
object that has not been previously bound, the GL first creates a
new state vector in the same manner as when BindProgramPipeline
creates a new program pipeline object."
I interpret this as "If GetProgramPipelineiv gets called without a
bound (but valid) pipeline object, the state should reflect initial
state of a new pipeline object." This is also expected behaviour by
ES31-CTS.sepshaderobjs.PipelineApi conformance test.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
From OpenGL ES 3.1 spec (7.12):
"Most properties set within program objects are specified not to
take effect until the next call to LinkProgram or ProgramBinary.
Some properties further require a successful call to either of
these commands before taking effect. GetProgramiv returns the
properties currently in effect for program, which may differ from
the properties set within program since the most recent call to
LinkProgram or ProgramBinary, which have not yet taken effect. If
there has been no such call putting changes to pname into effect,
initial values are returned."
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
As a bonus we get indirect support for arrays of arrays for free.
V5: couple of small clean-ups suggested by Jason.
V4: fix struct member location caclulation, use nir_ssa_def rather than
nir_src for the indirect as suggested by Jason
V3: Use nir_instr_rewrite_src() with empty src rather then clearing
the use_link list directly for the old indirects as suggested by Jason
V2: Fixed validation error in debug build
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This will allow us to access the uniform later on without resorting to
building a name string and looking it up in UniformHash.
V3: remove line wrap change from this patch
V2: store slot number for all non-UBO uniforms to make code more
consitent, renamed explicit_binding to explicit_location and added
comment about what it does. Store the location at every shader stage.
Updated data.location comments in ir/nir.h.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is required so that the next patch can safely assign the slot id
to the var.
The ids are now assigned in the order we want before allocating storage
so there is no need to sort the storage array and move things around.
V2: rename variable to make code easier to follow as suggested by Jason
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This allows the correct offset to be easily calculated for indirect
indexing when a struct array contains multiple samplers, or any crazy
nesting.
The indices for the folling struct will now look like this:
Sampler index: 0 Name: s[0].tex
Sampler index: 1 Name: s[1].tex
Sampler index: 2 Name: s[0].si.tex
Sampler index: 3 Name: s[1].si.tex
Sampler index: 4 Name: s[0].si.tex2
Sampler index: 5 Name: s[1].si.tex2
Before this change it looked like this:
Sampler index: 0 Name: s[0].tex
Sampler index: 3 Name: s[1].tex
Sampler index: 1 Name: s[0].si.tex
Sampler index: 4 Name: s[1].si.tex
Sampler index: 2 Name: s[0].si.tex2
Sampler index: 5 Name: s[1].si.tex2
struct S_inner {
sampler2D tex;
sampler2D tex2;
};
struct S {
sampler2D tex;
S_inner si;
};
uniform S s[2];
V3: Update comments with suggestions from Jason
V2: rename struct array counter to have better name
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This reverts commit 48961fa3ba.
glamor/Xwayland use this, the spec saying something when it
was written, and the fact that the comment says Mesa relies on it
hasn't changed.
I also don't have a copy of this patch in my mail archive, which
seems wierd, did it get posted to mesa-dev?
Signed-off-by: Dave Airlie <airlied@redhat.com>
Even though luminance formats don't have alpha, we still want the alpha
output to go to the blender. This fixes the luminance blending tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
(originally part of previous patch, split out to separate patch by Rob)
v2: squash in some fixes from Eric
v3: Another fix from Eric for point coords.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This avoids exceeding the size of the .index bitfield since it got
truncated, and should make our NIR look more like the NIR that the rest of
the NIR developers are working on.
v2: split out vc4 updates, first patch uses varying_slot_to_tgsi_semantic()
helper, and second patch does the actual conversion.
v3: add frag_result_to_tgsi_semantic() helper and don't try to map
frag_results to semantic name/index as if they were varying_slot's
v4: use VERT_ATTRIB_ for VS inputs
v5: Fix vc4 build.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
v2: split out moving of FILE *fp into state structure into it's own
(more complete patch) to reduce the noise in this one
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Rename print_var_state to print_state, and stuff FILE ptr into the state
object. This avoids passing around an extra parameter everywhere.
v2: even more extensive conversion.. use state *everywhere* instead of
FILE ptr, and convert nir_print_instr() to use state as well
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Similar to fee0686c21, but in this case to
ensure that drm_gralloc and libGLES_mesa are sharing a single screen.
Bumps libdrm_freedreno version dependency, as it requires the new
fd_device_fd() API.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
When preparing the barrier payload, the instructions should operate in
simd8 mode since we only use 1 payload register.
fs_inst::regs_read is also updated to indicate that it only reads one
register for SHADER_OPCODE_BARRIER.
These issues were flagged by:
commit cadd7dd384
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date: Thu Jul 2 15:41:02 2015 -0700
i965/fs: Add a very basic validation pass
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
C++ gets cranky if we take references of temporaries. This isn't a problem
yet in master because nir_builder is never used from C++. However, it will
be in the future so we should fix it now.
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Both use the same layout for the buffer containing border-color values,
so rather than duplicating the logic in a4xx, split it out into a
helper.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Now that we have a replicating fdot instruction, we can actually coalesce
into the destinations of vec4 instructions. We couldn't really do this
before because, if the destination had to end up in .z, we couldn't
reswizzle the instruction. With a replicated destination, the result ends
up in all channels so we can just set the writemask and we're done.
Shader-db results for vec4 programs on Haswell:
total instructions in shared programs: 1747753 -> 1746280 (-0.08%)
instructions in affected programs: 143274 -> 141801 (-1.03%)
helped: 667
HURT: 0
It turns out that dot-products matter...
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Fortunately, nir_constant_expr already auto-splats if "dst" never shows up
in the constant expression field so we don't need to do anything there.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
The old pass blindly inserted a bunch of moves into the shader with no
concern for whether or not it was really needed. This adds code to try and
coalesce into the destination of the instruction providing the value.
Shader-db results for vec4 shaders on Haswell:
total instructions in shared programs: 1754420 -> 1747753 (-0.38%)
instructions in affected programs: 231230 -> 224563 (-2.88%)
helped: 1017
HURT: 2
This approach is heavily based on a different patch by Eduardo Lima Mitev
<elima@igalia.com>. Eduardo's patch did this in a separate pass as opposed
to integrating it into nir_lower_vec_to_movs.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Previously, we did this thing with keeping track of a separate start_idx
which was different from the iteration variable. I think this was a relic
of the way that GLSL IR implements writemasks. In NIR, if a given bit in
the writemask is unset then that channel is just "unused", not missing. In
particular, a vec4 operation with a writemask of 0xd will use sources 0, 2,
and 3 and leave source 1 alone. We can simplify things a good deal (and
make them correct) by removing this "compacting" step.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We made this switch in the FS backend some time ago and it seems to make a
number of things a bit easier. In particular, supporting SSA values takes
very little work in the backend and allows us to take advantage of the
majority of the SSA information even after we've gotten rid of Phi nodes.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
v2 (Jason Ekstrand):
- Use nir_instr_rewrite_dest
- Pass the impl directly into lower_vec_to_movs_block
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Previously, we were passing the shader around, we were just calling it
"mem_ctx". However, the nir_shader is (and must be for the purposes of
mark-and-sweep) the mem_ctx so we might as well pass it around explicitly.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Currently the validation pass only validates that regs_read and
regs_written are consistent with the sizes of VGRF's. We can add more as
we find it to be useful.
Reviewed-by: Matt Turner <mattst88@gmail.com>
In certain conditions, we have to do bounds-checking in the shader for
image_load_store. The way this works for image loads is that we do a
predicated load and then emit a series of selects, one per component,
that gives us 0 or the loaded value depending on whether or not you're
in bounds. However, we were hard-coding 4 components which may not be
correct. Instead, we should be using size which is the number of
components read.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
According to the extensions table and our glext headers,
OES_compressed_ETC1_RGB8_texture is only supported in
GLES1 and GLES2. Since we may give users a GLES3 context
when a GLES2 context is requested, we also allow this
extension for GLES3 as well.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Driver vendors do this as well. The extension specification
lists GLES 1.1 or 2.0 as requirements.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
It's extensively used by XA for a8- and planar yuv component surfaces.
This fixes broken XA yuv blits using vgpu10 contexts.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Currently the check was incorrect as it did not consider the (unlikely)
case of fd == 0. In order to fix this we should first correctly
initialize it to -1, as the swrast implementations leave it set to zero
(props to calloc()).
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
Move the fcntl(dupfd_cloexec) to the else branch where it belongs.
Otherwise it's not immediately obvious that the code is hit, only when
an existing device is used.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
v2: [Emil Velikov]
Rework the error path to a common goto, close only if we own the fd.
v3; [Emil Velikov]
Always close the fd (we either opened the device or dup'd) (Boyan, Ian)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>
At the moment if a gbm buffer is imported and the gbm buffer
has an old-style GBM_BO_FORMAT format, the import will crash,
since it's passed directly to DRI functions that expect
a fourcc format (as provided by the newer GBM_FORMAT
definitions)
This commit addresses the problem in two ways:
1) it prevents invalid formats from leading to a crash by
returning EINVAL if the image couldn't be created
2) it translates GBM_BO_FORMAT formats into the comparable
GBM_FORMAT formats.
Reference: https://bugzilla.gnome.org/show_bug.cgi?id=753531
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
We're trying to avoid a libdrm dependency in the core compiler, so let's
move the perf_debug code one level up from the brw_*_emit() helpers to
the brw_codegen_*_prog() helpers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
All other precompile functions live in the brw_<stage>.c files, make fs
follow the convention.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
This moves the compute shader code around in order to make the way the
code is split up more consistent. There should be no functional changes.
Typically we have a few files per stage:
brw_vs.c, brw_wm.c brw_gs.c:
code to drive code generation and implement precompiling and
cache search.
genX_<stage>_state.c
gen specific implementation of the state emission for the shader
stage.
The brw_*_emit() functions are all in the same files as the visitor
classes they use (with the exception of VS, which may use either vec4 or
fs).
To make compute follow this convention, we move the brw_cs_emit()
function into brw_fs.cpp. We can then rename brw_cs.cpp to brw_cs.c and
do this in C like the other similar files. Finally, move state setup
and atoms to gen7_cs_state.c.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Kristian Høgsberg Kristensen <krh@bitplanet.net>
Curiously this has no actual effect. I think it's because the first 8
textures are bound in multiple slots for some reason. However seems
prudent to use these the same way as regular texturing, esp in the case
where there are more than 8 textures bound.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
If the register types do not match and the instruction
that contains the final destination is saturated, register
coalescing generated non-equivalent code.
This did not happen when using IR because types usually
matched, but it is visible in nir-vec4.
For example,
mov vgrf7:D vgrf2:D
mov.sat m4:F vgrf7:F
is coalesced to:
mov.sat m4:D vgrf2:D
The patch prevents coalescing in such scenario, unless the
instruction we want to coalesce into is a MOV (without type
conversion implied). In that case, the patch sets the register
types to the type of the final destination.
Shader-db results in HSW (only vec4 instructions shown):
total instructions in shared programs: 1754415 -> 1754416 (0.00%)
instructions in affected programs: 74 -> 75 (1.35%)
helped: 0
HURT: 1
GAINED: 0
LOST: 0
Only one extra instruction in one of the shaders, that comes from
eliminating a saturation error by preventing register coalesce.
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We lower gl_LocalInvocationIndex based on the extension spec formula:
gl_LocalInvocationIndex =
gl_LocalInvocationID.z * gl_WorkGroupSize.x * gl_WorkGroupSize.y +
gl_LocalInvocationID.y * gl_WorkGroupSize.x +
gl_LocalInvocationID.x;
https://www.opengl.org/registry/specs/ARB/compute_shader.txt
We need to set this variable in main(), even if gl_LocalInvocationIndex
is not referenced by the shader. (It may be used by a linked shader.)
Therefore, we can't eliminate it as a dead variable.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Also rename to _mesa_get_main_function_signature.
We will call it near the end of compilation to insert some code into
main for initializing some compute shader global variables.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
We lower gl_GlobalInvocationID based on the extension spec formula:
gl_GlobalInvocationID =
gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID
https://www.opengl.org/registry/specs/ARB/compute_shader.txt
We need to set this variable in main(), even if gl_GlobalInvocationID
is not referenced by the shader. (It may be used by a linked shader.)
Therefore, we can't eliminate these as dead variables.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This is to avoid needless float<->int conversions, since all
face-related computations are made on integers. Spotted by Emil
Velikov.
Reviewed-by: Brian Paul <brianp@vmware.com>
New enum to add to switch so compiler doesn't complain.
commit 1807a08e4f
Author: Ilia Mirkin <imirkin@alum.mit.edu>
AuthorDate: Thu Aug 27 23:05:03 2015 -0400
Commit: Ilia Mirkin <imirkin@alum.mit.edu>
CommitDate: Thu Sep 10 17:38:33 2015 -0400
nir: add nir_texop_texture_samples and convert from glsl
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Sometimes a useful thing for compilers (or, for example, tgsi_to_nir) to
know. And pretty trivial for scan to figure this out for us.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cypress/Cayman/Aruba, earlier r6xx/r7xx chips only support a subset
of the needed fp64 ops, and don't do GL4 anyway.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Only for Cypress/Cayman/Aruba, older chips have only partial fp64 support.
Uses float intermediate values so only accurate for int24 range, which
matches what the blob does.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
I'm going to want a driver constant buffer for tess to coordinate
LDS storage, so before I go tackling that I decided to merge the
clip/samplepos and texture info buffers into one. So I can steal
the spare one.
This creates a single constant buffer between the two, with
clip/samplepos taking up a reserved 128 bytes at the start.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
V2: -Change to "not started" for most entries
-Add status for multisample_2d_array
-Change shader_multisample_interpolation to "not_stared"
V3 (idr): Move the GLES 3.2 section after the "Additional functions"
section from GLES 3.1. Note that GL_KHR_texture_compression_astc_hdr is
done for i965 on gen9+ hardware. Note that GL_OES_shader_io_blocks is
based on some features from GLSL 1.50.
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> [v2]
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This commit makes a lot of variables constant - this is basically done
by moving the computation to variable definition. Some of them are
moved into lower scopes (like in img_filter_2d_ewa).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Add a small inline function doing the casting - this is to make sure
we don't do a cast from some completely unrelated type. This commit
does not make tgsi_sampler parameters const in vfuncs themselves for
now - probably llvmpipe would need looking at before making such a
change.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Those functions actually could always take them as constants.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Those functions actually could always take them as constants.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
A followup from previous commit - since all functions called by
query_lod take pointers to const sp_sampler_view and const sp_sampler,
which are taken from tgsi_sampler subclass, we can the tgsi_sampler as
const itself now.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This is to prepare for making tgsi_sampler parameter in query_lod a
const too. These functions do not modify anything in either sampler or
view anymore.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
With that, sp_sampler_view instances are not abused anymore as a local
storage, so we can later make them constant.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
As of a10d4937, we would really like things associated with an instruction
to be allocated out of that instruction and not out of the shader. In
particular, you should be passing the instruction that will ultimately be
holding the source into nir_src_copy rather than an arbitrary memory
context.
We also change the prototypes of nir_dest_copy and nir_alu_src/dest_copy to
explicitly take an instruction so we catch this earlier in the future.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
We copy the output, make the old output the temporary, and give the
temporary a new name. The copy keeps the pointer to the old name. This
works just fine up until the point where we lower things to SSA and delete
the old variable and, with it, the name. Instead, we should re-parent to
the copy.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
opt_register_coalesce stopped to check previous instructions to
coalesce with if somebody else was writing on the same
destination. This can be optimized to check if somebody else was
writing to the same channels of the same destination using the
writemask.
Shader DB results (taking into account only vec4):
total instructions in shared programs: 1781593 -> 1734957 (-2.62%)
instructions in affected programs: 1238390 -> 1191754 (-3.77%)
helped: 12782
HURT: 0
GAINED: 0
LOST: 0
v2: removed some parenthesis, fixed indentation, as suggested by
Matt Turner
v3: added brackets, for consistency, as suggested by Eduardo Lima
Reviewed-by: Matt Turner <mattst88@gmail.com>
This makes it possible for NIR shaders to know the number of output
vertices and the number of invocations. Drivers could also access
these directly without going through gl_program.
We should probably add InputType and OutputType here too, but currently
those are stored as GL_* enums, and I wanted to avoid using those in
NIR, as I suspect Vulkan/SPIR-V will use different enums. (We should
probably make our own.)
We could add VerticesIn, but it's easily computable from the input
topology, so I'm not sure whether it's worth it. It's also currently
not stored in gl_shader (only gl_shader_program), which would require
changes to the glsl_to_nir interface or require us to store it there.
This is a bit of duplication of data...ideally, we would factor these
substructs out of gl_program, gl_shader_program, and nir_shader, creating
a gl_geometry_info class...but it would need to go in a new place (in
src/glsl?) that isn't mtypes.h nor nir.h.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
These provide a convenient way to do simple variable loads and stores.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Previously the result of the complicated clamp() expression just dropped
on the floor: clamp does not modify any of its parameters. Looking at
the surrounding code, I believe this is supposed to modify the value of
tex_coord.
This change (along with a change to avoid the use of
brw_blorp_framebuffer) does not affect any existing piglit tests. I'm
not sure what this clamp is trying to accomplish, so I'm not sure how to
write a test to exercise this path.
I also noticed another bug in this code. There is no way the array
texture case could possibly work. This will generate code for the
TEXEL_FETCH macro like:
#define TEXEL_FETCH(coord) texelFetch(texSampler, ivec3(coord), sample_map[int(2 * fract(coord.x))]);
Since the coord parameter of this macro is a vec2 at all invocations, no
expansion of this macro will even compile.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
These only occurred in release builds, but they occurred in every file
that included intel_batchbuffer.h. Lots of spam. :(
intel_batchbuffer.h: In function 'intel_batchbuffer_advance':
intel_batchbuffer.h:153:47: warning: unused parameter 'brw' [-Wunused-parameter]
intel_batchbuffer_advance(struct brw_context *brw)
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The for_bo parameter of intel_miptree_create_layout appears to be unused
since 27eedca when Eric removed some Gen5 code (after the i915 and i965
drivers parted ways).
intel_mipmap_tree.c: In function 'old_intel_miptree_create_layout':
intel_mipmap_tree.c:77:35: warning: unused parameter 'for_bo' [-Wunused-parameter]
bool for_bo)
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Hasn't existed in the i915 source since the i915 and i965 drivers parted
ways.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This hasn't been used outside intel_mipmap_tree.c since d5d4ba9 started
using meta instead of the blitter for PBO TexSubImage. While we're
here, remove the unused brw parameter from the function formerly known
as intel_miptree_unmap_raw.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
These only occurred in release builds, but they occurred in every file
that included intel_mipmap_tree.h. Lots of spam. :(
intel_mipmap_tree.h: In function 'intel_miptree_check_level_layer':
intel_mipmap_tree.h:595:59: warning: unused parameter 'mt' [-Wunused-parameter]
intel_miptree_check_level_layer(struct intel_mipmap_tree *mt,
^
intel_mipmap_tree.h:596:42: warning: unused parameter 'level' [-Wunused-parameter]
uint32_t level,
^
intel_mipmap_tree.h:597:42: warning: unused parameter 'layer' [-Wunused-parameter]
uint32_t layer)
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The target parameter of compute_msaa_layout appears to be unused since
83b83fb when support for CMS textures was added for Gen7.
The brw parameter of intel_get_non_msrt_mcs_alignment appears to be
unused since e92fbdc when the GEN check (along with the "can we fast
clear" decision) was moved to a different function.
intel_mipmap_tree.c: In function 'compute_msaa_layout':
intel_mipmap_tree.c:62:73: warning: unused parameter 'target' [-Wunused-parameter]
compute_msaa_layout(struct brw_context *brw, mesa_format format, GLenum target,
^
intel_mipmap_tree.c: In function 'intel_get_non_msrt_mcs_alignment':
intel_mipmap_tree.c:143:54: warning: unused parameter 'brw' [-Wunused-parameter]
intel_get_non_msrt_mcs_alignment(struct brw_context *brw,
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
If we skip a vbuffer we need to make sure we NULL out
the contents, otherwise when it gets passed to the driver
it will get confused.
This was hit by:
GL41-CTS.gpu_shader_fp64.varyings
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Enable barrier in MEDIA_INTERFACE_DESCRIPTOR if the program uses the
barrier() GLSL function.
On Ivy Bridge and Haswell, this allows the piglit test
tests/spec/arb_compute_shader/execution/simple-barrier-atomics.shader_test
to pass. On gen8, this enables a similar test with a local group size
of 896 to pass.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
first_non_payload_grf may be updated in assign_urb_setup for FS or
assign_vs_urb_setup for VS.
We need to set this in assign_curb_setup for compute shaders since cs
does not have an assign_cs_urb_setup like assign_urb_setup (fs) or
assign_vs_urb_setup (vs).
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
[v2: kayden-supplied code in fs_nir replacing need for logical opcode]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
grep -lr 'sub license' | while read f; do \
sed --in-place -e 's/sub license/sublicense/' $f ;\
done
grep -lr 'NON-INFRINGEMENT' | while read f; do \
sed --in-place -e 's/NON-INFRINGEMENT/NONINFRINGEMENT/' $f ;\
done
As noted by Matt, both of these changes match the MIT license text found
at http://opensource.org/licenses/MIT.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Passes the shader piglit tests and introduces no regressions.
This commit finally makes use of the refactoring in previous
commits.
v2:
- adapted the code to changes in previous commits (renames,
need_cube_convert stuff)
- splitted too long lines
Reviewed-by: Brian Paul <brianp@vmware.com>
This introduces new vfunc in tgsi_sampler just for this opcode. I
decided against extending get_samples vfunc to return the mipmap level
and LOD - the function's prototype is already too scary and doing the
sampling for textureQueryLod would be a waste of time.
v2:
- splitted too long lines
Reviewed-by: Brian Paul <brianp@vmware.com>
These functions will be used by textureQueryLod.
v2:
- renamed mip_level_* funcs to mip_rel_level_* to indicate that
these functions return mip level relative to base level and
documented them
- renamed a level member in sp_filter_funcs struct to relative_level
- changed mip_rel_level_none and mip_rel_level_nearest to return mip
level relative to base level, mip_rel_level_linear already did
that
- documented clamp_lod function
Reviewed-by: Brian Paul <brianp@vmware.com>
This is to avoid tying the conversion to the sampling -
textureQueryLod will need to do the conversion too, but it does not do
any sampling.
So instead of a "get_samples" vfunc, there is just a bool saying
whether the conversion is needed or not. This solution keeps a nice
property of not adding any overhead for the common case (2D textures).
v2:
- replaced the "convert_coords" vfunc with a "need_cube_convert"
boolean to avoid overhead of copying arrays in common case
- removed an unused typedef
- splitted too long lines in convert_cube
- const fixes in convert_cube
Reviewed-by: Brian Paul <brianp@vmware.com>
This function will be later used by textureQueryLod. The
img_filter_func are optional, because textureQueryLod will not need
them.
v2:
- adapted to changes in previous commit (renames)
- simplified conditions a bit
- updated docs
- splitted too long lines
Reviewed-by: Brian Paul <brianp@vmware.com>
Putting this function pointer into a struct enables grouping of
several related functions in a single place. For now it is just a
single function, but the struct will be later extended with a
mip_level_func for returning relative mip level.
v2:
- renamed sp_mip struct to sp_filter_funcs
- renamed sp_filter_funcs instances from mip_foo to funcs_foo
- splitted too long lines
- sp_sampler now holds a pointer to sp_filter_funcs instead of an
instance of it
- some const fixes
Reviewed-by: Brian Paul <brianp@vmware.com>
textureQueryLod returns a vec2 with a mipmap information and a
LOD. The latter needs to be not clamped.
v2:
- changed the "not_clamped" part to "unclamped"
- corrected "clamp into" to "clamp to"
- splitted too long lines
Reviewed-by: Brian Paul <brianp@vmware.com>
The level-of-detail bias wasn't simply added in the explicit LOD case.
This case seems to be tested only in piglit's
fs-texturequerylod-nearest-biased test, which is currently skipped, as
softpipe does not support textureQueryLod at the moment.
Reviewed-by: Brian Paul <brianp@vmware.com>
Basically, do the same thing as for buffer_unmap, but use the explicit range
instead. It's for apps which want to map a whole buffer and mark touched
ranges explicitly.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
When checking for LLVM shared libraries, use IMP_LIB_EXT for the extension for
shared libraries appropriate to the target, rather than hardcoding '.so'
Also add some comments to explain why we have this circus of pain.
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
mesa/src/mesa/drivers/dri/i965/brw_eu_compact.c: In function 'set_3src_control_index':
mesa/src/mesa/drivers/dri/i965/brw_eu_compact.c:805:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < ARRAY_SIZE(gen8_3src_control_index_table); i++) {
^
mesa/src/mesa/drivers/dri/i965/brw_eu_compact.c: In function 'set_3src_source_index':
mesa/src/mesa/drivers/dri/i965/brw_eu_compact.c:839:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < ARRAY_SIZE(gen8_3src_source_index_table); i++) {
^
mesa/src/mesa/drivers/dri/i965/brw_state_dump.c: In function 'dump_sampler_state':
mesa/src/mesa/drivers/dri/i965/brw_state_dump.c:382:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < size / 16; i++) {
^
mesa/src/mesa/drivers/dri/i965/brw_state_upload.c: In function 'brw_pipeline_state_finished':
mesa/src/mesa/drivers/dri/i965/brw_state_upload.c:801:13: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (i != pipeline) {
^
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c: In function 'intel_gen7_hiz_buf_create':
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1544:47: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int level = mt->first_level; level <= mt->last_level; ++level) {
^
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c: In function 'intel_gen8_hiz_buf_create':
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1638:44: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int level = mt->first_level; level <= mt->last_level; ++level) {
^
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c: In function 'intel_miptree_alloc_hiz':
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1771:44: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int level = mt->first_level; level <= mt->last_level; ++level) {
^
mesa/src/mesa/drivers/dri/i965/intel_mipmap_tree.c:1775:33: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int layer = 0; layer < mt->level[level].depth; ++layer) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
mesa/src/mesa/program/prog_to_nir.c: In function 'setup_registers_and_variables':
/mesa/src/mesa/program/prog_to_nir.c:1059:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < c->prog->NumTemporaries; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
mesa/src/glsl/nir/nir_lower_tex_projector.c: In function 'nir_lower_tex_projector_block':
mesa/src/glsl/nir/nir_lower_tex_projector.c:63:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < tex->num_srcs; i++) {
^
mesa/src/glsl/nir/nir_lower_tex_projector.c: In function 'nir_lower_tex_projector_block':
mesa/src/glsl/nir/nir_lower_tex_projector.c:114:38: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = proj_index + 1; i < tex->num_srcs; i++) {
^
mesa/src/glsl/nir/nir_lower_tex_projector.c: In function 'nir_lower_tex_projector_block':
mesa/src/glsl/nir/nir_lower_tex_projector.c:53:39: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (proj_index = 0; proj_index < tex->num_srcs; proj_index++) {
^
mesa/src/glsl/nir/nir_lower_tex_projector.c:57:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (proj_index == tex->num_srcs)
^
mesa/src/glsl/nir/nir_search.c: In function 'match_value':
mesa/src/glsl/nir/nir_search.c:84:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < num_components; ++i)
^
mesa/src/glsl/nir/nir_search.c: In function 'match_value':
mesa/src/glsl/nir/nir_search.c:110:28: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < num_components; ++i) {
^
mesa/src/glsl/nir/nir_search.c: In function 'match_value':
mesa/src/glsl/nir/nir_search.c:139:19: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (i < num_components)
^
mesa/src/glsl/nir/nir_opt_peephole_ffma.c: In function 'get_mul_for_src':
mesa/src/glsl/nir/nir_opt_peephole_ffma.c:130:27: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (unsigned i = 0; i < num_components; i++)
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Resolve a series of missing field initializer warnings within get_hash_params.py
Of the form:
In file included from mesa/src/mesa/main/get.c:495:0:
mesa/src/mesa/main/get_hash.h:180:5: warning: missing initializer for field
'extra' of 'const struct value_desc' [-Wmissing-field-initializers]
{ GL_POINT_SIZE_ARRAY_BUFFER_BINDING_OES, LOC_CUSTOM, TYPE_INT, 0 },
^
mesa/src/mesa/main/get.c:165:15: note: 'extra' declared here
const int *extra;
^
This patch addresses some likely code rot around the *extra field, where the
initialization is via C code generated indirectly from a Python script.
It resolves a number of warnings reported by GCC when configured to be pedantic.
$ gcc --version
gcc (Ubuntu 4.9.2-10ubuntu13) 4.9.2
No piglit regressions on Ironlake.
v2:
- Squash series into a single patch.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
When parsing an variable declaration qualified with the typename
keyword, clang attempted to declare a variable with the type of non
type member "enum type type" of module::argument (within the header
file clover/core/module.hpp) instead of the typed member of
module::argument "enum type".
Replaced "typename" with "enum" to force clang to declare the variable
marg_type with type "enum type" of module::argument.
CC: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Albert Freeman <albertwdfreeman@gmail.com>
Our old value of 16384 is the minimum value. DirectX apparently
requires 65536 at a minimum; that's also what nVidia and the Intel
Windows driver advertise. AMD advertises MAX_INT.
Ilia Mirkin noticed that "Shadow Warrior" uses UBOs larger than 16k
on Nouveau, which advertises 65536 bytes for this limit. Traces
captured on Nouveau don't work on i965 because our lower limit causes
the GLSL linker to reject the captured shaders. While this isn't
important in and of itself, it does suggest that raising the limit
would be beneficial.
We can read linear buffers up to 2^27 bytes in size, so raising this
should be safe; we could probably even go larger. For now, matching
nVidia and Intel/Windows seems like a good plan.
We have to reinitialize MaxCombinedUniformComponents as core Mesa will
have set it based on a stale value for MaxUniformBlockSize.
According to Tapani, there's an unreleased game that asserts on this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
It is advantageous to use r63 instead of r127 since r63 can fit into the
shorter encoding. However if we've RA'd over 63 registers, we must use
r127 as the replacement instead.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Unfortunately nv50_ir phi nodes aren't directly connected to the CFG, so
the mapping between source and the actual BB is by inbound edge order.
So when manipulating edges one has to be extremely careful. We were
insufficiently careful when splitting critical edges which resulted in
the phi nodes being confused as to where their sources were coming from.
This primarily manifests itself with the TXL-lowering logic on nv50,
when it is inside of a conditional. I've been unable to trigger the
issue anywhere else so far. This resolves rendering failures
in a number of games like Two Worlds 2, Trine: Enchanted Edition, Trine 2,
XCOM:Enemy Unknown, Stacking. It also improves the situation in
Hearthstone, Sonic Generations, and The Raven: Legacy of a Master Thief.
However more work needs to be done there (splitting a lot more edges
solves it, so it's some other sort of RA-related issue).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90887
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
The purpose of the macro was to create the name_as_gs_input from name.
The previous commit removed the name_as_gs_input from add_varying, so
the macro is unnecessary.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
After inserting instructions the cursor.option becomes _after_instr
(even if it started life as an _after_block). So we cannot simply stash
the current cursor on the if/loop_stack. Otherwise we end up inserting
instructions after the endif/endloop in the block preceeding the if/
loop.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
CB updates to bound buffers need to go through the CB_DATA endpoints,
otherwise the shader may not notice that the updates happened.
Furthermore, these updates have to go in to the same address as the
bound buffer, otherwise, again, the shader may not notice updates.
So we keep track of all the places where a constbuf is bound, and
iterate over all of them when updating data. If a binding is found that
encompasses the region to be updated, then we use the settings of that
binding for the upload. Otherwise we upload as a regular data update.
This fixes piglit 'arb_uniform_buffer_object-rendering offset' as well
as blurriness in Witcher2.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91890
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
This pass can be used as a helper for NIR producers so they don't have to
worry about creating the temporaries themselves.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Some modern apps try to use msaa without keeping in mind the
restrictions on videomem of older cards. Resulting in dmesg saying:
[ 1197.850642] nouveau E[soffice.bin[3785]] fail ttm_validate
[ 1197.850648] nouveau E[soffice.bin[3785]] validating bo list
[ 1197.850654] nouveau E[soffice.bin[3785]] validate: -12
Because we are running out of video memory, after which the program
using the msaa visual freezes, and eventually the entire system freezes.
To work around this we do not allow msaa visauls by default and allow
the user to override this via NV30_MAX_MSAA.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
[imirkin: move env var lookup to screen so that it's only done once]
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
We do not have a generic blitter on nv3x cards, so we must use the
sifm object for color resolving.
This commit divides the sources and dest surfaces in to tiles which
match the constraints of the sifm object, so that color resolving
will work properly on nv3x cards.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Since debugging issues w/ fd's close()d at the wrong time can be quite
fun, this should probably be made more explicit in the docs.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This patch is necessary to avoid building error on android,
due to missing sid_tables.h generated sources
v2:[Emil Velikov] Correctly split the lists.
Fixes: fbbebeae10f(radeonsi: inline si_cmd_context_control)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Otherwise the android build fails with
error : unable to find string literal operator ‘operator"" PRIx64’
There are several resources referring to the problem, which is related
to c++11, in our case used when building mesa for lollipop.
http://comments.gmane.org/gmane.comp.graphics.opensg.user/5883
I've not investigated all the semantics, some people even suggested a
bug in the gcc compiler,
I just saw the building error was solved with one little space for
lollipop and no side effect when c+11 not used.
v2: [Emil Velikov] add an alternative commit message from Mauro.
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
There are a few bits this commit aims to resolve:
One can generalise the mkdir rule to a simple MKDIR_P $(@D) which will
expand appropriately for even if we change the subdir name, and/or add
new rules. We can also drop the explicit $(srcdir) prefix for the
dependency rules, they they are not strictly required, nor used
elsewhere in mesa.
Finally replace $< with explicit filename to be consistent through the
file, and honour PYTHON_FLAGS.
v2: Add comprehensive commit summary/message (Ian, Matt)
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Rather than folding one variable within the other only to unwrap them,
just use the ones we need.
v2: bring back LOCAL_PATH prefix for nir_constant_expressions,h
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
The glsl equivalent of "mesa: automake: rework the source generation
rules". Plus let's make things consistent and always explicitly provide
the header name.
v2: Rebase on top of reverted "remove custom AM_V_LEX/YACC" (Matt)
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Same logic as previous commit applies.
Additionally remove the odd (set -e/mv/INDENT) from the rules.
The last one is the only one we remotely care about, if reading the
generated sources.
Upcoming work from DylanB which will replace the existing python
scripts with ones that produce more readable output anyway.
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
A handful of changes/cleanups paving the way to bmake support:
- Remove optional $(srcdir)/ prefix for files in the prereq list.
- Drop the space after the AM_V_GEN variable.
- Using $< in a non-suffix rule is a GNU make idiom.
- Use $(@D) over $(dir $@). The latter is a POSIX standard.
v2: Cosmetic tweaks in the commit summary.
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
This is the only place in mesa that uses this constuct which seems
to be GNUmake-ism. Attempting to build with POSIX make implementations
(bmake) would fail as below.
--- options.h ---
LOCALEDIR := .
sh: line 2: LOCALEDIR: command not found
*** [options.h] Error code 127
So let's keep things consistent and compatible by making the variable
non target specific.
v2:
- Bring back LOCALEDIR.
- Reword the commit message
- Change mesa-stable tag 10.6 > 11.0
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Cc: Jonathan Gray <jsg@jsg.id.au>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
According to OpenGL ES 3.1 specification table : 20.2 and
OpenGL specification 4.4 table 23.4. The glGetIntegeri_v
functions should report the name of the buffer bound
when called with GL_VERTEX_BINDING_BUFFER.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This code is all pretty much identical. We just needed the translation
from one enum value to the other.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
TRIFAN_NOSTIPPLE has always been 0x16 - 0x15 is marked "Reserved" on all
platforms. See the 965 PRM, Volume 2, Table 3-1, "3D Primitive Topology
Type Encoding" for a list.
We don't currently use this, and I don't expect we will, but we may as
well not leave the bogus value around.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Docs suggest this is no longer required starting with Gen8.
Perf (no regressions in n=20)
OglMultithread 0.67%
OglTerrainPanInst 0.12%
trex 0.45%
warsow 0.64%
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Since 7a32652231
r600: Turn 'r600_shader_key' struct into union
we were accessing key fields that might be aliased in the union
with other fields, so we should check what shader type we are
compiling for before using key values from it.
v1.1: make it compile
v2: have caffeine, make it work - we don't set type
until later, so don't reference it until we've set it.
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
I meant to do this here, but it was in the wrong place:
commit c1151b18f2
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date: Wed Jun 24 20:07:54 2015 -0700
i965/skl: Use more compact hiz dimensions
NOTE: Jordan did go back and look at the original mailing list post. I mailed
the right thing, and pushed the wrong one.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Indications are that if the colormask indicates a single bit set on
fermi, that value will always be read from $r0 instead of a potentially
higher register (if e.g. green is set). Not to upset the counting logic,
always set the header up with a full color mask for each RT. Such a
situation can basically only ever happen with generated blit shaders.
Fixes the following piglit on Fermi (Kepler is unaffected):
fbo-stencil blit GL_DEPTH32F_STENCIL8
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Commit 2126c68e5c killed the array elements parameter on load/store
intrinsics that was stored in const_index[1]. It looks like that
patch missed to remove this assignment in the UBO path.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The sifm object has a limit of 1024x1024 for its input size and 2048x2048
for its output. The code checking this was trying to be clever resulting
in it seeing a surface of e.g 1024x256 being outside of the input size
limit.
This commit fixes this.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
X11_CFLAGS is never defined. Path to X11 headers is not needed here, so
just remove.
Future work: Using AM_CFLAGS here looks wrong, as this Makefile only builds
C++ files
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Make sure errors are correcly propagated.
Also don't flush during state emission if emission fails.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Like xa_surface_from_handle(), but takes a handle type, rather than
hard-coding 'shared' handle. This is needed to fix bugs seen with
xf86-video-freedreno with xrandr rotation, for example. The root issue
is that doing a GEM_OPEN ioctl on a bo that already has a GEM handle
associated with the drm_file will result in two unique handles for the
same bo. Which causes all sorts of follow-on fail.
v2:
- Add support for for fd handles.
- Avoid duplicating code.
- Bump xa version minor.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
My earlier attempt to fix this missed the fact that there was a #else
clause that assumes that you have openssh. This moves the whole thing
under #ifdef HAVE_SHA1 which should avoid this issue.
Fixes: 13bfa5201 (util: always include sha1 into the build)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91898
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@gmail.com>
SHA1 is now used in all builds when HAVE_SHA1 is defined. Adjust src to
do the same thing, rather than predicating on shader cache.
Fixes: 04e201d0c0 ("mesa: change 'SHADER_SUBST' facility to work with env variables")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@gmail.com>
Nothing in the spec allows for the reduced precision, and this also
fixes st_QuerySamplesForFormat for nv50, which does not allow MS8 on
RGBA32F. Now this will be respected instead of reporting MS8 as
supported with an assumption that the format used will be RGBA16F.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
vbuf is never null. We want to make sure that a resource was allocated
for the vbuf, which is *vbuf.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The hardware only generates vertexid when vertices come from a VBO. This
fixes:
vertexid-drawelements
vertexid-drawarrays
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
The stride was being set to 0, which is illegal (and also non-sensical).
Also we must wait for the buffer to become available for reading as
otherwise a wrong value may be prefetched. Since we must wait for the
buffer anyways, and it's mapped and in GART, we may as well avoid the
annoyance of the indirect pushbuf submit.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Gen9 changes the meaning of this to coarse LOD quality mode. Although that's a
desirable thing to be setting, it doesn't match the gen8 behavior and this was
unintentional. More importantly, we don't ever use this field. So instead of
getting it "wrong" drop it entirely.
This is a respin of a patch which only [incorrectly] tried to address gen9.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
round(val*dscale) produces a double result, as val and dscale are double.
However, LLVMConstInt receives unsigned long long, so there is an
implicit conversion from double to unsigned long long.
This is an undefined behavior. Therefore, we need to first explicitly
convert the round result to long long, and then let the compiler handle
conversion from that to unsigned long long.
This bug manifests itself in POWER, where all IMM values of -1 are being
converted to 0 implicitly, causing a wrong LLVM IR output.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Note this is not ideal. Since the sifm can only do source sizes upto
1024x1024 we end up using the blitter on nv4x, which is not that fast.
And on nv3x we end up using the cpu which is really slow.
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Scanout buffers on nv30 must always be non-swizzled and have special
width alignment constraints.
These constrains have been taken from the xf86-video-nouveau
src/nv_accel_common.c: nouveau_allocate_surface() function.
nouveau_allocate_surface() applies these width constraints only when a
tiled attribute is set, which it sets for all surfaces allocated via
dri, and this "tiling" is not the same as swizzling, scanout surfaces
must be linear / have a uniform_pitch or only complete garbage is shown.
This commit fixes dri3 on nv30 showing a garbled display, with dri3 the
scanout buffers are allocated by mesa, rather then by the ddx, and the
wrong stride of these buffers was causing the garbled display.
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The tiled memcpy fast paths perform a simple blit (with only a couple of
trivial pixel conversion routines) and do not accommodate PixelTransfer
operations. Therefore if any are set, fallback to the regular routines.
Note that PixelTransfer only applies to TexImage and ReadPixels, not to
GetTexImage.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
If we have spilled/unspilled a register in the current instruction, avoid
emitting unspills for the same register in the same instruction or consecutive
instructions following the current one as long as they keep reading the spilled
register. This should allow us to avoid emitting costy unspills that come with
little benefit to register allocation.
v2:
- Apply the same logic when evaluating spilling costs (Curro).
v3:
- Abstract the logic that decides if a register can be reused in a function.
that can be used from both spill_reg and evaluate_spill_costs (Curro).
v4:
- Do not disallow reusing scratch_reg in predicated reads (Curro).
- Track if previous sources in the same instruction read scratch_reg (Curro).
- Return prev_inst_read_scratch_reg at the end (Curro).
- No need to explicitily skip scratch read/write opcodes in spill_reg (Curro).
- Fix the comments explaining what happens when we hit an instruction that
does not read or write scratch_reg (Curro)
- Return true early when the current or previous instructions read
scratch_reg with a compatible mask.
v5:
- Do not return true early, the loop should not be expensive anyway
and this adds more complexity (Curro).
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We now print out the name of the message instead of its numerical
value, and label the message control and surface numbers.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The entire VUE map is computed based on the slots_valid bitfield;
calling brw_compute_vue_map on the same bitfield will return the
same result. So we can simply compare those.
struct brw_vue_map is 136 bytes; doing a single 8-byte comparison is
much cheaper and should work just as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
These were only for legacy userclipping, which we no longer support
in geometry shaders.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
According to the GLSL 1.50 specification, page 76:
"The shader must also set all values in gl_ClipDistance that have been
enabled via the OpenGL API, or results are undefined."
With this patch, we only enable clip distance writes when the shader
actually writes them. We no longer force a value to be written when
clip planes are enabled in the API. This could mean the first varying
slot would be used as clip distances - I believe it should be the safe
kind of undefined behavior.
Empirically, it doesn't seem to cause a problem.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The legacy userclip fields are only used for the vertex shader, and at
that point there's only program_string_id and the tex struct, which are
common to all keys. So there's no need for a "VUE" key base class.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This avoids a downcast of key, which won't exist in the base class soon.
I'm not a huge fan of this patch, but given that we're currently using
inheritance, this seems like the "right" way to do it. The alternative
is to make key a void pointer in the parent class and continue
downcasting.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
I'm about to remove the base class for VS/GS/HS/DS program keys, at
which point we won't be able to use key->tex anymore. Instead, we'll
need to store a direct pointer (like we do in the FS backend).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This is now only used for the vertex shader, so it makes sense to get it
out of any paths run by the geometry shader.
Instead of passing the gl_clip_plane array into the run() method (which
is shared among all subclasses), we add it as a vec4_vs_visitor
constructor parameter. This eliminates the bogus NULL parameter in the
GS case.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
There are two uses of this flag.
The primary use is checking whether we need to emit code to convert
legacy gl_ClipVertex/gl_Position clipping to clip distances. In this
case, we also have to upload the clip planes as uniforms, which means
setting nr_userclip_plane_consts to a positive value. Checking if it's
> 0 works for detecting this case.
Gen4-5 also wants to know whether we're doing clipping at all, so it can
emit user clip flags. Checking if output_reg[VARYING_SLOT_CLIP_DIST0]
is set to a real register suffices for this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
We only support geometry shaders in core profiles, where gl_ClipVertex
doesn't exist. Presumably the even older behavior of clipping to
gl_Position isn't supported either. In fact, GLSL 1.50 page 76 claims:
"The shader must also set all values in gl_ClipDistance that have been
enabled via the OpenGL API, or results are undefined."
So we don't need to handle legacy clipping in geometry shaders. I think
Paul added this back when we were considering supporting the old
GL_ARB_geometry_shader4 extension.
This removes a non-orthagonal state dependency on GS compilation.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This living in brw_fs.{h,cpp} is a historical artifact of us supporting
texturing for fragment shaders before any other stages. It's kind of
awkward given that we use it for all stages.
This avoids having to include brw_fs.h in geometry shader code in order
to access this function.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Patch modifies existing shader source and replace functionality to work
with environment variables rather than enable dumping on compile time.
Also instead of _mesa_str_checksum, _mesa_sha1_compute is used to avoid
collisions.
Functionality is controlled via two environment variables:
MESA_SHADER_DUMP_PATH - path where shader sources are dumped
MESA_SHADER_READ_PATH - path where replacement shaders are read
v2: cleanups, add strerror if fopen fails, put all functionality
inside HAVE_SHA1 since sha1 is required
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Suggested-by: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
commit 472ef9a02f introduced code to
change the types of SEL and MOV instructions for moves that simply
"copy bits around". It didn't account for type conversion moves,
however. So it would happily turn this:
mov(8) vgrf6:D, -vgrf5:D
mov(8) vgrf7:F, vgrf6:UD
into this:
mov(8) vgrf6:D, -vgrf5:D
mov(8) vgrf7:D, -vgrf5:D
which erroneously drops the conversion to float.
Cc: "11.0 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
As far as I can tell, the behavior is preserved from the previous generations.
Before we set a single bit to tell the FS whether or not we'll be using an input
coverage mask. Now we have some options which are implementing various
extensions. These bits are used for the various conservative rasterization
mechanisms (for collision detection, binning, and whatever else).
I believe that the behavior is preserved because the problem which conservative
rasterization is attempting to fix would go away with the "NORMAL" mode (at the
cost of performance, I believe).
This patch serves as documentation of the change by creating the enums, as well
as giving some of the history with the links here so that the next person who
comes along and looks at it doesn't spend as long as I had to in order to
determine if there is an issue or not.
Previously, this algorithm had been done in software, and this can still be used
as long as we don't export an extension stating otherwise.
References: https://www.opengl.org/registry/specs/NV/conservative_raster.txt
References: https://http.developer.nvidia.com/GPUGems2/gpugems2_chapter42.html
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This must be done before exporting a buffer as dmabuf fds, because
we lose track of who is using it and can't trust the reference counter.
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This is probably the most called util function. It does almost nothing,
yet it can consume 10% of the CPU on the profile. This drops it down to 5%.
Reviewed-by: Brian Paul <brianp@vmware.com>
There doesn't seem any reason to start from 4.
Start from 1 instead (0 is left reserved to catch uninitialized atoms).
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Similarly to scissor states, we can use single atom to track all viewport
states. This will allow to simplify dirty atom handling later.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
During review of the "r600g: make all scissor states use single atom" patch
Marek Olšák noticed that scissor disable workaround should be applied on
all scissor states and not just first one, so let's do so.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
As suggested by Marek Olšák, we can use single atom to track all scissor
states. This will allow to simplify dirty atom handling later.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
It's legal to call glTexSubImage with zero values for the width,
height or depth. Previously this was breaking the PBO access
validation because it tries to work out the last pixel accessed by
getting the pixel at height-1 and depth-1 which would end up with
bogus values.
This was causing GL errors to be generated during the Piglit
texsubimage test, although the test was passing anyway.
v2: Also check for width == 0. Don't validate the start pointer if any
of the dimensions are zero.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
In various versions of OpenGL and GLSL, it's possible to declare
multiple VS input variables with aliasing attribute locations.
So, when computing the storage requirements for vertex attributes,
we can't simply add up the sizes. Instead, we need to look at the
enabled slots.
This patch begins tracking which attributes are double types that
are larger than 128-bits (i.e. take up two vec4 slots). We then
count normal attributes once, and count the double-size attributes
a second time.
Fixes deQP functional.attribute_location.bind_aliasing.max_cond_* tests
on i965, which regressed with commit ad208d975a.
No Piglit changes on llvmpipe (which actually supports dvecs).
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Previously we would allow glUniformMatrix4fv on a dmat4 and
glUniformMatrix4dv on a mat4. Both are illegal. That later also
overwrites the storage for the mat4 and causes bad things to happen.
Should fix the (new) arb_gpu_shader_fp64-wrong-type-setter piglit test.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Cc: Dave Airlie <airlied@redhat.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
All of the other state upload functions are static because the only use
is in the brw_tracked_state structure.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
All of the other state upload functions are static because the only use
is in the brw_tracked_state structure.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Because the compiler already has enough things to complain about.
grep -rl 'const static' src/ | while read f
do
sed --in-place -e 's/const static/static const/g' $f
done
brw_eu_emit.c: In function 'brw_reg_type_to_hw_type':
brw_eu_emit.c:98:7: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static int imm_hw_types[] = {
^
brw_eu_emit.c:120:7: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
const static int hw_types[] = {
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
brw_upload_cs_push_constants was based on gen6_upload_push_constants.
v2:
* Add FINISHME comments about more efficient ways to push uniforms
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Check for a valid framebuffer cbuf pointer before accessing its
associated surface.
Fix piglit test fbo-drawbuffers-none.
Reviewed-by: Brian Paul <brianp@vmware.com>
Commit b9ba8492 removes an unneeded pipe_surface_release() from
st_render_texture(). This implies a surface can now be reused for a
render buffer. Currently, when we render to a texture, we mark the
surface as dirty. But in svga_mark_surface_dirty(), if the surface
is already marked as dirty, it does not increment the texture age.
Any view to this texture might not be updated properly then.
With this patch, the texture age is incremented regardless of whether
the surface is already marked as dirty or not.
Fix bug 1499181.
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Commit b9ba8492 removes an unneeded pipe_surface_release() from
st_render_texture() and exposes a bug in the backed surface view
creation. Currently a backed surface view for a conflicted surface view
is created at framebuffer emit time. But if shader sampler views are changed
but framebuffer surface views remain unchanged, emit_framebuffer() will not
be called and conflicted surface views will not be detected.
To fix this, also check for conflicted surface views when setting sampler
views. If there is any conflicted surface views, enable the
framebuffer dirty bit so that the framebuffer emit code has a chance to
create a backed surface view for the conflicted surface view.
Fix cinebench-r11-test regression.
Reviewed-by: Brian Paul <brianp@vmware.com>
The lowered code reads from the destination, which isn't possible from
message registers.
Fixes the following dEQP tests on SNB:
dEQP-GLES3.functional.shaders.precision.int.highp_mul_fragment
dEQP-GLES3.functional.shaders.precision.int.mediump_mul_fragment
dEQP-GLES3.functional.shaders.precision.int.lowp_mul_fragment
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is a squash commit of roughly two years of development work.
Authors include:
Brian Paul
Charmaine Lee
Thomas Hellstrom
Jakob Bornecrantz
Sinclair Yeh
Mingcheng Chen
Kai Ninomiya
MengLin Wu
The driver supports OpenGL 3.3.
Signed-off-by: Brian Paul <brianp@vmware.com>
The VMware svga driver doesn't directly support pipe_screen::get_timestamp()
but we can do a work-around. However, we need a gallium context to do so.
This patch adds a new pipe_context::get_timestamp() function that will only
be called if the pipe_screen::get_timestamp() function is NULL.
Signed-off-by: Brian Paul <brianp@vmware.com>
This involves a few driver modifications to keep things building.
The driver may not actually run properly at this point.
Signed-off-by: Brian Paul <brianp@vmware.com>
If the user is specifying a subregion of a buffer using SKIP_ROWS and
SKIP_PIXELS, we must compute the buffer size carefully as the end of the
last row may be much shorter than stride*image_height*depth. The current
code tries to memcpy from beyond the end of the user data, for example
causing:
==28136== Invalid read of size 8
==28136== at 0x4C2D94E: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915)
==28136== by 0xB4ADFE3: brw_bo_write (brw_batch.c:1856)
==28136== by 0xB5B3531: brw_buffer_data (intel_buffer_objects.c:208)
==28136== by 0xB0F6275: _mesa_buffer_data (bufferobj.c:1600)
==28136== by 0xB0F6346: _mesa_BufferData (bufferobj.c:1631)
==28136== by 0xB37A1EE: create_texture_for_pbo (meta_tex_subimage.c:103)
==28136== by 0xB37A467: _mesa_meta_pbo_TexSubImage (meta_tex_subimage.c:176)
==28136== by 0xB5C8D61: intelTexSubImage (intel_tex_subimage.c:195)
==28136== by 0xB254AB4: _mesa_texture_sub_image (teximage.c:3654)
==28136== by 0xB254C9F: texsubimage (teximage.c:3712)
==28136== by 0xB2550E9: _mesa_TexSubImage2D (teximage.c:3853)
==28136== by 0x401CA0: UploadTexSubImage2D (teximage.c:171)
==28136== Address 0xd8bfbe0 is 0 bytes after a block of size 1,024 alloc'd
==28136== at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==28136== by 0x402014: PerfDraw (teximage.c:270)
==28136== by 0x402648: Draw (glmain.c:182)
==28136== by 0x8385E63: ??? (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x83896C8: fgEnumWindows (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x838641C: glutMainLoopEvent (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x8386C1C: glutMainLoop (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x4019C1: main (glmain.c:262)
==28136==
==28136== Invalid read of size 8
==28136== at 0x4C2D940: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915)
==28136== by 0xB4ADFE3: brw_bo_write (brw_batch.c:1856)
==28136== by 0xB5B3531: brw_buffer_data (intel_buffer_objects.c:208)
==28136== by 0xB0F6275: _mesa_buffer_data (bufferobj.c:1600)
==28136== by 0xB0F6346: _mesa_BufferData (bufferobj.c:1631)
==28136== by 0xB37A1EE: create_texture_for_pbo (meta_tex_subimage.c:103)
==28136== by 0xB37A467: _mesa_meta_pbo_TexSubImage (meta_tex_subimage.c:176)
==28136== by 0xB5C8D61: intelTexSubImage (intel_tex_subimage.c:195)
==28136== by 0xB254AB4: _mesa_texture_sub_image (teximage.c:3654)
==28136== by 0xB254C9F: texsubimage (teximage.c:3712)
==28136== by 0xB2550E9: _mesa_TexSubImage2D (teximage.c:3853)
==28136== by 0x401CA0: UploadTexSubImage2D (teximage.c:171)
==28136== Address 0xd8bfbe8 is 8 bytes after a block of size 1,024 alloc'd
==28136== at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==28136== by 0x402014: PerfDraw (teximage.c:270)
==28136== by 0x402648: Draw (glmain.c:182)
==28136== by 0x8385E63: ??? (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x83896C8: fgEnumWindows (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x838641C: glutMainLoopEvent (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x8386C1C: glutMainLoop (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136== by 0x4019C1: main (glmain.c:262)
==28136==
Fixes regression from commit 7f396189f0
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date: Mon Jan 5 18:17:04 2015 -0800
meta: Add a BlitFramebuffers-based implementation of TexSubImage
v2: However, the teximage we create does need to be width x full_height x 1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Neil Roberts <neil@linux.intel.com>
Reviewed-by Neil Roberts <neil@linux.intel.com>
The src_reg constructor that received the glsl_type was using it
only to build the swizzle, but not to fill this->type as dst_reg
is doing.
This caused some type mismatch between movs and alu operations
on the NIR path, so copy propagation optimization was not applied
to remove unneeded movs if negate modifier was involved. This was
first detected on minus (negate+add) operations.
Shader DB results (taking into account only vec4):
total instructions in shared programs: 20019 -> 19934 (-0.42%)
instructions in affected programs: 2918 -> 2833 (-2.91%)
helped: 79
HURT: 0
GAINED: 0
LOST: 0
Reviewed-by: Matt Turner <mattst88@gmail.com>
Only a subset of AMD GPUs supported by r600g support doubles,
CAYMAN and CYPRESS are probably all we'll try and support, however
I don't have a CYPRESS so ignore that for now.
This disables SB support for doubles, as we think we need to
make the scheduler smarter to introduce delay slots.
[airlied: pushing this to avoid pain of rebasing, it mostly
works on cayman only so far, Glenn has some ideas about
delay slot issues we need to look into. turned off by
default for now]
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch is taken from work by Glenn and myself,
and I've spent some time making it all work here.
This adds support for the multiple streams part of
ARB_gpu_shader5 to r600g.
It doesn't enable ARB_gpu_shader5 yet.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds a peephole and removes an assert that isn't
actually valid with some of the stream emit instructions.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds support to the assembler dumper and allows
stream instructions to be generated. Also fix up the stream
debugging to add stream info.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The CTS packed_pixels test checks that readpixels doesn't write
into the space between rows, however we fail that here unless
we check the format and stride match.
This fixes all the core mesa problems with CTS packed_pixels
tests.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The fastpath currently checks the RowLength != width, but
if you have a RowLength of 7, and Alignment of 4, then
that shouldn't match.
align the rowlength to the pack alignment before comparing.
This fixes compressed cases in CTS packed_pixels_pixelstore
test when SKIP_PIXELS is enabled, which causes row length
to get set.
v1.1: add fxt1 fix (Iago)
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We don't need to use the 3d image address here as that will
include SKIP_IMAGES, and we are only blitting a single
2D anyways, so just use the 2D path.
This fixes some memory overruns under CTS
packed_pixels.packed_pixels_pixelstore when PACK_SKIP_IMAGES
is used.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
GL3.3 added GL_ARB_texture_rgb10_a2ui, which specifies
a lot more things than just rgb10/a2ui.
While playing with ogl conform one of the tests must
attempted all valid formats for GL3.3 and hits the
unreachable here.
This adds the first chunk of formats that hit the
assert.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In a number of places the SwapBytes handling didn't handle cases with
GL_(UN)PACK_ALIGNMENT set and 7 byte width cases aligned to 8 bytes.
This adds a common routine to swap bytes a 2D image and uses this
code in:
texture storage
texture get
readpixels
swrast drawpixels.
[airlied: updated with Brian's nitpicks].
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Let it be defined externally instead, allowing setting mechanisms other
than environment variables.
Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
The first function translates prim restart indexes to be 0xffff or
0xffffffff.
The second splits indexed primitives with restart indexes into sub-
primitives without restart indexes.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This adds a tgsi utility tgsi_add_aa_point to transform a fragment shader
to support anti-aliased wide point by computing the fragment distance from
the point center. This utility assumes the geometry shader is emitting
an extra generic output with point coord data. The semantic index of
this generic output is passed to the tgsi_add_aa_point utility.
Reviewed-by: Brian Paul <brianp@vmware.com>
This adds a tgsi utility tgsi_add_point_sprite to transform a geometry
shader to emulate wide points by drawing quads. This utility adds an
extra output for the original point position if the point position is
to be written to a stream output buffer. It also assumes the driver will
add a constant for inverse viewport scale after the user defined constants.
Reviewed-by: Brian Paul <brianp@vmware.com>
This could be used by any driver where the device doesn't directly
support two-sided lighting. This code modifies a fragment shader
to accecpt back-face colors and choose between the front/back colors
depending on the triangle's front-face sign.
These functions deal with inclusive coordinates, hence a 0/0/0/0 rect
returned when there's no intersection doesn't actually represent an empty
rectangle. Hence return 0/-1/0/-1 instead.
This fixes some problems in llvmpipe with empty scissor rects (which up
to now didn't really matter because while the intersect test returned the
wrong result all pixels were scissored away later anyway).
It isn't really obvious if intersection test should take into account empty
rectangles or if the caller should do it. But it looks like most callers
actually verified one of the rects but not the other, but since correctly
returning an empty rect that other rect could actually be empty leading to
more bugs. Hence just verify both rects for emptyness in the intersection
test itself which makes the code easier in the caller (though it will be
slower if the caller knows the rectangles are non-empty).
Reviewed-by: Zack Rusin <zackr@vmware.com>
This patch adds some more helper functions such as
. tgsi_transform_temps_decl
. tgsi_transform_output_decl
. tgsi_transform_dst_reg
. tgsi_transform_src_reg
Reviewed-by: Brian Paul <brianp@vmware.com>
The border colors are uploaded only once when the state is created.
This brings truly immutable sampler descriptors, because they don't have
to be updated every time a sampler state is re-bound.
It also moves the TA_BC_BASE_ADDR registers to init_config, removing one
more state. The catch is there is now a limit: only 4096 border colors can
be used by one context. I don't think that will be a problem.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Since we don't put any resource descriptors in IBs, the space used by draw
calls is quite small.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
The main idea is to avoid setting CB_COLORi_INFO = 0 for i>0 repeatedly
when those colorbuffers aren't used. This is mainly for glamor.
Same for DB. Z_INFO and STENCIL_INFO need to be cleared only once.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
This mainly removes the cache misses when checking the dirty flags.
Not much else though.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
I need to initialize more atom IDs.
This adds 4 more si_init_atom calls, which simplifies the code.
(si_init_atom needs a different context type of the emit functions though)
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
- convert 16 states to 1 atom
- only emit 1 scissor if VIEWPORT_INDEX isn't written
- use only one packet when emitting consecutive scissors
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Fixes regression from
commit 8c17d53823
Author: Kenneth Graunke <kenneth@whitecape.org>
Date: Wed Apr 15 03:04:33 2015 -0700
i965: Make intel_emit_linear_blit handle Gen8+ alignment restrictions.
which adjusted the coordinates to be relative to the nearest cacheline.
However, this then offsets the coordinates by up to 63 and this may then
cause them to overflow the BLT limits. For the well aligned large
transfer case, we can use 32bpp pixels and so reduce the coordinates by
4 (versus the current 8bpp pixels). We also have to be more careful
doing the last line just in case it may exceed the coordinate limit.
Reported-and-tested-by: kaillasse91@hotmail.fr
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90734
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
v2: fix detecting if the loop has any phi nodes after it.
v2: use nir_foreach_ssa_def() instead of nir_foreach_dest() when
checking for values live after the loop to catch const_load
instructions.
v2: fix handling return instructions
v2: add some documentation to loop_is_dead()
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We were already doing this internally for iterating over a function
implementation, so just expose it directly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: use nir_cf_node_remove_after().
v2: use foreach_list_typed() instead of hardcoding a list walk.
v3: update to new control flow modification helpers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: use nir_cf_node_remove_after() instead of our own broken thing.
v3: use the new control flow modification helpers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I've been chasing a geom shader hang on rv635 since I wrote
r600 geom code, and finally I hacked some values from fglrx
in and I could run texelfetch without failures.
This is totally my fault as well, maths fail 101.
This makes geom shaders on r600 not fail heavily.
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
According to OpenGL ES 3.1 specification, section 9.2.1 for
glFramebufferParameter and section 9.2.3 for glGetFramebufferParameteriv:
"An INVALID_ENUM error is generated if pname is not FRAMEBUFFER_DEFAULT_WIDTH,
FRAMEBUFFER_DEFAULT_HEIGHT, FRAMEBUFFER_DEFAULT_SAMPLES, or
FRAMEBUFFER_DEFAULT_FIXED_SAMPLE_LOCATIONS."
Therefore exclude OpenGL ES 3.1 from using the GL_FRAMEBUFFER_DEFAULT_LAYERS
parameter.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Kevin Rogovin <kevin.rogovin at intel.com>
This *should* ensure that the cursor gets properly advanced in all cases.
We had a problem before where, if the cursor was created using
nir_after_cf_node on a non-block cf_node, that would call nir_before_block
on the block following the cf node. Instructions would then get inserted
in backwards order at the top of the block which is not at all what you
would expect from nir_after_cf_node. By just resetting to after_instr, we
avoid all these problems.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This function's cases for non-generic compressed formats duplicate
the GL to MESA translation in _mesa_glenum_to_compressed_format().
This patch replaces the switch cases with a call to the translation
function. This change teaches this function about ASTC, thus enabling
ASTC for glTex*Storage*() calls.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
MESA_FORMAT_RGBA_DXT5 should actually be reserved for GL_RGBA[4]_DXT5_S3TC.
Also, Gallium and other dri drivers (radeon and nouveau) follow this mapping
scheme.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
As Glenn did for finalize_loop we need to update_cf when we
add a POP at the end of a shader.
I think this fixes one of the earlier shader going off end
of memory problems we've stopped.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The docs specifically call out SEL with .l and .ge as the
implementations of MIN and MAX respectively. Among other things,
SEL with these conditional mods are commutative.
See commit 3b7f683f.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Certain compressed formats require this setting. The docs don't go into much
detail as to why it's needed exactly.
This patch introduces no piglit regressions on gen9 (bsw is untested). Note that
the SKL "regressions" are fixed tests, and the egl_khr_gl_colorspace tests are
WTF. The patch also fixes nothing I can find.
http://otc-mesa-ci.jf.intel.com/job/Leeroy/127820/
v2:
Reworded commit message (Matt); Added piglit results link.
Restructured condition (Matt)
Moved check out to function (Nanley). I left the setting of the bit in the
surface state open coded because it seems to go better with the existing code.
v3:
Use and inline function only in gen8_emit_texture_surface_state() (Matt).
Cc: Matt Turner <mattst88@gmail.com>
Cc: Nanley Chery <nanleychery@gmail.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This can be done with a single pass for the instruction base,
and takes renumber_registers out of its spot on the profile.
Acked-by: Marek Olšák <marek.olsak@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
The glsl->tgsi convertor does some temporary register reduction
however in profiling shader-db this shows up quite highly,
so optimise things to reduce the number of loops through
all the instructions we do. This drops merge_registers
from 4-5% on the profile to 1%. I think this can be reduced
further by possibly optimising the renumber pass.
Acked-by: Marek Olšák <marek.olsak@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
Instead of looking this up lots, lets just cache it in the instruction
translation up front. I just noticed this function what high in a profile
of shader-db on radeonsi.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The selector is shared by all shader variants, so the
individual shaders shouldn't change it. Use tgsi_shader_scan()
results to set geometry properties within a
r600_create_shader_state() call and treat said propertices in
the selector as read-only within r600_shader_from_tgsi().
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Note that 'geometry shader properties' should be carried in the
selector state over the shader state in any case.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
The hardware is capable of dealing with GL1-style user clip planes.
No clip vertex, no clip distances. Fixes a number of ucp tests, as well
as neverball.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
According to NVIDIA, local performance counters (MP) are prefixed
with SM, while global performance counters (PCOUNTER) are called PM.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Otherwise this will crash on 32-bit, and it gets rid of
warnings building on 32-bit.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This greatly improves generated code, especially for the snorm variants,
since it is able to get rid of the lshift/rshift for sext, as well as
replacing each shift + mask with a single op.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It is fairly tricky to detect the proper conditions for using bitfield
insert, but easy to just use it up front. This removes a lot of
instructions on nvc0 when invoking the packing builtins.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
No need to walk through instructions in blocks we know don't contain our
registers' live ranges.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I always thought that the is_control_flow() -> return false check was a
bad hack, and some previous attempts to remove it have failed and have
been reverted.
The previous two patches fix some problems that caused register
coalescing to not notice some interference between registers, which the
is_control_flow() check apparently works around.
With that fixed, we can calculate interference more accurately.
total instructions in shared programs: 6261319 -> 6257917 (-0.05%)
instructions in affected programs: 346282 -> 342880 (-0.98%)
helped: 1552
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
equals() returns false for registers with different types, using it
isn't appropriate to determine whether an is overwriting a register.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Noticed when debugging things that lead to the next patch.
On G45 (and presumably ILK) this helps register coalescing:
total instructions in shared programs: 4077373 -> 4077340 (-0.00%)
instructions in affected programs: 43751 -> 43718 (-0.08%)
helped: 52
HURT: 2
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Zero sized uniforms can exist in the list, but they don't get get any space
allocated in prog_data->params or in the param_size array, so the size
should not be set for them. This was previously fixed in:
commit: 781dc7c0e1.
However,
commit: 259f7291de
removed the fix.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
If the sampler object has been deleted in the same context the binding
will have been cleared. If it has been deleted in another context, the
spec does not say what should returned. None of the other binding point
queries check for deletion in another context.
Also, as names of deleted objects are free for reuse, the current code
didn't even work reliably.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
This adds index queries (glGet*i_v) for GL_TEXTURE_BINDING_* and
GL_SAMPLER_BINDING, as well as textue queries
(glGetTex{,ture}Parameter*) for GL_TEXTURE_TARGET.
CC: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Shaders that contain instruction data after an instruction with EOP could end
up parsing that as an instruction, leading to various crashes and asserts in
SB as it gets very confused if it sees for instance a loop start instruction
jumping off to some random point.
Add a couple of asserts, and print EOP bit if set in old asm printer.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cube maps are special in that they have separate teximages for each
face. We handled that by copying the data to them separately, but in
case zoffset != 0 or depth != 6 we would read off the end of the client
array or modify the wrong images.
zoffset/depth have already been verified by the time the code gets to
this stage, so no need to double-check.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
The NIR cursor API is exactly what we want for the builder's insertion
point. This simplifies the API, the implementation, and is actually
more flexible as well.
This required a bit of reworking of TGSI->NIR's if/loop stack handling;
we now store cursors instead of cf_node_lists, for better or worse.
v2: Actually move the cursor in the after_instr case.
v3: Take advantage of nir_instr_insert (suggested by Connor).
v4: vc4 build fixes (thanks to Eric).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> [v4]
Acked-by: Connor Abbott <cwabbott0@gmail.com> [v4]
This patch implements a general nir_instr_insert() function that takes a
nir_cursor for the insertion point. It then reworks the existing API to
simply be a wrapper around that for compatibility.
This largely involves moving the existing code into a new function.
Suggested by Connor Abbott.
v2: Make the legacy functions static inline in nir.h (requested by
Connor Abbott).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
We want to use this for normal instruction insertion too, not just
control flow. Generally these functions are going to be extremely
useful when working with NIR, so I want them to be widely available
without having to include a separate file.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
Jumps must be the last instruction in a block, so inserting another
instruction after a jump is illegal.
Previously, we only checked this when the new instruction being inserted
was a jump. This is a red herring - inserting *any* kind of instruction
after a jump is illegal.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Connor Abbott <cwabbott0@gmail.com>
Previously, we used PROGRAM_ARRAY only for variables which were
arrays or matrices. But if the variable is a structure containing
an array or matrix, we need to use PROGRAM_ARRAY for that too.
Before, we failed an assertion:
state_tracker/st_glsl_to_tgsi.cpp:4900:
Assertion `src_reg->file != PROGRAM_TEMPORARY' failed.
when running the piglit test
glsl-1.20/execution/fs-const-array-of-struct-of-array.shader_test
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The split_virtual_grfs code doesn't properly rewrite reladdr so we need to
make sure that any uniform indirects are lowered away first.
This fixes the glsl-fs-uniform-indexed-by-swizzled-vec4.shader_test in piglit
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that all constant locations are assigned in a single function, we can
refactor it a bit to unify things. In particular, we now handle
pull_constant_loc and push_constant_loc more similarly and we only modify
stage_prog_data->params[] in one place at the end of the function.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
driParseDebugString() doesn't have actual code to parse comma separated
lists (or any other supported options?); instead it dumbly uses strstr().
This means that INTEL_DEBUG="vec4vs" will trigger both DEBUG_VEC4VS and
DEBUG_VS, as "vs" is also a substring.
We should probably improve the driconf parsing, but for now, just rename
the option so it's usable in the meantime.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Kristian Høgsberg <krh@bitplanet.net>
v2: use ARB_texture_multisample enable bit
Patch adds extension enable bit and enables required keywords
and builtin functions for the extension.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This patch creates a new macro, FETCH_COMPRESSED - similar in nature
to the other FETCH_* macros. This reduces repetition in the code that
deals with compressed textures.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
In agreement with the coding style, functions that aren't directly visible
to the GL API should prefer the use of bool over GLboolean.
Suggested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Remove redundant checks and comments by grouping our calculations for
align_w and align_h wherever possible.
v2: reintroduce brw.
don't include functional changes.
don't adjust function parameters or create a new function.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
An ASTC block takes up 16 bytes for all block width and height configurations.
This size is not integrally divisible by all ASTC block widths. Therefore cpp
is changed to mean bytes per block if the texture is compressed.
Because the original definition was bytes per block divided by block width, all
references to the mipmap width must be divided the block width. This keeps the
address calculation formulas consistent. For example, the units for miptree_level
x_offset and miptree total_width has changed from pixels to blocks.
v2: reuse preexisting ALIGN_NPOT macro located in an i965 driver file.
v3: move ALIGN_NPOT into seperate commit.
simplify cpp assignment in copy_image_with_blitter().
update miptree width and offset variables in: intel_miptree_copy_slice(),
intel_miptree_map_gtt(), and brw_miptree_layout_texture_3d().
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
In agreement with commit 4ab8d59a23, vertical alignment values are equal to
four times the block height on Gen9+.
v2: add newlines to separate declarations, statments, and comments.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
ALIGN is changed to ALIGN_NPOT because alignment values are sometimes not
powers of two when working with ASTC.
v2: handle texture arrays and LDR-only systems.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Aligning with a non-power-of-two number is a general task that can be used in
various places. This commit is required for the next one.
v2: add greater than 0 assertion (Anuj).
convert the macro to a static inline function.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
ALIGN and ROUND_DOWN_TO both require that the alignment value passed
into the macro be a power of two in the comments. Using software assertions
verifies this to be the case.
v2: use static inline functions instead of gcc-specific statement expressions (Brian).
v3: fix indendation (Brian).
v4: add greater than zero requirement (Anuj).
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Define two-thirds of the 2D Intel ASTC surface formats (LDR-only). This allows
a 1-to-1 mapping from the mesa format to the Intel format.
ASTC textures will default to being processed in LDR mode. If there is
hardware support for HDR/Full mode and the texture is not sRGB, add the
format bit necessary to process it in HDR/Full mode.
v2: remove extra newlines.
v3: follow existing coding style in translate_tex_format().
v4: expound on the GEN9_SURFACE_ASTC_HDR_FORMAT_BIT comment.
update SF table - ASTC is actually supported in Gen8.
v5: conform the ASTC MESA_FORMAT enums to the existing naming convention.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
This is necesary to initialize the gl_texture_image struct.
From the KHR_texture_compression_astc_ldr spec:
"Added to Section 3.8.6, Compressed Texture Images
Add the tokens specified above to Table 3.16, Compressed Internal Formats.
In all cases, the base internal format will be RGBA. The encoding allows
images to be encoded with fewer channels, but this is always presented as
RGBA to the sampler."
v2. use _mesa_is_astc_format().
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
The ASTC spec was revised as follows:
Revision 2, April 28, 2015 - added CompressedTex{Sub,}Image3D to
commands accepting ASTC format tokens in the New Tokens section [...].
Support only exists in the HDR submode:
Add a second new column "3D Tex." which is empty for all non-ASTC
formats. If only the LDR profile is supported by the implementation,
this column is also empty for all ASTC formats. If both the LDR and HDR
profiles are supported only, this column is checked for all ASTC
formats.
LDR-only systems should generate an INVALID_OPERATION error when
attempting to call CompressedTexImage3D with the TEXTURE_3D target.
v2. return the proper error for LDR-only systems.
v3. update is_astc_format().
v4. use _mesa_is_astc_format().
v5. place logic in _mesa_target_can_be_compressed.
v6. fix issues handling ASTC formats.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
In agreement with the ASTC spec, this makes calls to TexImage*D unsuccessful.
Implied by the spec, Generate[Texture]Mipmap and [Copy]Tex[Sub]Image*D calls
must be unsuccessful as well.
v2. actually force attempts to compress online to fail.
v3. indentation (Matt).
v4. update copytexture_error_check to account for CopyTexImage*D (Chad).
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
v2: correct the spelling of the sRGB variants.
remove spaces around "=" when setting the enum value.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Define the mesa formats and make changes necessary for compilation
without errors. Also add support for _mesa_get_srgb_format_linear().
v2. conform the ASTC MESA_FORMAT enums to the existing naming convention.
v3. remove ASTC cases for _mesa_get_uncompressed_format(). This function is
only used for generating mipmaps - something ASTC formats do not support
due to lack of online compression.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
If the packet encoding is defined in the same format as register definitions,
the python script can process them automatically and the parser support
becomes trivial.
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
This adds trace points to all IBs and the parser prints them and also
prints which trace points were reached (executed) by the CP.
This can help pinpoint a problematic packet, draw call, etc.
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
GPU hang detection must be enabled by setting: GALLIUM_DDEBUG=[timeout in ms]
This may print too much information that we might not understand yet,
but some of the bits are very useful.
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
This makes writing a good IB parser a lot easier.
It generates 2 tables:
- packet3 table
- register table with all registers, fields, and named values
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Instruction encoding isn't needed in Mesa.
The border color address registers were duplicated.
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
v2: lots of improvements
This is like identity or trace, but simpler. It doesn't wrap most states.
Run with:
GALLIUM_DDEBUG=1000 [executable]
where "executable" is the app and "1000" is in miliseconds, meaning that
the context will be considered hung if a fence fails to signal in 1000 ms.
If that happens, all shaders, context states, bound resources, draw
parameters, and driver debug information (if any) will be dumped into:
/home/$username/dd_dumps/$processname_$pid_$index.
Note that the context is flushed after every draw/clear/copy/blit operation
and then waited for to find the exact call that hangs.
You can also do:
GALLIUM_DDEBUG=always
to do the dumping after every draw/clear/copy/blit operation without
flushing and waiting.
Examples of driver states that can be dumped are:
- Hardware status registers saying which hw block is busy (hung).
- Disassembled shaders in a human-readable form.
- The last submitted command buffer in a human-readable form.
v2: drop pipe-loader changes, drop SConscript
rename dd.h -> dd_pipe.h
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
This works if drivers upsample on upload (like all radeon ones do).
The alternative is an unexpected GL error from anything calling
_mesa_update_state and possibly other issues.
Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Otherwise we get:
warning: 'num_user_sgprs' may be used uninitialized in this function
...
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Patch refactors existing parameters check to first check common enums
between desktop GL and GLES 3.1 and modifies get_tex_level_parameter_image
to be compatible with enums specified in 3.1.
v2: remove extra is_gles31() checks (suggested by Ilia)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v1)
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com> (v1)
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
According to OpenGL ES specification section 7.12,
GL_COMPUTE_WORK_GROUP_SIZE, is supported by the
glGetProgramiv function.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
According to the OpenGL ES 3.1 specification chapter 17, the
MAX_COMPUTE_WORK_GROUP_COUNT and MAX_COMPUTE_WORK_GROUP_SIZE
is available for glGetIntegeri_v.
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
commit 26c549e69d
Author: Nanley Chery <nanley.g.chery@intel.com>
Date: Fri Jul 31 10:26:36 2015 -0700
mesa/formats: remove compressed formats from matching function
caused a regression in my CTS testing, this looks like a clear
thinko.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
sSigned-off-by: Dave Airlie <airlied@redhat.com>
I used this as some testing ground for investigating some compiler
bits initially (e.g. lrint calls etc.), figured I could do much better
in the end just for fun...
This is mathematically equivalent, but uses some tricks to avoid
doubles and also replaces some float math with ints. Good for another
performance doubling or so. As a side note, some quick tests show that
llvm's loop vectorizer would be able to properly vectorize this version
(which it failed to do earlier due to doubles, producing a mess), giving
another 3 times performance increase with sse2 (more with sse4.1), but this
may not apply to mesa.
No piglit change.
Acked-by: Marek Olšák <marek.olsak@amd.com>
This code (lifted straight from the extension) was doing things the most
inefficient way you could think of.
This drops some of the more expensive float operations, in particular
- int-cast floors (pointless, values always positive)
- 2 raised to (signed) integers (replace with simple exponent manipulation),
getting rid of a misguided comment in the process (implement with table...)
- float division (replace with mul of reverse of those exponents)
This is like 3 times faster (measured for float3_to_rgb9e5), though it depends
(e.g. llvm is clever enough to replace exp2 with ldexp whereas gcc is not,
division is not too bad on cpus with early-exit divs).
Note that keeping the double math for now (float x + 0.5), as the results may
otherwise differ.
Acked-by: Marek Olšák <marek.olsak@amd.com>
GetTexImage can read to stencil8 but only from
a stencil or depthstencil textures.
This fixes a bunch of failures in CTS
GL33-CTS.gtf32.GL3Tests.packed_pixels
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Enables _mesa_target_can_be_compressed to return the appropriate GL error
depending on it's inputs. Use the parameter to return the appropriate GL error
for ETC2 formats on GLES3.
Suggested-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
All compressed formats return GL_FALSE and there isn't any evidence to
support that this behaviour would change. Remove all switch cases for
compressed formats.
v2. Since the exhaustive switch is removed, add a gtest to ensure
all formats are handled.
v3. Ensure that GL_NO_ERROR is set before returning.
v4. Fix an arg to _mesa_uncompressed_format_to_type_and_comps();
fix formatting and misc improvements (Chad).
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
We currently check that our format info table is sane during context
initialization in debug builds. Perform this check during
`make check` instead. This enables format testing in release builds
and removes the requirement of an exhuastive switch for
_mesa_uncompressed_format_to_type_and_comps().
v2. indentation and conditional inclusion fixes (Chad).
allow tests to continue running if any format fails
and display the failing format name.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
I intend to remove nir_builder::cf_node_list, so I can't have this code
poking at it directly. The proper way is to set the insertion point and
then simply insert things there.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I intend to remove nir_builder::cf_node_list, so I can't have this code
poking at it directly. The proper way is to set the insertion point and
then simply insert things there.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
This makes it easy for NIR passes to inspect what kind of shader they're
operating on.
Thanks to Michel Dänzer for helping me figure out where TGSI stores the
shader stage information.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The comment above move_uniform_array_access_to_pull_constants was
completely bogus because it has nothing to do with lowering instructions.
Instead, it's assiging locations of pull constants.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, we treated the entire UNIFORM file as if it had two elements:
One for direct things and one for indirect. This is substantially
different from how the old visitor code handled it where each element was
effectively its own uniform. This commit makes the NIR path more like the
old ir_visitor path where each uniform is separate. This should allow us
to more easily make decisions about what to push.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In the i965 backend, we want to be able to "pull apart" the uniforms and
push some of them into the shader through a different path. In order to do
this effectively, we need to know which variable is actually being referred
to by a given uniform load. Previously, it was completely flattened by
nir_lower_io which made things difficult. This adds more information to
the intrinsic to make this easier for us.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously, there were four type_size() functions in play - the i965
compiler backend defined scalar and vec4 type_size() functions, and
nir_lower_io contained its own similar functions.
In fact, the i965 driver used nir_lower_io() and then looped over the
components using its own type_size - meaning both were in play. The
two are /basically/ the same, but not exactly in obscure cases like
subroutines and images.
This patch removes nir_lower_io's functions, and instead makes the
driver supply a function pointer. This gives the driver ultimate
flexibility in deciding how it wants to count things, reduces code
duplication, and improves consistency.
v2 (Jason Ekstrand):
- One side-effect of passing in a function pointer is that nir_lower_io is
now aware of and properly allocates space for image uniforms, allowing
us to drop hacks in the backend
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If there are no parameters, we don't need to create a nir_variable to
hold them...and allocating an array of length 0 is pretty bogus.
Should avoid i965 backend assertions in future patches Jason and I are
working on.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
I want to use C function pointers to these, and they don't use anything
in the visitor classes anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This way they don't implicitly increment the uniforms variable and don't
have to be called in-sequence during uniform setup.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The commit:
commit b49371b8ed
Author: Connor Abbott <cwabbott0@gmail.com>
AuthorDate: Tue Jul 21 19:54:18 2015 -0700
nir: move control flow modification to its own file
split out some control flow related APIs into a separate header, but did
not update drivers.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The commit:
commit 8e0d4ef341
Author: Kenneth Graunke <kenneth@whitecape.org>
AuthorDate: Thu Aug 6 18:18:40 2015 -0700
nir: Delete the nir_function_impl::start_block field.
removed the start_block field without fixing up drivers..
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Connor introduced this helper recently; we should use it here too.
I had to move the function earlier in the file for it to be available.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This gives us some testing of it. Also, the old nir_cf_node_remove()
wasn't handling phi nodes correctly and was calling cleanup_cf_node()
too late.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These will help us do a number of things, including:
- Early return elimination.
- Dead control flow elimination.
- Various optimizations, such as replacing:
if (foo) {
...
}
if (!foo) {
...
}
with:
if (foo) {
...
} else {
...
}
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is a helper that will be shared between the new control flow
insertion and modification code.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
For now, it allows us to refactor the control flow insertion API's so
that there's a single entrypoint (with some wrappers). More importantly,
it will allow us to reduce the combinatorial explosion in the extract
function. There, we need to specify two points to extract, which may be
at the beginning of a block, the end of a block, or in the middle of a
block. And then there are various wrappers based off of that (before a
control flow node, before a control flow list, etc.). Rather than having
9 different functions, we can have one function and push the actual
logic of determining which variant to use down to the split function,
which will be shared with nir_cf_node_insert().
In the future, we may want to make the instruction insertion API's as
well as the builder use this, but that's a future cleanup.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When we insert a single basic block A into another basic block B, we
will split B into C and D, insert A in the middle, and then splice
together C, A, and D. When we splice together C and A, we need to move
the successors of A into C -- except A has no successors, since it
hasn't been inserted yet. So in move_successors(), we need to handle the
case where the block whose successors are to be moved doesn't have any
successors. Fixing link_blocks() here prevents a segfault and makes it
work correctly.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We may delete a control flow node which contains structured jumps to
other parts of the program. We need to remove the jump as a predecessor,
as well as remove any phi node sources which reference it. Right now,
the same problem exists for blocks that don't end in a jump instruction,
but with the new API it shouldn't be an issue, since blocks that don't
end in a jump must either point to another block in the same extracted
CF list or not point to anything at all.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Unlike calling nir_instr_remove(), calling nir_cf_node_remove() (and
later in the series, the nir_cf_list_delete()) implies that you're
removing instructions that may still have uses, except those
instructions are never executed so any uses will be undefined. When
cleaning up a CF node for deletion, we must clean up any uses of the
deleted instructions by making them point to undef instructions instead.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In particular, handle the case where the earlier block ends in a jump
and the later block is empty. In that case, we want to preserve the jump
and remove any traces of the later block. Before, we would only hit this
case when removing a control flow node after a jump, which wasn't a
common occurance, but we'll need it to handle inserting a control flow
list which ends in a jump, which should be more common/useful.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Before, we would only split a block with a jump at the end if we were
inserting something after a block with a jump, which never happened in
practice. But now, we want to use this to extract control flow lists
which may end in a jump, in which case we really need to do the correct
patching up. As a side effect, when removing jumps we now correctly
insert undef phi sources in some corner cases, which can't hurt.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Before, the process of removing a jump and wiring up the remaining block
correctly was atomic, but with the new control flow modification it's
split into two parts: first, we extract the jump, which creates a new
block with re-wired successors as well as a free-floating jump, and then
we delete the control flow containing the jump, which removes the entry
in the predecessors and any phi node sources. Split up
nir_handle_remove_jumps() to accomodate this, and add the missing
support for removing phi node sources.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We want to start reworking and expanding this code, but it'll be a lot
easier to do once we disentangle it from the rest of the stuff in nir.c.
Unfortunately, there are a few unavoidable dependencies in nir.c on
methods we'd rather not expose publicly, since if not used in very
specific situations they can cause Bad Things (tm) to happen. Namely, we
need to do some magical control flow munging when adding/removing jumps.
In the future, we may disallow adding/removing jumps in
nir_instr_insert_*() and nir_instr_remove(), and use separate functions
that are part of the control flow modification code, but for now we
expose them and put them in a separate, private header.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
cleanup_cf_node() is part of the control flow modification code, which
we're going to split into its own file, but remove_defs_uses() is an
internal function used by nir_instr_remove(). Break the dependency by
making cleanup_cf_node() use nir_instr_remove() instead, which simply
calls remove_defs_uses() and then removes the instruction from the list.
nir_instr_remove() does do extra things for jumps, though, so we avoid
calling it on jumps which matches the previous behavior (this will be
fixed later in the series).
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It was being used to initialize function impls and loops, even though
it's really a control flow modification helper. It's pretty trivial, so
just inline it to avoid the dependency.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's simply the first nir_cf_node in the nir_function_impl::body list,
which is easy enough to access - we don't to store a pointer to it
explicitly. Removing it means we don't need to maintain the pointer
when, say, splitting the start block when modifying control flow.
Thanks to Connor Abbott for suggesting this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Only uncompressed formats have a non-void type and actual
components per pixel. Rename _mesa_format_to_type_and_comps
to _mesa_uncompressed_format_to_type_and_comps and require
callers to check if the format is not compressed.
v2. include compressed format cases to avoid gcc warnings (Chad).
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
On the older platforms where we don't have logical contexts preserving
state across batches, we emit the invariant state setup on every batch
using the brw_invariant_state atom. This includes the pipeline selection
which is cached with the introduction of
commit 0e0e23ef53
Author: Jordan Justen <jordan.l.justen@intel.com>
Date: Wed Apr 22 11:43:50 2015 -0700
i965/state: Emit pipeline select when changing pipelines
However, we do not reset the cache between batches on context-less
platforms resulting in us not setting the pipeline selection and can
cause GPU hangs if a media pipelined was loaded in the meantime (e.g.
mixing mplayer/gstreamer using libva and gnome-shell). A simple solution
is to just forcibly re-emit the pipeline select along with the invariant
state and reset the cache at that point.
Reported-and-tested-by: Tomasz C. <tomaszc@o2.pl>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91254
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
This reverts commit 567394112d.
It regressed performance. It looks like smaller IBs are better, because
the GPU goes idle quicker and there is less waiting for buffers and fences.
Cc: 11.0 <mesa-stable@lists.freedesktop.org>
When the edge flag element is enabled then the elements are slightly
reordered so that the edge flag is always the last one. This was
confusing the code to upload the 3DSTATE_VF_INSTANCING state because
that is uploaded with a separate loop which has an instruction for
each element. The indices used in these instructions weren't taking
into account the reordering so the state would be incorrect.
v2: Use nr_elements instead of brw->vb.nr_enabled so that it will cope
when gl_VertexID is used.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91292
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
The edge flag data on Gen6+ is passed through the fixed function hardware as
an extra attribute. According to the PRM it must be the last valid
VERTEX_ELEMENT structure. However if the vertex ID is also used then another
extra element is added to source the VID. This made it so the vertex ID is in
the wrong register in the vertex shader and the edge attribute is no longer in
the last element.
v2: Also implement for BDW+
v3 [by Ben]: Remove 10.5 tag. Too late.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84677
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
The build/file was removed with an earlier commit while the EXTRA_DIST
was forgotten.
Fixes: 66d77cd71c (scons: don't build the kms-dri winsys)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The files are not referenced in any other place in whole of
mesa. They are likely remnants of the early development stage.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
vc4 conflicts with ilo, when build on x86 as it's build for emulation
purposes. In that mode a i965-like symbol is exported by vc4, which
conflicts with the ilo one in the gallium-dri megadriver.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The nv_conditional_render piglits were sporadically failing. Moving
the control flush from the write and placing it just before the read
was sufficient to make the piglits pass a 1000/1000 times. The bspec
says that the flush enable bit "waits until all previous writes of
immediate data from post sync circles are complete before executing the
next command" - the operative word being previous!
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90691
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Neil Roberts <neil@linux.intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I switched us to tracking whether the results *could* go to r4, but then
didn't make a separate register class for the class bits that included r4.
Switch the "any" class to actually be "any", and name the "any but r4"
class more appropriately.
total instructions in shared programs: 96798 -> 94680 (-2.19%)
instructions in affected programs: 62736 -> 60618 (-3.38%)
We had several reports of users hitting bugs
with the other path to upload constants,
and switching to the user constant buffer
path solves the bugs.
User constant buffers are expected to be slower
for Nvidia cards, so ideally this patch should be
reverted when the path is fixed.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Krzysztof Sobiecki <sobkas@gmail.com>
It is very common for d3d9 apps to set again the constants
they need before every draw call, even if nothing changed.
Since we are mostly gpu bound, it is better to check
for change, and upload constants again (and thus use
gpu bandwith) only if the constants changed.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
The number of texture stages is 8.
'tex_stage' array was too big, and thus
the checks with 'Elements(state->ff.tex_stage)' were passing,
causing some invalid API calls to pass, and crash because of
out of bounds write since bumpmap_vars was just the correct size.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
The CSO cache unbinds views that are not needed anymore,
which we don't do.
It checks for change before committing the views.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
nine_update_state called every draw call.
This patch attemps to change the order
of the checks to have better control flow
Signed-off-by: Axel Davy <axel.davy@ens.fr>
There were flags all sm3 cards do advertise,
and we weren't.
Some games can trigger buggy rendering path
if the caps are not what they expect.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Always use a user constant buffer for ff.
It means we have to:
. commit the user constant buffer for ff when we use it
. commit back the non-ff constant buffer when we stop using it
Signed-off-by: Axel Davy <axel.davy@ens.fr>
We have two paths:
. One that uses a fixed constant buffer, and updates it when needed
. One that uses a user constant buffer, and uploads it when needed.
This patch separates the preparation of the constant buffer
and the commit.
It also removes NineDevice9_RestoreNonCSOState, which was
used to restore all states. Instead the commit of the constant
buffer is moved to nine_state, and the other field settings
moved to other functions where more appropriate.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
For now the path updated is only used by Amd drivers, but a later
patch will make it used by all drivers. Some drivers like llvmpipe
doesn't support the uploading of constants from user buffers, so improve
the path to work for all drivers
Inspired from the gl state tracker.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Instead of mixing state preparation (filling pipe_****)
and state commit (pipe->set_*****),
begin doing so in two separate functions.
This will allow to implement efficient Stateblocks,
and eventually lead to optimisation where the complete
pipe_*** structure is only partially updated.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Set all values to 0 after allocation. Found using valgrind.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
In case NineBaseTexture9_ctor returns an error
This->surfaces[l] might be NULL.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Allow more than two errors, and return D3DERR_INVALIDCALL
for failed display resolution changes.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Tests showed, that in case of errors, the pBits pointer is set to NULL.
The pBits field isn't set to NULL in case of an already locked object.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Return 0 for non MANAGED textures and surfaces.
Fixes failing wine d3d9 tests device.c test_resource_priority.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Move the assert to return error codes in the correct order.
Always set the pSizeOfData to the required buffer size.
Fixes failing wine test device.c test_private_data()
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
This fixes wine test device.c test_lockrect_invalid()
Mimic WindowsXp behaviour and allow negative values in the rectangle passed.
Add comment to point out behaviour used.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
For the case of D3DPOOL_MANAGED textures, This->base.resource can be NULL
at the start of the function. In This case, UploadSelf will take care
of the defining. Assign resource after the UploadSelf call
to prevent NULL pointer exception.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
IT is better check if the surface was created with RT flag,
instead of checking capability (llvmpipe was complaining)
Signed-off-by: Axel Davy <axel.davy@ens.fr>
EvictManagedResources is used by apps to free
the gpu memory of MANAGED textures (which have
a cpu memory backing)
Signed-off-by: Axel Davy <axel.davy@ens.fr>
UpdateTexture is supposed to optimise by uploading only for the
dirty region of the source (d3d9 doc, wine tests).
This patch adds the behaviour for surfaces, but not entirely for
volumes.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Dirty regions should be tracked for both MANAGED
and SYSTEMMEM.
Until now we didn't bother to track for SYSTEMMEM,
because we hadn't implemented using the dirty region
to avoid some copies
Signed-off-by: Axel Davy <axel.davy@ens.fr>
If the texture is bound and dirty_mip is true,
BASETEX_REGISTER_UPDATE adds the texture to the list
of things to update before the next draw call.
Some calls to it were missing.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
NineSurface9_CopySurface was supporting more cases than what
we needed, and doing checks that were innapropriate for
some NineSurface9_CopySurface use cases.
This patch splits it into two for the two use cases, and moves
the checks to the caller.
This patch also adds a few checks to NineDevice9_UpdateSurface
Signed-off-by: Axel Davy <axel.davy@ens.fr>
For sw cursor we do not tell wine the cursor position (the app
tells us directly). We shouldn't use ID3DPresent_GetCursorPos.
device->cursor.pos already contains the coordinates the app
gave us.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: David Heidelberg <david@ixit.cz>
We have either hardware cursor or software cursor.
When we use software cursor, we should hide the hardware
cursor.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: David Heidelberg <david@ixit.cz>
D3DRS_DITHERENABLE was assigned to the rasterizer state
group, but it was used for the blend group.
Assign it to the blend group.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
When using D3DRS_POINTSIZE make sure the value is at least
D3DRS_POINTSIZE_MIN but not greater than D3DRS_POINTSIZE_MAX.
Fixes some Wine tests.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Align texture memory on 32 byte boundry to allow
SSE/AVX memcpy to work on locked rects.
This fixes some crashes with games using SSE.
Reviewed-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
We had red and green in the wrong channels
for the ATI2 format (RGTC2).
Found thanks to wine tests.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: David Heidelberg <david@ixit.cz>
Add support for multiple cards and fill in Win
like card name, driver name and version info.
Use fallback for unknown vendors and unknown card names.
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Nine code uses some C11 features, and this
leads to compile error on gcc <= 4.5
Another way would have been to use the
-fms-extensions CFLAG
Signed-off-by: David Heidelberg <david@ixit.cz>
Cc: "10.4 10.5 10.6" <mesa-stable@lists.freedesktop.org>
This got missed because the piglit test only tested int images to avoid a
combinatiorial explosion of format, targets, stages and sizes which
takes more than 5 minutes to test on nvidia's driver.
This patch also drops the IMAGE_FUNCTION_AVAIL_ATOMIC which is not applicable
to the image_size codepath but was not hurting in any way.
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
There is no MDOperand in llvm 3.5.
v2: Check if kernel metadata is present to avoid crash (EdB).
v3: Second attempt to avoid crash: switch off metadata query for llvm < 3.6.
Reviewed-by: Serge Martin (EdB) <edb+mesa@sigluy.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We're generating rcps as part of backend lowering of the packed coordinate
in the CS, and we don't want to lower them in NIR because of the extra
newton-raphson steps in the common case. However, GLB2.7 is moving a
vertex attribute with a 1.0 W component to the position, and that makes us
produce some silly RCPs.
total instructions in shared programs: 97590 -> 97580 (-0.01%)
instructions in affected programs: 74 -> 64 (-13.51%)
I had QPU emit code to do it, but forgot to flag the register class.
total instructions in shared programs: 97974 -> 97590 (-0.39%)
instructions in affected programs: 25291 -> 24907 (-1.52%)
Now that we do non-SSA QIR instructions, we can take a NIR SSA src that's
only used by the unorm packing and just stuff the pack bits into it.
total instructions in shared programs: 98136 -> 97974 (-0.17%)
instructions in affected programs: 4149 -> 3987 (-3.90%)
This helps ensure that the register allocator doesn't force the later pack
operations to insert extra MOVs.
total instructions in shared programs: 98170 -> 98159 (-0.01%)
instructions in affected programs: 2134 -> 2123 (-0.52%)
Now that we have NIR, most of the optimization we still need to do is
peepholes on instruction selection rather than general dataflow
operations. This means we want to be able to have QIR be a lot closer to
the actual QPU instructions, just with virtual registers. Allowing
multiple instructions writing the same register opens up a lot of
possibilities.
The constant folding pass can take a long time to complete
so rather than running through the entire pass each time
a new constant is propagated (and vice versa) interleave them.
This change helps ES31-CTS.arrays_of_arrays.InteractionFunctionCalls1
go from around 2 min -> 23 sec.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Due to a quirk in how the nv50 opt passes run, the algebraic
optimization that looks for these BFE's happens before the constant
folding pass. Rearranging these passes isn't a great idea, but this is
easy enough to fix. Allows a following cvt to eliminate the bfe in
certain situations.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
See issue from the ARB_texture_query_lod spec for LOD vs Lod confusion:
(3) The core specification uses the "Lod" spelling, not "LOD". Should
this extension be modified to use "Lod"?
RESOLVED: The "Lod" spelling is the correct spelling for the core
specification and the preferred spelling for use. However, use of
"LOD" also exists, as the extension predated the core specification,
so this extension won't remove use of "LOD".
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
This reverts commit ffe6c6ad5f.
_mesa_format_num_components() does not include the padding bits in mesa formats
containing 'X' channels. This could cause mipmap generation for certain
uncompressed formats to underestimate the number of channels in the source
image by 1.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
FLT_TO_INT goes in the vector pipes on evergreen/NI,
not the trans unit as on earlier chips.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This struct was getting a bit crowded, following the lead of
radeonsi, mirror the idea of having sub-structures for each
shader type. Turning 'r600_shader_key' into an union saves
some trivial memory and CPU cycles for the shader keys.
[airlied: drop as_ls, and reorder so larger fields at start.]
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This happens with unpackSnorm lowering. There's yet another
bitfield-extract behind it, but there's too much variation to be worth
cutting through.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
unpackUnorm* lowering doesn't AND the high byte/word as it's
unnecessary. Detect that situation as well.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Some Unigine shaders have been observed to unpack bytes out of 32-bit
integers and convert them to floats. I2F/I2I can handle this sort of
thing directly. Detect the handleable situations.
This misses 16-bit word capabilities in nv50, but I haven't seen shaders
that would actually make use of that.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Some shaders appear to extract bits using shift/and combos. Detect
(some) of those and convert to EXTBF instead.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
If build with C++11 standard, use std::unordered_set.
Otherwise if build on old Android version with stlport,
use std::tr1::unordered_set with a wrapper class.
Otherwise use std::tr1::unordered_set.
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
I pushed a half-baked version of "i965: handle nir_intrinsic_image_size" by
accident. Not having the Reviewed-by: tags on the last two commits should
have been a red flag but I somehow missed it after the QA check.
This patch should fix image-size for non-int images. I will add support to
the piglit test for all the other image types.
Sorry for the noise.
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
v2, Review from Francisco Jerez:
- avoid the camelCase for the booleans
- init the booleans using the sampler type
- force the initialization of all the components of the output register
v3:
- Rename a variable from CubeMapArray to CubeArray to re-use GLSL's name (Ilia)
- Fix some indentation and drop parenthesis (Topi)
- Fix a signed/unsigned comparaison warning
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
v2, review from Francisco Jerez:
- make the destination variable as large as what the nir instrinsic
defines (4) instead of the size of the return variable of glsl. This
is still safe for the already existing code because all the intrinsics
affected returned the same amount of components as expected by glsl IR.
In the case of image_size, it is not possible to do so because the
returned number of component depends on the image type and this case
is not well handled by nir.
v3:
- Style fix
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The code is heavily inspired from Francisco Jerez's code supporting the
image_load_store extension.
Backends willing to support this builtin should handle
__intrinsic_image_size.
v2: Based on the review of Ilia Mirkin
- Enable the extension for GLES 3.1
- Fix indentation
- Fix the return type (float to int, number of components for CubeImages)
- Add a warning related to GLES 3.1
v3: Based on the review of Francisco Jerez
- Refactor the code to share both add_image_function and _image with the other
image-related functions
v4: Based on Topi Pohjolainen's comments
- Do not add parenthesis for the return value
v5: based on Francisco Jerez's comments:
- Fix a few indent issues
- Reduce the size of a condition by testing the dimension and array properties
instead of enumerating all the formats.
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
I'll mark the OES_shader_image_atomic extension entry with this tag to
make sure that we don't expose it on earlier GLES API versions
accidentally, because according to the extension:
"OpenGL ES 3.1 and GLSL ES 3.10 are required."
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This includes the minimum required desktop/ES GLSL version in the
format qualifier table in anticipation of new GLSL versions extending
the set of supported image formats. According to section 4.4.7 of the
GLSL ES 3.1 spec:
"The format layout qualifier identifiers for image variable
declarations are:
[...]
rgba32f
rgba16f
r32f
rgba8
rgba8_snorm
[...]
rgba32i
rgba16i
rgba8i
r32i
[...]
rgba32ui
rgba16ui
rgba8ui
r32ui"
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
These seem to have been re-added at some point during the
ARB_tessellation_shader implementation work. AFAICT the second
(correct) definition of each constant would have had no effect because
the symbols were already defined.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
These are a subset of the image types supported by desktop GL,
excluding 1D, 1D array, rectangle, buffer, cube array, 2D MS and 2D
MS array texture targets.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
From the GLSL ES 3.1 spec, section 4.7.3:
"Any floating point, integer, opaque type declaration can have the
type preceded by one of these precision qualifiers: [...] highp
[...], mediump [...], lowp [...]."
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Note that this is slightly more permissive than the spec language
requires: "Any image variable must specify a format layout qualifier."
The GLSL ES spec seems really sketchy regarding format layout
qualifiers on function formal parameters -- On the one hand they are
required, but on the other hand it doesn't provide any syntax to
specify them (see section 6.1.1), they don't participate in parameter
type matching for overload resolution, and are in fact explictly
forbidden ("Layout qualifiers cannot be used on formal function
parameters"). Of course none of the image built-in functions defined
by the spec specify format layout qualifiers (and they probably
couldn't sensibly), to contradict its own requirement.
This probably qualifies for a spec bug, but in the meantime do the
sensible thing and require layout qualifiers on uniforms *only*.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Support for binding an image to an image unit explicitly in the shader
source is required by both GLSL 4.2 and GLSL ES 3.1, but not by the
original ARB_shader_image_load_store extension.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
The GLES 3.1 spec removed support for updating the image unit bound to
an image uniform using glUniform1i() calls.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
There is no GL_R8 image format in GLES, according to the state table
20.32 of the GLES 3.1 spec the default value should be GL_R32UI. The
ES31-CTS.shader_image_load_store.basic-api-bind Khronos conformance
test checks that this is the case.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The ES31-CTS.shader_image_load_store.basic-api-bind conformance test
expects the whole image unit state to be reset when the bound texture
object is deleted. The ARB_shader_image_load_store extension is
rather vague regarding what should happen with image unit state other
than the texture object in that case, but the GL 4.2 and GLES 3.1
specifications (section "Automatic Unbinding of Deleted Objects")
explicitly require it to be reset to the default values.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The spec requires that all layers of the image starting from the 0-th
are bound to the image unit regardless of the Layer parameter when
Layered is true, so I was setting gl_image_unit::Layer to zero in that
case for the convenience of the driver back-end. However the
ES31-CTS.shader_image_load_store.basic-api-bind conformance test
checks that the layer value returned by glGetInteger is the same that
was originally specified, regardless of the value of layered. Rename
Layer to _Layer as is usual for other derived state and keep track of
the original layer value as gl_image_unit::Layer.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The name of both the GLSL built-in variable and the glGetInteger param
with the same value changed in GLSL ES 3.1 and GL 4.5. Its semantics
also changed slightly, since the limit now also takes into account the
number of SSBs in use. Switch our internal data structures to the
up-to-date name.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Combine the adjacent cases which have the same GL type in the switch statemnt.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Add the classes of compressed formats as layouts. This allows the detection
of compressed formats belonging to a certain category of compressed formats.
v2. simplify layout name construction (Ilia).
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
GL_ARB_compute_shader is limited for GLSL version 430.
This enables for GLSL ES version 310.
V2: Updated error string to also include GLSL 3.10
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
According to Open GL ES 3.1 specification, section 8.10.2
GL_IMAGE_FORMAT_COMPATIBILITY_TYPE should be supported by
glGetTexParameterfv.
Signed-off-by: Marta Lofstedt <marta.lofstedt@linux.intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This support should be removed in favor of something that actually works
in all the weird cases. However this is simple and is enough to allow
Bioshock Infinite to render properly on nvc0.
Since the functionality is not implemented correctly, the extension will
not appear in the extension string and mesa will still return
INVALID_OPERATION for any glCopyImageSubData calls. In order to make use
of this functionality, run with
MESA_EXTENSION_OVERRIDE=GL_ARB_copy_image
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Patch separates array samplers from the texture_multisample check so that we
can enable only [iu]sampler2DMS, [iu]sampler2DMSArray are not supported.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
v2: code cleanup
v3: check only dimensions, samples is checked separately later
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
This is done so that following patch can use it to verify dimensions
for multisample variants of glTex*Storage.
v2: move function to header, use bool instead GLboolean
v3: small changes, cleanup
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
When gl_VertexID or gl_InstanceID is used a 3DSTATE_VF_SGVS
instruction is sent to create a sort of element to store the generated
values. The last instruction in this chunk of code looks like it was
trying to set the instancing state for the element using the
3DSTATE_VF_INSTANCING instruction. However it was sending
brw->vb.nr_buffers instead of the element index. This instruction is
supposed to take an element index and that is how it is used further
down in the function so the previous code looks wrong. Perhaps
previously the number of buffers coincidentally matched the number of
enabled elements so the value was generally correct anyway. In a
subsequent patch I want to change a bit how it chooses the SGVS
element index so this needs to be fixed.
v2 [by Ben]
Remove stable 10.5 stable tag (it's too late now)
Commit update as follows:
The number of vertex buffers emitted is always <= the number of vertex elements.
To maximize reuse (actually, to minimize relocations - according to the code
comments), a vertex buffer is only emitted once, even when we setup multiple
components (3DSTATE_VERTEX_ELEMENT) from that buffer. This meant that the
previous code would use the wrong indexed element for these reuse cases. This
patch by itself prevents hangs on BSW in the linked bug. It doesn't make the
test pass, the remaining patches are needed for that.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91610
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Since i965 is now using make_reg_conflicts_transitive and doesn't need
q-value computations, they are disabled on i965. They are enabled
everywhere else so that they get the old behavior. This reduces the time
spent in eglInitialize() on BDW by around 10-15%.
Reviewed-by: Eric Anholt <eric@anholt.net>
Instead of adding transitive conflicts as we go, we now add regular
conflicts and them make them all transitive at the end. This reduces
screen creation time substantially on BDW. The time spent in eglInitialize
is reduced from 27.78 ms/call to 9.92 ms/call in debug mode and from 13.15
ms/call to 4.54 ms/call in release mode (about 65% in either case).
Reviewed-by: Eric Anholt <eric@anholt.net>
They're used by glsl_to_nir.cpp, and I want to use them in TGSI-to-NIR as
well (our use of the var->index slot to store slot properties no longer
works since it got truncated).
The *_MAX defines are left in mtypes.h, because they depend on config.h.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This code was split out into a separate function to be used also
by GL_EXT_separate_shader_objects which has since been removed from
Mesa, so move it back.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
To properly support the case of waiting on a fence with a 0 timeout, we
still need to call down to the kernel. Which requires the use of the
new fd_pipe_wait_timeout() API.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Don't take current timestamp/fence from current ring, as we might have
already rolled over to new rb.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The positive and negative value of a float can only
be equal to each other if it is -0.0f and 0.0f.
This is safe for Nan and Inf, as -Nan != Nan, and -Inf != Inf
This gives no changes in my shader-db
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
-NaN != NaN, and -Inf != Inf, so this should be safe.
Found while working on my VRP pass.
Shader-db results on my IVB:
total instructions in shared programs: 1698267 -> 1698067 (-0.01%)
instructions in affected programs: 15785 -> 15585 (-1.27%)
helped: 36
HURT: 0
GAINED: 0
LOST: 0
Some shaders was found to have the following pattern in NIR:
vec1 ssa_26 = fneg ssa_21
vec1 ssa_27 = fne ssa_21, ssa_26
Make that:
vec1 ssa_27 = fne ssa_21, 0.0f
This is found in Dota2 and Brutal Legend.
One shader is cut by 8%, from 323 -> 296 instructons in SIMD8
Signed-off-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
mesa/src/mesa/drivers/dri/i965/gen7_sol_state.c: In function 'gen7_upload_3dstate_so_decl_list':
mesa/src/mesa/drivers/dri/i965/gen7_sol_state.c:119:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < linked_xfb_info->NumOutputs; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/gen6_vs_state.c: In function 'gen6_upload_push_constants':
mesa/src/mesa/drivers/dri/i965/gen6_vs_state.c:85:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < prog_data->nr_params; i++) {
^
mesa/src/mesa/drivers/dri/i965/gen6_vs_state.c:92:17: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < prog_data->nr_params; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/brw_vs_surface_state.c: In function 'brw_upload_pull_constants':
mesa/src/mesa/drivers/dri/i965/brw_vs_surface_state.c:84:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < prog_data->nr_pull_params; i++) {
^
mesa/src/mesa/drivers/dri/i965/brw_vs_surface_state.c:89:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ALIGN(prog_data->nr_pull_params, 4) / 4; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/brw_wm_surface_state.c: In function 'brw_upload_abo_surfaces':
mesa/src/mesa/drivers/dri/i965/brw_wm_surface_state.c:961:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < prog->NumAtomicBuffers; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/brw_wm_surface_state.c: In function 'brw_upload_ubo_surfaces':
mesa/src/mesa/drivers/dri/i965/brw_wm_surface_state.c:901:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < shader->NumUniformBlocks; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/brw_tex_layout.c: In function 'brw_miptree_layout_texture_array':
mesa/src/mesa/drivers/dri/i965/brw_tex_layout.c:560:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int q = 0; q < mt->level[level].depth; q++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/brw_state_cache.c: In function 'brw_try_upload_using_copy':
mesa/src/mesa/drivers/dri/i965/brw_state_cache.c:216:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < cache->size; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/brw_primitive_restart.c: In function 'can_cut_index_handle_prims':
mesa/src/mesa/drivers/dri/i965/brw_primitive_restart.c:94:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < nr_prims; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/brw_draw_upload.c: In function 'brw_prepare_vertices':
mesa/src/mesa/drivers/dri/i965/brw_draw_upload.c:434:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = j = 0; i < brw->vb.nr_enabled; i++) {
^
mesa/src/mesa/drivers/dri/i965/brw_draw_upload.c:557:17: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < nr_uploads; i++) {
^
mesa/src/mesa/drivers/dri/i965/brw_draw_upload.c:569:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < nr_uploads; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/brw_draw.c: In function 'brw_draw_destroy':
mesa/src/mesa/drivers/dri/i965/brw_draw.c:630:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < brw->vb.nr_buffers; i++) {
^
mesa/src/mesa/drivers/dri/i965/brw_draw.c:636:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < brw->vb.nr_enabled; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/egl/drivers/dri2/platform_drm.c: In function 'release_buffer':
mesa/src/egl/drivers/dri2/platform_drm.c:73:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(dri2_surf->color_buffers); i++) {
^
mesa/src/egl/drivers/dri2/platform_drm.c: In function 'has_free_buffers':
mesa/src/egl/drivers/dri2/platform_drm.c:87:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(dri2_surf->color_buffers); i++)
^
mesa/src/egl/drivers/dri2/platform_drm.c: In function 'dri2_drm_destroy_surface':
mesa/src/egl/drivers/dri2/platform_drm.c:199:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(dri2_surf->color_buffers); i++) {
^
mesa/src/egl/drivers/dri2/platform_drm.c: In function 'get_back_bo':
mesa/src/egl/drivers/dri2/platform_drm.c:224:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(dri2_surf->color_buffers); i++) {
^
mesa/src/egl/drivers/dri2/platform_drm.c: In function 'dri2_drm_swap_buffers':
mesa/src/egl/drivers/dri2/platform_drm.c:425:24: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(dri2_surf->color_buffers); i++)
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/gbm/main/backend.c: In function 'find_backend':
mesa/src/gbm/main/backend.c:70:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(backends); ++i) {
^
mesa/src/gbm/main/backend.c: In function '_gbm_create_device':
mesa/src/gbm/main/backend.c:95:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(backends) && dev == NULL; ++i) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/glx/dri_common_query_renderer.c: In function 'dri2_convert_glx_query_renderer_attribs':
mesa/src/glx/dri_common_query_renderer.c:61:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(query_renderer_map); i++)
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/glx/dri_common.c: In function 'scalarEqual':
mesa/src/glx/dri_common.c:259:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(attribMap); i++)
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/intel_screen.c: In function 'intel_screen_make_configs':
mesa/src/mesa/drivers/dri/i965/intel_screen.c:1222:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < ARRAY_SIZE(formats); i++) {
^
mesa/src/mesa/drivers/dri/i965/intel_screen.c:1259:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < ARRAY_SIZE(formats); i++) {
^
mesa/src/mesa/drivers/dri/i965/intel_screen.c:1291:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < ARRAY_SIZE(formats); i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/intel_fbo.c: In function 'intel_validate_framebuffer':
mesa/src/mesa/drivers/dri/i965/intel_fbo.c:734:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(fb->Attachment); i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/common/utils.c: In function 'driGetConfigAttrib':
mesa/src/mesa/drivers/dri/common/utils.c:457:19: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ARRAY_SIZE(attribMap); i++)
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/intel_screen.c: In function 'aub_dump_bmp':
mesa/src/mesa/drivers/dri/i965/intel_screen.c:125:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < fb->_NumColorDrawBuffers; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/intel_fbo.c: In function 'intel_blit_framebuffer_with_blitter':
mesa/src/mesa/drivers/dri/i965/intel_fbo.c:836:21: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < drawFb->_NumColorDrawBuffers; i++) {
^
V2 (Thomas Helland):
-Use unsigned instead of GLuint (trivial)
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/brw_wm_state.c: In function 'brw_color_buffer_write_enabled':
mesa/src/mesa/drivers/dri/i965/brw_wm_state.c:53:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (i = 0; i < ctx->DrawBuffer->_NumColorDrawBuffers; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
mesa/src/mesa/drivers/dri/i965/brw_draw.c: In function 'brw_postdraw_set_buffers_need_resolve':
mesa/src/mesa/drivers/dri/i965/brw_draw.c:390:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for (int i = 0; i < fb->_NumColorDrawBuffers; i++) {
^
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
In the DRI2 path this event is magically synthesized from the
corresponding DRI2 event, but with Present, the server sends us the
event itself. The DRI2 path fills in the serial number, send_event, and
display fields of the XEvent struct that the app sees, but the Present
path did not.
This is likely related to a class of crashes seen in gtk/clutter apps:
https://bugzilla.redhat.com/attachment.cgi?id=1032631
Note that the crashing instruction is looking up the lock_fns slot in
the Display *, and %rdi (holding the Display *) is 0x1.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tessellation control shaders write to outputs as OUT[ADDR[0].x][0], make
sure to parse the indirect dimension on outputs.
Also tess control inputs/outputs and tess eval input declarations need
to receive the same treatment as geometry shader inputs.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The previous patch replaces the other use case.
V2: remove the validation from it old location.
Cc: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The GL 4.5 spec says its an GL_INVALID_VALUE error if samples equals 0 for
glTexImage*Multisample and an GL_INVALID_VALUE error if samples < 1 for
glTexStorage*Multisample.
The spec says its undefined what happens if glTexImage*Multisample is passed
a samples value < 0 but we currently already produced a GL_INVALID_VALUE error
in this case, this is also consistent with the Nvidia binary.
Cc: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
If an immediate is written to multiple channels, we can load it in a
single writemasked MOV.
total instructions in shared programs: 6285144 -> 6261991 (-0.37%)
instructions in affected programs: 718991 -> 695838 (-3.22%)
helped: 5762
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This only appears in cubemaps which have have packed layers, so are very
sensitive to any layout disagreement between sw and hw.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The hardware checks for multisampling being enabled, but does not have
the rule about cbuf0 being an integer format. Only enable
alpha-to-one/alpha-to-coverage if cbuf0 is not an integer format.
Fixes piglits
ext_framebuffer_multisample-int-draw-buffers-alpha-to-one
ext_framebuffer_multisample-int-draw-buffers-alpha-to-coverage
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
According to OpenGL version 4.5 and OpenGL ES 3.1 standards, section 7.3:
GL_INVALID_VALUE should be generated, if count is less than 0.
V2: Changed title, eased Open GL ES 3.1 restriction and added comments.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
According to OpenGL specification version 4.5 table 23.46
and OpenGL ES specification version 3.1 table 20.31:
ATOMIC_COUNTER_BUFFER_START and ATOMIC_COUNTER_BUFFER_SIZE
should have the initial value of zero.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
With non-dsa functions we need to do target error checking before
_mesa_get_current_tex_object which would just call _mesa_problem without
raising GL_INVALID_ENUM error. In other places of Mesa, target gets checked
before this call.
Fixes failures in:
ES31-CTS.texture_storage_multisample.APIGLGetTexLevelParameterifv.*
v2: do the target check also for dsa functions (Timothy)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
GK110/GK208 have 256 registers, not 64. Find out the number of registers
from the target to avoid unnecessary iteration for pre-GK110.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The function glMemoryBarrierByRegion is part of OpenGL ES 3.1
and OpenGL 4.5 core and compatibility profiles.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
There are separate line widths for smooth and aliased lines. The smooth
one is selected when multisampling is enabled even if line smoothing
isn't explicitly turned on.
Fixes the ext_framebuffer_multisample-line-smooth piglits
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Apparently this is necessary in order for tess factors to work in a tess
eval program without a tess control program bound. Probably because it
uses the fake program's shader header to work out the number of patch
constants.
Fixes vs-tes-tessinner-tessouter-inputs
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
There's a lot of functionality duplicated in the gm107 lowering pass
from the nvc0 pass. As that one gets updated, the gm107 one falls
behind. Avoid this by sharing the code.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This fixes arb_get_texture_sub_image-get, and any situation where the 2d
engine was being used for multi-layer blits to a non-0 level.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Do the same as in st_TexSubImage. This fixes
arb_get_texture_sub_image-get on llvmpipe when it is set to prefer
blits, and nouveau when it uses the 3d engine for blits.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The address calculations are all different (e.g. see GP), there appear
to be sync's in programs, and probably a bunch of other differences.
Just disable it for now.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
NIR instruction count results on i965:
total instructions in shared programs: 1261954 -> 1261937 (-0.00%)
instructions in affected programs: 455 -> 438 (-3.74%)
One in yofrankie, two in tropics. Apparently i965 had also optimized all
of these out anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
There are so many flags in textures, that the CSE pass would have a hard
time referencing the correct set when figuring out if two texture ops are
the same. By zeroing, we can avoid that fragility.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
In order to move more of our lowering into NIR, we need the ability to
reference various pipeline state (like texture rectangle scaling factors
or blend colors), so we just set those up as a load_uniform with a big
offset to indicate that it's not within the shader's uniform storage and
is one of our state values.
Avoids regressions in vc4 when trying to do our blending in NIR.
v2: Add the other unpack ops I meant to when writing the original commit
message.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We may find a cause to do more undef optimization in the future, but for
now this fixes up things after if flattening. vc4 was handling this
internally most of the time, but a GLB2.7 shader that did a conditional
discard and assign gl_FragColor in the else was still emitting some extra
code.
total instructions in shared programs: 100809 -> 100795 (-0.01%)
instructions in affected programs: 37 -> 23 (-37.84%)
v2: Use nir_instr_rewrite_src() to update def/use on src[0] (by Thomas
Helland).
v3: Make sure to flag metadata dirties, and copy the swizzle and abs/neg
over to src[0], too (by anholt).
Reviewed-by: Thomas Helland <thomashelland90@gmail.com> (v2)
Tested-by: Thomas Helland <thomashelland90@gmail.com> (v2)
VCE dual instances are encoding in parallel, it needs two frames for
encoding with their own parameters in one IB. Master instance will check
the task info to find another frame, assign it to the slave instance
Signed-off-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
The config task has own task ID, extract the configuration functions
into config task.
v2 (chk): calculate offset automatically
Signed-off-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
v2: -make tonga use new h264 performance HW decoder;
-integrate it scaling buffer to msg_fb buffer
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
v2: (leo) add checking for driver backend
v3: (leo) change variable name from use_amdgpu to use_vm
v4: rebase by Marek
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
v2: (leo) add checking for driver backend
v3: (leo) change variable name from use_amdgpu to use_vm
v4: rebase by Marek
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
v2: incorporate comments from Marek
v3: add missing fiji case in winsys init
use tonga raster config (double check this)
v4: rebase on harvest patch
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v3)
Reviewed-by: Christian König <christian.koenig@amd.com> (v3)
Reviewed-by: David Zhang <david1.zhang@amd.com> (v3)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Properly calculate the PA_SC_RASTER_CONFIG[_1] settings
for harvest chips.
v2: - fix default raster config settings for CZ and KV
- Suggestions from Michel
v3: - handle multiple packers properly for CI+
- GRBM_GFX_INDEX is privileged on VI+
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v2)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Useful for debugging hangs with the read-register interface.
I checked that this adds the same register fields as the kernel driver.
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
This is an internal project that Catalyst uses and now open source will do
too.
v2: squashed these commits in:
- winsys/amdgpu: fix warnings in addrlib
- winsys/amdgpu: set PIPE_CONFIG and NUM_BANKS in tiling_flags
v2: - lots of changes according to Emil Velikov's comments
- implemented radeon_winsys::read_registers
v3: - a lot of new work, many of them adapt to libdrm interface changes
Squashed patches:
winsys/amdgpu: implement radeon_winsys context support
winsys/amdgpu: add reference counting for contexts
winsys/amdgpu: add userptr support
winsys/amdgpu: allocate IBs like normal buffers
winsys/amdgpu: add IBs to the buffer list, adapt to interface changes
winsys/amdgpu: don't use KMS handles as reloc hash keys
winsys/amdgpu: sync buffer accesses to different rings
winsys/amdgpu: use dependencies instead of waiting for last fence v2
gallium/radeon: unify buffer_wait and buffer_is_busy in the winsys interface (amdgpu part)
winsys/amdgpu: track fences per ring and be thread-safe
winsys/amdgpu: simplify waiting on a variable in amdgpu_fence_wait
gallium/radeon: allow the winsys to choose the IB size (amdgpu part)
winsys/amdgpu: switch to new amdgpu_cs_query_fence_status interface
winsys/amdgpu: handle fence and dependencies merge
winsys/amdgpu follow libdrm change to move user fence into UMD
winsys/amdgpu: use amdgpu_bo_va_op for va map/unmap v2
winsys/amdgpu: use the new tiling flags
winsys/amdgpu: switch to new GTT_USWC definition
winsys/amdgpu: expose amdgpu_cs_query_reset_state to drivers
winsys/amdgpu: fix valgrind warnings
winsys/amdgpu: don't use VRAM with APUs that don't have much of it
winsys/amdgpu: require LLVM 3.6.1 for VI because of bug fixes there
winsys/amdgpu: remove amdgpu_winsys::num_cpus
winsys/amdgpu: align BO size to page size
winsys/amdgpu: reduce BO cache timeout
winsys/amdgpu: remove useless flushing and waiting in amdgpu_bo_set_tiling
winsys/amdgpu: use amdgpu_device_handle as a unique device ID instead of fd
winsys/amdgpu: use safer access to amdgpu_fence_wait::signalled
winsys/amdgpu: allow maximum IB size of 4 MB
winsys/amdgpu: add ip_instance into amdgpu_fence
gallium/radeon: add RING_COMPUTE instead of RADEON_FLUSH_COMPUTE
winsys/amdgpu: set the ring type at CS initilization
winsys/amdgpu: query the GART page size from the kernel
winsys/amdgpu: correctly wait for shared buffers to become idle
winsys/amdgpu: set the amdgpu_cs_fence structure only once at fence creation
winsys/amdgpu: add a specific error message for cs_submit -> -ENOMEM
winsys/amdgpu: check num_active_ioctls before calling amdgpu_bo_wait_for_idle
winsys/amdgpu: clear user fence BO after allocating it
winsys/amdgpu: fix user fences
winsys/amdgpu: make amdgpu_winsys_create public
winsys/amdgpu: remove thread offloading
winsys/amdgpu: flatten the amdgpu_cs_context structure and simplify more
v4: require libdrm 2.4.63
zMin and zMax can't use _DepthMaxF, because the test is done in Z32_UNORM.
Probably a useless patch given how popular swrast is nowadays, but it helped
create and validate the piglit test.
v2: add an explicit cast to GLuint
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: Add compute mode flag to sampler state setup (Marek).
Drop branches which avoid reference counting (Marek).
Simplify unset branch condition (Marek).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
raw_svector_ostream::flush() is now unnecessary and forbidden:
CXX llvm/libclllvm_la-invocation.lo
../../../../../src/gallium/state_trackers/clover/llvm/invocation.cpp: In function 'clover::module {anonymous}::build_module_llvm(llvm::Module*, unsigned int (&)[7])':
../../../../../src/gallium/state_trackers/clover/llvm/invocation.cpp:574:29: error: use of deleted function 'void llvm::raw_svector_ostream::flush()'
bitcode_ostream.flush();
^
In file included from /home/daenzer/src/llvm-git/llvm/include/clang/Basic/VirtualFileSystem.h:22:0,
from /home/daenzer/src/llvm-git/llvm/include/clang/Basic/FileManager.h:20,
from /home/daenzer/src/llvm-git/llvm/include/clang/Basic/SourceManager.h:38,
from /home/daenzer/src/llvm-git/llvm/include/clang/Frontend/CompilerInstance.h:16,
from ../../../../../src/gallium/state_trackers/clover/llvm/invocation.cpp:25:
/home/daenzer/src/llvm-git/llvm/include/llvm/Support/raw_ostream.h:512:8: note: declared here
void flush() = delete;
^
Makefile:862: recipe for target 'llvm/libclllvm_la-invocation.lo' failed
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
We do not want bug reports from this early stepping of SKL. Few if any were ever
shipped outside of Intel to early enabling partners, and none will be sold.
There is a functional change here. If you're using new mesa on an old
kernel/libdrm, the revid will be -1, and we'll use new SKL values instead of
early ones (a hopefully irrelevant improvement IMO).
v2: Remove hunk which warned before dying. Instead, default to normal SKL
support (Ken)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
The EGL 1.4 spec states for eglCreateContext:
"attribute EGL_CONTEXT_CLIENT_VERSION is only valid when the current
rendering API is EGL_OPENGL_ES_API"
Additionally, if the EGL_KHR_create_context EGL extension is supported
(this is mandatory in EGL 1.5) then the EGL_CONTEXT_MAJOR_VERSION_KHR,
which is an alias for EGL_CONTEXT_CLIENT_VERSION, and
EGL_CONTEXT_MINOR_VERSION_KHR attributes are also accepted by
eglCreateContext with the extension spec stating:
"The values for attributes EGL_CONTEXT_MAJOR_VERSION_KHR and
EGL_CONTEXT_MINOR_VERSION_KHR specify the requested client API
version. They are only meaningful for OpenGL and OpenGL ES
contexts, and specifying them for other types of contexts will
generate an error."
Add the necessary checks against the extension and rendering APIs when
validating these attributes as part of eglCreateContext.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
[Emil Velikov: Add newline before the spec quote (Matt)]
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
When a buffer is provided to eglGetConfigs it's supposed to set the value
of the num_config parameter to the total number of configs that have been
copied into this buffer. For some reason the EGL spec doesn't consider it
to be an error to pass this function a buffer while specifying its size to
be less than 0. Given this, one would expect this combination to result in
the num_config parameter being set to 0 but this wasn't the case. This was
due to the buffer size being copied straight into num_configs without being
clamped to 0.
This was causing the following dEQP EGL test to fail:
dEQP-EGL.functional.query_config.get_configs.get_configs_bounds
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
When calling either eglCreateWindowSurface or eglCreatePixmapSurface it
was possible for an application to be aborted as a result of it failing
to create a DRI2 drawable on the server. This could happen due to an
application passing in an invalid native drawable handle, for example.
v2: Handle the case where an error has been set on the connection
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Both eglCreatePixmapSurface and eglCreateWindowSurface were incorrectly
setting the EGL error to be EGL_BAD_ALLOC when an invalid native drawable
handle was being passed in. The EGL spec states the following for
eglCreatePixmapSurface:
"If pixmap is not a valid native pixmap handle, then an EGL_BAD_-
NATIVE_PIXMAP error should be generated."
(eglCreateWindowSurface has similar text)
Correctly set the EGL error value based on xcb_get_geometry_reply returning
an error structure containing something other than BadAlloc.
v2: Check for BadAlloc error and update commit message to reflect this
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Commit 4ed23fd590 introduced some calls to _eglError inappropriately
passing it EGL_BAD_NATIVE_WINDOW. This was actually harmless in two of the
cases as _eglError gets called later on with a more appropriate error code
but (just to be safe) switch these to _eglLog calls instead.
The final case is a little trickier as it actually needs to set an error
of which the following are available (according to the EGL spec):
EGL_BAD_MATCH, EGL_BAD_CONFIG, EGL_BAD_NATIVE_(PIXMAP|WINDOW) and
EGL_BAD_ALLOC.
Of these, EGL_BAD_ALLOC seems to be the most appropriate given that
failure can occur either as a result of xcb_get_setup failing due to an
earlier error on the connection (where the most commonly occurring error
code is XCB_CONN_CLOSED_MEM_INSUFFICIENT) or as a result of the
xcb_screen_iterator_t 'rem' field being 0.
In addition to this, commit af2aea40d2 unconditionally set the error to
EGL_BAD_NATIVE_WINDOW when creating a window or pixmap surface with a NULL
native handle. Change this to correctly set the error based on surface
type.
v2: Updated patch description (Emil Velikov)
Return EGL_BAD_NATIVE_PIXMAP when eglCreatePixmapSurface is called
with a NULL native pixmap handle
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Items in the program cache consist of three things: key, the data
representing the instructions and auxiliary data representing
uniform storage. The data consisting of instructions is stored into
a drm buffer object while the key and the auxiliary data reside in
malloced section. Now the cache uploading is equipped with a check
that iterates over existing items and seeks to find a another item
using identical instruction data than the one being just uploaded.
If such is found there is no need to add another section into the
drm buffer object holding identical copy of the existing one. The
item just being uploaded should instead simply point to the same
offset in the underlying drm buffer object.
Unfortunately the check for the matching instruction data is
coupled with a check for matching auxiliary data also. This
effectively prevents the cache from ever containing two items
that could share a section in the drm buffer object.
The constraint for the instruction data and auxiliary data to
match is, fortunately, unnecessary strong. When items are stored
into the cache they will anyway contain their own copy of the
auxiliary data (even if they matched - which they in real world
never will). The only thing the items would be sharing is the
instruction data and hence we should only check for that to match
and nothing else.
No piglit regression in jenkins.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Current logic re-writes the same data when existing data is found.
Not that this actually matters at the moment in practice, the
contraint for finding matching data is too severe to ever allow
data to be shared between two items in the cache.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
and simplify the interface to take directly the size and to return
the offset. The routine does nothing more than allocate, it doesn't
upload anything.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Extension spec originally required 2^24 but 2^27 is the minimum value
required by OpenGL 4.5 and OpenGL ES 3.1 specifications.
Fixes:
ES31-CTS.shader_storage_buffer_object.basic-max
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
_mesa_get_program_resource_name has logic to append '[0]' in name
if variable is an array, this should be skipped for XFB varyings
that have array index already appended.
v2: fix comment, change also GL_NAME_LENGTH query to match
the behaviour
Fixes:
ES31-CTS.program_interface_query.transform-feedback-types
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
The value was copied from r300g, which uses 1/12 subpixels, but this hw
uses 1/16 subpixels.
Should fix piglit: gl-1.4-polygon-offset (formerly a glean test)
(untested, ported from radeonsi)
Reviewed-by: Edward O'Callaghan <eocallaghan at alterapraxis.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: mesa-stable@lists.freedesktop.org
The value was copied from r300g, which uses 1/12 subpixels, but this hw
uses 1/16 subpixels.
Fixes piglit: gl-1.4-polygon-offset (formerly a glean test)
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: mesa-stable@lists.freedesktop.org
It must be obtained from the VS.
The GS scenario A must be enabled for PrimID to be generated for the VS.
+ 4 piglits
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Basic texture buffer support. Should be straightforward to add first/
last_element support. And with a bit of work in ir3 emulate larger
texture buffer sizes. But this seems to be enough for stk gl31 render
paths.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This shows up with a glamor shader, which does a TXF and uses the result
for conditional kill. Before we wouldn't group the fanin (collect)
neighbors which need to be allocated adjacently at RA, resulting in
badness.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
a4xx needs similar treatment as 995f55a6
Also fixup a few point-size and vpsrepl issues and drop fix_blit_fp()
hack previously needed for mem2gmem.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Move a few things around to group stuff that is common to a3xx/a4xx
together. Also, introduce is_ir3() for things that are more specific to
the compiler / shader-ISA than to the gpu generation.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
These extensions allow reading depth/stencil for GLES contexts, which is
useful for tools like apitrace.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This function would always report that a dimension or size error occurred
in glTexImage even when it was called from glCompressedTexImage. Replace
the static string with the dynamically determined caller name.
Reviewed-by: Tapani Palli <tapani.palli@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Because we build here an array format, we don't need to swap the
bytes for big endian.
If it isn't an array format, the bytes will be swapped in
_mesa_format_convert.
v2: remove temp variable
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Before, if we encountered an array format of 0 on a BE system, we would
flip all the channels even though it's an invalid format. This would
result in a mostly invalid format with a swizzle of yyyy or wwww. Instead,
we should just return 0 if the array format stashed in the format info is
invalid.
Cc: "10.6 10.5" <mesa-stable@lists.freedesktop.org>
The swizzle defines where in the format you should look for any given
channel. When we flip the format around for BE targets, we need to change
the destinations of the swizzles, not the sources. For example, say the
format is an RGBX format with a swizzle of xyz1 on LE. Then it should be
wzy1 on BE; however, the code as it was before, would have made it 1zyx on
BE which is clearly wrong.
Reviewed-by: Iago Toral <itoral@igalia.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "10.6 10.5" <mesa-stable@lists.freedesktop.org>
Cuts about 2k of .text.
text data bss dec hex filename
5017141 197160 27672 5241973 4ffc75 i965_dri.so before
5014981 197160 27672 5239813 4ff405 i965_dri.so after
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cuts about 1k of .text.
text data bss dec hex filename
5018165 197160 27672 5242997 500075 i965_dri.so before
5017141 197160 27672 5241973 4ffc75 i965_dri.so after
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
MESA_LLVM_VERSION_PATCH is undefined.
Reviewed-by: Edward O'Callaghan <eocallaghan at alterapraxis.com>
Tested-by: Benjamin Bellec <b.bellec@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
r600 currently has 73 atoms and looping through their dirty flags has
become costly because checking each flag requires a pointer
dereference before the read. To avoid having to do that add additional
bitfield which can be checked really quickly thanks to tzcnt instruction.
id field was added to struct r600_atom but that doesn't affect memory
usage for both 32 and 64 bit CPUs because it was stuffed into padding.
The performance improvement is ~2% for benchmarks that can have FPS in
the thousands but is hardly measurable in "real" programs.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This is analogous to r300_mark_atom_dirty() used by r300, and will
be used by later patches. For common radeon code, appropriate helper
is called through a function pointer.
No functional changes.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This fixes the spec@arb_shader_image_load_store@invalid index bounds
piglit tests on IVB, which were causing a GPU hang and then a crash
due to the invalid binding table index result of the array index
calculation. Other generations seem to behave sensibly when an
invalid surface is provided so it doesn't look like we need to care.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
v2: Drop VEC4 suport.
v3: Rebase.
v4: Move array coordinate workaround into the surface builder.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Define bitfield packing, unpacking and type conversion operations in
terms of which the image format conversion code will be implemented.
These don't directly know about image formats: The packing and
unpacking functions take a 4-tuple of bit shifts and a 4-tuple of bit
widths as arguments, determining the bitfield position of each
component. Most of the remaining functions perform integer, fixed
point normalized, and floating point type conversions, mapping between
a target type with per-component bit widths given by a parameter and a
matching native representation of the same type.
v2: Drop VEC4 suport.
v3: Rebase.
v4: Fix clamping of negative floats in the unsigned case of
emit_convert_to_scaled().
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Define some utility functions to query the bitfield layout of a given
image format and whether it satisfies a number of more or less
hardware-specific properties.
v2: Drop VEC4 suport.
v3: Add SKL support.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Define a function to calculate the memory address of the image
location given by a vector of coordinates. This is required in cases
where we need to fall back to untyped surface access, which take a raw
memory offset and know nothing about surface coordinates, type
conversion or memory tiling and swizzling. They are still useful
because typed surface reads don't support any 64 or 128-bit formats on
IVB, and they don't support any 128-bit formats on HSW and BDW.
The tiling algorithm is implemented based on a number of parameters
which are passed in as uniforms and determine whether the surface
layout is X-tiled, Y-tiled or untiled. This allows binding surfaces
of different tiling layouts to the pipeline without recompiling the
program.
v2: Drop VEC4 suport.
v3: Rebase.
v4: Add plenty of comments (Jason).
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
These utility functions check whether an image access is valid.
According to the spec an invalid image access should have no effect on
the image and yield well-defined results. Typically the hardware
implements correct bounds and surface checking by itself, but in some
cases (typed atomics on IVB and untyped messages elsewhere) we need to
implement it in software to work around lacking hardware support.
v2: Drop VEC4 suport.
v3: Rebase.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2: Store early fragment test mode in brw_wm_prog_data instead of
getting it from core mesa data structures (Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Shaders with image uniforms may have side effects. Make sure that
fragment shader threads are dispatched if the shader has any image
uniforms.
v2: Use brw_stage_prog_data::nr_image_params to find out if the shader
has image uniforms instead of checking core mesa data structures
(Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be used to pass image meta-data to the shader when we cannot
use typed surface reads and writes. All entries except surface_idx
and size are otherwise unused and will get eliminated by the uniform
packing pass. size will be used for bounds checking with some image
formats and will be useful for ARB_shader_image_size too. surface_idx
is always used.
v2: Add CS support. Move the image_params array back to
brw_stage_prog_data.
v3: Improve documentation.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
On BDW+, the negation source modifier on NOT, AND, OR, and XOR, is actually
a boolean negate and not an integer negate. However, NIR's soruce
modifiers are the integer version. We have to resolve it with a MOV prior
to emitting the actual instruction. This is basically the same thing we do
in the FS backend.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The analysis code was already there and running, we just weren't doing
anything with the result of it yet.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously, we were explicitly listing every instruction that needs a
resolve. However, those instructions were precicely the ones that returned
booleans so there's no reason why we shouldn't just have that check. Also,
all of the reduction opcodes such as bany and ball were missing so it
didn't properly flag stuff on vec4. If an opcode gets added in the future
that returns a bool but doesn't need a resolve, we can special-case that.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This better ensures that the src_bits == dst_bits case gets optimized away.
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
For gmem restore (mem2gmem), we swap blit programs, in order to have a
different frag shader for depth vs color restore. But we weren't
actually clearing the cached fp, so it would not actually change the
frag shader as expected.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For gmem restore (mem2gmem), we swap blit programs, in order to have a
different frag shader for depth vs color restore. But we weren't
actually clearing the cached fp, so it would not actually change the
frag shader as expected.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This patch fixes a bug in big-endian treatment, where the previous
swizzle info wasn't cleared before a new swizzle info was inserted into
the format field using a bitwise-OR operation.
v2: use MESA_ARRAY_FORMAT_SWIZZLE_*_MASK instead of numeric constants
v3: align according to coding style
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
CC: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Neither MSVC nor MinGW defines LONG_BIT. For MSVC this was not a problem as
it doesn't define __x86_64__ macro (it's GCC specific.)
However on Windows long type is guaranteed to be 32bits.
Also add an #error, as GCC will just warn, not throw any error, when no
value is returned.
Trivial.
To avoid collission with windows.h's PURE macro.
We could consider eventually renaming to __pure, but that would require
further care, so it's left to the future.
Reviewed-by: Brian Paul <brianp@vmware.com>
We don't need to free driverName string from dri2 reply, on the other
hand, the driver name acquired from loader doesn't need duplication.
Fixes: 45e110bad9 (egl/x11: trust our loader over the xserver for the
drivername)
Reported-by: Timothy Arceri <t_arceri@yahoo.com.au>
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
[Emil Velikov: use brackets for both branches of conditional]
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
This should have been a part of:
commit 7eaacc1678
Author: Ben Widawsky <benjamin.widawsky@intel.com>
Date: Wed Jul 29 12:35:24 2015 -0700
i965/skl: Add production thread counts and URB size
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Since the introduction of new gl_shader_stages in
commit a2af956963
Author: Fabian Bieler <fabianbieler@fastmail.fm>
Date: Fri Mar 7 10:19:09 2014 +0100
mesa: add tessellation shader enums
the translation table for the stage into the HW binding table edit
command was broken, and so we used illegal commands. Fix the array
initialisation to be impervious to changes in the gl_shader_stages enum
and add the asserts that would have caught the issue earlier.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This was causing a failure to build on SCons due to a missing
-Isrc/egl. Instead of adding in that path, lets just -Isrc/
and include "utils/u_atomic.h".
Reviewed-by: Matt Turner <mattst88@gmail.com>
Identical to commit 60e9c35b3a0(egl/x11: bail out if we cannot fetch
the xcb connection) but for the swrast codepath.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
No real change, apart from keeping the calls to the underlying winsys
(x11) next to each other. Just like platform_wayland.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
With the follow up commits we're about to further reshuffle things. Thus
we'll honour our our driver_name lookup (src/loader), and use the one
provided by xserver as a fall-back.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The documentation of xcb_connection_has_error() does not mention
what will happen, if NULL is fed to the function.
Upon closer look (props to Matt), it seems that we'll crash as the
implementation dereferences conn.
This will also allow us to remove the dri2_dpy->conn checking with the
next commit.
v2: Reword commit message as per Matt's findings.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
As sugested by Tom a long time ago
and in order to be able to create Piglit tests
v2:
replace NOT_SUPPORTED_BY_CL_1_1 macro with an inline function
remove extra space in clLinkProgram arg
v3:
use __func__
v4:
back to a macro, it make more sense to use it with __func__
[ Francisco Jerez: Rename to CLOVER_NOT_SUPPORTED_UNTIL and pass the
minimum API version required by the entry point so the error
messages don't become stale when support for additional CL versions
is introduced. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When a program is compiled, but linking failed the sh->InfoLog
could be NULL. This is expoloited by OpenGL ES 3.1 conformance tests.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The dst is always written, in this case the predicate is only used to select
the value to write, so if we are spilling the dst we always want to write
whatever value we selected to scratch.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Stage ref cannot be queried for transform feedback.
Also simplify the build_stageref function by passing the
correct mode for uniforms.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Same idea as in libdrm_amdgpu.
A command stream can only be created for a specific context and it's always
submitted to that context.
This will mainly be used by amdgpu and it's required by the GPU reset status
query too.
(radeon only has a basic version of the query and thus doesn't need this)
Reviewed-by: Christian König <christian.koenig@amd.com>
Ben suggested that I rename MIPTREE_LAYOUT_ALLOC_ANY_TILED since it
needed to include no tiling at all, but the name
MIPTREE_LAYOUT_ALLOC_ANY is pretty nondescriptive. We can avoid
confusion by replacing "ALLOC" with "TILING" in the identifiers.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Regression since commit 3a31876600, when tiling modes were moved into
layout_flags.
The relevant enum values are
MIPTREE_LAYOUT_ALLOC_YTILED = 1 << 5
MIPTREE_LAYOUT_ALLOC_XTILED = 1 << 6
MIPTREE_LAYOUT_ALLOC_ANY_TILED = MIPTREE_LAYOUT_ALLOC_YTILED |
MIPTREE_LAYOUT_ALLOC_XTILED
MIPTREE_LAYOUT_ALLOC_LINEAR = 1 << 7
so the expression (layout_flags & MIPTREE_LAYOUT_ALLOC_ANY_TILED) can
never produce a value of MIPTREE_LAYOUT_ALLOC_LINEAR.
The enum this replaced was
enum intel_miptree_tiling_mode {
INTEL_MIPTREE_TILING_ANY,
INTEL_MIPTREE_TILING_Y,
INTEL_MIPTREE_TILING_NONE,
};
where "ANY" means "Y" or "NONE" (i.e., linear). As such, remove the
unused (and worse, unhandled) MIPTREE_LAYOUT_ALLOC_XTILED and redefine
MIPTREE_LAYOUT_ALLOC_ANY_TILED to mean what it did before.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91513
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Luckily, there is a kernel query, so use the size from that.
It currently returns 256KB. It can be increased in the kernel.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
This extends the SIMD lowering pass to enforce the hardware limitation
that no directly-addressed source may read more than 2 physical GRFs.
One can easily go over this limit when doing 64-bit arithmetic
(e.g. FP64 or extended-precision integer MULs) or SIMD32, so it's nice
to be able to just emit an instruction of the intended execution size
from the visitor and let the lowering pass deal with this restriction
transparently.
Some hardware arithmetic instructions are not handled here, including
all instructions that use the accumulator implicitly (which the SIMD
lowering pass deliberately doesn't handle), instructions with
non-per-channel sources (e.g. LINE or PLANE) and SEND-like
instructions, which need special handling most likely as virtual
opcodes.
Reviewed-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Otherwise it would crash on Gen8 with scalar VS. The issue can easily
be reproduced with the following patch, but I don't see any reason why
it wouldn't be possible to end up with an ATTR argument here even
without it.
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Translate MULH into the MUL/MACH sequence. This does roughly the same
thing that nir_emit_alu() used to do but we can now handle 16-wide by
taking advantage of the SIMD lowering pass. The force_sechalf
workaround near the bottom is required because the SIMD lowering pass
will emit instructions with non-zero quarter control and we need to
make sure we avoid that on integer arithmetic instructions with
implicit accumulator access due to a known hardware bug on IVB.
Reviewed-by: Matt Turner <mattst88@gmail.com>
In order to make room for the code that will lower the MULH virtual
instruction. Also move the hardware generation and execution type
checks into the same branch, they are going to have to be different
for MULH.
Reviewed-by: Matt Turner <mattst88@gmail.com>
AFAIK BXT has the same annoying alignment limitation as CHV on the
source register regions of 32x32 bit MULs, give it the same treatment.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This instruction will translate to the MUL/MACH sequence that computes
the high 32-bits of the result of a 64-bit multiply. Before Gen8
integer operations that used the accumulator were limited to 8-wide,
but the SIMD lowering pass can easily be hooked up to sidestep this
limitation, we just need a virtual opcode to represent the MUL/MACH
sequence in the IR.
Reviewed-by: Matt Turner <mattst88@gmail.com>
There is apparently a subtle difference in C++ between
F f;
and
F f();
The former will use the default constructor. If there is no default
constructor specified, the compiler provides one that simply invokes the
default constructor for each field. For built-in basic types, the
default constructor does nothing. The later will, according to
http://stackoverflow.com/questions/2417065/does-the-default-constructor-initialize-built-in-types)
perform value-initialization of the type. For built-in types this means
initializing to zero.
The per_vertex_accumulator constructor is:
per_vertex_accumulator::per_vertex_accumulator()
: fields(),
num_fields(0)
{
}
This is the second form of constructor, so the glsl_struct_field
objects were previously zero initialized. With the addition of an empty
default constructor in commit 7ac946e5, per_vertex_accumulator::fields
receive no initialization.
Fixes a bunch of random (mostly tessellation related) piglit failures
since commit 7ac946e5 ("glsl: Add constuctors for the common cases of
glsl_struct_field").
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91544
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Argument validation for glTexSubImageXD is missing a check of format and type
against texture object's internal format when profile is OpenGL-ES 3.0+.
This patch also groups together all format and type checks on GLES into a
new function texture_format_error_check_gles(), to factorize similar
code in texture_format_error_check().
Fixes 2 dEQP tests:
* dEQP-GLES3.functional.negative_api.texture.texsubimage2d
* dEQP-GLES3.functional.negative_api.texture.texsubimage3d
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Page 161 of the OpenGL-ES 3.1 (PDF) spec, and page 207 of the OpenGL 4.5 (PDF),
both on section '8.6. ALTERNATE TEXTURE IMAGE SPECIFICATION COMMANDS', states:
"An INVALID_ENUM error is generated if an invalid value is specified for
internalformat".
It is currently returning INVALID_OPERATION error because
_mesa_get_read_renderbuffer_for_format() is called before the internalformat
argument has been validated. To fix this, we move this call down the validation
process, after _mesa_base_tex_format() has been called. _mesa_base_tex_format()
effectively serves as a validator for the internal format.
Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.texture.copyteximage2d_invalid_format
Fixes 1 piglit test:
* spec@oes_compressed_etc1_rgb8_texture@basic
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: 10.6 <mesa-stable@lists.freedesktop.org>
Currently, glTexSubImageXD attempt to resolve the texture object
(by calling _mesa_get_current_tex_object()) before validating the given
target. However, that method explicitly states that target must have been
validated before calling it, so it never returns a user error.
The target validation occurs later when texsubimage_error_check() is called.
This patch reorganizes target validation, taking it out from the error check
function and into a point before the texture object is resolved.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: 10.6 <mesa-stable@lists.freedesktop.org>
Page 68, section 7.2 'Shader Binaries" of the of the OpenGL ES 3.1,
and page 88 of the OpenGL 4.5 specs state:
"An INVALID_VALUE error is generated if count or length is negative.
An INVALID_ENUM error is generated if binaryformat is not a supported
format returned in SHADER_BINARY_FORMATS."
Currently, an INVALID_OPERATION error is returned for all cases.
Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.shader.shader_binary
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: 10.6 <mesa-stable@lists.freedesktop.org>
Original purpose of these lines was to be more friendly against
GUI tools using the extension. However conformance suite explicitly
checks that buffers are not modified in error conditions.
Fixes:
ES31-CTS.program_interface_query.buff-length
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Currently stage reference mask is built using the variable name
only. However it can happen that input of one stage has same name
as output from another stage. Adding check of variable mode makes
sure we do not pick wrong variable.
Fixes some subcases from
ES31-CTS.program_interface_query.no-locations
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Calling eglQuerySurface on a window or pixmap with the EGL_LARGEST_PBUFFER
attribute resulted in the contents of the 'value' parameter being modified.
This is the wrong behaviour according to the EGL spec, which states:
"Querying EGL_LARGEST_PBUFFER for a pbuffer surface returns the
same attribute value specified when the surface was created with
eglCreatePbufferSurface. For a window or pixmap surface, the
contents of value are not modified."
Avoid this from happening by checking that the surface type is EGL_PBUFFER_BIT
before modifying the contents of the parameter.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Update the DRI image interface error codes to reflect the needs of the
EGL_EXT_image_dma_buf_import extension. This means updating the existing error
code documentation and adding a new __DRI_IMAGE_ERROR_BAD_ACCESS error code
so that drivers can correctly reject unsupported pitches and offsets. Hook
the new error code up in EGL to return EGL_BAD_ACCESS.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is useful to increase the CSE opportunities for a scalar backend. It
avoids regressions when dropping vc4's custom CSE implementation.
v2: Cleanups by Matt (decl in the for loop, and unreachable()).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Since we just pulled it out of the destination as 8-bit unorm, we know
it's in [0, 1] already.
shader-db:
total instructions in shared programs: 100040 -> 98208 (-1.83%)
instructions in affected programs: 14084 -> 12252 (-13.01%)
Previously, SFU values always moved to a temporary, and TLB color reads
and texture reads always lived in r4. Instead, we can have these results
just be normal temporaries, and the register allocator can leave the
values in r4 when they don't interfere with anything else using r4.
shader-db results:
total instructions in shared programs: 100809 -> 100040 (-0.76%)
instructions in affected programs: 42383 -> 41614 (-1.81%)
Both a3xx and a4xx need the same logic to decide if half-precision can
be used for blit shaders. So move it to core and simplify things a bit
with a helper that considers all render targets.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Collapse dirty/reading bools into status bitmask (and drop writing which
should really be the same as dirty). And use 'used_resources' list for
all tracking, including zsbuf/cbufs, rather than special casing the
color and depth/stencil buffers.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We hard-coded 4 or 8 as the max in various places. Switch it all to a
define since the limit will go up with a4xx (and maybe even again in the
future?)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Functional change in which way half-way cases are rounded from towards
positive-infinity to even. The spec says "the passed value is rounded to
the nearest integer". Removes another case of bad half-up rounding.
gcc actually generates this for us now that we use -fno-math-errno
(which is weird, since lrintf()/lrint() don't set errno) but clang still
does not. Presumably helps MSVC as well.
Reduced .text size by 8.5k with gcc before -fno-math-errno.
text data bss dec hex filename
4935850 195136 26192 5157178 4eb13a i965_dri.so before
4927225 195128 26192 5148545 4e8f81 i965_dri.so after
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
vl/vl_mpeg12_bitstream.c: In function 'decode_slice':
vl/vl_mpeg12_bitstream.c:928:19: warning: unused variable 'extra' [-Wunused-variable]
unsigned extra = vl_vlc_get_uimsbf(&bs->vlc, 1);
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
The util/hash_table was intended to be a fast hash table
replacement for the program/hash_table see 35fd61bd99 and
72e55bb688.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Fixes a giant pile of GCC warnings:
builtin_types.cpp:60:1: warning: missing initializer for member 'glsl_struct_field::stream' [-Wmissing-field-initializers]
I had to add a default constructor because a non-default constructor
was added. Otherwise the only constructor would be the one with
parameters, and all the plases like
glsl_struct_field foo;
would fail to compile.
I wanted to do this in two patches. All of the initializers of
glsl_struct_field structures had to be converted to use the
constructor because C++ apparently forces you to do one or the other:
builtin_types.cpp:61:1: error: could not convert '{glsl_type::float_type, "near", -1, 0, 0, 0, GLSL_MATRIX_LAYOUT_INHERITED, 0, -1}' from '<brace-enclosed initializer list>' to 'glsl_struct_field'
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
When the NIR-vec4 pass is enabled, handles uniform and GRF array access
on ARB_vertex_program like it is done on vertex shaders.
When the old IR-vec4 pass is used, emit_program_code() emits pull constant
loads directly instead of using relative addressing, hence to call to
move_uniform_array_access_to_pull_constants() is not needed and it is enough
to call to split_uniform_registers().
The patch also calls to move_grf_array_access_to_scratch() like it is
done for shaders, however I suspect this is a no-op for vertex programs and
we could remove it.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The implementation takes into account that on ARB_vertex_program
only a single nir variable is generated to support all the uniform data.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
So the implementation is independent of GLSL IR and the visit methods of the
gen6 GS visitor. This way we will be able to reuse that implementation directly
from the NIR vec4 backend.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Outputs from the vertex shader become array inputs in the geomtry shader,
but the arrays are interleaved, so we need to map our inputs accordingly.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
So the implementation is independent of GLSL IR and the visit methods of the
vec4 visitor. This way we will be able to reuse that implementation directly
from the NIR vec4 backend.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Avoid copying an overwritten swizzle, use the original values.
Example:
Former swizzle[] = xyzw
src->swizzle[] = zyxx
The expected output swizzle = zyxx but if we reuse swizzle in the loop,
then output swizzle would be zyzz.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Uses the nir structure to get all the info needed (sources,
dest reg, etc), and then it uses the common
vec4_visitor::emit_texture to emit the final code.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Splitted in two. The emission is moved to a new vec4_visitor
method, vec4_visitor::emit_texture, ir order to be reused
on the nir path.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This is useful for the upcoming texture support in NIR->vec4 pass,
as we found several cases where the brw_type is available, but not
the glsl_type.
Without this new constructor, the alternative would be:
dst_reg reg(MRF, <reg>)
reg.type = <brw_type>
reg.writemask = <mask>
Adding a new constructor makes code easier to read.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This patch changes the signature of swizzle_result() to accept lower
level arguments. The purpose is to reuse it in the upcoming NIR->vec4
pass.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This patch changes the signature of gather_channel() to accept the gather
component directly instead of fetching it internally from ir_texture.
This will allow reuse in the upcoming NIR->vec4 pass.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This patch changes the signature of emit_mcs_fetch() to accept lower level
arguments. The purpose is to reuse it in the upcoming NIR->vec4 pass.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The is_high_sample() method is currently accessible only in the implementation of
vec4_visitor. Since we need to reuse it in the upcoming NIR->vec4 pass, lets make
it a method of the class instead.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This method returns the glsl_base_type corresponding to a nir_alu_type.
It will factorize code currently present in fs_nir, that can be reused
in vec4_nir on its upcoming emit_texture support.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
NIR ALU operations:
* nir_op_fabs
* nir_op_iabs
* nir_op_fneg
* nir_op_ineg
* nir_op_fsat
should be lowered by lower_source mods
* nir_op_fdiv
should be lowered in the compiler by DIV_TO_MUL_RCP.
* nir_op_fmod
should be lowered in the compiler by MOD_TO_FLOOR.
* nir_op_fsub
* nir_op_isub
should be handled by ir_sub_to_add_neg.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Follows the vec4_visitor IR implementation but
sets the saturate value in addition.
Adds NIR ALU operations:
* nir_op_fsign
* nir_op_isign
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* Lowered floating-point pack and unpack operations are not valid in VS.
* Pack and unpack 2x16 operations should be handled by lower_packing_builtins.
* Adds NIR ALU operations:
* nir_op_pack_half_2x16
* nir_op_unpack_half_2x16
* nir_op_unpack_unorm_4x8
* nir_op_unpack_snorm_4x8
* nir_op_pack_unorm_4x8
* nir_op_pack_snorm_4x8
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Used the same implementation than the vec4_visitor NIR.
Adds NIR ALU operations:
* nir_op_b2i
* nir_op_b2f
* nir_op_f2b
* nir_op_i2b
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This method returns the brw_conditional_mod value used when emitting
comparative ALU operations.
It could be moved to brw_nir in the future to reuse it in fs_nir backend.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Implementation based on the vec4_visitor IR implementation
for the operations ir_binop_mul and ir_binop_imul_high.
Adds NIR ALU operations:
* nir_op_fmul
* nir_op_imul
* nir_op_imul_high
* nir_op_umul_high
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Disables nir_lower_alu_to_scalar when the shader stage being processed work
on vec4 vectors, like the upcoming NIR->vec4 backend.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This patch resolves and initializes the destination and the source
registers that are common to most ALU operations.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Based on the vec4_visitor IR implementation for the ir_binop_load_ubo
operation. Notice that unlike the vec4_visitor IR, adding the !=0
comparison for UBO bools is not needed here because that comparison is
already added by the nir_visitor when processing the ir_binop_load_ubo
(in UBOs "true" is any value different from zero, but for us is ~0).
Adds NIR instrinsics:
* nir_intrinsic_load_ubo_indirect
* nir_intrinsic_load_ubo
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
For the indirect case we need to take the index delivered by
NIR and compute the parent uniform that we are accessing (the one
that we uploaded to a surface) and the constant offset into that
surface.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
These include:
nir_intrinsic_load_vertex_id_zero_base
nir_intrinsic_load_base_vertex
nir_intrinsic_load_instance_id
The source register is fetched from the nir_system_values map initialized
during nir_setup_system_values stage.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This implementation is based on the current URB setup in vec4_visitor, which
requires the output register to be stored in the output_reg array at variable's
original shader location index. But since nir_lower_io() pass uses the value
in var->data.driver_location, we need to put there var->data.location instead,
prior to calling nir_lower_io(), so that we end up with the correct index
in const_index[0].
The driver_location is not used at all, so this patch also disables the
nir_assign_var_locations pass on non-scalar shaders.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Instead of relying on backends (currently vec4_visitor and soon NIR-vec4) to
store registers in output_reg with the correct type, this patch makes sure
that the common code in emit_urb_slot() always emit MOVs from output registers
using the same type on source and destination.
Since the actual type is not important, only that they match, we default to
float.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The same we do in the FS NIR backend, only that here we need to consider
the number of components in the condition and adjust the swizzle
accordingly.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
These methods are essential for the implementation of the NIR->vec4 pass. They
work similar to their fs_nir counter-parts.
When processing instructions, these methods are invoked to resolve the
brw registers (source or destination) corresponding to the NIR sources
or destination. It uses the map of NIR register index to brw register for
all registers locally allocated in a block.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Similar to fs_nir backend, a nir_local_values map will be filled with
newly allocated registers as the load_const instrinsic instructions are
processed. Later, get_nir_src() will fetch the registers from this map
for sources that are ssa.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
New method brw_writemask_for_size() will return a writemask with the first
'size' components activated.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
In the vec4 backend we want uniform locations to be assigned consecutively
since that way the offsets produced by nir_lower_io are exactly what we
need to implement nir_intrinsic_load_uniform. Otherwise we would need a
mapping to match the output of nir_lower_io to the actual uniform registers
we need to use.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The current implementation operates in scalar mode only, so add a vec4
mode where types are padded to vec4 sizes.
This will be useful in the i965 driver for its vec4 nir backend
(and possbly other drivers that have vec4-based shaders).
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The upcoming introduction of NIR->vec4 pass will require that some NIR
lowering passes are enabled/disabled depending on the type of shader
(scalar vs. vector).
With this patch we pass a 'is_scalar' variable to the process of
constructing the NIR, to let an external context decide how the shader
should be handled.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
It basically allocates registers local to a function in a nir_locals map,
then emits all its control-flow blocks.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Similar to other variable setups, system values will initialize the
corresponding register inside a 'nir_system_values' map, which will then
be queried later when processing the different system value intrinsics
for the appropriate register.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The new virtual method is more flexible, it has a signature:
dst_reg *make_reg_for_system_value(int location, const glsl_type *type);
v2 (Jason Ekstrand):
Use the new version in unit tests so make check passes again
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This implementation sets up a map of input variable offsets to source registers
that are already initialized with the corresponding register offset.
This map will then be queried when processing load_input intrinsic operations,
to obtain the correct register source from which the input data will be loaded.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The type_size() method is currently accessible only in the implementation
of vec4_visitor. Since we need to reuse it in the upcoming NIR->vec4 pass,
lets make it a method of the class instead.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The NIR->vec4 pass will be activated if both the following conditions are met:
* INTEL_USE_NIR environment variable is defined and is positive (1 or true)
* The stage is vertex shader (support for geometry shaders and
ARB_vertex_program will be added later).
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This patch will add a brw_vec4_nir.cpp file filled with entry point methods to
the main functionality, following a structure similar to brw_fs_nir.cpp.
Subsequent patches in this series will be adding the implementations for these
methods, incrementally.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
I'm not sure what the true meaning of "The rounding mode may vary." is,
but it is the case that the IROUND() path rounds differently than the
other paths (and does it wrong, at that).
Like _mesa_roundeven{f,}(), just add an use _mesa_lroundeven{f,}() that
has known semantics.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Cuts about 1k of .text size.
text data bss dec hex filename
4983676 197808 26328 5207812 4f7704 i965_dri.so before
4982522 197800 26328 5206650 4f727a i965_dri.so after
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Cuts about 9k of .text size.
text data bss dec hex filename
4992804 197808 26328 5216940 4f9aac i965_dri.so before
4983676 197808 26328 5207812 4f7704 i965_dri.so after
Also, Darwin's libm does not ever set errno, so if we care about those
systems we shouldn't rely on errno anyway.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
To circumvent a problem occuring when LINEAR_ALIGNED array mode is
selected on a TEXTURE_2D RAT.
This configuration causes MEM_RAT STORE_TYPED to write to incorrect
locations.
The only values allowed are 0 and 1, and the value is checked before
assigning.
This is a copy of 8eeca7a56c that seems to have been made to the glsl
ir type after it was copied for use in nir but before nir landed.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Read-only and write-only image arguments are recognized and
distinguished.
Attributes of the image arguments are passed to the kernel as implicit
arguments.
parse_program_resource_name returns -1 when the index is invalid this needs to
be tested before assigning the value to the unsigned array_index.
In link_varyings.cpp (the other place parse_program_resource_name is used) after
the -1 check is done the value is just assigned to an unsigned variable so it
seems long is just used so we can return the -1 rather than actually expecting
index values to be ridiculously large.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
GLES 3.1 should be able to bind a texture with the target
GL_TEXTURE_2D_MULTISAMPLE.
Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
All three of GLX_NV_float_buffer, GLX_EXT_texture_from_pixmap and
GLX_MESA_query_renderer have been in glxext.h for a while now.
As such we can drop this workaround/hack from the header.
v2: Remove the comment about GLX_NV_float_buffer.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Recently a few drivers have grown OpenGL 4+ support so we might as
well go all the way to... 11 ;-)
v2: Don't forget to update the version file (Ilia)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Earlier commit added an extra dup(fd) to fix a ZaphodHeads issue.
Although it did not consider the (very unlikely) case where we might end
up with the valid fd == 0.
Fixes: 28dda47ae4d(winsys/radeon: Use dup fd as key in drm-winsys hash
table to fix ZaphodHeads.)
Cc: 10.6 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Kleiner <mario.kleiner.de@gmail.com>
... and update the documentation to reflect reality.
null and gdi are gone, and surfaceless is a recent addition.
v2: s/platforms/platform/ (spotted by Thomas)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Without this this extension basically can't work in indirect contexts,
TexImage2D will compute the image size as 0 and we'll send no image data
to the server.
v2: Add EXT_texture_integer to the client extension list too (Ian)
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Disabling the FP16 mode didn't help.
If needed, we can use this trick for blits too, but not for scaled blits.
+ 4 piglits
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
There are 2 reasons for this:
- LLVM optimization passes can work with floor
- there are patterns to select v_fract from floor anyway
There is no change in the generated code.
The patch has a better explanation. Just a summary here:
- The CPU always uploads a whole descriptor array to previously-unused memory.
- CP DMA isn't used.
- No caches need to be flushed.
- All descriptors are always up-to-date in memory even after a hang, because
CP DMA doesn't serve as a middle man to update them.
This should bring:
- better hang recovery (descriptors are always up-to-date)
- better GPU performance (no KCACHE and TC flushes)
- worse CPU performance for partial updates (only whole arrays are uploaded)
- less used IB space (no CP_DMA and WRITE_DATA packets)
- simpler code
- hopefully, some of the corruption issues with SI cards will go away.
If not, we'll know the issue is not here.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
With num_direct_uniforms == 0 there's no space allocated in the
param_size array for the one block of direct uniforms -- On the FS
stage this would be a harmless no-op because it would simply re-set
one of the param_size entries allocated for the sampler units to zero,
but on the VS stage it has been reported to cause memory corruption
followed by a crash -- Surprising how a full piglit run on Gen8 didn't
catch it.
Reported-and-reviewed-by: "Lofstedt, Marta" <marta.lofstedt@intel.com>
For SKL: These are the production values.
For BXT: These are low estimates to enable platforms.
This patch was originally part of
i965/skl: Add production thread counts and URB size
but was split out at Jordan's request (which I found to be reasonable).
Note on stable inclusion: 10.6 does not care about hs, and ds. It does care
about cs, but since Jordan was the one that asked me to extract it, I'll leave
it up to him to deal with a backport to stable is required.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Since we really do not know what may occur in the future, pick a more
conservative value for thread counts until we know better what values are
correct. As far as I can tell, the old values will work fine, but some of the
registers seem to indicate that going even lower is possible and the purpose of
having early support is to enable as many configurations that can possibly
exist (we can trim things down after platforms begin shipping later).
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This patch adjusts the SKL values to the best known values we have.
v2: Remove HS/DS/CS fields. Adding this makes most sense to add to the
GEN9_FEATURES macro, however, doing that would require updating BXT values, and
Jordan requested I not do that. Conveniently, this request makes a lot of sense
wrt to stable backport as HS, and DS do not even exist there.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
For now, this just splits up store_output intrinsics to be scalars, and
drops unused outputs in the coordinate shader. My goal is to be able to
drop a bunch of my VC4-specific optimization by letting NIR handle it.
if we get a request to take the count from feedback, but there
is no buffer to take it from, just draw as if we got 0 vertices
so nothing.
This fixes this assert killing the ogl conform, and a piglit
test I've sent.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This removes the need for multiple functions designed to validate an array
subscript and replaces them with a call to a single function.
The change also means that validation is now only done once and the index
is retrived at the same time, as a result the getUniformLocation code can
be simplified saving an extra hash table lookup (and yet another
validation call).
This chage also fixes some tests in:
ES31-CTS.program_interface_query.uniform
V3: rebase on subroutines, and move the resource index array == 0
check into _mesa_GetProgramResourceIndex() to simplify things further
V2: Fix bounds checks for program input/output, split unrelated comment fix
and _mesa_get_uniform_location() removal into their own patch.
Cc: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Previously it could end up using the “SKL early” device on BXT
depending on the revision number. This would probably break things
because for example has_llc would be wrong.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This fixes the remaining failing tests in:
ES31-CTS.program_interface_query.uniform-types
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This enables GL4.1 for radeonsi, and updates the
docs in the correct places.
v2: enable only for llvm 3.7 which has fixes in place.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is the final piece for ARB_gpu_shader5,
The code is based on the r600 code from Glenn Kennard,
and myself.
While developing this, I'm not 100% sure of all the calculations
made in the GS registers, this is why the max_stream is worked
out there and used to limit the changes in registers. Otherwise
my initial attempts either regressed GS texelFetch tests
or primitive-id-restart. The current code has no regressions
in piglit.
This commit doesn't enable ARB_gpu_shader5, since that just
bumps the glsl level to 4.00, so I'll just do a separate patch
for 4.10.
v1.1: fix bug introduced in rebase.
v2: Address Marek's review comments,
remove my llvm stream code for simpler C,
move gsvs_ring and gs_next_vertex to arrays.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fixes following compiler warning:
brw_cs.cpp:386:27: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is easily accomplished by moving simd16 3src to GEN9_FEATURES.
v2: small cleanup to make it more similar to GEN8_FEATURES
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There are a couple of unrelated changes in t_vb_lighttmp.h that I hope
you'll excuse -- there's a block of code that's duplicated modulo a few
trivial differences that I took the liberty of fixing.
Literals without an f/F suffix are of type double, and implicit
conversion rules specify that the float in (float op double) be
converted to a double before the operation is performed. I believe float
execution was intended (in nearly all cases) or is sufficient (in the
case of gen7_urb.c).
Removes a lot of float <-> double conversion instructions and replaces
many double instructions with float instructions which are cheaper.
text data bss dec hex filename
4928659 195160 26192 5150011 4e953b i965_dri.so before
4928315 195152 26192 5149659 4e93db i965_dri.so after
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
ARB_viewport_array specifies that DEPTH_RANGE consists of double-
precision parameters (corresponding commit d4dc35987), and a preparatory
commit (6340e609a) added _mesa_get_viewport_xform() which returned
double-precision scale[3] and translate[3] vectors, even though X, Y,
Width, and Height were still floats.
All users of _mesa_get_viewport_xform() immediately convert the double
scale and translation vectors into floats (which were floats originally,
but were converted to doubles in _mesa_get_viewport_xform(), sigh).
i965 at least cannot consume doubles (see SF_CLIP_VIEWPORT). If we want
to pass doubles to hardware, we should have a different function that
does that.
Acked-by: Mathias Froehlich <Mathias.Froehlich@web.de>
Not a typo. Replace the default builder with one of bogus width to
catch cases in which optimization passes assume that the default
dispatch width is good enough. The execution controls of instructions
emitted during optimization should in general match the original code
that is being manipulated. Many of the problems fixed in this series
were caught by the assertions introduced in this patch.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This could have led to somewhat increased bandwidth usage for lowered
texturing instructions on Gen4 (which is the only case in which
lower_width may be greater than inst->exec_size). After the previous
patches the invariant mentioned in the comment should no longer be
assumed by any of the other optimization and lowering passes, so the
exec_all() call shouldn't be necessary anymore.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Instead of relying on the default one. This shouldn't lead to any
functional changes because DEP_RESOLVE_MOV overrides the execution
size of the instruction anyway and other execution controls are
irrelevant.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The global CSE pass stinks and is unable to pull this out. Easy enough
to handle it here and avoid generating unnecessary special register
loads (which can allegedly be quite slow).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
PFETCH retrieves the address for incoming vertices, not output vertices
in TCS. For output vertices, we must use the laneid as a base.
Fixes barrier piglit test, which was failing for entirely non-barrier
reasons, but rather that it was (a) trying to draw multiple patches and
(b) the incoming patch size was not the same as the outgoing patch size.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This reverts commit a27ec5dc46. It
breaks the intended behaviour of pipe_loader_probe() with ndev==0 as
relied upon by clover to query the number of devices available to the
pipe loader in the system.
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
opt_sampler_eot() was relying on the default builder to have the same
width as the sampler and FB write opcodes it was eliminating, the
channel selects didn't matter because the builder was only being used
to allocate registers, no new instructions were being emitted with it.
A future commit will change the width of the default builder what will
break this assumption, so initialize it explicitly here.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This wasn't taking into account the execution controls of the original
instruction, but it was most likely not a bug because control flow
instructions are typically full width.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Emit the SELs and MOVs with the same execution controls as the
original MOVs, and the CMP with the same execution controls as the IF.
Also explicitly check that the execution controls of any pair of MOVs
being folded into a SEL are compatible (which is almost always going
to be the case), since otherwise it would seem wrong to initialize the
builder object below from the then_mov instruction only.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
lower_integer_multiplication() was ignoring the execution controls of
the original MUL instruction. Fix it by using the new fs_builder
constructor.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
demote_pull_constants() was ignoring the execution size and channel
selects of the instruction that wanted the constant, which doesn't
matter for uniform pull constant loads because all channels get the
same scalar value, but it might for varying pull constant loads. Fix
it by using the new fs_builder() constructor that takes care of
setting execution controls compatible with the instruction passed as
argument.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The execution size was being left equal to the default of 8/16, which
AFAICT would have overwritten components other than the one we wanted
to initialize and could potentially have corrupted other registers.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We have a number of optimization passes that repeat the same pattern
before inserting new instructions into the program based on some
previous instruction: They point the default builder at the original
instruction, then call exec_all() and group() to select the same
execution controls the original instruction had, and then maybe call
annotate() to clone the debug annotation from the original
instruction.
In fact an optimization pass missing any of these steps is likely to
be broken if the intention was to emit new code based on a preexisting
instruction, so let's make it easy for passes to do the right thing by
having an fs_builder constructor that automates the task of setting up
a builder to emit a given instruction provided as argument.
The following patches fix all cases I've found in which we weren't
explicitly initializing the execution controls of the emitted
instructions, and clean-up optimization passes which were already
doing the right thing to use the new constructor.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Images take up zero uniform slots in the nir_shader::num_uniforms
calculation, but nir_setup_uniforms needs to be executed even if the
program has no non-image uniforms so the driver-specific image
parameters are uploaded. nir_setup_uniforms is a no-op if there are
really no uniforms, so checking the num_uniform count is useless in
any case.
The nir_setup_inputs and _outputs changes shouldn't lead to any
functional change, they are just meant to preserve the symmetry
between them and nir_setup_uniforms.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Image variables need to allocate additional uniform slots over
nir_shader::num_uniforms. nir_setup_uniforms() overwrites the values
imported from the SIMD8 visitor and then exits early before entering
the nir_shader::uniforms loop, so image uniforms are never re-created.
Instead leave the imported values alone, they *must* be the same for
the uniform layout of both runs to be compatible.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Rewrite the NIR atomic counter intrinsics translation code making use
of the recently introduced surface builder. This will allow the
removal of some of the functionality duplicated between the visitor
and surface builder.
v2: Drop VEC4 suport.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Implement helper functions that can be used to construct and send
untyped and typed surface read, write and atomic messages to the
shared dataport unit easily.
v2: Drop VEC4 suport.
v3: Reimplement in terms of logical send opcodes.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This will be handy to avoid some ugly ternary operators in the next
patch, like:
fs_reg reg = (size == 0 ? null_reg_ud() : vgrf(..., size));
Because a zero-size register allocation is guaranteed not to ever be
read or written we can just return the null register. Another
possibility would be to actually allocate a zero-size VGRF what would
involve defining a zero-size register class in the register allocator
and a considerable amount of churn.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Each logical variant is largely equivalent to the original opcode but
instead of taking a single payload source it expects its arguments
separately as individual sources, like:
typed_surface_write_logical null, coordinates, source, surface,
num_coordinates, num_components
This patch defines the opcodes and usual instruction boilerplate,
including a placeholder lowering function provided mainly as
documentation for their source registers.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This cleans up the VEC4 implementation of setup_uniform_values()
somewhat and will avoid duplication of the image uniform upload code
by having a common interface to upload a vector of uniforms on either
back-end.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This should match the set of cases in which we currently call fail()
or no16() from the emit_texture_*() methods and the ones in which
emit_texture_gen4() enables the SIMD16 workaround.
Hint for reviewers: It's not a big deal if I happen to have missed
some case here, it will just lead to an assertion failure down the
road which is easily fixable, however being stricter than necessary
won't cause any visible breakage, it would just decrease performance
silently due to the unnecessary message splitting, so feel free to
double-check that all cases listed here already cause a SIMD8/16
fall-back with the current texturing code -- You may want to skip over
the Gen5-6 cases though if you don't have pencil and paper at hand.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Unlike its Gen5 and Gen7 counterparts this patch isn't a plain
refactor of the previous Gen4 texturing code, it's more of a rewrite
largely based on emit_texture_gen4_simd16(). The reason is that on
the one hand the original emit_texture_gen4() code didn't seem easily
fixable to be SIMD width-invariant and had plenty of clutter to
support SIMD-width workarounds which are no longer required. On the
other hand emit_texture_gen4_simd16() was missing a number of
SIMD8-only opcodes. This should generalize both and roughly match
their current behaviour where there is overlap.
Incidentally this will fix the following piglits on Gen4:
arb_shader_texture_lod.execution.arb_shader_texture_lod-texgrad
arb_shader_texture_lod.execution.tex-miplevel-selection *gradarb 2d
arb_shader_texture_lod.execution.tex-miplevel-selection *gradarb 3d
arb_shader_texture_lod.execution.tex-miplevel-selection *projgradarb 2d
arb_shader_texture_lod.execution.tex-miplevel-selection *projgradarb 2d_projvec4
arb_shader_texture_lod.execution.tex-miplevel-selection *projgradarb 3d
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This should be largely equivalent to emit_texture_gen5() except for
slight codestyle changes and the use i965 opcodes instead of the
ir_texture_opcode enum, see "i965/fs: Implement lowering of logical
texturing opcodes on Gen7+." for the mapping between them.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These weren't being handled by emit_texture_gen7() but we can easily
lower them here for consistency with other texturing opcodes.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This should be largely equivalent to emit_texture_gen7() except that
we now get i965 sampling opcodes directly rather than
ir_texture_opcode enum values. The mapping is as follows:
- ir_tex -> SHADER_OPCODE_TEX
- ir_txb -> FS_OPCODE_TXB
- ir_txl -> SHADER_OPCODE_TXL
- ir_txd -> SHADER_OPCODE_TXD
- ir_txf -> SHADER_OPCODE_TXF
- ir_txf_ms -> SHADER_OPCODE_TXF_CMS
- ir_txs -> SHADER_OPCODE_TXS
- ir_query_levels -> SHADER_OPCODE_TXS too, the visitor will make
sure that the provided lod value is zero in this
case.
- ir_lod -> SHADER_OPCODE_LOD
- ir_tg4 -> SHADER_OPCODE_TG4_OFFSET if the offset value is not
immediate, SHADER_OPCODE_TG4 otherwise.
Other than that there are only minor changes and style fixes like the
implementation now being factored out in static functions to improve
encapsulation.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This hasn't been overallocating space for the header for a long time.
It still leaves the header uninitialized though until the generator
fixes it.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
So that it's left uninitialized by LOAD_PAYLOAD, we only need to
reserve space for it in the message since it will be initialized
implicitly by the generator.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
dispatch_width is global for a single compilation and doesn't
necessarily match the desired execution width if we had to lower the
original full-width instruction due to hardware limitations. These
were all inside a Gen4-specific branch so this patch shouldn't have
any effect on more recent hardware.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Each logical variant is largely equivalent to the original opcode but
instead of taking a single payload source it expects the arguments
separately as individual sources, like:
tex_logical dst, coordinates, shadow_c, lod, lod2,
sample_index, mcs, sampler, offset,
num_coordinate_components, num_grad_components
This patch defines the opcodes and usual instruction boilerplate,
including a placeholder lowering function provided mostly as
documentation for their source registers.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The only non-trivial thing it still has to do is figure out where to
take the src/dst depth values from and predicate the instruction if
discard is in use. The manual SIMD unrolling logic in the dual-source
case goes away because this is now handled transparently by the SIMD
lowering pass.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This does essentially the same thing as
fs_visitor::emit_single_fb_write(), with some slight differences:
- We don't have to worry about exec_size and use_2nd_half anymore,
16-wide sources have already been lowered to 8-wide thanks to the
previous commit and the manual argument unzipping is no longer
required.
- The src/dst_depth and sample_mask values are now explicit sources
of the instruction instead of being taken from the visitor state
directly. The same goes for the kill-pixel mask that will be
passed to the instruction explicitly as predicate.
- Everything is now done in static functions to improve
encapsulation.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This shouldn't have any effect because we don't emit logical
framebuffer writes yet.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's no need to initialize the wrong half of oMask in the payload
when we're doing an 8-wide framebuffer write because it will be
ignored by the hardware anyway. By doing it this way we can let the
SIMD lowering pass split the sample_mask source as a regular
per-channel source, otherwise we would have to introduce some sort of
per-instruction source query or use fs_inst::header_size for the
lowering pass to be able to find out whether some source is
header-like, and leave the source untouched in that case.
As a bonus this achieves the same purpose as the previous code without
making use of the SET_OMASK pseudo-instruction, which will be removed
in a future commit.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Flatten the if ladder to match the way that the ordering of these
fields is specified in the hardware documentation a bit more closely.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In cases where the color0 argument wasn't being provided,
emit_single_fb_writes() would take the alpha channel directly from the
visitor state instead of taking it from its arguments. This sort of
hack didn't fit nicely into the logical send-message approach because
all parameters of the instruction have to be visible to the SIMD
lowering pass for it to be able to split them into halves at all.
Fix it by using LOAD_PAYLOAD in fs_visitor::emit_fb_writes() to
provide an actual color0 vector with undefined contents except for the
alpha component to match the previous behavior when no color buffers
are enabled.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's surprising that we weren't checking for this already. A future
patch will cause code like the following to be emitted:
MOV(16) tmp<1>:uw, src
MOV(8) dst<1>:ud, tmp<8,8,1>:ud
The second MOV comes from the expansion of a LOAD_PAYLOAD header copy,
so I don't have control over its types. Copy propagation will happily
turn this into:
MOV(8) dst<1>:ud, src
Which has different semantics. Fix it by preventing propagation in
cases where a single channel of the instruction would span several
channels of the copy (this requirement could in fact be relaxed if the
copy is just a trivial memcpy, but this case is unusual enough that I
don't think it matters in practice).
I'm deliberately only checking if the type of the instruction is
larger than the original, because the converse case seems to be
handled correctly already in the code below.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were previously guessing the half based on the EOT flag which seems
rather gross.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The logical variant is largely equivalent to the original opcode but
instead of taking a single payload source it expects its arguments
that make up the payload separately as individual sources, like:
fb_write_logical null, color0, color1, src0_alpha,
src_depth, dst_depth, sample_mask, num_components
This patch defines the opcode and usual instruction boilerplate,
including a placeholder lowering function provided mainly as
self-documentation.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This lowering pass implements an algorithm to expand SIMDN
instructions into a sequence of SIMDM instructions in cases where the
hardware doesn't support the original execution size natively for some
particular instruction. The most important use-cases are:
- Lowering send message instructions that don't support SIMD16
natively into SIMD8 (several texturing, framebuffer write and typed
surface operations).
- Lowering messages that don't support SIMD8 natively into SIMD16
(*cough*gen4*cough*).
- 64-bit precision operations (e.g. FP64 and 64-bit integer
multiplication).
- SIMD32.
The algorithm works by splitting the sources of the original
instruction into chunks of width appropriate for the lowered
instructions, and then interleaving the results component-wise into
the destination of the original instruction. The pass is controlled
by the get_lowered_simd_width() function that currently just returns
the original execution size making the whole pass a no-op for the
moment until some user is introduced.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v2: Reverse order of the source transformations and split_inst emit
call to make the code a bit easier to understand.
Typically BAD_FILE sources are used to mark a source as not present
what implies that no registers are read. This will become much more
frequent with logical send opcodes which have a large number of
sources, many of them optionally used and marked as BAD_FILE when they
aren't applicable. It will prove to be useful to be able to rely on
the value of regs_read() regardless of whether a source is present or
not.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
And start using it in fs_builder::LOAD_PAYLOAD(). This will be used
to emit logical send message opcodes which have an unusually large
number of arguments.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This pass will house ad-hoc lowering code for several send
message-like virtual opcodes that will represent their logically
independent arguments as separate instruction sources rather than as a
single payload blob. This pass will basically just take the separate
arguments that are supposed to be part of the payload and concatenate
them to construct a message in the form required by the hardware.
Virtual instructions in separate-source form will eventually allow
some simplification of the visitor code and make several
transformations easier like lowering SIMD16 instructions to SIMD8
algorithmically in cases where the hardware doesn't support the former
natively.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This cleans up fs_inst::regs_read() slightly by disentangling the
calculation of "components" from the handling of message payload
arguments. This will also simplify the SIMD lowering and logical send
message lowering passes, because it will avoid expressions like
'regs_read * REG_SIZE / component_size' which are not only ugly, they
may be inaccurate because regs_read rounds up the result to the
closest register multiple so they could give incorrect results when
the component size is lower than one register (e.g. uniforms). This
didn't seem to be a problem right now because all such expressions
happen to be dealing with per-channel GRFs only currently, but that's
by no means obvious so better be safe than sorry.
v2: Split PIXEL_X/Y and LINTERP into separate case blocks.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
For some reason the loop that rewrites all occurrences of the
coalesced register was iterating over all possible offsets until it
would find one that compares equal to the offset of a source or
destination of any instruction in the program. Since the mapping
between old and new offsets is already available in the regs_to_offset
array and we know that the whole register has been coalesced we can
just look it up.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
The register coalesce pass wasn't rewriting the destination and
sources of instructions that accessed the second half of a coalesced
register previously copied with a 16-wide MOV instruction. E.g.:
| ADD (16) vgrf0:f, vgrf0:f, 1.0:f
| MOV (16) vgrf1:f, vgrf0:f
| MOV (8) vgrf2:f, vgrf0+1:f { sechalf }
would get incorrectly register-coalesced into:
| ADD (16) vgrf1:f, vgrf1:f, 1.0:f
| MOV (8) vgrf2:f, vgrf0+1:f { sechalf }
The reason is that the mov[i] pointer was being left equal to NULL for
every other register. The fact that we've made it to the rewrite loop
implies that the whole register will be coalesced, so it doesn't seem
right not to update something that uses it depending on whether mov[i]
is NULL or not. Fixes an amount of texturing and image_load_store
piglit tests on my SIMD-lowering branch.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
register_coalesce() was considering the exec_size of the MOV
instruction alone to decide whether the register at offset+1 of the
source VGRF was being copied to inst->dst.reg_offset+1 of the
destination VGRF, which is only a valid assumption if the move has a
32-bit execution type. Use regs_read() instead to find out the number
of registers copied by the instruction.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This adds to the common radeon streamout code, support
for multiple streams.
It updates radeonsi/r600 to set the enabled mask up.
v2: update for changes in previous patch.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This will be used here later.
v2: update atom sizes
add check for old vs new enabled mask
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Page 497 of the PDF, section '17.4.3.1 Clearing Individual Buffers' of the
OpenGL 4.5 spec states:
"An INVALID_ENUM error is generated by ClearBufferiv and
ClearNamedFramebufferiv if buffer is not COLOR or STENCIL."
Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.buffer.clear_bufferiv
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
A simple shader such as
vec4 color;
color.xy.x = 1.0;
would cause ir_assignment::set_lhs() to generate bogus IR:
(swiz xy (swiz x (constant float (1.0))))
We were setting the number of components of each new RHS swizzle based
on the highest channel used in the LHS swizzle. So, .xy.y would
generate (swiz xy (swiz xx ...)), while .xy.x would break.
Our existing Piglit test happened to use .xzy.z, which worked, since
'z' is the third component, resulting in an xxx swizzle.
This patch sets the number of swizzle components based on the size of
the LHS swizzle's inner value, so we always have the correct number
at each step.
Fixes new Piglit tests glsl-vs-swizzle-swizzle-lhs-[23].
Fixes ir_validate assertions in in Metro 2033 Redux.
v2: Move num_components updating completely out of update_rhs_swizzle
(suggested by Timothy Arceri). Simplify.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Same check is made for glBindFragDataLocationIndexed but it was missing
when using layout qualifiers.
Fixes following Piglit test:
arb_blend_func_extended-output-location
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Change function to get all gl_constants for inspection, this is used
by follow-up patch.
v2: rebase, update function documentation
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
It's a bunch of work for us to emit it (and its uniforms), more work for
the kernel to validate it, and additional work for the CLE to read
it. Improves es2gears framerate by about 50%.
Signed-off-by: Eric Anholt <eric@anholt.net>
Since the conversion to keeping validated shaders around for the BO's
lifetime, we haven't been checking that rendering doesn't happen to
shaders. Make vc4_use_bo check that always, and just don't use it for the
VC4_MODE_SHADER case (so now modes are unused)
We don't want anything to appear after we've kicked off the render (and
thus job flush), since that might then get written out to the tile
allocation state.
Signed-off-by: Eric Anholt <eric@anholt.net>
Glamor asks GBM for the handle of the BO, then flinks it itself. We
were marking the bo non-private in the flink and dmabuf (DRI3) paths,
but not the GEM handle path. As a result, non-pageflipping DRI2
swapbuffers (EGL apps, in particular) were never updating the texture.
The meta CopyImageSubData path uses BlitFramebuffers to do the actual copy.
The only thing that can affect BlitFramebuffers other than the currently
bound framebuffers is the scissor so we need to save that off and reset it.
If we don't do this, applications that use a scissor together with
CopyImageSubData will get accidentally scissored copies.
Tested-by: Markus Wick <markus at selfnet.de>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This adds support for queries against the non-0 vertex streams.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
It only checks fragment textures and ignores other shaders, which makes it
incomplete, and textures are already finalized in update_single_texture.
There are no piglit regressions.
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes piglit:
spec@glsl-1.30@execution@fs-texture-sampler2dshadow-10
spec@glsl-1.30@execution@fs-texture-sampler2dshadow-11
v2: use st_shader_stage_to_ptarget
Reviewed-by: Brian Paul <brianp@vmware.com>
By using 'Tobias Klausmann' piglit test-suite patch. We obtain
a full 12/12 passes using this patch. By 'faking' to claim
support for this extension we obtain 7 fails and 5 passes.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Tested-by: Furkan Alaca <falaca@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
This is part of ARB_gpu_shader5, and this passes
all the piglit tests currently available.
v2: use macros from the fine derivs commit.
add comments.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
EGL_EXT_image_dma_buf_import now supports those formats.
Tests:
- Tested by Piglit ext_image_dma_buf_import-transcode-nv12-as-r8-gr88.
- Tested by Peter in Kodi/XBMC to obtain 60fps NV12 transcode at 4K.
Tested-by: Peter Frühberger <peter.fruehberger@gmail.com>
Signed-off-by: Chad Versace <chad.versace@intel.com>
The Kodi/XBMC developers want to transcode NV12 to RGB with OpenGL shaders,
importing the two source planes through EGL_EXT_image_dma_buf_import. That
requires importing the Y plane as an R8 EGLImage and the UV plane as either an
RG88 or GR88 EGLImage.
This patch teaches the driver-independent part of EGL about the new
formats. Real driver support is left for follow-up patches.
The new formats landed in airlied's kernel branch 'drm-next' on July 24.
Tested-by: Peter Frühberger <peter.fruehberger@gmail.com>
Signed-off-by: Chad Versace <chad.versace@intel.com>
It seems like they're never necessary, and actively cause harm. This
fixes some of the barrier-related piglits.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Gallium exposes it unconditionally, so do our best to support it. It
fails on the negative index cases, but those seem unlikely to be used in
the wild.
Previously we had a fixed array to track kills, since they don't
generate an SSA value, and then cheated by stuffing them in the
outputs array before sending things through depth/sched/etc. But
store instructions will need similar treatment. So convert this
over to a more general array of instructions that must be kept
and fix up the places that were previously relying on kills being
in the output array.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
For store instructions, the "dst" register is a read register, not a
written register. (Ie. it is the address to store to.) Lets not
confuse register allocation, scheduling, etc, with these details.
Instead just leave a dummy instr->regs[0], and take "dst" from
instr->regs[1] and srcs following.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Add 'enum ir3_driver_param' to track driver-param slots, and a
create_driver_param() helper to avoid having the knowledge about
where driver params are placed in const regs spread throughout
the code as we add additional driver-params.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
With stream-out (transform-feedback) we have the case where resources
are *written* by the gpu, which needs basically the same tracking to
figure out when rendering must be flushed.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Details of the cmdstream packets are different between a3xx and a4xx,
but the logic about the layout of const registers is the same, as that
is dictated by the ir3 shader compiler. So rather than duplicating
logic that is tightly coupled to ir3 between a3xx and a4xx, move this
into ir3 and use per-generation callbacks for to build the cmdstream
packets.
This should make it easier to pass additional const regs (such as for
transform feedback). And it also keeps the layout internal to ir3 in
case we want to make the layout more dynamic some day.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Since for transform-feedback, we'll need more than just the TGSI
tokens from the state object, just pass the entire state object to
ir3_shader_create(). This also cleans things up a bit for some
day in the future when we could take shader either as TGSI or
directly NIR (for ex, glsl2nir or spirv2nir paths). In the same
spirit, drop extra args from ir3_compile_shader_nir() (since it
can anyways get what it needs from the ir3_shader_variant).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This adds support for fine derivatives and enables
ARB_derivative_control on radeonsi.
(just fell out of my working out interpolation)
v2: cleanup some bits, write a comment
v2.1: take Michel's comment from the mailing list
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is required as part of ARB_gpu_shader5.
no backend changes are required for this, or if
any are, it's the same ones as for samplers.
v2: use get_indirect_index (Marek)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds the frontend support, however the llvm
backend produces the wrong pattern, however
we can conditionalise enabling ARB_gpu_shader5
on whatever version of llvm we fix this in.
v2: drop unneeded sampler_indirect checks (Marek)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is prep work for using it in the interpolation code
later.
Also add storage for the input interpolation mode so we
can pick it up later.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is prep work for reusing this in the interpolation
code later.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The 420pack extension enables various GLSL rules that need to be applied
to any GLSL 4.20+ shader even if the extension is not explicitly
enabled.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Commit 17f714836 (mesa: rearrange texture error checking order) moved
the width/height/depth == 0 allowance before checking if the image was
there. This was in part due to depth having to be == 1 for 2D images and
width having to be == 1 for 1D images. Instead relax the height/depth
checks to also accept 0 as valid.
With this change,
bin/arb_direct_state_access-get-textures
starts passing again.
Fixes: 17f714836 (mesa: rearrange texture error checking order)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
This allows us to handle cases when texImage->_BaseFormat doesn't match
_mesa_format_get_base_format(texImage->Format). _BaseFormat is what we
care about in this function.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
After recent addition of pbo testing in piglit test getteximage-luminance,
it fails on i965. This patch makes a sub test pass.
This patch adds a clear color operation to meta pbo path, which I think is
better than falling back to software path.
V2: Fix color mask for GL_LUMINANCE_ALPHA
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Replace a call to mesa_base_tex_format() that handles only internal
formats with a call to the new _mesa_unpack_format_to_base_format()
function that handles allowed unpack formats and does not care for
internal formats at all.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This is an optimization which avoids setting pixel transfer operations
when not required. _mesa_ReadPixels falls back to slower path if
transfer operations are set.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
_mesa_meta_pbo_GetTexSubImage() uses _mesa_meta_BlitFrameBuffer(),
which will do fragment clamping if enabled. But fragment clamping
doesn't affect ReadPixels and GetTexImage.
Without this patch, piglit test arb_color_buffer_float-clear fails,
when forced to use the meta pbo path.
v2: Apply this fix to both glReadPixels and glGetTexImage.
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Meta pbo path for ReadPixels rely on BlitFramebuffer which doesn't support
signed to unsigned integer conversions and vice versa.
Without this patch, piglit test fbo_integer_readpixels_sint_uint fails, when
forced to use the meta pbo path.
v2: Make need_signed_unsigned_int_conversion() a static function. (Iago)
Bump up the comment and the commit message. (Jason)
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral <itoral@igalia.com>
Currently used ctx->_ImageTransferState check is not sufficient
because it doesn't include the read color clamping enabled with
GL_CLAMP_READ_COLOR. So, use the helper function
_mesa_get_readpixels_transfer_ops().
Also, transfer operations don't affect glGetTexImage(). So, do
the check only for glReadPixles.
Without this patch, arb_color_buffer_float-readpixels test fails, when
forced to use meta pbo path.
V2: Add a comment and bump up the commit message.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
I was mistaken, I thought we already had fixed this in the kernel a
couple of years ago. We had not, and the broken read (the hardware
shifts the register output on 64bit kernels, but not on 32bit kernels) is
now enshrined into the ABI. I also had the buggy architecture reversed,
believing it to be 32bit that had the shifted results. On the basis of
those mistakes, I wrote
commit c8d3ebaffc
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Apr 29 13:32:38 2015 +0100
i965: Query whether we have kernel support for the TIMESTAMP register once
Now that we do have an extended register read interface for always
reporting the full 36bit TIMESTAMP (irrespective of whether the hardware
is buggy or not), make use of it and in the process fix my reversed
detection of the buggy reads for unpatched kernels.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Martin Peres <martin.peres@linux.intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Tested-and-acked-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
There's no need to attempt to avoid overlapping generic i/o with patch
i/o. By the same token, we can't merge patch and non-patch loads/stores.
This fixes at least the
tes-both-input-array-*-index-rd
tessellation variable-indexing tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
There's a special AL2P instruction (called AFETCH in nv50 ir) which
computes a "physical" value to be used with indirect addressing with ALD.
Fixes
tcs-input-array-*-index-rd
tcs-output-array-*-index-wr
varying-indexing tessellation tests on Kepler.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
radeon_fbo.c: In function 'radeon_map_renderbuffer_s8z24':
radeon_fbo.c:162:9: warning: variable 'ret' set but not used [-Wunused-but-set-variable]
int ret;
^
radeon_fbo.c: In function 'radeon_map_renderbuffer_z16':
radeon_fbo.c:200:9: warning: variable 'ret' set but not used [-Wunused-but-set-variable]
int ret;
^
radeon_fbo.c: In function 'radeon_map_renderbuffer':
radeon_fbo.c:242:8: warning: variable 'ret' set but not used [-Wunused-but-set-variable]
int ret;
^
radeon_fbo.c: In function 'radeon_unmap_renderbuffer':
radeon_fbo.c:419:14: warning: variable 'ok' set but not used [-Wunused-but-set-variable]
GLboolean ok;
^
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fix many instances of:
main/shaderapi.c: In function '_mesa_GetSubroutineUniformLocation':
main/shaderapi.c:2176:7: warning: format not a string literal and no format arguments [-Wformat-security]
_mesa_error(ctx, GL_INVALID_OPERATION, api_name);
^
Ideally, many of these error messages should be improved to indicate
which argument is incorrect as we do in other parts of Mesa.
Reviewed-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
When we're error-checking the target, we also need to check if the
corresponding extension is supported.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
The previous fix added GL_TEXTURE_CUBE_MAP_ARRAY but we also need
to support GL_TEXTURE_CUBE_MAP (via DSA).
So in the end, GL_TEXTURE_3D is the only (legal) target for
glCompressedTex*SubImage3D() which needs additional compression
format checking. GL_TEXTURE_2D_ARRAY, GL_TEXTURE_CUBE_MAP_ARRAY
and GL_TEXTURE_CUBE_MAP are basically 2D images which support all
compressed formats.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This doesn't provide much value since it's all done. The qbo interaction
is fairly trivial.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This extension is about setting expectation on GL4.1 implementations
rather than actually enforcing things. So once you support GLSL 410
then you support this in theory.
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds some missing pieces to nir/i965,
it is lightly tested on my Haswell.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves the width/height/depth == 0 check to the front and avoids
doing any other checking when that is the case.
Also moves the dimensions check after the format/type checks so that we
don't bail out with success on a width/height/depth == 0 request when
the format/type don't match.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91425
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
The current message makes it seem like the zoffset is invalid.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Brian Paul <brianp@vmware.com>
Basically, two different target error checks are chained consecutively, and the
second one is executed regardless the result of the first one. This produces an
incorrect error if the first check fails but is overrided by the second.
This patch conditions the execution of the second check to a successful pass of
the first one.
Fixes 1 dEQP test:
* dEQP-GLES3.functional.negative_api.texture.compressedtexsubimage3d
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
All LLVM API calls that require an ostream object have been removed from
the disassemble() function, so we don't need to use this class to wrap
_debug_printf() we can just call this function directly.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Apparently a multi-word load can potentially overwrite the indirect
sources, so make sure that RA picks different registers for those.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Uncomment the various functionality that was already there and add in
obvious missing bits that parallel vp/gp/fp functionality.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
since this touches drivers, only enable it on gallium
for now for drivers reporting GLSL 1.30 or above.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Just add support for the subroutine type to the
glsl->tgsi convertor.
v1.1: add subroutine to int support.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fleshes out the APIs, using the program resource
APIs where they should match.
It also sets the default values to valid subroutines.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Add support for the subroutine uniform type ir->mesa.cpp
v1.1: add subroutine to int to switch
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This fleshes out the ARB_program_query support for the
APIs that ARB_shader_subroutine introduces, leaving
some TODOs for later addition.
v2: reworked for lots of the ARB_program_interface_query
entry points and tests
v3: use common function to test for subroutine support
v3.1: add tess, fix missing breaks
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds linker support for subroutine uniforms, they
have some subtle differences from real uniforms, we also hide
them and they are given internal uniform names.
This also adds the subroutine locations and subroutine uniforms
to the program resource tracking for later use.
v1.1: drop is_subroutine_def
v2: handle explicit location properly, ARB_explicit_location
has a lot of language for subroutine shaders.
Calculate a link time the number of compatible subroutines
for a uniform, to make program resource easier later.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds the necessary storage for subroutine info to gl_shader.
v2: add comments, rename one member
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This lowers the enhanced ir_call using the lookaside table
of subroutines into an if ladder. This initially was done
at the AST level but it caused some ordering issues so a separate
pass was required.
v2: clone return value derefs.
v2.1: update for subroutine->int convert.
v2.2: add a clone for the array index
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is the guts of the GLSL parser and AST support for
shader subroutines.
The code creates a subroutine type in the parser, and
uses that there to validate the identifiers. The parser
also distinguishes between subroutine types/function prototypes
/uniforms and subroutine defintions for functions.
Then in the AST conversion it recreates the types, and
stores the subroutine definition info or subroutine info
into the ir_function along with a side lookup table in
the parser state. It also converts subroutine calls into
the enhanced ir_call.
v2: move to handling method calls in
function handling not in field selection.
v3: merge Chris's previous parser patches in here, to
make it clearer what's changed in one place.
v3.1: add more documentation, drop unused include
v3.2: drop is_subroutine_def
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds a ir_variable which contains the subroutine uniform
and an array rvalue for the deref of that uniform, these
are stored in the ir_call and lowered later.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We need to store two sets of info into the ir_function,
if this is a function definition with a subroutine list
(subroutine_def) or if it a subroutine prototype.
v1.1: add some more documentation.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This checks if core profile and shader subroutine extension
is enabled.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This handles converting the shader stages to the internal
prefix along with the program resource interfaces.
v2: add tess support
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This stops dead code from removing subroutines types,
we need these for the queries to work properly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This type will be used to store the name of subroutine types
as in subroutine void myfunc(void);
will store myfunc into a subroutine type.
This is required to the parser can identify a subroutine
type in a uniform decleration as a valid type, and also for
looking up the type later.
Also add contains_subroutine method.
v2: handle subroutine to int comparisons, needed
for lowering pass.
v3: do subroutine to int with it's own IR
operation to avoid hacking on asserts (Kayden)
v3.1: fix warnings in this patch, fix nir,
fix tgsi
v3.2: fixup tests
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
tests: fix warnings
Add the shader subroutine to the core only API list,
and fixup dispatch_sanity to suit.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is mainly needed for tessellation where a VS can be bound as VS, ES,
or LS, and TES (tess. evaluationshader) can be bound as VS or ES or neither.
Therefore we need the ability to move pointers to descriptors between
shaders arbitrarily.
The idea is that the context has a mapping from PIPE_SHADER_x to
SPI_SHADER_USER_DATA_x. After a shader is enabled or disabled,
si_shader_change_notify should be called to update this mapping accordingly.
There is a dirty flag for each shader pointer, but only one emit function
for all pointers in the whole context, whose code and logic is separated
from descriptors.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
emit_store will be reimplemented for tessellation control shader outputs
where only radeon_llvm_saturate will be used, but radeonsi will want to
fall back to radeon_llvm_emit_store for other register types.
This exposes both functions.
The idea is to allow 32 normal varyings and 32 patch varyings,
a total of 64. Previously, only a total of 32 was allowed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
With the exception of always-taken switch cases (which are
indistinguishable from straight line code in our IR), this
disallows use of the builtin barrier() function in all the
places it may not appear.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tessellation control outputs can be read in directly without first
having been written. Accessing these will require some special logic
anyways, so just let them through.
V2: Never lower tess control output reads, whether patch or not -- both
can be read back by other threads.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is technically not needed, but it makes the compiler return a better
error message if tessellation is used with GLSL < 1.50.
Instead of:
error: syntax error, unexpected NEW_IDENTIFIER, expecting $end
It returns:
error: #version 150 layout qualifier `triangles' used
And the tessellation spec says:
OpenGL 3.2 and GLSL 1.50 are required.
So it makes perfect sense.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There is no way to lower them, because the array sizes are unknown
at compile time.
Based on a patch from: Fabian Bieler <fabianbieler@fastmail.fm>
v2: add comments
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This is to prevent a name conflict in tessellation shaders built-in interface
blocks.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Similar to gl_ClipDistance -> gl_ClipDistanceMESA
v2: - renamed is_mesa_var to lowered_builtin_array_variable
- moved LowerTessLevel into gl_constants
- cosmetic changes in lower_tess_level.cpp
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Marek: require a tess eval shader if a tess control shader is present
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tessellation dependencies added by Marek.
v2: require tessellation in addition to atomics/images for some glGet queries
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This has been a no-op due to performance concerns. From now on, drivers
should decide when they don't want to unmap, not the winsys.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
buffer_unmap is currently a no-op on radeon and done correctly on amdgpu.
I plan to fix it for radeon, but before that, all occurences of buffer_unmap
that can negatively affect performance in the future must be removed.
There are 2 reasons for removing buffer_unmap calls:
- There is a likelihood that buffer_map will be called again, so we don't
want to unmap yet.
- The buffer is being released, which automatically unmaps it.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
None of the draw states are used here.
This fixes a crash in piglit: ext_framebuffer_blit/blit-early
Calling st_manager_validate_framebuffers is the minimum requirement here.
Cc: mesa-stable@lists.freedesktop.org
One of the plugins I use with vim "helpfully" added an underscore to the
front of mode for kicks.
Obviously this isn't a feature used very often because it's been broken
since d986cb7c70 (since May 20th), and no one has noticed.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
The scons equivalent of the previous commit - just fold the almost
identical driver + main Sconscripts.
Cc: Alexander von Gluck IV <kallisti5@unixzen.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Simplify things by merging the two makefiles. This way we can combine
the duplicated HAVE_PLATFORM_ checks, and build the library without
having a separate static library.
v2: use $() when referencing variables, use correct define (Matt)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The only user of it (libgbm.la) immediately links it. Just build it
directly into the library.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Support for Windows has been removed for a while now, and virtually
every POSIX compliant system provides strcasecmp, strdup and snprintf.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It is simply not possible to use the dri backend without shared glapi,
as the alternative provider (libGL) is not always present. We have fixed
the build for a while now, so we can rip this out.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It has been broken since 2011 with commit c98ea26e16b(egl: Make
egl_dri2 and egl_glx built-in drivers.). When the backends got merged
into the main library each entry point was guarded by a
_EGL_BUILT_IN_DRIVER_* define.
As the define was missing, the linker kindly removed the whole of the
dri2 backend, thus we did not notice any errors due to the unresolved
link to xcb and friends.
Cc: Chia-I Wu <olv@lunarg.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
As of last commit the only user of it (radeon/r200) no longer uses it.
As such let's remove it and cleanup the nasty hacks that we had in place
to support this.
v2: Leave LIBDRM_CFLAGS around.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
The original code only half considered hyperz as an option. As per
previous commit "major != 2 cannot occur" we can simply things, and
allow users to set the option if they choose to do so.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
As mentioned by Michel Dänzer
"FWIW though, any code which is specific to radeon DRM major version 1
can be removed, because that's the UMS major version."
and Marek Olšák
"major != 2" can't occur. You don't have to check the major version at
all and you can just assume it's always 2."
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Same as previous commit - unused (gbm is not a thing outside the
autotools build).
v2: Remove trailing HAVE_LIBDRM.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
GBM (the only user of kms-dri) is currently not available under Android.
Considering we have no way of testing/using this let's not bother
building it for now.
Cc: Chih-Wei Huang <cwhuang@linux.org.tw>
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Double negatives in English language are normally avoided, plus the
former seems cleaner and more consistent.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
With the dri_interface.h clean of the macro, we can remove the final
only st/dri specific use of the very same.
Seemingly it was incorrectly used, as the build-time presence of dri2 is
not libdrm specific. At run-time, the code is already limited to dri2
use-cases plus returning true, when the extension is not present (or too
old) will likely lead to a crash as one tries to use it shortly after
the dri_with_format() call.
As a side effect this gives us a nice cleanup the builds.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
With the follow up commit we'll remove the __NOT_HAVE_DRM_H macro. As
requested by Ian HAVE_LIBDRM will be used instead, which will lead to
swrast including drm.h when libdrm package is available, even though we
don't need/make use of the header.
As the define is added after the AM_CFLAGS we cannnot use -UHAVE_LIBDRM,
but instead let's just add LIBDRM_CFLAGS. The latter of which will
expand to NULL when the libdrm package is not around.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
These conditionals are used to guard both dri modules and loader(s).
Currently if we try to build the gallium swrast dri module (without glx)
on a system that's missing libdrm the build will fail.
v2: Make sure we assign prior to checking the have_libdrm variable.
Cc: 10.6 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The type stored in gl_uniform_storage is the type of a single array
element not the array type so size was always 1.
V2: Dont validate sampler units pointing to 0
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This adds the new glGetTextureSubImage() and
glGetCompressedTextureSubImage() functions. Also update the
dispatch sanity test program.
v2: remove stray brace, move xi:include line in gl_API.xml, fix extension
number typo, s/program/texture/ in xml file.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
1. Reorganize the error checking code.
2. Lay groundwork for getting sub images by passing image offset and
dimensions to the error checking code.
3. Implement _mesa_GetnTexImageARB(), _mesa_GetTexImage() and
_mesa_GetTextureImage() all in terms of get_texture_image().
v2: pass offset/width/height/depth arguments to the error checking
function, avoid using magic width/height/depth values.
v3: remove unused bufSize param to get_texture_image()
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Needed for GL_ARB_get_texture_sub_image. But at this point, the
offsets are always zero and the sizes match the whole texture image.
v2: Fixes, suggestions from Laura Ekstrand:
* Fix calls to ctx->Driver.UnmapTextureImage() to pass the correct
slice value.
* Added comments and assertions to check zoffset+depth<=tex->Depth before
the 'img' loops.
* Added a new zoffset==0 assert in get_tex_memcpy().
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The new driver hook has x/y/zoffset and width/height/depth parameters
for the new glGetTextureSubImage() function.
The meta code and gallium state tracker are updated to handle the
new parameters.
Callers to Driver.GetTexSubImage() pass in offsets=0 and sizes equal
to the whole texture size.
v2: update i965 driver code, s/GLint/GLsizei/ in GetTexSubImage hook
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Since s3tc works for cube maps and 2D arrays, it should also work for
cube arrays. NVIDIA's driver supports this too. Seems like the spec
should say this.
This is a minor follow-on fix for the commit "mesa: fix up some texture
error checks".
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Include stdarg.h for va_list. Unbreaks the build on OpenBSD:
In file included from mesa/program/dummy_errors.c:24:
../src/mesa/main/errors.h:85: error: expected declaration specifiers or '...' before 'va_list'
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Generated by running:
git grep -l INLINE src/gallium/ | xargs sed -i 's/\bINLINE\b/inline/g'
git grep -l INLINE src/mesa/state_tracker/ | xargs sed -i 's/\bINLINE\b/inline/g'
git checkout src/gallium/state_trackers/clover/Doxyfile
and manual edits to
src/gallium/include/pipe/p_compiler.h
src/gallium/README.portability
to remove mentions of the inline define.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Marek Olšák <marek.olsak@amd.com>
This fixes the following piglit test:
ext_transform_feedback-immediate-reuse-uniform-buffer
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This fixes the following piglit test:
ext_transform_feedback-immediate-reuse-uniform-buffer
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
gcc says:
sb/sb_sched.cpp: In member function 'bool r600_sb::alu_group_tracker::try_reserve(r600_sb::alu_node*)':
sb/sb_sched.cpp:492:7: warning: suggest parentheses around operand of '!' or change '&' to '&&' or '!' to '~' [-Wparentheses]
if (!trans & fbs)
It happens to be harmless; if fbs is ever non-zero, it will be VEC_210,
which is 5, so (!trans & 5) == 1 and the branch works as expected. But
logical AND is clearly what was meant.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Commit c9dbdc0 introduced some dead code which is supposed to be used
once we have Yf/Ys tiling working and performing better. Ken reported
the issue that static analysis tool now shows warnings due to the dead
code. To fix these warnings, this patch reverts the changes made in
commit c9dbdc0.
It'll be better to add the Yf/Ys tiling selection code later, when we
are ready to use it.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
This is essentially the same problem fixed in an earlier patch for
immediates. Setting the stride to zero will be particularly useful
for my future SIMD lowering pass, because we will be able to just
check whether the stride of a source register is zero and skip
emitting the copies required to unzip it in that case.
Instead of setting stride to zero in every caller of emit_uniformize()
I've changed the function to return the result as its return value
(previously it was being written into a caller-provided destination
register), because this way we can enforce that the result is used with
the correct regioning from the function itself.
The changes to the prototype of its VEC4 counterpart are mainly for
the sake of symmetry, VEC4 registers don't have stride.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This fixes essentially the same problem as for immediates. Registers
of the UNIFORM file are typically accessed according to the formula:
read_uniform(r, channel_index, array_index) =
read_element(r, channel_index * 0 + array_index * 1)
Which matches the general direct addressing formula for stride=0:
read_direct(r, channel_index, array_index) =
read_element(r, channel_index * stride +
array_index * max{1, stride * width})
In either case if reladdr is present the access will be according to
the composition of two register regions, the first one determining the
per-channel array_index used for the second, like:
read_indirect(r, channel_index, array_index) =
read_direct(r, channel_index,
read(r.reladdr, channel_index, array_index))
where:
read(r, channel_index, array_index) = if r.reladdr == NULL
then read_direct(r, channel_index, array_index)
else read_indirect(r, channel_index, array_index)
In conclusion we can handle uniforms consistently with the other
register files if we set stride to zero. After lowering to a GRF
using VARYING_PULL_CONSTANT_LOAD in demote_pull_constant_loads() the
stride of the source is set to one again because the result of
VARYING_PULL_CONSTANT_LOAD is generally non-uniform.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
When the width field was removed from fs_reg the BROADCAST handling
code in opt_algebraic() started to miss a number of trivial
optimization cases resulting in the ugly indirect-addressing sequence
to be emitted unnecessarily for some variable-indexed texturing and
UBO loads regardless of one of the sources of BROADCAST being
immediate. Apparently the reason was that we were setting the stride
field to one for immediates even though they are typically uniform.
Width used to be set to one too which is why this optimization used to
work previously until the "reg.width == 1" check was removed.
The stride field of vector immediates is intentionally left equal to
one, because they are strictly speaking not uniform. The assertion in
fs_generator makes sure that immediates have the expected stride as
consistency check.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
We only consider a vgrf defined by a given block if the block writes to it
unconditionally. So far we have been checking this by testing that the
instruction is not predicated, however, in the case of BRW_OPCODE_SEL,
the predication is used to select the value to write, not to decide if
the write is actually done. The consequence of this was increased life
spans for affected vgrfs, which could lead to additional register pressure.
Since NIR generates selects for conditional writes this was causing massive
register pressure in a handful of piglit and dEQP tests that had a large
number of select operations with the NIR-vec4 backend.
Fixes the following piglit tests with the NIR-vec4 backend:
spec/glsl-1.50/execution/variable-indexing/vs-output-array-vec4-index-wr-before-gs
spec/glsl-1.50/execution/variable-indexing/gs-input-array-vec4-index-rd
spec/glsl-1.50/execution/variable-indexing/vs-output-array-vec2-index-wr-before-gs
spec/glsl-1.50/execution/variable-indexing/vs-output-array-vec3-index-wr-before-gs
spec/glsl-1.50/execution/variable-indexing/vs-output-array-float-index-wr-before-gs
Fixes 80 dEQP tests with the NIR-vec4 backend in the following category:
dEQP-GLES3.functional.ubo.*
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This fixes crashes in llvmpipe with LLVM 3.8 and also some piglit tests
on radeonsi that use the draw module.
This is just a temporary solution. The correct solution will require
creating a TargetMachine during gallivm initialization and pulling the
DataLayout from there. This will be a somewhat invasive change, and it
will need to be validatated on multiple LLVM versions.
https://llvm.org/bugs/show_bug.cgi?id=24172
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
This fixes a compilation warning introduced in commit 05a12c5
(gallium: add interface for writable shader images).
While we are at it, fix indentation and rename parameters according to
the gallium interface.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Only constbufs must be aligned to 0x100, but since all buffers can be
rebinded as constant buffers they must be also aligned.
This patch prevents this behaviour by aligning everything to 256-byte
increments at buffer creation.
This fixes dmesg fails for the following piglit test:
ext_transform_feedback-immediate-reuse-uniform-buffer -auto -fbo
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
NV50_3D_BIND_TSC only allows to bind 16 samplers, and since we don't
want to do anything with NV50_3D_BIND_TSC2, just limit the maximum
number of samplers to 16 like for nvc0.
This fixes dmesg fails with the following piglit test:
max-samplers
But the test still fails.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
As functions are inlined, and nir_lower_global_vars_to_local gets
run, all global variables are lowered to local variables.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It appears that the G80 did not have support for the sampler view
first/last clamping. Put the view's last level in the place of the
texture's so that it doesn't go past what the sampler view allows.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
There's no need to deal with samplers for texture size queries. That
code also was accidentally setting an invalid sIndirectSrc position, but
it can now just be removed.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Switch off hardware-generated binding tables and gather push
constants in the blorp. Blorp requires only a minimal set of
simple constants. There is no need for the extra complexity
to program a gather table entry into the pipeline.
Cc: kenneth@whitecape.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
When hardware-generated binding tables are enabled, use the hw-generated
binding table format when uploading binding table state.
Normally, the CS will will just consume the binding table pointer commands
as pipelined state. When the RS is enabled however, the RS flushes whatever
edited surface state entries of our on-chip binding table to the binding
table pool before passing the command on to the CS.
Note that the the binding table pointer offset is relative to the binding table
pool base address when resource streamer instead of the surface state base address.
v2: Fix possible buffer overflow when allocating a chunk out of the
hw-binding table pool (Ken).
v3: Remove extra newline and add missing brace around if-statement (Matt).
v4: Fix broken INTEL_DEBUG=shader_time for hw-generated binding tables.
Document PRM WaStateBindingTableOverfetch workaround.
Cc: kenneth@whitecape.org
Cc: mattst88@gmail.com
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Unlike normal software binding tables where the driver has to manually
generate and fill a binding table array which are then uploaded to the
hardware, the resource streamer instead presents the driver with an option
to fill out slots for individual binding table indices. The hardware
accumulates the state for these combined edits which it then automatically
flushes to a binding table pool when the binding table pointer state
command is invoked.
v2: Clarify binding table edit bit aligment (Topi).
v3: Make comments and function names more clearer (Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
This patch implements the binding table enable command which is also
used to allocate a binding table pool where where hardware-generated
binding table entries are flushed into. Each binding table offset in
the binding table pool is unique per each shader stage that are
enabled within a batch.
Also insert the required brw_tracked_state objects to enable
hw-generated binding tables in normal render path.
v2: - Use MOCS in binding table pool alloc for GEN8
- Fix spurious offset when allocating binding table pool entry
and start from zero instead.
v3: - Include GEN8 fix for spurious offset above.
v4: - Fixup wrong packet length in enable/disable hw-binding table
for GEN8 (Ville).
- Don't invoke HW-binding table disable command when we dont
have resource streamer (Chris).
v5: - Reorder the state cache invalidate flush so it happens in-between
enabling hw-generated binding tables and the previous sw-binding
table GPU state (Chris).
v6: - Do the same fix in v5 for gen7_disable_hw_binding_tables().
- Adhere to coding guidelines and make comments more informative.
Cc: kenneth@whitecape.org
Cc: syrjala@sci.fi
Cc: chris@chris-wilson.co.uk
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Check first if the hardware and kernel supports resource streamer. If this
is allowed, tell the kernel to enable the resource streamer enable bit on
MI_BATCHBUFFER_START by specifying I915_EXEC_RESOURCE_STREAMER
execbuffer flags.
v2: - Use new I915_PARAM_HAS_RESOURCE_STREAMER ioctl to check if kernel
supports RS (Ken).
- Add brw_device_info::has_resource_streamer and toggle it for
Haswell, Broadwell, Cherryview, Skylake, and Broxton (Ken).
v3: - Update I915_PARAM_HAS_RESOURCE_STREAMER to match updated kernel.
v4: - Always inspect the getparam.value (Chris Wilson).
v5: - Fold redundant devinfo->has_resource_streamer check in context create
into init screen.
Cc: kenneth@whitecape.org
Cc: chris@chris-wilson.co.uk
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
This gives the kernel a chance to validate and lock down the data,
without having to deal with mmap zapping.
With this, GLBenchmark stops on a texture relocations, because we'd
recycled a shader BO as another shader and failed to revalidate, since we
weren't clearing the cached validation state on mmap faults.
In particular, we were incorrectly accepting s3tc (and lots of others)
for CompressedTexSubImage3D (but not CompressedTexImage3D) calls with 3d
targets. At this time, the only allowed formats for these calls are the
bptc ones, since none of the specific extensions allow it (astc hdr would).
Also, fix up a bug in _mesa_target_can_be_compressed - 3d target needs to
be allowed for bptc formats.
Reviewed-by: Brian Paul <brianp@vmware.com>
On a release build, this makes the rest of vc4_qpu_validate.c go away
(the compiler didn't know that our qpu helper function calls had no
side effects).
These are really useful hints to the compiler in the absence of link-time
optimization, and I'm going to use them in VC4.
I've made the const attribute be ATTRIBUTE_CONST unlike other function
attributes, because we have other things in the tree #defining CONST for
their own unrelated purposes.
v2: Alphabetize.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Before, we were setting payload_last_use_ip for unused payload
registers to 0, which made them interfere with whatever the first
instruction wrote to due to the workaround for SIMD16 uniform arguments.
Just use -1 to mean "unused" instead, and then skip setting any
interferences for unused payload registers.
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
GAINED: 1
LOST: 0
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
regs_read() will handle LINTERP for us since the previous commit. In
addition, we were being too conservative, since it will only read 2
registers on SIMD8.
instructions in affected programs: 9061 -> 8893 (-1.85%)
helped: 10
HURT: 0
GAINED: 0
LOST: 0
All of the changes were due to spills being eliminated, mostly in KSP
shaders.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
If a source register in the push constant registers uses more than one
register, then we wouldn't update payload_last_use_ip for subsequent
registers.
Unlike most uniform data pushed into registers, the CS gl_LocalInvocationID
data varies per execution channel. Therefore for SIMD16 mode, we have vec16
data in the payload. In this case we then need to mark 2 registers in
payload_last_use_ip as last used by the instruction. There's a similar
situation for the z and w coordinates of gl_FragCoord for fragment shaders,
where it had only happened to work before because of some bogus interferences
which the next commit removes.
(Connor: added bit about gl_FragCoord to commit message)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <connor.w.abbott@intel.com>
This prevents an assertion failure in brw_fs_live_variables.cpp,
fs_live_variables::setup_one_write: Assertion `var < num_vars' failed.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This prevents an assertion failure in brw_fs_live_variables.cpp,
fs_live_variables::setup_one_read: Assertion `var < num_vars' failed.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
A fragment program from "Pixel Piracy" contains redundant OPTION
directives:
!!ARBfp1.0
OPTION ARB_precision_hint_fastest;
OPTION ARB_fog_exp2;
OPTION ARB_precision_hint_fastest;
OPTION ARB_fog_exp2;
...
We already allow redundant ARB_precision_hint_fastest directives, but
disallow the redundant (yet consistent) ARB_fog_exp2 directives, failing
to compile the program.
The specification seems to contradict itself - the main text says that
only one fog application option may be specified, but then backpedals,
indicating the intent is to disallow /contradictory/ flags. One of the
issues suggests that specifying contradictory ones is stupid, but
allowed, and only the last one should take effect.
Accepting multiple redundant (but consistent) directives seems harmless,
and like a reasonable interpretation of the specification. It also
fixes a fragment program found in the wild.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
With the last few patches a way was provided to influence lower layer miptree
layout and allocation decisions via flags (replacing bools). For simplicity, I
chose not to touch the tiling requests because the change was slightly less
mechanical than replacing the bools.
The goal is to organize the code so we can continue to add new parameters and
tiling types while minimizing risk to the existing code, and not having to
constantly add new function parameters.
v2: Rebased on Anuj's recent Yf/Ys changes
Fix non-msrt MCS allocation (was only happening in gen8 case before)
v3: small fix in assertion requested by Chad
v4: Use parens to get the order right from v3.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
With the last few patches a way was provided to influence lower layer miptree
layout and allocation decisions via flags (replacing bools). For simplicity, I
chose not to touch the tiling requests because the change was slightly less
mechanical than replacing the bools.
The goal is to organize the code so we can continue to add new parameters and
tiling types while minimizing risk to the existing code, and not having to
constantly add new function parameters.
v2: Rebased on Anuj's recent Yf/Ys changes
Fix non-msrt MCS allocation (was only happening in gen8 case before)
v3: small fix in assertion requested by Chad
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> (v2)
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v2)
Reviewed-by: Chad Versace <chad.versace@intel.com> (v2)
This in principle simple calculation was being open-coded in a number
of places (in a series I haven't yet sent for review there will be a
couple more), all of them were subtly broken in one way or another:
None of them were handling the HW_REG case correctly as pointed out by
Connor, and fs_inst::regs_read() was handling the stride=0 case rather
naively. This patch solves both problems and factors out the
calculation as a new fs_reg method.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This gets rid of two no16() fall-backs and should allow better
scheduling of the generated IR. There are no uses of usubBorrow() or
uaddCarry() in shader-db so no changes are expected. However the
"arb_gpu_shader5/execution/built-in-functions/fs-usubBorrow" and
"arb_gpu_shader5/execution/built-in-functions/fs-uaddCarry" piglit
tests go from 40 to 28 instructions. The reason is that the plain ADD
instruction can easily be CSE'ed with the original addition, and the
b2i negation can easily be propagated into the source modifier of
another instruction, so effectively both operations are performed with
just one instruction.
v2: Rely on carry_to_arith() and borrow_to_arith() to lower these
(Ilia Mirkin).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Booleans are represented as 0/-1 on modern hardware which means we can
just negate them to convert them into a numeric type. Negation has
the benefit that it can be implemented using a source modifier which
can easily be propagated into some other instruction. shader-db
results on HSW:
total instructions in shared programs: 6349082 -> 6346693 (-0.04%)
instructions in affected programs: 40948 -> 38559 (-5.83%)
helped: 123
HURT: 1
GAINED: 1
LOST: 0
Reviewed-by: Matt Turner <mattst88@gmail.com>
PIPE_CAPs and TGSI support will be added later. The TGSI support should be
straightforward. We only need to split TGSI_FILE_RESOURCE into TGSI_FILE_IMAGE
and TGSI_FILE_BUFFER, though duplicating all opcodes shouldn't be necessary.
The idea is:
* ARB_shader_image_load_store should use set_shader_images.
* ARB_shader_storage_buffer_object should use set_shader_buffers(slots 0..M-1)
if M shader storage buffers are supported.
* ARB_shader_atomic_counters should use set_shader_buffers(slots M..N)
if N-M+1 atomic counter buffers are supported.
PIPE_CAPs can describe various constraints for early DX11 hardware.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Instead of relying on hardware defaults the i915 kernel driver is
going program custom MOCS tables system-wide on Gen9 hardware. The
"WT" entry previously used for renderbuffers had a number of problems:
It disabled caching on eLLC, it used a reserved L3 cacheability
setting, and it used to override the PTE controls making renderbuffers
always WT on LLC regardless of the kernel's setting. Instead use an
entry from the new MOCS tables with parameters: TC=LLC/eLLC, LeCC=PTE,
L3CC=WB.
The "WB" entry previously used for anything other than renderbuffers
has moved to a different index in the new MOCS tables but it should
have the same caching semantics as the old entry.
Even though the corresponding kernel change ("drm/i915: Added
Programming of the MOCS") is in a way an ABI break it doesn't seem
necessary to check that the kernel is recent enough because the change
should only affect Gen9 which is still unreleased hardware.
v2: Update MOCS values for the new Android-incompatible tables
introduced in v7 of the kernel patch.
Cc: 10.6 <mesa-stable@lists.freedesktop.org>
Reference: http://lists.freedesktop.org/archives/intel-gfx/2015-July/071080.html
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
s/build_error/compile_error in order to match the stored OpenCL status code.
Make program::build catch and log every OpenCL error.
Make tgsi error triggering uniform with the llvm one.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This is done by returning an rvalue of type void in the
ast_function_expression::hir function instead of a void expression.
This produces (in the case of the ternary) an hir with a call
to the void returning function and an assignment of a void variable
which will be optimized out (the assignment) during the optimization
pass.
This fix results in having a valid subexpression in the many
different cases where the subexpressions are functions whose
return values are void.
Thus preventing to dereference NULL in the following cases:
* binary operator
* unary operators
* ternary operator
* comparison operators (except equal and nequal operator)
Equal and nequal had to be handled as a special case because
instead of segfaulting on a forbidden syntax it was now accepting
expressions with a void return value on either (or both) side of
the expression.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85252
Signed-off-by: Renaud Gaubert <renaud@lse.epita.fr>
Reviewed-by: Gabriel Laskar <gabriel@lse.epita.fr>
Reviewed-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
The formats chosen (both by texture format choser, fbo storage allocation)
are different for big endian not just for rgba8 but also lower bit width
formats (why I don't actually know). Even the function to test for renderable
formats used different formats, however the actual colorbuffer setup did not.
And the blitter did not take that into account neither.
Untested (what could possibly go wrong...).
Same as for r100.
Acked-by: Marek Olšák <marek.olsak@amd.com>
The formats chosen (both by texture format choser, fbo storage allocation)
are different for big endian not just for rgba8 but also lower bit width
formats (why I don't actually know). Even the function to test for renderable
formats used different formats, however the actual colorbuffer setup did not.
And the blitter did not take that into account neither.
Untested (what could possibly go wrong...).
Acked-by: Marek Olšák <marek.olsak@amd.com>
Blit submits lots of packets which are usually handled by state atoms, so
these must be dirtied.
Not sure if this fixes anything, but it was a concern raised by bug 51658
(with this all issues there seen as actual bugs should be fixed, with the
exception of the patch to upload non-used texenv state atoms which I just
don't understand).
Acked-by: Marek Olšák <marek.olsak@amd.com>
It is rather unfortunate that we don't know if a texture is going to be used
as a rt later, and we lack the means to do something about a format chosen
which we can't render to directly, so disable this and always chose renderable
format for rgba8 textures.
This addresses an issue raised on (old) bug,
https://bugs.freedesktop.org/show_bug.cgi?id=51658 with gnome-shell, don't
know if that's still applicable but it might fix other things as well.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Along with fixing the type of pitch parameter, patch also changes
the types of few local variables and function return type.
Warnings fixed are:
intel_mipmap_tree.c:671:7: warning: passing argument 3 of
'intel_get_yf_ys_bo_size' from incompatible pointer type
intel_mipmap_tree.c:563:1: note: expected 'uint64_t *' but
argument is of type 'long unsigned int *'
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Previously OUT_BATCH was just a macro around an inline function which
does
brw->batch.map[brw->batch.used++] = dword;
When making consecutive calls to intel_batchbuffer_emit_dword() the
compiler isn't able to recognize that we're writing consecutive memory
locations or that it doesn't need to write batch.used back to memory
each time.
We can avoid both of these problems by making a local pointer to the
next location in the batch in BEGIN_BATCH().
Cuts 18k from the .text size.
text data bss dec hex filename
4946956 195152 26192 5168300 4edcac i965_dri.so before
4928956 195152 26192 5150300 4e965c i965_dri.so after
This series (including commit c0433948) improves performance of Synmark
OglBatch7 by 8.01389% +/- 0.63922% (n=83) on Ivybridge.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
So that everything writing to the batch between BEGIN_BATCH() and
ADVANCE_BATCH() goes through OUT_BATCH.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
BEGIN_BATCH() and ADVANCE_BATCH() will contain "do {" and "} while (0)"
respectively to allow declaring local variables used by intervening
OUT_BATCH macros. As such, BEGIN_BATCH() and ADVANCE_BATCH() need to be
in the same control flow.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
This field should always be set for gen8. In the bdw PRM, Volume 2d:
Command Reference: Structures under INTERFACE_DESCRIPTOR_DATA, DWORD
6, Bits 9:0, Number of Threads in GPGPU Thread Group:
"This field should not be set to 0 even if the barrier is disabled,
since an accurate value is needed for proper pre-emption."
In the HSW PRM, the it doesn't mention that it must always be set, but
it should not hurt.
Reported-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
I needed to rewrite this a bit for safety checking in the next commit.
Despite being a static inline of the same thing that was being done, we
lose 36 bytes of code for some reason.
Now that RCL generation is in the kernel, we don't have any other
callers. Oddly, the compiler generates another 8 bytes of code for
this, but the simplification is worth it.
Now that we don't resize the CL as we build (it's set up at the top by
vc4_start_draw()), we can store the pointers instead of offsets from
the base. Saves a bit of math in emitting relocs (about 60 bytes of
code).
Extend the existing lower_ubo_reference pass to also detect SSBO loads
and lower them to __intrinsic_load_ssbo intrinsics.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Extend the existing lower_ubo_reference pass to also detect SSBO writes
and lower them to __intrinsic_store_ssbo intrinsics.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Since the backing storage for these is shared we cannot ensure that
the value won't change by writes from other threads. Normally SSBO
accesses are not guaranteed to be syncronized with other threads,
except when memoryBarrier is used. So, we might be able to optimize
some SSBO accesses, but for now we always take the safe path and emit
the SSBO access.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Since the backing storage for these is shared we cannot ensure that
the value won't change by writes from other threads. Normally SSBO
accesses are not guaranteed to be syncronized with other threads,
except when memoryBarrier is used. So, we might be able to optimize
some SSBO accesses, but for now we always take the safe path and emit
the SSBO access.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Since the backing storage for these is shared we cannot ensure that
the value won't change by writes from other threads. Normally SSBO
accesses are not guaranteed to be syncronized with other threads,
except when memoryBarrier is used. So, we might be able to optimize
some SSBO accesses, but for now we always take the safe path and emit
the SSBO access.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
If we kill dead assignments we lose the buffer writes.
Also, we never kill UBO declarations even if they are never referenced
by the shader, they are always considered active. Although the spec
does not seem say this specifically for SSBOs, it is probably implied
since SSBOs are pretty much the same as UBOs, only that you can write
to them.
v2:
- Fix the comment (Jordan)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Due to GL_ARB_shader_storage_buffer_object extension, shader storage blocks
have the same limitations as uniform blocks.
This patch fixes the corresponding error messages.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Section 4.3.7 "Buffer Variables", GLSL 4.30 spec:
"Buffer variables may only be declared inside interface blocks
(section 4.3.9 “Interface Blocks”), which are then referred to as
shader storage blocks. It is a compile-time error to declare buffer
variables at global scope (outside a block)."
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
See GLSL 4.30 spec, section 4.4.5 "Uniform and Shader Storage Block
Layout Qualifiers".
v2:
- Add whitespace in an error message. Delete period '.' at the end of that
error message (Jordan).
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This includes the array of bindings, the current buffer bound to the
GL_SHADER_STORAGE_BUFFER target and a set of general limits and default
values for shader storage buffers.
v2:
- Use spec values for the new defined constants (Jordan)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This is used to identify shader storage buffer interface blocks where
buffer variables are declared.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This will be used to identify buffer variables inside shader storage
buffer objects, which are very similar to uniforms except for a few
differences, most important of which is that they are writable.
Since buffer variables are so similar to uniforms, we will almost always
want them to go through the same paths as uniforms.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The util/hash_table was intended to be a fast hash table
replacement for the program/hash_table see 35fd61bd99 and 72e55bb688.
This change replaces some more uses of the old hash table.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Most of the data stored(duplicated) was unused, and for the one that is
follow the approach set by other drivers.
This eliminates the use of legacy (dri1) types.
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It was only useful for st/egl, although I've never got to merging the
pipe-loader and inline-helpers before it was removed. There are no users
for it ATM.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Do not iterate and (attempt to) open the render device, if we're over
the requested number of devices.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Render nodes have been around for quite some time. Removing support via
the master/primary node allows us to clean up the conditional
compilation and simplify the build greatly.
For example currently we the pipe-loader, which explicitly links against
xcb and friends (for X auth) if found at compile-time. That
would cause problems as one will be forced to use X/xcb, even if it's a
headless system that is used for opencl.
v2: Clarify the linking topic in the commit message.
Cc: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This adds the translation from TGSI to AMDGPU llvm backend, for the
64-bit opcodes. The backend pretty much handles everything for us
fine. There is one patch required for SI DFRAC support, that I know
off.
[airlied: fixed missing comma, updated relnotes]
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
temp_reg needs to be last, as we increment things
away from it, otherwise on cayman some tests were overwriting
the index regs.
Fixes 2 piglit with ARB_gpu_shader5 forced on cayman.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cayman needs a different method to upload the CF IDX0/1
This fixes 31 piglits when ARB_gpu_shader5 is forced on
with cayman.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
When binding a layered texture, the layer is already 0. There's no need
to special case this.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This ports over Chris Forbes' equivalent fixes in gen7_misc_state.c
from commit 77d55ef481.
No Piglit changes on Sandybridge.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Color clears can be performed via two separate shaders - one is the
generic "meta clear" shader (in meta.c); the other is the i965 specific
"repclear" shader (in brw_meta_fast_clear.c).
Giving them separate names makes them distinguishable when reading
INTEL_DEBUG=shader_time output.
v2: Call it "meta repclear", as suggested by Jason.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Paul's original code had emit_control_data_bits() skip the URB write if
vertex_count was 0. This meant wrapping every control data write in a
conditional write.
We accumulate control data bits in a single UD (32-bit) register. For
simple shaders that don't emit many vertices, the control data header
will be <= 32-bits long, so we only need to write it once at the end of
the shader.
For shaders with larger headers, we write out batches of control data
bits at EmitVertex(), when (vertex_count * bits_per_vertex) % 32 == 0.
On the first EmitVertex() call, the above expression will evaluate to
true simply because vertex_count == 0. But we want to avoid emitting
the control data bits, because we haven't accumulated 32-bits worth yet.
In other words, the vertex_count != 0 check is really only necessary in
the EmitVertex() batching case, not the end-of-thread case.
This saves a CMP/IF/ENDIF in every shader that uses EndPrimitive() or
multiple streams. The only downside is that a shader which emits no
vertices at all will execute an additional URB write---but such shaders
are pointless and not worth optimizing.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When the new hash table implementation was added to Mesa it claimed to be much
faster, see commits 35fd61bd99 and 72e55bb688.
The set implementation follows the same implementation strategy so this should
be faster and there was no need to store a data field.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Don't assume that $(top_srcdir)/.git is a directory. It may be a
gitlink file [1] if $(top_srcdir) is a submodule checkout or a linked
worktree [2].
[1] A "gitlink" is a text file that specifies the real location of
the gitdir.
[2] Linked worktrees are a new feature in Git 2.5.
Cc: "10.6, 10.5" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
If we split addr/pred, the original instruction could have originated
from a different block. If we don't fixup the block ptr we hit asserts
later (in debug builds).
NOTE: perhaps we don't want to try to preserve addr/pred reg's across
block boundaries.. this at least needs some thought in case addr/pred
writes end up inside a conditional block..
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The address and predicate register are special, they don't get assigned
in RA. So do a better job of ignoring them rather than hitting later
asserts.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Some, but not all, state trackers will explicitly unref (and set to
NULL) the previous *fence before calling pipe->flush(). So driver
should use fence_ref() which will unref the old fence if not NULL.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Acked-by: Eric Anholt <eric@anholt.net>
Some, but not all, state trackers will explicitly unref (and set to
NULL) the previous *fence before calling pipe->flush(). So driver
should use fence_ref() which will unref the old fence if not NULL.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Acked-by: Chia-I Wu <olvaffe@gmail.com>
Some, but not all, state trackers will explicitly unref (and set to
NULL) the previous *fence before calling pipe->flush(). So driver
should use fence_ref() which will unref the old fence if not NULL.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
XA was never unref'ing last_fence in the various call paths to
pipe->flush(). Add this to xa_context_flush() and update the other
open-coded calls to pipe->flush() to use xa_context_flush() instead.
This fixes a memory leak reported with xf86-video-freedreno.
Reported-by: Nicolas Dechesne <nicolas.dechesne@linaro.org>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
After tearing it out another level or two, and just passing the key and
vp directly, we can finally remove this struct. It also eliminates a
pointless memcpy() of the key.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
At this point, the brw_vs_compile structure only contains the key and
gl_vertex_program pointer. We may as well pass and store them directly;
it's simpler and more convenient (key-> instead of vs_compile->key...).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Nothing outside of vec4_visitor uses it, so we may as well keep it
internal.
Commit db9c915abc for the vec4 backend.
(The empty class will be going away soon.)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This is more consistent with how we do it in the FS backend, and reduces
a tiny bit of duplication. It'll also allow for a bit more tidying.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This patch makes us only issue the performance warning about register
spilling if we actually spilled registers. We also use scratch space
for indirect addressing and the like.
This is basically commit c51163b0cf for
the vec4 backend.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason plumbed this through a while back in the FS backend, but
apparently we were just passing NULL in the vec4 backend.
This patch passes brw in as intended.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Adding new shader stages to a switch statement is less confusing than an
if-else-if ladder where all but the first case are fragment shader
specific (but don't claim to be).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
It's only used inside #ifdef DEBUG. Cuts ~1.7k of .text, and more
importantly prevents a larger code size regression in the next commit
when the .used field is replaced and calculated on demand.
text data bss dec hex filename
4945468 195152 26192 5166812 4ed6dc i965_dri.so before
4943740 195152 26192 5165084 4ed01c i965_dri.so after
And surround the emit and total fields with #ifdef DEBUG to prevent
such mistakes from happening again.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This patch can cause an infinite recursion if the previous patch titled, "i965:
Track finished batch state" isn't present (backporters take notice).
v2: Sent out the wrong patch originally. This patches switches the order of
flushes, doing the generic flush before the CC_STATE, and the required
workaround flush afterwards
v3: Only perform workaround for render ring
Add text to the BATCH_RESERVE comments
v4 (By Ken): Rebase; update citation to mention PRM and Wa name; combine two
blocks.
http://otc-mesa-ci.jf.intel.com/job/bwidawsk/171/
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We need to check what the 3D pipe is able to handle for the mixer, not what
the decoder is able to decode. This fixes output of resolutions like 720x1280.
Signed-off-by: Christian König <christian.koenig@amd.com>
CC: mesa-stable@lists.freedesktop.org
Before validating vertex arrays we need to check if a VBO is present.
Checking if vb->buffer is not NULL fixes the issue.
Fixes the following piglit test:
gl-3.1-vao-broken-attrib
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
According to nv50, this should be src->ms_y instead of src->ms_x. This
code is here since 2012, so it's probably a typo error which has never
been detected since a long time. I didn't do a full piglit run to check
if it fixes some other weird issues.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
I also created an bug in Khronos 's bugzilla as you suggested:
https://www.khronos.org/bugzilla/show_bug.cgi?id=1356
I'll let you know if I get feedback from this bug or else where.
Patch with updated error messages:
[PATCH] eglplatform: treat __APPLE__ the same way as __unix__ to handle X11 types
CC eglapi.lo
./egldisplay.h:258:19: error: unknown type name 'Display'
_eglGetX11Display(Display *native_display, const EGLint *attrib_list);
eglapi.c:290:4: error: array size is negative
STATIC_ASSERT(sizeof(void*) == sizeof(nativeDisplay));
eglapi.c:291:25: warning: cast to 'void *' from smaller integer type
'EGLNativeDisplayType' (aka 'int') [-Wint-to-void-pointer-cast]
native_display_ptr = (void*) nativeDisplay;
eglapi.c:307:32: error: use of undeclared identifier 'Display'
dpy = _eglGetX11Display((Display*) native_display, attrib_list);
eglapi.c:776:35: error: use of undeclared identifier 'Window'
native_window = (void*) (* (Window*) native_window);
eglapi.c:847:35: error: use of undeclared identifier 'Pixmap'
native_pixmap = (void*) (* (Pixmap*) native_pixmap);
Bugzilla Mesa: https://bugs.freedesktop.org/show_bug.cgi?id=90249
Bugzilla Khronos: https://www.khronos.org/bugzilla/show_bug.cgi?id=1356
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
This patch and its description are inspired from Jose Fonseca
explanations and suggestions.
With this patch the following logic applies and only if __APPLE__:
When building mesa, GLhandleARB is defined as unsigned long and
at some point casted to GLuint in gl fuction implementations.
These exact points are where these errors and warnings appear.
When building an application GLhandleARB is defined as void*.
Later when calling a gl function, for example glBindAttribLocationARB,
it will be dispatched to _mesa_BindAttribLocation. So internally
void* will be treated as unsigned long which has the same size.
So the same truncation happens when casting it to GLuint.
Same when GLhandleARB appears as return value.
For mesa it will be GLuint -> unsigned long.
For an application it will be GLuint -> unsigned long -> void*.
Note that the value will be preserved when casting back to GLuint.
When GLhandleARB appears as a pointer there are also separate
entry-points, i.e. _mesa_FuncNameARB. So the same logic can
be applied.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66346
Signed-off-by: Julien Isorce <julien.isorce@gmail.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
With the exception of gen8, the sole user of the workaround bo are for
emitting pipe controls. Move it out of the purview of the batchbuffer
and into the pipecontrol.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Since there was an ABI break and linking twice against libudev.so.0 and
libudev.so.1 causes the application to quickly crash, we first check if
the application is currently linked against libudev before dlopening a
local handle. However for backwards/forwards compatability, we need to
inspect the application for current linkage against all known versions
first. Not doing so causes a crash when both libraries are present and
so mesa chooses libudev.so.1 but the application was linked against
libudev.so.0.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Emil Velikov:
I'm ever so slightly conserned that RTLD_NOLOAD is not part of the POSIX
standard, thus it's missing on some platforms (*BSD seems ok, while
Solaris, MacOS are not).
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Move the query for the TIMESTAMP register from context init to the
screen, so that it is only queried once for all contexts.
On 32bit systems, some old kernels trigger a hw bug resulting in the
TIMESTAMP register being shifted and the low 32bits always zero. Detect
this by repeating the read a few times and check the register is
incrementing every 80ns as expected and not stuck on zero (as would be
the case with the buggy kernel/hw.).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matrix vertex attributes have their columns padded out to vec4s, which
I was failing to account for. Scalar NIR expects them to be packed,
however.
Fixes 1256 dEQP tests on Broadwell.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This allows drivers to report queries in units of microseconds and
have the HUD display "us" (microseconds), "ms" (milliseconds) or "s"
(seconds) on the graph.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Instead of using a boolean 'is bytes' value, use the pipe_driver_query_type
enum type. This will let is add support for time values in the next patch.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This was probably disabled due to a combination of several bugs in the
generator code (fixed earlier in this series) and a misunderstanding
of the hardware spec. The documentation for most control flow
instructions mentions among other restrictions:
"Instruction compression is not allowed."
This however doesn't have any implications on 16 wide not being
supported, because none of the control flow instructions have
multi-register operands (control flow instructions are not compressed
on more recent hardware either, except maybe SNB's IF with inline
compare). In fact Gen4-5 had 16-wide control flow masks and stacks,
and the spec mentions in several places that control flow instructions
push and pop 16 channels worth of data -- Otherwise there doesn't seem
to be any indication that it shouldn't work.
Causes no piglit regressions, and gives the following shader-db
results on ILK:
total instructions in shared programs: 4711384 -> 4711384 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
GAINED: 1215
LOST: 0
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
From the hardware docs for the DO instruction:
"Execution size is ignored for this instruction."
My observation on ILK hardware contradicts the spec though, channels
over the execution size of a DO instruction won't enter the loop, and
channels over the execution size of a WHILE instruction will exit the
loop after the first iteration -- The latter is consistent with the
spec though, there's no claim about the execution size being ignored
for the WHILE instruction so it's not completely unexpected that it
has an influence on the evaluation of EMask.
The execute_size argument of brw_DO() shouldn't have any effect on
Gen6 and newer hardware. On Gen4-5 WHILE instructions inherit the
execution size from the matching DO, so this patch should fix them
too. The execution size of BREAK and CONT instructions was already
being set correctly.
Fixes some 50 piglit tests on Gen4-5 when forced to run shaders with
conditional and loop instructions 16-wide,
e.g. shaders/glsl-fs-continue-inside-do-while.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The hardware docs don't mention explicitly what these fields should
be, but I've verified experimentally on ILK that using a GRF as
destination causes the register to be corrupted when the execution
size of an ENDIF instruction is higher than 8 -- and because the
destination we were using was g0, eventually a hang.
Fixes some 150 piglit tests on Gen4-5 when forced to run shaders with
if conditionals 16-wide, e.g. shaders/glsl-fs-sampler-numbering-3.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This eliminates the error prone logic in si_shader_vs recalculating this
value.
It also fixes TGSI_SEMANTIC_CLIPDIST outputs incorrectly not being
counted for VS exports. They need to be counted because they are passed
to the pixel shader as parameters as well.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91193
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The expansion should always be to the same width as the input arguments
no matter what, since these functions should work with any bit width of
the arguments (the sext is a no-op on any sane simd architecture).
Thus, fix the caller expecting differently.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=91222
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
In the kernel, this is called __must_check; all our attribute macros in
Mesa appear to be uppercase, so I went with that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
In this bit of code point_five can be NULL if the expression is not a
constant. This fixes it to match the pattern of the rest of the chunk
of code so that it checks for NULLs.
Cc: Matt Turner <mattst88@gmail.com>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There is a piece of code that is trying to match expressions of the
form (mul (floor (add (abs x) 0.5) (sign x))). However the check for
the add expression wasn't checking whether it had the expected
operation. It looks like this was just an oversight because it doesn't
match the pattern for the rest of the code snippet. The existing line
to check whether add_expr!=NULL was added as part of a coverity fix in
3384179f.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91226
Cc: Matt Turner <mattst88@gmail.com>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ben noticed that I said each PIPE_CONTROL was 4 DWords, but it's
actually 5 DWords on Gen6-7. We've been reserving insufficient space
for performance monitoring on Sandybridge, which means it would likely
break if you used that functionality. (Thankfully, no one does...)
Also, the existing number of 146 was the result of me flubbing up the
arithmetic: it should have actually been 140.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
On Gen9+ there is a new bit in 3DSTATE_PS_EXTRA that must be set if
the shader sends a message to the pixel interpolator. This fixes the
interpolateAt* tests on SKL, apart from interpolateatsample-nonconst
but that is not implemented anywhere so it's not a regression.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.6 10.5" <mesa-stable@lists.freedesktop.org>
Absolute timeouts are used with the amdgpu kernel driver.
It also makes waiting for several variables and fences at the same time
easier (the timeout doesn't have to be recalculated after every wait call).
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This will be used by radeon and amdgpu winsyses.
Copied from the amdgpu winsys.
v2: use volatile and p_atomic_read
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Assigns a new array type based on the max access of
unsized array members. This is to support arrays of arrays.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The first argument to UCMP needs to be compared against 0, but the
latter arguments are treated as float and need to be able to properly
apply neg/abs arguments. Adjust the inferSrcType function accordingly.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
After c61bc6e ("util: port _mesa_strto[df] to C"), "make check"
fails due to a missing _mesa_locale_init. Fixup this oversight,
by moving the stand-alone compiler initializer inside
initialize_context_to_defaults().
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Same problem and fix as for nouveau's ZaphodHeads trouble.
See patch ...
"nouveau: Use dup fd as key in drm-winsys hash table to fix ZaphodHeads."
... for reference.
Cc: "10.3 10.4 10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If an instruction using address register value gets eliminated, we need
to remove it from the indirects list, otherwise it causes mayhem in
sched for scheduling address register usage.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
A handful of fixes and cleanups:
1) If we split addr/pred, we need the newly created instruction to
end up in the unscheduled_list
2) Avoid scheduling a write to the address register if there is no
instruction using the address register that is otherwise ready
to schedule. Note that I currently don't bother with the same
logic for predicate register, since the only instructions using
predicate (br/kill) don't take any other src registers, so this
situation should not arise.
3) few other cosmetic cleanups
Signed-off-by: Rob Clark <robclark@freedesktop.org>
cp would update instr->address but not update the indirects array
resulting in sched getting confused when it had to 'spill' the address
register. Add an ir3_instr_set_address() helper to set instr->address
and also update ir->indirects, and update all places that were writing
instr->address to use helper instead.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We need to distinguish a shader that has separate writes to each MRT
from one which is supposed to write the data from MRT 0 to all the MRTs.
In TGSI this is done with a property. NIR doesn't have that, so encode
it as a funny location and decode on the other end.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eric Anholt <eric@anholt.net>
There was a comment saying that in SIMD16 mode the pixel interpolator
returns coords interleaved 8 channels at a time and that this requires
extra work to support. However, this interleaved format is exactly
what the PLN instruction requires so I don't think anything needs to
be done to support it apart from removing the line to disable it and
to ensure that the message lengths for the send message are correct.
I am more convinced that this is correct because as it says in the
comment this interleaved output is identical to what is given in the
thread payload. The code generated to apply the plane equation to
these coordinates is identical on SIMD16 and SIMD8 except that the
dispatch width is larger which implies no special unmangling is
needed.
Perhaps the confusion stems from the fact that the description of the
PLN instruction in the IVB PRM seems to imply that the src1 inputs are
not interleaved so it wouldn't work. However, in the HSW and BDW PRMs,
the pseudo-code is different and looks like it expects the interleaved
format. Mesa doesn't seem to generate different code on IVB to
uninterleave the payload registers and everything is working so I can
only assume that the PRM is wrong.
I tested the interpolateAt tests on HSW and did a full Piglit run on
IVB on there were no regressions.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
When there are no color buffer render targets, gen6 and gen7 still
use the first BLEND_STATE element to determine alpha test.
gen6_upload_blend_state was allocating zero elements when
ctx->Color.AlphaEnabled was false.
That left _3DSTATE_CC_STATE_POINTERS or _3DSTATE_BLEND_STATE_POINTERS
pointing to random data from some previous brw_state_batch().
That sometimes suppressed depth rendering when those bits
happened to mean COMPAREFUNC_NEVER.
This produced flickering shadows for dota2 reborn.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80500
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These checks were in Mesa prior to commit fbba25bba, but they were
not necessary for the purpose that Mesa intended (check if we could
resolve ReadPixels via memcpy), so that commit took them away.
Unfortunately, it seems that some Gallium drivers rely on these
checks to make the decision of whether they should fallback to Mesa's
implementation of ReadPixels correctly. Michel Dänzer reported that
the following piglit test would fail on radeonsi after commit
fbba25bba:
spec@ext_texture_integer@fbo_integer_readpixels_sint_uint
This patch puts the checks back in Gallium, where they are needed.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
In the immediate form, src2 == dst, so it does not need to be emitted.
Otherwise it overlaps with the immediate value's low bits.
Fixes: 09ee907266 (nv50/ir: Fold IMM into MAD)
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Prefer blit-based texture transfers only if the chip has dedicated VRAM
since it would translate to a copy into the same memory on shared-memory
chips.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
This is required on non-coherent architectures to ensure the value of
the fence is correct at all times. Failure to do this results in the
display freezing for a few seconds every now and then on Tegra.
The NOUVEAU_BO_COHERENT is a no-op for coherent architectures, so behavior
on x86 should not be affected by this patch.
Also bump the required libdrm version to 2.4.62, which introduced this
flag.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
Although the horizontal and vertical alignment fields are ignored here,
0 is a reserved value for them and may cause undefined behavior. Change
the default value to an abitrary valid one.
v2: add comment about chosen value (Topi).
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Now that we can create builders with a bigger width than their parent as
long as it's exec_all, we don't need to create the instruction manually.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This assertion was meant to catch code inadvertently escaping the
control flow jail determined by the group of channel enable signals
selected by some caller, however it seems useful to be able to
increase the default execution size as long as force_writemask_all is
enabled, because force_writemask_all is an explicit indication that
there is no longer a one-to-one correspondence between channels and
SIMD components so the restriction doesn't apply.
In addition reorder the calls to fs_builder::group and ::exec_all in a
couple of places to make sure that we don't temporarily break this
invariant in the future for instructions with exec_size higher than
the dispatch width.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We want to require different versions for nouveau and nouveau_vieux.
autoconf will only check for NOUVEAU once if both drivers are enabled,
meaning both version checks don't get executed. Rename the nouveau_vieux
one to NVVIEUX to avoid the issue.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Alexandre Courbot <acourbot@nvidia.com>
Tested-by: Martin Peres <martin.peres@free.fr>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Instead of using symbol table, build mask by inspecting IR. This
change is required by further patches to move resource list creation
to happen later when symbol table does not exist anymore.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
The current implementation only moves the joinAt when splitting after
the given instruction, not before it. So if you have a BB with
foo
instr
bar
joinat
and thus with joinAt set, we end up first splitting before instr, at
which point the instr's bb is updated to the new bb. Since that bb
doesn't have a joinAt set (despite containing one), when splitting after
the instr, there is nothing to copy over. Since the joinat will be in
the "split" bb irrespective of whether we're splitting before or after
the instruction, move it over in either case.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91124
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
This adds support for ARB_gpu_shader_fp64 and ARB_vertex_attrib_64bit to
llvmpipe.
Two things that don't mix well are SoA and doubles, see
emit_fetch_double, and emit_store_double_chan in this.
I've also had to split emit_data.chan, to add src_chan,
which can be different for doubles.
It handles indirect double fetches from temps, inputs, constants
and immediates. It doesn't handle double stores to indirects,
however it appears the mesa/st doesn't currently emit these,
it always does UARL/MOV combos, which will work fine.
tested with piglit, no regressions, all the fp64 tests seem to pass.
v2:
switch to using shuffles for fetch/store (Roland)
assert on indirect double stores - mesa/st never emits these (it uses MOV)
fix indirect temp/input/constant/immediates (Roland)
typos/formatting fixes (Roland)
v2.1:
cleanup some long lines, emit_store_double_chan cleanups.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
As of now, the width field is no longer used for anything. The width field
"seemed like a good idea at the time" but is actually entirely redundant
with the instruction's execution size. Initially, it gave us the ability
to easily set the instructions execution size based entirely on register
widths. With the builder, we can easiliy set the sizes explicitly and the
width field doesn't have as much purpose. At this point, it's just
redundant information that can get out of sync so it really needs to go.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
There are a variety of places where we use dst.width / 8 to compute the
size of a single logical channel. Instead, we should be using exec_size.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Now that all of the non-explicit constructors are gone, we don't need to
guess anymore.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Previously, we were just depending on register widths to ensure that
various things were exec_size of 1 etc. Now, we do so explicitly using the
builder.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Shortly, offset() will depend on the builder so we need it moved to some
place where it has access to that.
Reviewed-by: Iago Toral Quiroga <itoral@igali.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Soon we will start using the builder to explicitly set all the execution
sizes. We could make a 32-wide builder, but the builder asserts that we
never grow it which is usually a reasonable assumption. Since this one
instruction is a bit of an odd-ball, we just set the exec_size explicitly.
v2: Explicitly new the fs_inst instead of using the builder and setting
exec_size after the fact.
v3: Set force_writemask_all with the builder instead of directly. The
builder over-writes it if we set it manually. Also, if we don't have
force_writemask_all in the builder it will assert-fail on SIMD32.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Previously, fs_inst::regs_read() fell back to depending on the register
width for the second source. This isn't really correct since it isn't a
SIMD8 value at all, but a SIMD4x2 value. This commit changes it to
explicitly be always one register.
v2: Use mlen for determining the number of registers read
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
Previously, we were allocating the payload with different sizes per gen and
then figuring out the mlen in the generator based on gen. This meant,
among other things, that the higher level passes knew nothing about it.
Acked-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This makes things a little simpler, more efficient, and quite a bit more
readable.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Before, we would lazily emit a MOV whenever we encountered a use of a
constant. Now that we have a dedicated file for SSA values, we can
instead only emit the MOV's once, which is more consistent and prevents
us from relying on CSE to re-combine the constants when they aren't
absorbed into the instruction.
total instructions in shared programs: 6078991 -> 6073118 (-0.10%)
instructions in affected programs: 402221 -> 396348 (-1.46%)
helped: 1527
HURT: 0
GAINED: 8
LOST: 2
v2: split this out from the previous commit (Jason)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Before, we would use registers, but set a magical "parent_instr" field
to indicate that it was actually purely an SSA value (i.e., it wasn't
involved in any phi nodes). Instead, just use SSA values directly, which
lets us get rid of the hack and reduces memory usage since we're not
allocating a nir_register for every value. It also makes our handling of
load_const more consistent compared to the other instructions.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
We already don't convert constants out of SSA, and in our backend we'd
like to have only one way of saying something is still in SSA.
The one tricky part about this is that we may now leave some undef
instructions around if they aren't part of a phi-web, so we have to be
more careful about deleting them.
v2: rename and flip meaning of flag (Jason)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
0 is not used as a valid drawable id, as such there is no point in
attempting to query its geometry. Just bail out early and provide the
more meaningful EGL_BAD_NATIVE_WINDOW to the user.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Raise EGL_BAD_NATIVE_WINDOW instead of crashing.
v2: s/Rise/Raise/ (spotted by Michel)
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Free the memory for dri2_surf in the unlikely case that one provides
NULL for native_window. Also set the relevant EGL_ERROR to provide
feedback to the user.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously we were unconditionally doing ttn_get_src() even for
instructions with no src's. Which created a lot of unnecessary
load_const instructions. These were mostly harmless since NIR opt
passes would strip them back out. But for an ENDIF following a
BRK, it would result in load_const instructions created after the
NIR break instruction. Which nir_validate dislikes.
But we can actually just dtrt by using NumSrcRegs instead.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
It isn't quite yet practical to enable TGSI_ANY_INOUT_DECL_RANGE shader
cap yet, at least not in drivers that need lower_to_scalar pass (which
right now is all of the ttn users), since the register arrays do not get
converted to SSA, which angers nir_lower_alu_to_scalar.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
It is silly to traverse back to find first instruction that writes part
of a larger "virtual" register many times per instruction (plus per use
as a src to later instructions). Cache this information so we only
figure it out once.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
The fanin source could be grouped, for example with shaders like:
VERT
DCL IN[0]
DCL IN[1]
DCL OUT[0], POSITION
DCL OUT[1], GENERIC[9]
DCL SAMP[0]
DCL SVIEW[0], 2D, FLOAT
DCL TEMP[0], LOCAL
0: MOV TEMP[0].xy, IN[1].xyyy
1: MOV TEMP[0].w, IN[1].wwww
2: TXF TEMP[0], TEMP[0], SAMP[0], 2D
3: MOV OUT[1], TEMP[0]
4: MOV OUT[0], IN[0]
5: END
The second arg to the isaml is IN[1].w, so we need to look at the fanin
source to get the correct offset.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Split out most of dump_info() from ir3_cmdline compiler into a function
that can be used both by cmdline compiler and also for the disasm debug
option. This way, for FD_MESA_DEBUG=disasm we also get to see intput/
output registers, etc.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Some piglit tests, like arb_fragment_program-sparse-samplers, result in
having a null samp#0 but valid samp#1.
TODO: a3xx probably needs similar fix
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We can't rely on what we get from the assembler if we have indirect
addressing of constant file, since the assembler doesn't know the array
index. This got lost in the transition to NIR.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Desktop GLSL < 130 and GLSL ES < 300 allow sampler array indexing where
index can contain a loop induction variable. This extra check will warn
during linking if some of the indexes could not be turned in to constant
expressions.
v2: warning instead of error for backends that did not enable
EmitNoIndirectSampler option (have dynamic indexing)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "10.5" and "10.6" <mesa-stable@lists.freedesktop.org>
Patch provides new compiler option for backend to force unroll loops
that have non-constant expression indexing on sampler arrays.
This makes sure that we can never end up with a shader that uses loop
induction variable as sampler array index but does not unroll because
of having too much instructions. This would not work without dynamic
indexing support.
v2: change option name as EmitNoIndirectSampler
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "10.5" and "10.6" <mesa-stable@lists.freedesktop.org>
Dynamic indexing of sampler arrays is prohibited by GLSL ES 3.00.
Earlier versions allow 'constant-index-expression' indexing, where
index can contain a loop induction variable.
Patch allows dynamic indexing for sampler arrays when GLSL ES < 3.00.
This change makes 'sampler-array-index.frag' parser test in Piglit
pass + fishgl.com works when running Chrome on OpenGL ES 2.0 backend
v2: small change and some more commit message (Tapani)
v3: refactor checks to make it more readable (Ian Romanick)
v4: change warning comment in GLSL ES case (Curro)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "10.5" and "10.6" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84225
From the "apparently I don't know C" files...GCC apparently supports:
x ?: y
which is equivalent to
x ? x : y
except that it doesn't cause side-effects to occur twice. See:
https://gcc.gnu.org/onlinedocs/gcc/Conditionals.html#Conditionals
This was confusing and looked like a typo. It doesn't really buy us
anything, so just write the obvious code in normal C.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
A clear will do a partial validate, which will in turn reference all the
buffers in the bufctx again. However the fragprog last validated might
have already been deleted. So reset the bufctx when updating state.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This patch enables using XY_FAST_COPY_BLT only for Yf/Ys tiled buffers.
It can be later turned on for other tiling patterns (X,Y) too.
V3: Flush in between sequential fast copy blits.
Fix src/dst alignment requirements.
Make can_fast_copy_blit() helper.
Use ffs(), is_power_of_two()
Move overlap computation inside intel_miptree_blit().
V4: Use _mesa_regions_overlap() function.
Add check for src_buffer == dst_buffer.
Simplify horizontal and vertical alignment computations.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
In case of I915_TILING_{X,Y} we need to pass tiling format to libdrm
using drm_intel_bo_alloc_tiled(). But, In case of YF/YS tiled buffers
libdrm need not know about the tiling format because these buffers
don't have hardware support to be tiled or detiled through a fenced
region. libdrm still need to know buffer alignment value for its use
in kernel when resolving the relocation.
Using drm_intel_bo_alloc_for_render() for YF/YS tiled buffers
satisfy both the above conditions.
V2: Delete min/max buffer size restrictions not valid for i965+.
Remove redundant align to tile size statements.
Remove some redundant code now when there are no min/max buffer size.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Buffers with Yf/Ys tiling end up using meta upload / download
paths or the blitter for cases where they used tiled_memcpy paths
in case of Y tiling. This has exposed some bugs in meta path. To
avoid any piglit regressions on SKL this patch keeps the Yf/Ys
tiling disabled at the moment.
V3: Make brw_miptree_choose_tr_mode() actually choose TRMODE. (Ben)
Few cosmetic changes.
V4: Get rid of brw_miptree_choose_tr_mode().
Take care of all tile resource modes {Yf, Ys, none} for all
generations at one place.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
In order to save a small leak if mesa is continously loaded and
unloaded, let's free the locale when the shared object is unloaded.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
_mesa_strtod and _mesa_strtof are only used from the GLSL compiler and
the ARB_[vertex|fragment]_program code, meaning that the locale doesn't
need to be initialized before the first OpenGL context gets initialized.
So let's use explicit initialization from the one-time init code instead
of depending on a C++ compiler to initialize at image-load time.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This function only gets called while mesa is unloading, so there's
no potential of racing or multiple calls at the same time. So let's
just get rid of the locking.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
There's no point in calling _mesa_destroy_shader_compiler multiple
times on exit; the resources will only be released once anyway.
So let's move the atexit-call into the part that is only called
once.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This function is for deleting per-screen resources, and the shader
compiler resources are not of such nature. Besides, dri shouldn't
need to even know about the presence of a shader compiler.
These resources will already be released when mesa gets unloaded,
and that should be sufficient.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
All of these enums are now in use around in the code, so there's no need
to explicitly use them here any more.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Work-group size should always be aligned to subgroup size; this is a
basic requirement, otherwise some work-items will be no-operation.
It might make sense to refine the value according to a kernel's
resource usage, but that's a possible optimization for the future.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Since commit 104c8fc2c2 the GLSL IR will be freed if NIR is
being used. This was causing it to segfault if INTEL_DEBUG=wm is set.
This patch just makes it avoid dumping the GLSL IR in that case.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This reverts commit c2ff3485b3.
Ilia and I noticed a memory leak caused by this patch: at least with
fixed-function programs, we clone things using ProgramResourceList as
the context before reralloc makes it non-NULL.
I believe Tapani found other bugs with these patches, so I'm just going
to revert them for now and let him pursue them further.
Still appears to have issues with negative indices less than -1M, but
that's a corner case of a corner case.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Legacy user clipping (using gl_Position or gl_ClipVertex) is handled by
turning those into the modern gl_ClipDistance equivalents.
This is unnecessary in Core Profile: if user clipping is enabled, but
the shader doesn't write the corresponding gl_ClipDistance entry,
results are undefined. Hence, it is also unnecessary for geometry
shaders.
This patch moves the call up to run_vs(). This is equivalent for VS,
but removes the need to pass clip distances into emit_urb_writes().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
According to the "URB SIMD8 Write > Write Data Payload" documentation,
"The write data payload can be between 1 and 8 message phases long."
Apparently, the simulator considers it an error if you issue an URB
SIMD8 message with only a header and no actual data to write.
v2: Try to put in a better PRM citation, now that the Broadwell docs
actually exist (requested by Jordan).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The HUD doesn't check if query_create() fails and it calls other
pipe_query functions with NULL pointer instead of a valid query object.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The dup'ed fd owned by the nouveau_screen for a device node
must also be used as key for the winsys hash table, instead
of using the original fd passed in for a screen, to make
multi-x-screen ZaphodHeads configurations work on nouveau.
The original fd's lifetime differs from that of the nouveau_screen stored
in the hash. The hash key is the fd, and in order to compare hash entries
we fstat them, so the fd must be around for as long as the screen is.
This is an extension of the fix in commit a59f2bb1 (nouveau: dup fd
before passing it to device).
Cc: "10.3 10.4 10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The meta code was setting a default depth range for all viewports
and 'restoring' all viewports to depth range values saved from viewport 0.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This isn't pretty and I'd suggest it the pm4 interface builder
could be tweaked to do this more efficently, but I'd need
guidance on how that would look.
This seems to pass the few piglit tests I threw at it.
v2: handle passing layer/viewport index to fragment shader.
fix crash in blit changes,
add support to io_get_unique_index for layer/viewport index
update docs.
v3: avoid looking up viewport index and layer in es (Marek).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
brw_miptree_layout_2d tries to ensure that mt->total_width is a
multiple of the compressed block size, presumably because it wouldn't
be possible to make an image that has a fraction of a block. However
it was doing this by aligning mt->total_width to align_w. Previously
align_w has been used as a shortcut for getting the block width
because before Gen9 the block width was always equal to the alignment.
Commit 4ab8d59a2 tried to fix these cases to use the block width
instead of the alignment but it missed this case.
I think in practice this probably won't make any difference because
the buffer for the texture will be allocated to be large enough to
contain the entire pitch and libdrm aligns the pitch to the tile width
anyway. However I think the patch is worth having to make the
intention clearer.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Since the addition of ilo_vma, it was used only to pad a bo for sampling
engine surfaces. Replace it entirely with these functions
ilo_state_surface_buffer_size()
ilo_state_vertex_buffer_size()
ilo_state_index_buffer_size()
ilo_state_sol_buffer_size()
readpixels_can_use_memcpy will later call _mesa_format_matches_format_and_type
which does much tighter checks than these to decide if we can use
memcpy for readpixels.
Also, the checks do not seem to be extensive enough anyway, since we are
checking for signed/unsigned conversion only when the framebuffer has integers,
but the same checks could be done for other types anyway, since as long as
there is a signed/unsigned conversion we can't memcpy.
No regressions observed on i965/llvmpipe.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
From Muchnick's Advanced Compiler Design and Implementation:
"To determine which variables are live at each point in a flowgraph, we
perform a backward data-flow analysis"
Previously, we were walking the blocks forwards and updating the livein and
then the liveout. However, the livein calculation depends on the liveout
and the liveout depends on the successor blocks. The net result is that it
takes one full iteration to go from liveout to livein and then another
full iteration to propagate to the predecessors. This works out to an
O(n^2) computation where n is the number of blocks. If we run things in
the other order, it's O(nl) where l is the maximum loop depth which is
practically bounded by 3.
In b2c6ba0c4b, we made this same change in
the FS backend to great effect. Might as well keep it consistent and make
the same change for vec4. Also, this took the time to run the test:
ES31-CTS.arrays_of_arrays.InteractionFunctionCalls1
from 6:49.62 to 3:31.40 on Timothy Arceri's machine.
Reviewed-by: Matt Turner <mattst88@gmail.com>
gen8 had some special restrictions which don't seem to carry over to gen9.
Quoting the spec for SKL:
"The Z_Height and Z_Width values must equal those present in
3DSTATE_DEPTH_BUFFER incremented by one."
This fixes nothing in piglit (and regresses nothing).
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This is always 0 - only brw_workaround_depthstencil_alignment ever sets
it, and that doesn't run on Gen6+. My initial Broadwell depth state
commit had this mistake.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
We already recognize min(max(a, 0.0), 1.0) as a saturate, but neglected
this variant (which is also handled by the GLSL IR pass).
shader-db results on Broadwell:
total instructions in shared programs: 7363046 -> 7362788 (-0.00%)
instructions in affected programs: 11928 -> 11670 (-2.16%)
helped: 64
HURT: 0
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The thread counts and URB information are all speculative numbers that were
based on some CHV numbers at the time.
v2:
Originally this patch had PCI IDs. I've moved that to a new patch at the end of
the series.
Remove is_cherryview hack.
Add PCI ids. These match the ones defined in the kernel. The only one tested by
us is 0x0a84.
Capitalize the hex string (Mark)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: "Lecluse, Philippe" <Philippe.Lecluse@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Commit b765119c changed the default value of all the counter bits to
64. However, older hardware only has 32 counter bits.
This has only been build-tested. We don't have any tests that verify
the advertised value against implementation behavior, so I don't know
what additional testing could be done.
NOTE: It appears that many Gallium drivers (at least r300 and i915g)
have the same problem, but I don't see a way for the state-tracker to
determine the counter size. Marek says, "For Gallium, a new PIPE_CAP or
new get_xxx_param function will be needed."
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
From Muchnick's Advanced Compiler Design and Implementation:
"To determine which variables are live at each point in a flowgraph, we
perform a backward data-flow analysis"
Previously, we were walking the blocks forwards and updating the livein and
then the liveout. However, the livein calculation depends on the liveout
and the liveout depends on the successor blocks. The net result is that it
takes one full iteration to go from liveout to livein and then another
full iteration to propagate to the predecessors. This works out to an
O(n^2) computation where n is the number of blocks. If we run things in
the other order, it's O(nl) where l is the maximum loop depth which is
practically bounded by 3.
On my HSW desktop, one particular shadertoy test gets a 20% improvement in
compile times:
N Min Max Median Avg Stddev
x 10 15.965 16.884 16.026 16.1822 0.34736846
+ 10 12.813 13.052 12.876 12.8891 0.06913666
Difference at 95.0% confidence
-3.2931 +/- 0.235316
-20.3501% +/- 1.45417%
(Student's t, pooled s = 0.250444)
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is based on Kenneth's patch to delete 'most of the IR'. Due to
linker changes to clone variables, we can now free all of IR.
Saves 58MB of memory when replaying a Dota 2 trace on Broadwell.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
This increases memory pressure during linking but makes it easier
for backend to free IR after it is not needed anymore.
v2: use resource list as ralloc context in case of relink (Kenneth)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Start trimming the fat from intel_batchbuffer.c. First by moving the set
of routines for emitting PIPE_CONTROLS (along with the lore concerning
hardware workarounds) to a separate brw_pipe_control.c
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This caused us to always free the pipe_surface for the renderbuffer.
The subsequent call to st_update_renderbuffer_surface() would typically
just recreate it. Remove the call to pipe_surface_release() and let
st_update_renderbuffer_surface() take care of freeing the old surface
if it needs to be replaced (because of change to mipmap level, etc).
This can save quite a few calls to pipe_context::create_surface() and
surface_destroy().
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Fixes the following build issue, when building without libudev.
CCLD libGL.la
./.libs/libglx.a(dri2_glx.o): In function `dri2CreateScreen':
src/glx/dri2_glx.c:1186: undefined reference to `loader_open_device'
collect2: ld returned 1 exit status
CCLD libEGL.la
Undefined symbols for architecture x86_64:
"_loader_open_device", referenced from:
_dri2_initialize_x11_dri2 in libegl_dri2.a(platform_x11.o)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91077
Signed-off-by: Julien Isorce <j.isorce@samsung.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
1000 ms is an extreme value for typical interactive loads. A large
cache has some disadvantages. Search for reusable BOs can take a long
time and memory might get exhausted.
Let's be rather conservative and use half of the old value,
500ms. This is beneficial to some loads on my test system and there
are no regressions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is the basic granularity for BO allocations. The alignment also
helps with BO reuse by the cached bufmgr.
This results in a huge 45% speedup in Metro 2033 Redux on my test
system. The game relies on buffer orphaning with very small buffers
(hundreds of bytes in size) and that did not work efficiently
before. This change may also affect other applications and games.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Matt, Jason, and I haven't found this useful in a long time.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It returns a new value for each sample in the TLB. We've already avoided
trying to get the same index's color multiple times at the vc4_program.c
level, so we're not losing anything by doing this.
As of this commit, nothing actually needs the brw_context.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This way we can stop doing is_gles3 checks inside of the compiler.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Previously, these were pulled out of the GL context conditionally based on
whether we were running ff/ARB or a GLSL program. Now, we just pass them
in so that the visitor doesn't have to grab them itself.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Previously, each shader took 3 shader time indices which were potentially
at arbirary points in the shader time buffer. Now, each shader gets a
single index which refers to 3 consecutive locations in the buffer. This
simplifies some of the logic at the cost of having a magic 3 a few places.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This creates the options at screen cration time and then we just copy them
into the context at context creation time. We also move is_scalar to the
brw_compiler structure.
We also end up manually setting some values that the core would have set by
default for us. Fortunately, there are only two non-zero shader compiler
option defaults that we aren't overriding anyway so this isn't a big deal.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
While we're at it, we'll drop the note about 10-20% performance loss.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
v2 (Ken): Make shader_debug_log a printf-like function.
v3 (Jason): Add a void * to pass the brw_context through
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Coverity sees the if (mode >= BRW_PRIM_OFFSET (128)) test and assumes
that the else-branch might execute for mode to up 127, which out be out
of bounds.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Coverity sees that the functions immediately below the new assertions
dereference these pointers, but is unaware that an ENDIF always follows
an IF, etc.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
On gen9+ MOCS is an index into a table. It is 7 bits, and AFAICT, bit 0 is for
doing encrypted reads.
I don't recall how I decided to do this for BXT. I don't know this patch was
ever needed, since it seems nothing is broken today on SKL. Furthermore, this
patch may no longer be needed because of the ongoing changes with MOCS setup. It
is what is being used/tested, so it's included in the series.
The chosen values are the old values left shifted. That was also an arbitrary
choice.
v2: Use shift in MOCS to make it clear what we're doing. (Ken)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Without first running the bo through pushbuf_refn, the nouveau drm
library will have uninitialized structures regarding this bo, and will
insert incorrect data.
This fixes supertuxkart 0.9 crash on start (where it ends up doing a lot
of indirect draws).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Since we clear the TFB bufctx binding point above, we need to put all of
the active tfb's back in, even if they haven't changed since last time.
Otherwise the tfb may get moved into sysmem and the underlying mapping
will generate write errors.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
The only reason we touch glapi is to dlopen it in order to:
- make sure that the unresolved _glapi* symbols in the dri modules are
provided.
- fetch glFlush() and use it at various stages in the dri2 driver.
Cc: Chih-Wei Huang <cwhuang@linux.org.tw>
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
The whole of GBM does not rely on even a single symbol from the GL
dispatch library, unsuprisingly. The only need for it comes from the
unresolved symbols in the DRI modules, which are now correctly handled
with Frank's commit.
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Dri driver libs are not linked to pull in libglapi so gbm_create_device()
fails when it tries to dlopen them (unless the application is linked
with something that does pull in libglapi, like libGL).
Until dri drivers can be fixed properly, dlopen libglapi before trying
to dlopen them.
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Frank Henigman <fjhenigman@google.com>
[Emil Velikov: Drop misleading bugzilla link, mention that libname differs]
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Already handled in the Makefile which includes the drivers/x11 subdir.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We've moved the open with CLOEXEC idiom into a helper function, so
call it instead of duplicating the code.
This also replaces a couple of opens that didn't properly do CLOEXEC.
Signed-off-by: Derek Foreman <derekf@osg.samsung.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
We've moved the open with CLOEXEC idiom into a helper function, so
call it instead of duplicating the code here.
Signed-off-by: Derek Foreman <derekf@osg.samsung.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
This is already our common idiom for opening files with CLOEXEC and
it's a little ugly, so let's share this one implementation.
Signed-off-by: Derek Foreman <derekf@osg.samsung.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Replacing dup() with fcntl F_DUPFD_CLOEXEC creates the duplicate
file descriptor with CLOEXEC so it won't be leaked to child
processes if the process fork()s later.
Signed-off-by: Derek Foreman <derekf@osg.samsung.com>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
If the shader doesn't specify number of invocations, assume one.
This fixes geometry shaders on state trackers other than Mesa (and
probably graw tests too.)
Trivial.
This extends the draw code to add support for invocations.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is just for softpipe, llvmpipe won't work without
some changes.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is required for ARB_gpu_shader5 support in softpipe.
v2: add support to txd/txf/txq paths.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These are basically just moves, so they should be safe as well.
When disabling i965's GLSL IR level scalarizer (channel expressions)
pass, I started seeing NIR code like this:
if ssa_21 {
block block_1:
/* preds: block_0 */
vec4 ssa_120 = vec4 ssa_82, ssa_83, ssa_84, ssa_30
/* succs: block_3 */
} else {
block block_2:
/* preds: block_0 */
/* succs: block_3 */
}
block block_3:
/* preds: block_1 block_2 */
vec4 ssa_33 = phi block_1: ssa_120, block_2: ssa_2
Previously, the GLSL IR scalarizer pass would break the vec4 into a
series of fmovs, which were allowed by the peephole pass. But with
the vec4 operation, they were not. We want to keep getting selects.
Normal i965 on Broadwell:
instructions in affected programs: 200 -> 176 (-12.00%)
helped: 4
With brw_fs_channel_expressions() disabled:
instructions in affected programs: 1832 -> 1646 (-10.15%)
helped: 30
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This was originally only used by the vertex shader, but it's now used by
the geometry shader as well, and will also eventually be used for
tessellation control and evaluation shaders.
I suspect it will be easier to find in a file named after the concept.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This implements a workaround (exact excerpt as a comment in the code). The docs
specify [clearly, after you struggle for a while] that the offset isn't relative
to state base. This actually makes sense. This fixes hangs on SKL.
Buffer #0 is meant to be used for normal uniforms.
Buffer #1 is typically used for gather constants when using RS.
Buffer #1-#3 could be used to push a bunch of UBO data which would just be
somewhere in memory, and not relative to the dynamic state.
NOTE: I've moved away from the ternary operator for the new gen9 conditions.
Admittedly it's probably not great to do this, but I really want to fix this all
up in the subsequent patch and doing it here makes that diff a lot nicer. I want
to split out the gen8/9 code to make the function a bit more readable, but to
keep this easily cherry-pickable I am doing this fix first. If we decide not to
merge the cleanup patch then I can revisit this.
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Tested-by: Valtteri Rantala <Valtteri.rantala@intel.com>
It allows us to remove ilo_ib_state::draw_start_offset and
ILO_PRIM_RECTANGLES. gen6_3d_translate_pipe_prim() is also replaced by
ilo_translate_draw_mode().
Use the newly-introduced NV_VRAM_DOMAIN() macro to support alternative
VRAM domains for chips that do not have dedicated video memory.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Martin Peres <martin.peres@free.fr>
Some GPUs (e.g. GK20A, GM20B) do not embed VRAM of their own and use
the system memory as a backend instead. For such systems, allocating
objects in VRAM results in errors since the kernel will not allow
VRAM objects allocations.
This patch adds a vram_domain member to struct nouveau_screen that can
optionally be initialized to an alternative domain to use for VRAM
allocations. If left untouched, NOUVEAU_BO_VRAM will be used for
systems that embed VRAM, and NOUVEAU_BO_GART will be used for VRAM-less
systems.
Code that uses GPU objects is then expected to use the NV_VRAM_DOMAIN()
macro in place of NOUVEAU_BO_VRAM to ensure correct behavior on
VRAM-less chips.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Martin Peres <martin.peres@free.fr>
For query_levels, we generate a getinfo with writemask of (z), which RA
will consider as size==3. But we were still generating four fanouts.
Which meant that RA would see it as two different register classes,
depending on the path to definer. Ie. on the getinfo instruction itself
it would see size==3, but when chasing back through the fanouts it would
see size==4.
Easiest way to solve that is to just generate the chain of neighboring
fanouts to have the correct size in the first place.
Note: we may eventually want split_dest() to take start/end or wrmask
instead, since really we only need size==1. But RA is not clever enough
for that, query_levels is not that common, and the other two registers
that get allocated are never used so those register slots can be
immediately re-used. So bunch of work for probably no real gain.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
We get this information from NIR (which gets it from sview decl in tgsi
when translating from tgsi), so no need to maintain shader variants for
this.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
This shuffles things around to allow the shader to have multiple basic
blocks. We drop the entire CFG structure from nir and just preserve the
blocks. At scheduling we know whether to schedule conditional branches
or unconditional jumps at the end of the block based on the # of block
successors. (Dropping jumps to the following instruction, etc.)
One slight complication is that variables (load_var/store_var, ie.
arrays) are not in SSA form, so we have to figure out where to put the
phi's ourself. For this, we use the predecessor set information from
nir_block. (We could perhaps use NIR's dominance frontier information
to help with this?)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
These belong in the shader, rather than the block. Mostly a lot of
churn and nothing too interesting. But splitting this out from the
rest of ir3_block reshuffling to cut down the noise in the later
patch.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Right now, just provides a cleaner way to get at the gpu-id, given the
separation between compiler and context. But we will need this also to
hold the reg-set for new register allocation.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Use a more standard priority-queue based scheduling algo. It is simpler
and will make things easier once we have multiple basic blocks and flow
control.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Use standard list_head double-linked list and related iterators,
helpers, etc, rather than weird combo of instruction array and next
pointers depending on stage. Now block has an instrs_list. In
certain stages where we want to remove and re-add to the blocks list
we just use list_replace() to copy the list to a new list_head.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
At least for now.. right now the instruction and instruction list
printing should suffice, and the re-working of ir3_block would require
a lot of changes in that code.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Use ir3_MOV() builder in a couple of spots, rather than open-coding the
instruction construction. Also add ir3_NOP() builder and use that
instead of open coding.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Freedreno needs sampler type information to deal with int/uint textures.
To accomplish this, start creating sampler-view declarations, as
suggested here:
http://lists.freedesktop.org/archives/mesa-dev/2014-November/071583.html
create a sampler-view with index matching the sampler, to encode the
texture type (ie. SINT/UINT/FLOAT). Ie:
DCL SVIEW[n], 2D, UINT
DCL SAMP[n]
TEX OUT[1], IN[1], SAMP[n]
For tgsi texture instructions which do not take an explicit SVIEW
argument, the SVIEW index is implied by the SAMP index.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Some hardware needs to know the sampler type. Update the blit related
shaders to include SVIEW decl.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
To allow for shaders which use SVIEW decls for TEX* instructions, we
need to preserve the constraint that the shader either has no SVIEW's or
it has one matching SVIEW for each SAMP.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
To allow for shaders which use SVIEW decls for TEX* instructions, we
need to preserve the constraint that the shader either has no SVIEW's or
it has one matching SVIEW for each SAMP.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This was a hack as part of debugging some glamor-on-GLES2 behavior that
ended up being an xserver bug. I suspect we can just flip this extension
on for GLES2, but the spec says it requires 3.1.
We need to make sure that when we store the aligned box, we've got
initialized contents in the border. We could potentially just load the
border area, but for now let's get text rendering working in X (and fix
the GL_TEXTURE_2D errors in piglit's texsubimage test and
gl-2.1-pbo/test_tex_image)
Being a parameter-like state, we may want to get rid of
ilo_state_vertex_buffer_info or ilo_state_vertex_buffer eventually. But we
want them now as they are how we do cross-validation right now.
3DSTATE_VF_INSTANCING specifies instancing enable and step rate. They are
specified along with 3DSTATE_VERTEX_BUFFERS instead prior to Gen8. Both
commands are added.
3DSTATE_VF specifies cut index enable and cut index. Cut index enable is
specified in 3DSTATE_INDEX_BUFFER instead prior to Gen7.5. Both commands are
added.
ilo_shader.c: In function ‘ilo_shader_select_kernel_sbe’:
ilo_shader.c:1140:27: warning: ‘src_skip’ may be used uninitialized in this
function [-Wmaybe-uninitialized]
The original code meant to do this, but was only checking num_samples == 1 to
figure out if a surface was fast clear capable. However, we can allocate single
sample miptrees with num_samples == 0 (when it's an internally created buffer).
This fixes a bunch of the piglit tests on gen8. Other gens should have been
fine.
Here is the order of events that allowed this to slip through:
t0: I wrote halign patches and tested them. These alignment assertions are for
gen8 fast clear surfaces, basically.
t1: I pushed bogus perf patch which made fast clears never happen
t2: Reworked halign patches based on Chad's feedback and introduced the bug this
patch fixes.
t2.5: I tested reworked patches, but assertion wasn't hit because of t1.
t3. Matt fixed issue in t1 which made fast clears happen here:
commit 22af95af83
Author: Matt Turner <mattst88@gmail.com>
Date: Thu Jun 18 16:14:50 2015 -0700
i965: Add missing braces around if-statement.
This logic should match that of the v1 of my halign patch series.
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Matt Turner <mattst88@gmail.com>
Reported-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Mark Janes <mark.a.janes@intel.com>
When adding EXT_polygon_offset_clamp, I first made it core-only, and
never moved the enum getter back to the GL/GL_CORE section. Similarly,
ARB_gs5 is a core-only extension, so move its getters to the GL_CORE
section.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
If the driver says PIPE_CAP_VERTEX_ELEMENT_SRC_OFFSET_4BYTE_ALIGNED_ONLY=1,
the driver should never receive a pipe_vertex_element::src_offset value
that's not a multiple of four. But the vbuf code wasn't actually adjusting
the src_offset value when creating the vertex element state object.
We just need to align the src_offset values put in the driver_attribs[]
array.
See the piglit gl-1.5-vertex-buffer-offsets test.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
There are three possible return values (not two): WGL_SWAP_COPY_ARB,
WGL_SWAP_EXCHANGE_EXT and WGL_SWAP_UNDEFINED_ARB.
VMware bug 1431184
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Also, print a warning if we do return NULL from wglGetProcAddress() to
help spot this sort of problem in the future.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Viewperf 12 calls wglGetProcAddress() to get pointers to some unsupported
DSA and half-float functions. We return NULL but Viewperf doesn't check
for null before trying to jump through the pointer. That causes a crash.
This patch adds no-op functions to call instead (used by the next patch).
This avoids the crash but the rendering is incorrect.
Some DSA functions are being added to Mesa at this time so we may be
able to remove some of these no-ops in the future.
More no-op functions may be added as needed.
VMware PR1383421
Reviewed-by: José Fonseca <jfonseca@vmware.com>
WGL_CONTEXT_PROFILE_MASK_ARB doesn't apply to desktop OpenGL versions
less than 3.2 -- applications can't specify whether they want a core or
a compat 3.1 context -- instead they are supposed the check whether the
returned context advertises GL_ARB_compatibility extension.
Mesa doesn't support compatability contexts for version higher than 3.1,
so we used to return core profile context, but this makes several Windows
applications unhappy, because they just assume they got a compatability
context without checking.
So it seems safer to on Windows to never return core profile for 3.1,
ie, just fail the context creation.
VMware PR1365920.
Reviewed-by: Brian Paul <brianp@vmware.com>
Create pixel formats with 0, 4, 8 and 16 samples per pixel.
Add a SVGA_FORCE_MSAA env var to force creating all pixel formats
with a particular sample count. This is useful for testing Mesa/GLUT/
etc. programs which don't ordinarily use multisample.
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
Update piglit link to the current Piglit website.
Add note about updating patchwork when sending patch revisions.
Acked-by: Matt Turner <mattst88@gmail.com>
Tested with Ilia Mirkin's gzdoom.trace and
"arb_uniform_buffer_object-maxuniformblocksize fsexceed" piglit test
without my earlier fix to fail linkage when UBO exceeds
GL_MAX_UNIFORM_BLOCK_SIZE.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
It's not totally clear whether other Mesa drivers can safely cope with
over-sized UBOs, but at least for llvmpipe receiving a UBO larger than
its limit causes problems, as it won't fit into its internal display
lists.
This fixes piglit "arb_uniform_buffer_object-maxuniformblocksize
fsexceed" without regressions for llvmpipe.
NVIDIA driver also fails to link the shader from
"arb_uniform_buffer_object-maxuniformblocksize fsexceed".
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65525
PS: I don't recommend cherry-picking this for Mesa stable, as some app
might inadvertently been relying on UBOs larger than
GL_MAX_UNIFORM_BLOCK_SIZE to work on other drivers, so even if this
commit is universally accepted it's probably best to let it mature in
master for a while.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
gl_NumSamples should only be enabled when ARB_sample_shading is enabled.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Although we don't support SIMD32, krh pointed out that the left shift
by 32 is undefined by C/C++ for 32-bit integers.
Suggested-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This was apparently missed when ARB_sso support was added.
Add label support to pipeline objects just like all the other
debug-related objects.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
This avoids a security issue where userspace could have written the tile
state/tile alloc behind the GPU's back, and will apparently be necessary
for fixing stability bugs (tile state buffers are missing some top bits
for the tile alloc's address).
We can't use sampler messages with gradient information (like
sample_g or sample_d) to deal with this scenario because according
to the PRM:
"The r coordinate and its gradients are required only for surface
types that use the third coordinate. Usage of this message type on
cube surfaces assumes that the u, v, and gradients have already been
transformed onto the appropriate face, but still in [-1,+1] range.
The r coordinate contains the faceid, and the r gradients are ignored
by hardware."
Instead, we should lower this to compute the LOD manually based on the
gradients and use a different sample message that takes the computed
LOD instead of the gradients. This is already being done in
brw_lower_texture_gradients.cpp, but it is restricted to shadow
samplers only, although there is a comment stating that we should
probably do this also for samplerCube and samplerCubeArray.
Because of this, both dEQP and Piglit test cases for textureGrad with
cube maps currently fail.
This patch does two things:
1) Activates the texturegrad lowering pass for all cube samplers.
2) Corrects the computation of the LOD value for cube samplers.
I had to do 2) because for cube maps the calculations implemented
in the lowering pass always compute a value of rho that is twice
the value we want (so we get a LOD value one unit larger than we
want). This only happens for cube map samplers (all kinds). I am
not sure about why we need to do this, but I suspect that it is
related to the fact that cube map coordinates, when transported
to a specific face in the cube, are in the range [-1, 1] instead of
[0, 1] so we probably need to divide the derivatives by 2 when
we compute the LOD. Doing that would produce the same result as
dividing the final rho computation by 2 (or removing a unit
from the computed LOD, which is what we are doing here).
Fixes the following piglit tests:
bin/tex-miplevel-selection textureGrad Cube -auto -fbo
bin/tex-miplevel-selection textureGrad CubeArray -auto -fbo
bin/tex-miplevel-selection textureGrad CubeShadow -auto -fbo
Fixes 10 dEQP tests in the following category:
dEQP-GLES3.functional.shaders.texture_functions.texturegrad.*cube*
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Triggers an INVALID_OPCODE warning on GK208. Seems rare enough to not
warrant verification on other chips. Fixes the new piglits:
ubo_array_indexing/fs-nonuniform-control-flow.shader_test
ubo_array_indexing/vs-nonuniform-control-flow.shader_test
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Ensure that the GPU spawns the fragment shader thread for those
fragment shaders with atomic buffer access.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Add helper function that checks if current fragment shader active
of gl_context has atomic buffer access.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Change references to gl_framebuffer::Width, Height, MaxNumLayers
and Visual::samples to use the _mesa_geometry_ convenience functions
for those places where the geometry of the gl_framebuffer is needed
(in contrast to the geometry of the intersection of the attachments
of the gl_framebuffer).
This patch is to pave the way to enable GL_ARB_framebuffer_no_attachments
on Gen7 and higher in i965.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Add helper convenience function that intersects the scissor values
against a passed bounding box. In addition, to avoid replicated code,
make the function _mesa_scissor_bounding_box() use this new function.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Add convenience helper functions for fetching geometry of gl_framebuffer
that return the geometry of the gl_framebuffer instead of the geometry of
the buffers of the gl_framebuffer when then the gl_framebuffer has no
attachments.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Implement GL_ARB_framebuffer_no_attachments in Mesa core
- changes to conditions for framebuffer completenss
- implement set/get functions for framebuffers for
new functions in GL_ARB_framebuffer_no_attachments
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Define the enumeration constants, function entry points and
glGet for the GL_ARB_framebuffer_no_attachments.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Define the infrastructure for the extension GL_ARB_framebuffer_no_attachments:
- extension table
- additions to gl_framebuffer
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
I was thinking of the MIN opcode in terms of unsigned math, but it's
signed, so if you used a negative array index, you could read before the
UBO. Fixes segfaults under simulation in piglit array indexing tests with
mprotect-based guard pages.
I wanted to assert that src1 came from a non-unspilled register in shader
validation, and this easily gets us that. And, as a bonus:
total instructions in shared programs: 93347 -> 92723 (-0.67%)
instructions in affected programs: 60524 -> 59900 (-1.03%)
Disabling miptails fixed the buffer corruption happening in FBO
which use YF/YS tiled renderbuffer or texture as color attachment.
Spec recommends disabling mip tails only for non-mip-mapped surfaces.
But, without disabling miptails I couldn't get correct data out of
mipmapped YF/YS tiled surface.
We need better understanding of miptails before start using them.
For now this patch helps move things forward.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Patch sets the alignments for texture and renderbuffer surfaces.
V3: Make changes inside horizontal_alignment() and
vertical_alignment() (Topi)
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This function will be utilised in later patches.
V2: Make both pointers constants (Topi)
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This patch sets the tiled resource mode for texture and renderbuffer
surfaces.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
The surfaceless platform is for off-screen rendering only. Render node support
is required.
Only consider the render nodes. Do not use normal nodes as they require
auth hooks.
v3: change platform_null to platform_surfaceless
v4: make libdrm required for surfaceless
v5: remove modified include guards with defined(HAVE_SURFACELESS_PLATFORM)
v6: use O_CLOEXEC for drm fd
Signed-off-by: Haixia Shi <hshi@chromium.org>
Signed-off-by: Zach Reizner <zachr@google.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Previously when setting up the sample instruction for an indirect
sampler the vec4 backend was directly passing the pseudo opcode's
src0. However vec4_visitor::visit(ir_texture *) doesn't set the
texture operation's src0 -- it's left as BAD_FILE, which when
translated into a brw_reg gives the null register. In brw_SAMPLE,
gen6_resolve_implied_move() inserts a MOV from the inst->base_mrf and
sets the src0 appropriately. The indirect sampler case did not have a
call to gen6_resolve_implied_move().
The fs backend avoids this because the platforms that support dynamic
indexing of samplers (IVB+) have been converted to not use the
fake-MRF hack, and instead send from proper GRFs.
This patch makes it call gen6_resolve_implied_move before setting up
the indirect message. This is similar to what is done for constant
sampler numbers in brw_SAMPLE.
The Piglit tests for sampler array indexing didn't pick this up
because they were using a texture with a solid colour so it didn't
matter what texture coordinates were actually used. The tests have now
been changed to be more thorough in this commit:
http://cgit.freedesktop.org/piglit/commit/?id=4f9caf084eda7
With that patch the tests for gs and vs are currently failing on
Ivybridge, but this patch fixes them. There are no other changes to a
Piglit run on Ivybridge.
On Skylake the gs tests were failing even without the Piglit patch
because Skylake needs the source registers to work correctly in order
to send a message header to select SIMD4x2 mode.
(The explanation in the commit message is partially written by Matt
Turner)
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This reverts commit adee54f826.
Further down in the GLSL ES 3.10 spec it say:
"If an array is declared as the last member of a shader storage block
and the size is not specified at compile-time, it is sized at run-time.
In all other cases, arrays are sized only at compile-time."
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This function was trying to align the width and height to a multiple
of the block size for compressed textures. It was using align_w/h as a
shortcut to get the block size as up until Gen9 this always happens to
match. However in Gen9+ the alignment values are expressed as
multiples of the block size so in effect the alignment values are
always 4 for compressed textures as that is the minimum value we can
pick. This happened to work for most compressed formats because the
block size is also 4, but for FXT1 this was breaking because it has a
block width of 8.
This fixes some Piglit tests testing FXT1 such as
spec@3dfx_texture_compression_fxt1@fbo-generatemipmap-formats
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
The state tracker will pass through requests from buggy applications
which will have the buffer size larger than the max allowed (64k). Clamp
the size to 64k so that we don't get errors when uploading the constbuf
data.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
One of the places we have to insert texbars is in situations where the
result of the tex gets overwritten by a different instruction (e.g. in a
conditional statement). However in some situations it can actually
appear as though the original tex itself is an overwriting instruction.
This can naturally never really happen, so just ignore the tex
instruction when it comes up.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90347
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Back in 2013, a patch was added (with 2 reviewers!) at the end of the
block to early exit the loop in this case, without noticing that the loop
already did. I added another early exit case, again without noticing, but
Rob caught me. Just drop the loop condition that apparently surprises
most of us, instead of leaving the end of the loop conspicuously not
exiting on success.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
This was part of gallium_egl, and we now have the normal libEGL Android
winsys support to handle it.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
v2: Add a comment explaining why we link libmesa_glsl. Drop warning
option from freedreno. Add vc4 to the documentation for
BOARD_GPU_DRIVERS.
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
in case of glTexImage{1,2,3}D(). Texture has already been allocated
at this point and we have no data to upload. With out this patch,
with create_pbo = true, we end up creating a temporary pbo and then
uploading uninitialzed texture data.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
We have an assert() in intel_miptree_map_movntdqa() which expects
the pitch to be 16 byte aligned.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
OP_JOIN instructions are assumed to be flow instructions and mercilessly
casted to FlowInstruction.
This patch fixes an instance where an OP_JOIN is created as a plain
instruction. This can cause crashes in the ir printer.
[imirkin: add ->fixed = 1]
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
We want to replace ilo_zs_surface with ilo_state_zs. One noteworthy
difference is that ilo_state_zs always aligns level 0 to 8x4 when HiZ is
enabled. HiZ will not be enabled for 1D surfaces as a result.
It needs be set to R/W only when using certain messages via DP render cache.
Since we only use RT wrties with the render cache, we never need to set it.
The only use of lp_profile() is wrapped in #if defined(PROFILE),
so there is no reason to build it unless this macro is defined.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This helped find the incorrect HALIGN values from the previous patches.
v2: Add PRM references for assertions (Chad)
v3: Remove duplicated part of commit message, assert num_samples > 1, instead of
num_samples > 0. (Chad)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Just like the previous patch, but for the GEN9 constraints.
v2:
bugfix: Gen9 HALIGN was being set for all miptree buffers (Chad). To address
this, move the check to where the gen8 check is, and do the appropriate
conditional there.
v3:
Remove stray whitespace introduced in v2 (Chad)
Rework comment to show AUX_CCS and AUX_MCS specifically. Remove misworded part
about gen7 (Chad).
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> (v1)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Chad Versace <chad.versace@intel.com>
This restriction was attempted in this commit:
commit 4705346463
Author: Anuj Phogat <anuj.phogat@gmail.com>
Date: Fri Feb 13 11:21:21 2015 -0800
i965/gen8: Use HALIGN_16 if MCS is enabled for non-MSRT
However, the commit itself doesn't achieve the desired goal as determined by the
asserts which the next patch adds. mcs_mt is NULL (never set) we're in the
process of allocating the mcs_mt miptree when we get to this function. I didn't
check, but perhaps this would work with blorp, however, meta clears allocate the
miptree structure (which AFAICT needs the alignment also) way before it
allocates using meta clears where the renderbuffer is allocated way before the
aux buffer.
The restriction is referenced in a few places, but the most concise one [IMO]
from the spec is for Gen9. Gen8 loosens the restriction in that it only requires
this for non-msrt surface.
When Auxiliary Surface Mode is set to AUX_CCS_D or AUX_CCS_E, HALIGN 16 must
be used.
With the code before the miptree layout flag rework (patches preceding this),
accomplishing this workaround is very difficult.
v2:
bugfix: Don't set HALIGN16 for gens before 8 (Chad)
v3:
non-trivial rebase
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Cc: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
There are several constraints when determining if one can fast clear a surface.
Some of these are alignment, pixel density, tiling formats, and others that vary
by generation. The helper function which exists today does a suitable job,
however it conflates "BO properties" with "Miptree properties" when using
tiling. I consider the former to be attributes of the physical surface, things
which are determined through BO allocation, and the latter being attributes
which are derived from the API, and having nothing to do with the underlying
surface.
Determining tiling properties and creating miptrees are related operations
(when we allocate a BO for a miptree) with some disjoint constraints. By
extracting the decisions into two distinct choices (tiling vs. miptree
properties), we gain flexibility throughout the code to make determinations
about when we can or cannot fast clear strictly on the miptree.
To signify this change, I've also renamed the function to indicate it is a
distinction made on the miptree. I am torn as to whether or not it was a good
idea to remove "non_msrt" since it's a really nice thing for grep.
v2:
Reword some comments (Chad)
intel_is_non_msrt_mcs_tile_supported->intel_tiling_supports_non_msrt_mcs (Chad)
Make full if ladder for gens in above function (Chad)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Cc: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
For GEN9, much of the logic to use X-Tiled buffers has been stripped out. It is
still supported in some places, but it's never desirable. Unfortunately we don't
yet have the ability to have Y-Tiled scanout (see:
http://patchwork.freedesktop.org/patch/46984/),
NOTE: This patch shouldn't actually do anything since SKL doesn't yet use fast
clears (they are disabled because they are causing regressions). THerefore, the
only case we can get to this function on SKL is by way of
intel_update_winsys_renderbuffer_miptree.
v2: Update commit message to be more clear that the NOTE is for SKL only.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
I think pretty much everyone agrees that having more than a single bool as a
function argument is bordering on a bad idea. What sucks about the current
code is in several instances it's necessary to propagate these boolean
selections down to lower layers of the code. This requires plumbing (mechanical,
but still churn) pretty much all of the miptree functions each time. By
introducing the flags paramater, it is possible to add miptree constraints very
easily.
The use of this, as is already the case, is sometimes we have some information
at the time we create the miptree that needs to be known all the way at the
lowest levels of the create/allocation, disable_aux_buffers is currently one
such example. There will be another example coming up in a few patches.
v2:
Tab fix. (Ben)
Long line fixes (Topi)
Use anonymous enum instead of #define for layout flags (Chad)
Use 'X != 0' instead of !!X (everyone except Chad)
v3:
Some non-trivial conflict resolution on top of Anuj's patches.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Cc: "Pohjolainen, Topi" <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This will be used to implement the Gateway Barrier SEND needed to implement
the barrier function.
v2:
* notify => gateway_notify (Ken)
* combine short lines of brw_barrier proto/decl (mattst88)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This will be used to implement the barrier function.
v2:
* Rename to brw_WAIT (mattst88)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This will be used by the wait instruction when implementing the barrier()
function.
v2:
* Changes suggested by mattst88
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Matt Turner <mattst88@gmail.com>
These fields will be used when emitting a send for the barrier function.
Reference: IVB PRM Volume 4, Part 2, Section 1.1.1 Message Descriptor
v2:
* notify => gateway_notify (Ken)
* define bits for gen4-gen6 (bwidawsk, Ken)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This prevents an assertion from being hit with SIMD16:
Assertion `inst->exec_size == dispatch_width() || force_writemask_all' failed.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Silence the warnings about the future incompatibility with automake 2.0
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Rather than forcing everyone to provide their own definition of the symbol
provide a common (dummy) one.
This helps us resolve the build of the standalone pipe-drivers (amongst
others), which are missing the symbol.
Cc: Rob Clark <robclark@freedesktop.org>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Drop the stub/unused function haiku_create_surface() and add some basic implementation for destroy_surface()
Cc: Alexander von Gluck IV <kallisti5@unixzen.com>
Acked-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
It serves little to no purpose. As the driver gets updated, one can
look at the existing implementation (dri2) for reference rather than
letting the commented functions bitrot.
Cc: Alexander von Gluck IV <kallisti5@unixzen.com>
Acked-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Earlier commit folded the two separate variables into one, but forgot to
update the haiku driver.
Fixes: 0e4b564ef28(egl: combine VersionMajor and VersionMinor into one
variable)
Cc: Marek Olšák <marek.olsak@amd.com>>
Cc: Alexander von Gluck IV <kallisti5@unixzen.com>
Acked-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Instead use fs_builder::null_reg_f() which has the correct register
width. Avoids the assertion failure in fs_builder::emit() hit by the
"ES3-CTS.shaders.loops.for_dynamic_iterations.unconditional_break_fragment"
GLES3 conformance test introduced by 4af4cfba9e.
Reported-and-reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This reverts commit f3b709c0ac.
The "dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_4.
interpolation.lines_wide" test appears to be broken on Cherryview when
we expose line widths greater than 12.0. I'm not sure why.
For now, just go back to the limits we used on older platforms.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90902
Acked-by: Matt Turner <mattst88@gmail.com>
This makes the SSA definitions use sequential numbers (0, 1, 2, ...)
instead of seemingly random ones. There's not much point normally,
but it makes debug output much easier to read.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
The other PIPE_CAPF_ and PIPE_SHADER_CAP_ enums don't have explicit values.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
It should only be used as an initializer expression.
Trivial, and fixes Windows builds.
Nevertheless, overwriting an once_flag like this seems dangerous and
should be revised.
The same fix Marius implemented for gen6 (commit a9b04d8a) and
gen7 (commit 24ecf37a).
Also, we need the same code to handle special cases of line width
in gen6, gen7 and now gen8, so put that in the helper function
we use to compute the line width.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Without this patch, the following constructs (not an extensive list)
would crash mesa:
- mat2 foo = mat2(1); vec4 bar = vec4(foo);
- mat3 foo = mat3(1); vec4 bar = vec4(foo);
- mat3 foo = mat3(1); ivec4 bar = ivec4(foo);
The first case is explicitely allowed by the GLSL spec, as seen on
page 101 of the GLSL 4.40 spec:
"vec4(mat2) // the vec4 is column 0 followed by column 1"
The other cases are implicitely allowed also.
The actual changes are quite minimal. We first split each column of
the matrix to a list of vectors and then use them to initialize the
vector. An additional check to make sure that we are not trying to
copy 0 elements of a vector fix the (i)vec4(mat3) case as the last
vector (3rd column) is not needed at all.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
If _mesa_hash_table_create failed we'd get null pointer. Report
error and go away.
Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
In commit fe74fee8fa we rounded the line width to the nearest integer to
match the GLES3 spec requirements stated in section 13.4.2.1, but that seems
to break a dEQP test that renders wide lines in some multisampling scenarios.
Ian noted that the Open 4.4 spec has the following similar text:
"The actual width of non-antialiased lines is determined by rounding the
supplied width to the nearest integer, then clamping it to the
implementation-dependent maximum non-antialiased line width."
and suggested that when ES removed antialiased lines, they removed
"non-antialised" from that paragraph but probably should not have.
Going by that note, this patch restricts the quantization implemented in
fe74fee8fa only to regular aliased lines. This seems to keep the
tests fixed with that commit passing while fixing the broken test.
v2:
- Drop one of the clamps (Ken, Marius)
- Add a rule to prevent advertising line widths that when rounded go beyond
the limits allowed by the hardware (Ken)
- Update comments in the code accordingly (Ian)
- Put the code in a utility function (Ian)
Fixes:
dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_max.primitives.lines_wide
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90749
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Broadwell's stencil blitting code attempts to bind a renderbuffer as a
texture, using dd->BindRenderbufferTexImage().
This calls _mesa_init_teximage_fields(), which then attempts to set
img->_BaseFormat = _mesa_base_tex_format(ctx, internalFormat), which
assert fails if internalFormat is GL_STENCIL_INDEX8 but
ARB_texture_stencil8 is unsupported.
To work around this, just pretend to support the extension momentarily,
during the blit. Meta has already munged a variety of other things in
the context (including the API!), so it's not that much worse than what
we're already doing.
Fixes regressions since commit f7aad9da20
(mesa/teximage: use correct extension for accept stencil texture.).
v2: Add an XXX comment explaining the situation (requested by Jason
Ekstrand and Martin Peres), and an assert that we don't support
the extension so we remember to remove this hack (requested by
Neil Roberts).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Just implement it in terms of util_resource_copy_region(). Both the
original code and util_resource_copy_region() boil down to mapping,
calling util_copy_box() and unmapping.
No piglit regressions. This will also help to implement GL_ARB_copy_image.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Mesa supports EXT_texture_rg and OES_texture_float. This patch adds
support for using unsized enums GL_RED and GL_RG for floating point
targets and writes proper checks for internalformat when format is
GL_RED or GL_RG and type is of GL_FLOAT or GL_HALF_FLOAT.
Later, internalformat will get adjusted by adjust_for_oes_float_texture
after these checks.
v2: simplify to check vs supported enums
v3: follow the style and break out if internalFormat ok (Kenneth)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90748
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
we don't check the validity of pscreen until dri_init_screen_helper
hit this trying to init glamor on a device with no driver (udl).
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
When we import a dma-buf fd from another driver the kernel
gives us the right info, and this trashes it.
Convert the kernel bo flags into the domain flags.
This helps getting reverse prime and glamor working.
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
On Lollipop, apparently stlport is gone and libcxx must be used instead.
We still support stlport when building on earlier android releases.
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Eric Anholt <eric@anholt.net>
Based on the nice work of Paulo Sergio Travaglia <pstglia@gmail.com>.
The main modifications are:
- Include paths for LLVM header files and shared/static libraries
- Set C++ flag "c++11" to avoid compiling errors on LLVM header files
- Set defines for LLVM
- Add GALLIVM source files
- Changes path of libelf library for lollipop
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Acked-by: Eric Anholt <eric@anholt.net>
Use the pre-defined macro es-gen to generate new added files
instead of writing new rules manually. The handmade rules
that may generate the files before the directory is created
result in such an error:
/bin/bash: out/target/product/x86/gen/STATIC_LIBRARIES/libmesa_st_mesa_intermediates/main/format_pack.c: No such file or directory
make: *** [out/target/product/x86/gen/STATIC_LIBRARIES/libmesa_st_mesa_intermediates/main/format_pack.c] Error 1
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Eric Anholt <eric@anholt.net>
This avoids needing hardlinks between all of the DRI driver .so names,
since we're the only loader on the system.
v2: Add early exit on success (like previous block) and log message on
failure.
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Eric Anholt <eric@anholt.net>
The Android gallium build used to use gallium_egl, which was removed back
in March. Instead, we will now use a normal Mesa libEGL loader with
dlopen()ing of a DRI module.
v2: add a clean step to rebuild all dri modules properly.
v3: Squish the 2 patches doing this together (change by anholt).
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Eric Anholt <eric@anholt.net>
This single .so includes all of the enabled gallium drivers.
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Eric Anholt <eric@anholt.net>
The include paths of libmesa_dri_common are also used by modules
that need libmesa_dri_common.
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Eric Anholt <eric@anholt.net>
Previously the number needed to be divided by 4 to get the proper results. Now
the hardware does the right thing. Through experimentation it seems Braswell
(CHV) does also need the division by 4.
Fixes piglit test:
arb_pipeline_statistics_query-frag
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Previously, we just put the message for the EOT send as high in the file as
it would go. This is because the register pre-filling hardware will stop
all over the early registers in the file in preparation for the next thread
while you're still sending the last message. However, if something happens
to spill, then the MRF hack interferes with the EOT send message and, if
things aren't scheduled nicely, will stomp on it.
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90520
Reviewed-by: Neil Roberts <neil@linux.intel.com>
The explicit call to fs_builder::group() in emit_single_fb_write() is
required by the builder (otherwise the assertion in fs_builder::emit()
would fail) because the subsequent LOAD_PAYLOAD and FB_WRITE
instructions are in some cases emitted with a non-native execution
width. The previous code would always use the channel enables for the
first quarter, which is dubious but probably worked in practice
because FB writes are never emitted inside non-uniform control flow
and we don't pass the kill-pixel mask via predication in the cases
where we have to fall-back to SIMD8 writes.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Yes, it's incorrect to use the 0-th channel enable group
unconditionally without considering the execution and regioning
controls of the instruction that uses the spilled value, but it
matches the previous behaviour exactly, the builder just makes the
preexisting problem more obvious because emitting an instruction of
non-native SIMD width without having called .group() or .exec_all()
explicitly would have led to an assertion failure.
I'll fix the problem in a follow-up series, as the solution is going
to be non-trivial.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This simplifies opt_peephole_sel() slightly by emitting the SEL
instructions immediately after they are created, what makes the
sel_inst and mov_imm_inst arrays unnecessary and will make it possible
to get rid of the explicit inserts when the pass is migrated to the IR
builder.
Reviewed-by: Matt Turner <mattst88@gmail.com>
LOAD_PAYLOAD instructions need the same treatment as any other
generator instructions, at least FB writes and typed surface messages
will need a payload built with non-zero execution controls.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Most of these fields affect the behaviour of the instruction, but
apparently we currently don't CSE the kind of instructions for which
these fields could make a difference in the VEC4 back-end. That's
likely to change soon though when we start using send-from-GRF for
texture sampling and surface access messages.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Most of these fields affect the behaviour of the instruction so it
could actually break the program if we CSE a pair of otherwise
matching instructions with different values of these fields.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The purpose of this change is threefold: First, it improves the
modularity of the compiler back-end by separating the functionality
required to construct an i965 IR program from the rest of the visitor
god-object, what in turn will reduce the coupling between other
components and the visitor allowing a more modular design. This patch
doesn't yet remove the equivalent functionality from the visitor
classes, as it involves major back-end surgery.
Second, it improves consistency between the scalar and vector
back-ends. The FS and VEC4 builders can both be used to generate
scalar code with a compatible interface or they can be used to
generate natural vector width code -- 1 or 4 components respectively.
Third, the approach to IR construction is somewhat different to what
the visitor classes currently do. All parameters affecting code
generation (execution size, half control, point in the program where
new instructions are inserted, etc.) are encapsulated in a stand-alone
object rather than being quasi-global state (yes, anything defined in
one of the visitor classes is effectively global due to the tight
coupling with virtually everything else in the compiler back-end).
This object is lightweight and can be copied, mutated and passed
around, making helper IR-building functions more flexible because they
can now simply take a builder object as argument and will inherit its
IR generation properties in exactly the same way that a discrete
instruction would from the same builder object.
The emit_typed_write() function from my image-load-store branch is an
example that illustrates the usefulness of the latter point: Due to
hardware limitations the function may have to split the untyped
surface message in 8-wide chunks. That means that the several
functions called to help with the construction of the message payload
are themselves required to set the execution width and half control
correctly on the instructions they emit, and to allocate all registers
with half the default width. With the previous approach this would
require the used helper functions to be aware of the parameters that
might differ from the default state and explicitly set the instruction
bits accordingly. With the new approach they would get a modified
builder object as argument that would influence all instructions
emitted by the helper function as if it were the default state.
Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD()
method. It doesn't actually emit any instructions, they are simply
created and inserted into an exec_list which is returned for the
caller to emit at some location of the program. This sort of two-step
emission becomes unnecessary with the builder interface because the
insertion point is one more of the code generation parameters which
are part of the builder object. The caller can simply pass
VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the
location of the program where the effect of the constant load is
desired. This two-step emission (which pervades the compiler back-end
and is in most cases redundant) goes away: E.g. ADD() now actually
adds two registers rather than just creating an ADD instruction in
memory, emit(ADD()) is no longer necessary.
v2: Drop scalarizing VEC4 builder.
v3: Take a backend_shader as constructor argument. Improve handling
of debug annotations and execution control flags.
v4: Drop Gen6 IF with inline comparison. Rename "instr" variable.
Initialize cursor to NULL by default and add method to explicitly
point the builder at the end of the program.
Reviewed-by: Matt Turner <mattst88@gmail.com>
simple_list.h defines a number of macros with short non-namespaced
names that can easily collide with other declarations (first_elem,
last_elem, next_elem, prev_elem, at_end), and according to the comment
it was only being included because of struct simple_node, which is no
longer used in this file.
Reviewed-by: Brian Paul <brianp@vmware.com>
v3: Use ffs() and a switch loop in
tr_mode_horizontal_texture_alignment() (Ben)
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
v3: Use ffs() and a switch loop in
tr_mode_vertical_texture_alignment() (Ben)
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
and change the name to brw_miptree_choose_tiling().
V3: Remove redundant function parameters. (Topi)
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Since the introduction of
commit 536003c11e
Author: Boyan Ding <boyan.j.ding@gmail.com>
Date: Wed Mar 25 19:36:54 2015 +0800
i965: Add XRGB8888 format to intel_screen_make_configs
winsys buffers no longer have an alpha channel. This causes
_mesa_format_matches_format_and_type() to reject previously working BGRA
uploads from using the BLT fast path. Instead of using the generic
routine for matching formats exactly, export the slightly more relaxed
check from intel_miptree_blit() which importantly allows the blitter
routine to apply a small number of format conversions.
References: https://bugs.freedesktop.org/show_bug.cgi?id=90839
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Alexander Monakov <amonakov@gmail.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
The blitter already has code to accommodate filling in the alpha channel
for BGRX destination formats, so expand this to also allow filling the
alpha channgel in RGBX formats.
More importantly for the next patch is moving the test into its own
function for the purpose of exporting the check to the callers.
v2: Fix alpha expansion as spotted by Alexander with the fix suggested by
Kenneth
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Alexander Monakov <amonakov@gmail.com>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
The BLT pitch is specified in bytes for linear surfaces and in dwords
for tiled surfaces. In both cases the programmable limit is 32,767, so
adjust the check to compensate for the effect of tiling.
v2: Tweak whitespace for functions (Kenneth)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kristian Høgsberg <krh@bitplanet.net>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Based on the corresponding SI support. Same as that, this is currently
only enabled for one-dimensional buffer copies due to issues with
multi-dimensional SDMA copies.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
In the ARB_fragment_program specification, the result.depth output
variable is treated as a vec4, where the fragment depth is stored in the
.z component, and the other three components are undefined.
This is different than GLSL, which uses a scalar value (gl_FragDepth).
To make this consistent for driver backends, this patch makes
prog_to_nir use a scalar output variable for FRAG_RESULT_DEPTH,
moving result.depth.z into the first component.
Fixes Glean's fragProg1 "Z-write test" subtest.
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90000
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Previously we were leaving this at the default of 64K, which meets the
spec but is too small for some real uses. The hardware can handle up to
128M.
User was complaining about this on freenode ##OpenGL today.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In an ideal world I would just implement this instead of adding the perf debug.
There are some errata involved which lead me to believe it won't be so simple as
flipping a few bits.
There is room to add a thing for Gen9s flexibility, but since I am actively
working on that I have opted to ignore it.
Example:
Multi-LOD fast clear - giving up (256x128x8).
v2: Use braces for if statements because they are multiple lines (Ken)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When we cannot do the optimized fast clear it's important to know the buffer
size since a small buffer will have much less performance impact.
A follow-on patch could restrict printing the message to only certain sizes.
Example:
Failed to fast clear 1400x1056 depth because of scissors. Possible 5% performance win if avoided.
Recommended-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are just wrappers around the existing extension functions.
v2: return BAD_ALLOC if _eglConvertAttribsToInt fails
Reviewed-by: Chad Versace <chad.versace@intel.com>
v2: - don't modify "value" in eglGetSyncAttribKHR after an error
- rename _egl_api::GetSyncAttribKHR -> GetSyncAttrib
- rename GetSyncAttribKHR_t -> GetSyncAttrib_t
- rename _eglGetSyncAttribKHR to _eglGetSyncAttrib
Reviewed-by: Chad Versace <chad.versace@intel.com>
v2: include mesa and chromium extensions in eglext.h so as not to break
existing users
v3: keep PFNEGLSWAPBUFFERSREGIONNOK because piglit uses it
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
There's no reason to use our own definition.
Tessellation will use the NV definitions too.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This is for user assembly shaders only (not GLSL). We won't support those.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
These states are for GS assembly shaders only. We don't support those.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Softpipe, llvmpipe, r300g, and radeonsi pass tests. Other drivers need testing.
Freedreno and nv30 are definitely broken. Other drivers seem to be alright.
Patch fixes special cases with gl_VertexID and sets all builtin
variables locations as '-1' as specified by the extension spec.
Fixes ES 3.1 conformance test failure:
ES31-CTS.program_interface_query.input-built-in
v2: comments + use is_gl_identifier() (Martin)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Martin Peres <martin.peres@linux.intel.com>
Flagged by Oracle's parfait static analyzer:
Error: Format string argument mismatch (CWE 628)
In call to printf with format string "usage: %s [options] <file.vert | file.geom | file.frag>\n\nPossible options are:\n"
Too many arguments for format string (got more than 1 arguments)
at line 285 of src/glsl/main.cpp in function 'usage_fail'.
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This probably got broken when the samplers were converted to be indexed
by shader type.
Seen when looking at bug 89819 though I'm not sure if that really was what
the bug was about...
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
I just botched this when writing the original code.
From the ARB_vertex_program specification:
"The RSQ instruction approximates the reciprocal of the square root of
the absolute value of the scalar operand and replicates it to all four
components of the result vector."
Fixes a Glean vertProg1 subtest:
RSQ test 2 (reciprocal square root of negative value)
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90547
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This change introduces a new field in gl_uniform_storage to
explicitely say that a uniform is built-in. In the case where it is,
no storage is defined to make it clear that it is read-only from the
mesa side. I fixed all the places in the code that made use of the
structure that I changed. Any place making a wrong assumption and using
the storage straight away will just crash.
This patch seems to implement the path of least resistance towards
listing built-in uniforms in GL_ACTIVE_UNIFORM (and other APIs).
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Pretty trivial, fixes the issue that we're expected to be able to blit
stencil surfaces (as the blit just relies on util blitter code which needs
stencil export to do it).
2 piglits skip->pass, 11 fail->pass
v2: prettify, keep different stencil ref value handling out of depth/stencil
test itself.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Some hardware reads only the low 16-bits even if the type is UD, but
other hardware like Cherryview can't handle this.
Fixes spec@arb_gpu_shader5@execution@sampler_array_indexing@fs-simple on
Cherryview.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90830
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
It was 2 bits to accommodate SATURATE_PLUS_MINUS_ONE (removed by commit
09b566e1). A similar change was made to TGSI recently in commit
e1c4e8aa.
Reducing the size from 2 bits to 1 reduces the size of the bit fields
from 17 bits to 16, which is a much nicer number.
Reviewed-by: Brian Paul <brianp@vmware.com>
If the incoming func matches the current state it must be a legal
value so we can do this before the switch statement.
Signed-off-by: Brian Paul <brianp@vmware.com>
If the glPushAttrib() mask value was zero we didn't actually push
anything onto the attribute stack. A subsequent glPopAttrib() call
would generate a GL_STACK_UNDERFLOW error. Now push a dummy attribute
in that case to prevent the error.
Mesa now matches nvidia's behavior.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
lower_phis_to_scalar() pass recurses the instruction dependence graph to
determine if all the sources of a given instruction are scalarizable.
To prevent cycles, it temporary marks the phi instruction before recursing in,
then updates the entry with the resulting value. However, it does not consider
that the entry value may have changed after a recursion pass, hence causing
a use-after-free situation and a crash.
This patch fixes this by reloading the entry corresponding to the 'phi'
after recursing and before updating its value.
The crash can be reproduced ~20% of times with the dEQP test:
dEQP-GLES3.functional.shaders.loops.while_constant_iterations.nested_sequence_fragment
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Now that Jason's LOAD_PAYLOAD improvements have landed, we don't need
this. Passing 1 for the number of header registers already takes care
of setting force_writemask_all on the header copy.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
See the corresponding code in brw_vs_surface_state.c.
v2: const more things (requested by Topi Pohjolainen)
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
We used to store the GS dispatch mode in brw_gs_prog_data while
separately storing the VS dispatch mode in brw_vue_prog_data::simd8.
This patch introduces an enum to represent all possible dispatch modes,
and stores it in brw_vue_prog_data::dispatch_mode, unifying the two.
Based on a suggestion by Matt Turner.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
The documentation makes it pretty clear that we shouldn't use this:
"Under normal conditions SW shall specify DMask, as the GS stage
will provide a Dispatch Mask appropriate to SIMD4x2 or SIMD8 thread
execution (as a function of dispatch mode). E.g., for SIMD4x2
execution, the GS stage will generate a Dispatch Mask that is equal
to what the EU would use as the Vector Mask. For SIMD8 execution
there is no known usage model for use of Vector Mask (as there is
for PS shaders)."
I also managed to find descriptions of DMask and VMask, in the "State
Register" (sr0.2/3) field descriptions:
"Dispatch Mask (DMask). This 32-bit field specifies which channels
are active at Dispatch time."
"Vector Mask (VMask). This 32-bit field contains, for each 4-bit
group, the OR of the corresponding 4-bit group in the dispatch
mask."
SIMD4x2 shaders process one or two vec4 values, with each 4-bit group
corresponding to xyzw channel enables (either all on, or all off).
Thus, DMask = VMask in SIMD4x2 mode. But in SIMD8 mode, 4-bit groups
are meaningless, so it just messes up your values.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
It's incompletete -- it wasn't filling ReferenceType so it was causing
garbagge on the disassembly. Furthermore it seems impossible to get the
jump information through this interface.
The solution for function size problem is to effectively book-keep the
machine code start and end address while JIT'ing.
When calculating the binding table index for non-constant sampler
array indexing it needs to add the base binding table index which is a
constant within the generated code. Often this base is zero so we can
avoid a redundant instruction in that case.
It looks like nothing in shader-db is doing non-constant sampler array
indexing so this patch doesn't make any difference but it might be
worth having anyway.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Previously when generating the send instruction for a sample
instruction with an indirect sampler it would use the destination
register as a temporary store. This breaks when used in combination
with the opt_sampler_eot optimisation because that forces the
destination to be null. This patch fixes that by avoiding the temp
register altogether.
The reason the temporary register was needed was because it was trying
to ensure the binding table index doesn't overflow a byte by and'ing
it with 0xff. The result is then or'd with samper_index<<8. This patch
instead just and's the whole thing by 0xfff. This will ensure that a
bogus sampler index won't overflow into the rest of the message
descriptor but unlike the previous code it won't ensure that the
binding table index doesn't overflow into the sampler index. It
doesn't seem like that should matter very much though because if the
shader is generating a bogus sampler index then it's going to just get
garbage out either way.
Instead of doing sampler_index<<8|(sampler_index+base_table_index) the
new code avoids one operation by doing
sampler_index*0x101+base_table_index which should be equivalent.
However if we wanted to avoid the multiply for some reason we could do
this by adding an extra or instruction still without needing the
temporary register.
This fixes a number of Piglit tests on Skylake that were using
indirect samplers such as:
spec@arb_gpu_shader5@execution@sampler_array_indexing@fs-simple
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
We were returning the most recently freed BO, without checking if it
was idle yet. This meant that we generally stalled immediately on the
previous frame when generating a new one. Instead, allocate new BOs
when the *oldest* BO is still busy, so that the cache scales with how
much is needed to keep some frames outstanding, as originally
intended.
Note that if you don't have some throttling happening, this means that
you can accidentally run the system out of memory. The kernel is now
applying some throttling on all execs, to hopefully avoid this.
AFAICT, there is no real way to make sure a send message with EOT is properly
ignored from compact, nor can I see a way to actually encode EOT while
compacting. Before the single send optimization we'd always bail because we hit
the is_immediate && !is_compactable_immediate case. However, with single send,
is_immediate is not true, and so we end up trying to compact the un-compactible.
Without this, any compacting single send instruction will hang because the EOT
isn't there. I am not sure how I didn't hit this when I originally enabled the
optimization. I didn't check if some surrounding code changed.
I know Neil and Matt were both looking into this. I did a quick search and
didn't see any patches out there to handle this. Please ignore if this has
already been sent by someone. (Direct me to it and I will review it).
Reported-by: Neil Roberts <neil@linux.intel.com>
Reported-by: Mark Janes <mark.a.janes@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Pure integer formats cannot be sampled with linear tex / mip filters. In GL
such a setup would make the texture incomplete.
We shouldn't rely on the state tracker though to filter that out, just return
all zeros instead of dying in the lerp.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
gallivm now depends on it. And depending on particular LLVM version /
configure options, the build can fail without this change due to
undefined reference to `LLVM*Disasm*' symbols.
Trivial.
It doesn't do everything we want. In particular it doesn't allow to
detect jumps or return opcodes. Currently we detect the x86's RET
opcode.
Even though it's worse for LLVM 3.3, it's an improvement for LLVM 3.7,
which was totally busted.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Roland pointed out my previous attempt was lacking, so I enhanced the
texwrap piglit test, and tested them. This fixes the offset calculations
in a number of areas by adding the offset first, it also fixes the fastpaths,
which I forgot to address in the previous commit.
v2: try and avoid divides in most paths, the repeat mirror path
really was ugly no matter which way I went, so I left it having
the divide.
Also fix the gather lod calculation bug.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Whether or not to use NIR is now equivalent to brw->scalar_vs. We can
simplify the logic and make it far less confusing.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that everything is running through NIR, this is all dead.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that everything is running through NIR, this is all dead.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is using multiple inheritance in C++. However, ir_visitor is really
just an interface with no data so it shouldn't be so bad.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The backend_shader class really is a representation of a shader. The fact
that it inherits from ir_visitor is somewhat immaterial.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Currently on the functions that are exclusive to core-profile are
implemented. The remainder continue to live in the XML. Additional
functions can be moved later.
The functions for GL_ARB_draw_indirect and GL_ARB_multi_draw_indirect
are put in the dispatch table inside the VBO module, so they do not need
to be moved over.
The diff of src/mesa/main/api_exec.c before and after this patch is as
expected. All of the functions listed in apiexec.py moved out of a 'if
(_mesa_is_desktop(ctx))' block into a new 'if (ctx->API ==
API_OPENGL_CORE)' block.
v2: Remove stray shebang line in apiexec.py. Suggested by Ilia.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Dylan Baker <baker.dylan.c@gmail.com>
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
The string is only applied when the context is API_OPENGLES2.
The bulk of the change is to prevent overriding the context to
API_OPENGL_CORE based on the requested version. If the context is
API_OPENGL_ES2, don't change it.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Remove _mesa_get_gl_version_override. We don't need two functions that
do basically the same thing. This change seemed easier (esp. with the
next patch) than going the other way.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This is a bit of a hack for now. Several of the extensions required for
OpenGL ES 3.1 have no support, at all, in Mesa. However, with this
patch and a patch to allow MESA_GL_VERSION_OVERRIDE to work with ES
contexts, people can begin testing the ES "version" of the functionality
that is supported.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
A couple functions are missing because there are no implementations of
them yet. These are:
glFramebufferParameteri (from GL_ARB_framebuffer_no_attachments)
glGetFramebufferParameteriv (from GL_ARB_framebuffer_no_attachments)
glMemoryBarrierByRegion
v2: Rebase on updated dispatch_sanity.cpp test.
v3: Add support for glDraw{Arrays,Elements}Indirect in vbo_exec_array.c.
The updated dispatch_sanity.cpp test discovered this omission.
v4: Rebase on glapi changes.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
If the multiplication's result is unused, except by a conditional_mod,
the destination will be null. Since the final instruction in the lowered
sequence is a partial-write, we can't put the conditional mod on it and
we have to store the full result to a register and do a MOV with a
conditional mod.
Cc: "10.6" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90580
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
When we compute the output swizzle we want to consider the number of
components in the add operation. So far we were using the writemask
of the multiplication for this instead, which is not correct.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Temporarily undefine DEBUG macro while including LLVM C++ headers,
leveraging the push/pop_macro pragmas, which are supported both by GCC
and MSVC.
https://bugs.freedesktop.org/show_bug.cgi?id=90621
Trivial.
The idea I had when I wrote the original shadow code was that you'd see a
set_index_buffer to the IB, then a bunch of draws out of it. What's
actually happening in openarena is that set_index_buffer occurs at every
draw, so we end up making a new shadow BO every time, and converting more
of the BO than is actually used in the draw.
While I could maybe come up with a better caching scheme, for now just
do the simple thing that doesn't result in a new shadow IB allocation
per draw.
Improves performance of isosurf in drawelements mode by 58.7967% +/-
3.86152% (n=8).
We'd sometimes try to reallocate something that X was using as a new
pipe_resource, and potentially conflict in our rendering. But even
worse, if we reallocated the BO as a shader, the kernel would reject
rendering using the shader.
Not sure what happened in my testing that made the previous shadow
code fix glxgears swapbuffering, but this also fixes lots of CopyArea
in X (like dragging xlogo around in metacity).
I forgot to make the change in 96f164f6f0.
This fixes a warning with GCC and probably an error with Clang.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Starting with GEN8, there is documentation that the multisample state command
must be emitted before the 3DSTATE_WM_HZ_OP command any time the multisample
count changes. The 3DSTATE_WM_HZ_OP packet gets emitted as a result of a
intel_hix_exec(), which is called upon a fast clear and/or a resolve. This can
happen before the state atoms are checked, and so the multisample state must be
put directly in the function.
v1:
- In v0, I was always emitting the command, but Ken came up with the condition to
determine whether or not the sample count actually changed.
- Ken's recommendation was to set brw->num_multisamples after emitting
3DSTATE_MULTISAMPLE. This doesn't work. I put my best guess as to why in the XXX
(it was causing 7 regressions on BDW).
v2:
Flag NEW_MULTISAMPLE state. As Ken found, in state upload we check for the
multisample change to determine whether or not to emit certain packets. Since
the hiz code doesn't actually care about the number of multisamples, set the
flag and let the later code take care of it.
Jenkins results:
http://otc-mesa-ci.jf.intel.com/view/dev/job/bwidawsk/136/
Fixes around 200 piglit tests on SKL. I'm somewhat surprised that it seems to
have no impact on BDW as the restriction is needed there as well.
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Neil Roberts <neil@linux.intel.com> (v0)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v2)
TargetOptions::NoFramePointerElim was removed in llvm-3.7.0svn r238244
"Remove NoFramePointerElim and NoFramePointerElimOverride from
TargetOptions and remove ExecutionEngine's dependence on CodeGen. NFC."
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
This workaround is documented in the 3DSTATE_GS documentation. It
appears to only apply to early steppings of Broadwell and Skylake.
I don't think it ever affected production hardware, so at this point it
probably makes sense to delete it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Commit 4bdbb588a9 introduced new _glapi_new_nop_table() and
_glapi_set_nop_handler() functions in the glapi dispatcher (which
live in libGL.so). The calls to those functions from context.c
would be undefined (i.e. an ABI break) if the libGL used at runtime
was older.
For the time being, use the old single generic_nop() function for
non-Windows builds to avoid this problem. At some point in the future
it should be safe to remove this work-around. See comments for more
details.
v2: Incorporate feedback from Emil. Use _WIN32 instead of
GLX_DIRECT_RENDERING to control behavior, move comments.
Cc: 10.6 <mesa-stable@lists.freedesktop.org>
Reviewed-and-tested-by: Ian Romanick <ian.d.romanick@intel.com>
This hasn't been updated in a long time and from recent discussion on
the mailing list, it's not always clear what's expected. Hopefully,
this will help a bit.
v2: document function brace placement, per Thomas Helland.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
In case the glproto.h file isn't up to date, we provide the #define
for X_GLXCreateContextAttribsARB.
v2: fix other occurances, improve #ifndef test, per Jose.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
argparse type is a nice type saver for simple data types, but it doesn't
look a good fit for the input XML file:
- Certain implementations of argparse (particularly python 2.7.3's)
invoke the type constructor for the default argument even when an
option is passed in the command line. Causing `No such file or
directory: 'gl_API.xml'` when the current dir is not
src/mapi/glapi/gen.
- The parser takes multiple arguments. This is currently worked around
using lambdas, but that unnecessarily complex and hard to read.
Furthermore it's odd to have a side-effect as heavy as parsing XML
happening deep inside the argument parsing.
https://bugs.freedesktop.org/show_bug.cgi?id=90600
Reviewed-by: Brian Paul <brianp@vmware.com>
Without it, texcoords are mapped to GENERIC[0..7], PointCoord is mapped to
GENERIC[8], and user-defined varyings start from GENERIC[9]. Since texcoords
can only be used between VS and PS, and PointCoord is PS-only, it's silly to
always start from GENERIC[9] in all other shaders (such as LS, HS, ES, GS).
This adds support for TEXCOORD and PCOORD semantics. As a result, st/mesa
will use GENERIC[0] as a base for user-defined varyings, which should make
linking ES and GS as well as tessellation shaders at runtime easier.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
When using SIMD4x2 on Skylake, the sampler instructions need a message
header to select the correct mode. This was added for most sample
instructions in 0ac4c2727 but the TXF_MCS instruction is emitted
separately and it was missed.
This fixes a bunch of Piglit tests which test texelFetch in a geometry
shader, for example:
spec/arb_texture_multisample/texelfetch/2-gs-sampler2dms
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The problem is that the EDGEFLAG has to be toggled at vertex submission
time. This can be done from either the draw or the regular paths. Avoid
falling back to draw just because there's an edgeflag.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Commit 8acaf862df switched things over to use TEXCOORD instead of
GENERIC, but did not update the nv30 swtnl draw paths. This teaches the
draw logic about TEXCOORD.
Among other things, this fixes a crash in demos/arbocclude when using
swtnl. Curiously enough, the point-sprite piglit works without this.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
These are only used once per draw, so it makes sense to keep them in
GART. Also take this opportunity to modernize the buffer mapping API
usage.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Apparently some compilers think we probably wanted to do !(x == y) instead
and issue a warning, so just shut it up... No functional change, obviously.
Cc: <mesa-stable@lists.freedesktop.org>
This fixes glxgears with NV30_SWTNL=1 forced on. Probably fixes a bunch
of other situations where we fall back to the swtnl path.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
nv30_validate_clip depends on the rasterizer state. Also we should
upload all the new clip planes on change since next time the plane data
won't have changed, but the enables might.
This fixes fixed-clip-enables and vs-clip-vertex-enables shader tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Clearing can happen at a time when various state objects are incoherent
and not ready for a draw. Some of the validation functions don't handle
this well, so only flush the framebuffer state. This has the advantage
of also not doing extra work.
This works around some crashes that can happen when clearing.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
There can be scenarios where the "indirect" arg of a PFETCH becomes
known, and so the code will attempt to propagate it. Use this
opportunity to just fold it into the first argument, and prevent the
load propagation pass from touching PFETCH further.
This fixes gs-input-array-vec4-index-rd.shader_test and
vs-output-array-vec4-index-wr-before-gs.shader_test on nvc0 at least.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
According to spec, CL_MEM_USE_HOST_PTR should directly use host memory,
if possible. This is just what userptr is for, so use it.
In case the memory cannot be mapped, a fallback similar to
CL_MEM_COPY_HOST_PTR is used.
v2: constify, drop unneeded cast
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
This flag is typically used to request pinned host memory, to avoid
any copies between GPU and CPU.
This improves throughput with an older OpenCL app which I unfortunately
can't publish due to its licensing.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Previously, the prog_to_nir pass was directly generating uniform load/store
intrinsics. This converts it to use a single giant "parameters" variable
and we now depend on lowering to get the uniform load/store intrinsics.
One advantage of this is that we now have one code-path after we do the
initial conversion into NIR.
No shader-db changes.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
PIPE_QUERY_TIMESTAMP_DISJOINT could not work because q->ready was always
set to FALSE. To fix this issue, add more different states for queries
according to nvc0.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
We forgot to convert to VFETCH in case of indirect access. Fix that.
This avoids crashes on the new gs-input-array-vec4-index-rd and
vs-output-array-vec4-index-wr-before-gs but they still fail.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
When we get something like IN[ADDR[0].x+5], we will now guess that we
should look at IN[5] for the "base" information.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
This covers the pattern where a KILL_IF is used, which triggers a
comparison of -x to 0. This can usually be folded into the comparison whose
result is being compared to 0, however it may, itself, have already been
combined with another comparison. That shouldn't impact the logic of
this pass however. With this and the & 1.0 change, code like
00000020: 001c0001 80081df4 set b32 $r0 lt f32 $r0 0x3e800000
00000028: 001c0000 201fc000 and b32 $r0 $r0 0x3f800000
00000030: 7f9c001e dd885c00 set $p0 0x1 lt f32 neg $r0 0x0
00000038: 0000003c 19800000 $p0 discard
becomes
00000020: 001c001d b5881df4 set $p0 0x1 lt f32 $r0 0x3e800000
00000028: 0000003c 19800000 $p0 discard
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This is roughly equivalent to the original getopt, except that it
removes the '-h' short option, which argparse reserves for
auto-generated help messages. It does retain the long option specified
by the getopt version, and changes the makefile to use that.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Also drop -m switch, which only accepted a single value or raised an
error, and was unused in the makefile.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Also removes the redundant -m argument, which could only be set to
'generic', or it would raise an exception. This option wasn't used in
the makefile.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
This makes the tools shut up about a bunch of problems, making them more
useful for catching actual problems.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
This results in slightly less code, but code that is much more readable.
It has the advantage of putting everything together in one place, all of
the code is self documenting, help messages are auto-generated, choices
are automatically enforced, and the syntax is much less C like, taking
advantage of python features and idioms.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Making the tools shut up about worthless errors so you can see real ones
is very useful
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Some shaders in Civilization V and Beyond Earth do
pow(pow(x, 2.2), 0.454545)
which is converting to and from sRGB colorspace.
A more general rule that replaces pow(pow(a, b), c) with pow(a, b * c)
actually regresses two shaders in Sun Temple in which the result of the
inner pow is used twice, once by another pow and once by another
instruction. Also, since 2.2 * 0.454545 isn't exactly one, the more
general pattern would have still left us with a pow, and I'm 2.2 *
0.454545 percent sure that's not what they want.
instructions in affected programs: 934 -> 886 (-5.14%)
helped: 16
A sequence number is written for 32-bits queries to make sure they are
ready, but not for 64-bits queries. Instead, we have to use a fence in
order to fix the HUD because it doesn't wait until the result is ready.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
v2: make this also compatible with original released firmware
v3 (chk): switch to original idea of separate files for fw versions
Signed-off-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v2)
We now have is_array() and without_array() that make the
code much clearer and remove the need for this.
For all remaining calls to this we already knew that
the type was an array so returning a null wasn't adding any value.
v2: use without_array() in _mesa_ast_array_index_to_hir() and don't use
without_array() in lower_clip_distance_visitor() as we want to make sure the
array is 2D.
Reviewed-by: Matt Turner <mattst88@gmail.com>
get_immediate will return a const reference, the requested immediate
isn't necessarily in the x slot. Make sure to use the swizzle.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Passing -module to glibtool causes the resulting library to be called
libSomething.so rather than libSomething.dylib on darwin.
Regardless if libOSMesa is a library or a module, it has been used as
the former for quite some time. Update the build to reflect that and
resolve the naming issue.
Cc: "10.5 10.6" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
[Emil Velikov: Tweak the commit message.]
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Previously, we used intrinsic->const_index[1] to represent "the number of
array elements to load" for load/store intrinsics. However, this set to 1
by every pass that ever creates a load/store intrinsic. Also, while it
might make some sense for registers, it makes no sense whatsoever in SSA.
On top of that, the i965 backend was the only backend to ever support it;
freedreno and vc4 just assert that it's always 1. Let's just delete it.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
It's a remnant of some old NV extension. Unused.
I also have a patch that removes predicates if anyone is interested.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
From ARB_program_interface_query:
"Note that if an interface enumerates a single active resource list
entry for an array variable (e.g., "a[0]"), a <name> identifying
any array element other than the first (e.g., "a[1]") is not
considered to match."
It doesn't apply to arrays of interface blocks but just to array
variables.
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
This adds both ARB_texture_gather and the enhanced gather
for ARB_gpu_shader5.
This passes all the piglit tests, it relies on the GLSL
lowering pass to make textureGatherOffsets work.
v2: use inline to get gather component (Brian)
fix function name, add asserts (Brian)
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is a prep change for gather, and it makes more sense
to use an array in these cases.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds a new modifier interface for drivers to implement.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This was an oversight when GLSL1.30 was enabled, I think my
misunderstanding.
This fixes a bunch of tex-miplevel-selection tests under softpipe,
and is required for textureGather support.
I'm not sure this won't make sampling slowering, but its softpipe,
correctness first and all that.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This moves some of the image filter args into a struct,
and passes that instead, this is prep work for adding texture
gather support which needs new arguments.
review: make filter args const.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
GL_OES_geometry_shader not started (based on GL_ARB_geometry_shader4, which is done for all drivers)
GL_OES_gpu_shader5 not started (based on parts of GL_ARB_gpu_shader5, which is done for some drivers)
GL_OES_primitive_bounding box not started
GL_OES_sample_shading not started (based on parts of GL_ARB_sample_shading, which is done for some drivers)
GL_OES_sample_variables not started (based on parts of GL_ARB_sample_shading, which is done for some drivers)
GL_OES_shader_image_atomic not started (based on parts of GL_ARB_shader_image_load_store, which is done for some drivers)
GL_OES_shader_io_blocks not started (based on parts of GLSL 1.50, which is done)
GL_OES_shader_multisample_interpolation not started (based on parts of GL_ARB_gpu_shader5, which is done)
GL_OES_tessellation_shader not started (based on GL_ARB_tessellation_shader, which is done for some drivers)
GL_OES_texture_border_clamp not started (based on GL_ARB_texture_border_clamp, which is done)
GL_OES_texture_buffer not started (based on GL_ARB_texture_buffer_object, GL_ARB_texture_buffer_range, and GL_ARB_texture_buffer_object_rgb32 that are all done)
GL_OES_texture_cube_map_array not started (based on GL_ARB_texture_cube_map_array, which is done for all drivers)
GL_OES_texture_stencil8 not started (based on GL_ARB_texture_stencil8, which is done for some drivers)
GL_OES_texture_storage_multisample_2d_array DONE (all drivers that support GL_ARB_texture_multisample)
More info about these features and the work involved can be found at
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=86792">Bug 86792</a> - [NVC0] Portal 2 Crashes in Wine</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90147">Bug 90147</a> - swrast: build error undeclared _SC_PHYS_PAGES on osx</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90350">Bug 90350</a> - [G96] Portal's portal are incorrectly rendered</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90363">Bug 90363</a> - [nv50] HW state is not reset correctly when using a new GL context</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90310">Bug 90310</a> - Fails to build gallium_dri.so at linking stage with clang because of multiple redefinitions</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90347">Bug 90347</a> - [NVE0+] Failure to insert texbar under some circumstances (causing bad colors in Terasology)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90520">Bug 90520</a> - Register spilling clobbers registers used elsewhere in the shader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90839">Bug 90839</a> - [10.5.5/10.6 regression, bisected] PBO glDrawPixels no longer using blit fastpath</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90873">Bug 90873</a> - Kernel hang, TearFree On, Mate desktop environment</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91056">Bug 91056</a> - The Bard's Tale (2005, native) has rendering issues</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91117">Bug 91117</a> - Nimbus (running in wine) has rendering issues, objects are semi-transparent</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91124">Bug 91124</a> - Civilization V (in Wine) has rendering issues: text missing, menu bar corrupted</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90621">Bug 90621</a> - Mesa fail to build from git</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91526">Bug 91526</a> - World of Warcraft (on Wine) has UI corruption with nouveau</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91719">Bug 91719</a> - [SNB,HSW,BYT] dEQP regressions associated with using NIR for vertex shaders</li>
</ul>
<h2>Changes</h2>
<p>Alejandro Piñeiro (1):</p>
<ul>
<li>i965/vec4: fill src_reg type using the constructor type parameter</li>
</ul>
<p>Antia Puentes (1):</p>
<ul>
<li>i965/vec4: Fix saturation errors when coalescing registers</li>
</ul>
<p>Emil Velikov (2):</p>
<ul>
<li>docs: add sha256 checksums for 10.6.7</li>
<li>cherry-ignore: add commit non applicable for 10.6</li>
</ul>
<p>Hans de Goede (4):</p>
<ul>
<li>nv30: Fix creation of scanout buffers</li>
<li>nv30: Implement color resolve for msaa</li>
<li>nv30: Fix max width / height checks in nv30 sifm code</li>
<li>nv30: Disable msaa unless requested from the env by NV30_MAX_MSAA</li>
</ul>
<p>Ian Romanick (2):</p>
<ul>
<li>mesa: Pass the type to _mesa_uniform_matrix as a glsl_base_type</li>
<li>mesa: Don't allow wrong type setters for matrix uniforms</li>
</ul>
<p>Ilia Mirkin (5):</p>
<ul>
<li>st/mesa: don't fall back to 16F when 32F is requested</li>
<li>nvc0: always emit a full shader colormask</li>
<li>nvc0: remove BGRA4 format support</li>
<li>st/mesa: avoid integer overflows with buffers >= 512MB</li>
<li>nv50, nvc0: fix max texture buffer size to 128M elements</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=38109">Bug 38109</a> - i915 driver crashes if too few vertices are submitted (Mesa 7.10.2)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=55552">Bug 55552</a> - Compile errors with --enable-mangling</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=66346">Bug 66346</a> - shader_query.cpp:49: error: invalid conversion from 'void*' to 'GLuint'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=73512">Bug 73512</a> - [clover] mesa.icd. should contain full path</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=73528">Bug 73528</a> - Deferred lighting in Second Life causes system hiccups and screen flickering</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=74329">Bug 74329</a> - Please expose OES_texture_float and OES_texture_half_float on the ES3 context</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=80500">Bug 80500</a> - Flickering shadows in unreleased title trace</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=82186">Bug 82186</a> - [r600g] BARTS GPU lockup with minecraft shaders</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=84677">Bug 84677</a> - Triangle disappears with glPolygonMode GL_LINE</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=85252">Bug 85252</a> - Segfault in compiler while processing ternary operator with void arguments</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=89131">Bug 89131</a> - [Bisected] Graphical corruption in Weston, shows old framebuffer pieces</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90073">Bug 90073</a> - Leaks in xcb_dri3_open_reply_fds() and get_render_node_from_id_path_tag</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90249">Bug 90249</a> - Fails to build egl_dri2 on osx</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90310">Bug 90310</a> - Fails to build gallium_dri.so at linking stage with clang because of multiple redefinitions</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90347">Bug 90347</a> - [NVE0+] Failure to insert texbar under some circumstances (causing bad colors in Terasology)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90466">Bug 90466</a> - arm: linker error ndefined reference to `nir_metadata_preserve'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90520">Bug 90520</a> - Register spilling clobbers registers used elsewhere in the shader</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90797">Bug 90797</a> - [ALL bisected] Mesa change cause performance case manhattan fail.</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90817">Bug 90817</a> - swrast fails to load with certain remote X servers</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90830">Bug 90830</a> - [bsw bisected regression] GPU hang for spec.arb_gpu_shader5.execution.sampler_array_indexing.vs-nonzero-base</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90839">Bug 90839</a> - [10.5.5/10.6 regression, bisected] PBO glDrawPixels no longer using blit fastpath</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90873">Bug 90873</a> - Kernel hang, TearFree On, Mate desktop environment</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90887">Bug 90887</a> - PhiMovesPass in register allocator broken</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=90895">Bug 90895</a> - [IVB/HSW/BDW/BSW Bisected] GLB2.7 Egypt, GfxBench3.0 T-Rex & ALU and many SynMark cases performance reduced by 10-23%</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91047">Bug 91047</a> - [SNB Bisected] Messed up Fog in Super Smash Bros. Melee in Dolphin</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91056">Bug 91056</a> - The Bard's Tale (2005, native) has rendering issues</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91077">Bug 91077</a> - dri2_glx.c:1186: undefined reference to `loader_open_device'</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91117">Bug 91117</a> - Nimbus (running in wine) has rendering issues, objects are semi-transparent</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91124">Bug 91124</a> - Civilization V (in Wine) has rendering issues: text missing, menu bar corrupted</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91222">Bug 91222</a> - lp_test_format regression on CentOS 7</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91226">Bug 91226</a> - Crash in glLinkProgram (NEW)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91231">Bug 91231</a> - [NV92] Psychonauts (native) segfaults on start when DRI3 enabled</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91254">Bug 91254</a> - (regresion) video using VA-API on Intel slow and freeze system with mesa 10.6 or 10.6.1</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91292">Bug 91292</a> - [BDW+] glVertexAttribDivisor not working in combination with glPolygonMode</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91461">Bug 91461</a> - gl_TessLevel* writes have no effect for all but the last TCS invocation</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91513">Bug 91513</a> - [IVB/HSW/BDW/SKL Bisected] Lightsmark performance reduced by 7%-10%</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91526">Bug 91526</a> - World of Warcraft (on Wine) has UI corruption with nouveau</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91544">Bug 91544</a> - [i965, regression, bisected] regression of several tests in 93977d3a151675946c03e</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91551">Bug 91551</a> - DXTn compressed normal maps produce severe artifacts on all NV5x and NVDx chipsets</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91570">Bug 91570</a> - Upgrading mesa to 10.6 causes segfault in OpenGL applications with GeForce4 MX 440 / AGP 8X</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91591">Bug 91591</a> - rounding.h:102:2: error: #error "Unsupported or undefined LONG_BIT"</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91610">Bug 91610</a> - [BSW] GPU hang for spec.shaders.point-vertex-id gl_instanceid divisor</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91673">Bug 91673</a> - Segfault when calling glTexSubImage2D on storage texture to bound FBO</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91726">Bug 91726</a> - R600 asserts in tgsi_cmp/make_src_for_op3</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91847">Bug 91847</a> - glGenerateTextureMipmap not working (no errors) unless glActiveTexture(GL_TEXTURE1) is called before</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91857">Bug 91857</a> - Mesa 10.6.3 linker is slow</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91881">Bug 91881</a> - regression: GPU lockups since mesa-11.0.0_rc1 on RV620 (r600) driver</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=38109">Bug 38109</a> - i915 driver crashes if too few vertices are submitted (Mesa 7.10.2)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91716">Bug 91716</a> - [bisected] piglit.shaders.glsl-vs-int-attrib regresses on 32 bit BYT, HSW, IVB, SNB</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91719">Bug 91719</a> - [SNB,HSW,BYT] dEQP regressions associated with using NIR for vertex shaders</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=55552">Bug 55552</a> - Compile errors with --enable-mangling</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=71789">Bug 71789</a> - [r300g] Visuals not found in (default) depth = 24</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91044">Bug 91044</a> - piglit spec/egl_khr_create_context/valid debug flag gles* fail</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91342">Bug 91342</a> - Very dark textures on some objects in indoors environments in Postal 2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91596">Bug 91596</a> - EGL_KHR_gl_colorspace (v2) causes problem with Android-x86 GUI</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=86720">Bug 86720</a> - [radeon] Europa Universalis 4 freezing during game start (10.3.3+, still broken on 11.0.2)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91788">Bug 91788</a> - [HSW Regression] Synmark2_v6 Multithread performance case FPS reduced by 36%</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=91993">Bug 91993</a> - Graphical glitch in Astromenace (open-source game).</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92214">Bug 92214</a> - Flightgear crashes during splashboot with R600 driver, LLVM 3.7.0 and mesa 11.0.2</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92437">Bug 92437</a> - osmesa: Expose GL entry points for Windows build, via .def file</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92623">Bug 92623</a> - Differences in prog_data ignored when caching fragment programs (causes hangs)</li>
</ul>
<h2>Changes</h2>
<p>Alex Deucher (1):</p>
<ul>
<li>radeon/uvd: don't expose HEVC on old UVD hw (v3)</li>
</ul>
<p>Ben Widawsky (1):</p>
<ul>
<li>i965/skl: Add GT4 PCI IDs</li>
</ul>
<p>Emil Velikov (4):</p>
<ul>
<li>docs: add sha256 checksums for 11.0.4</li>
<li>cherry-ignore: ignore a possible wrong nomination</li>
<li>Revert "mesa/glformats: Undo code changes from _mesa_base_tex_format() move"</li>
<li>Update version to 11.0.5</li>
</ul>
<p>Emmanuel Gil Peyrot (1):</p>
<ul>
<li>gbm.h: Add a missing stddef.h include for size_t.</li>
</ul>
<p>Eric Anholt (1):</p>
<ul>
<li>vc4: When the create ioctl fails, free our cache and try again.</li>
</ul>
<p>Ian Romanick (1):</p>
<ul>
<li>i965: Fix is-renderable check in intel_image_target_renderbuffer_storage</li>
</ul>
<p>Ilia Mirkin (3):</p>
<ul>
<li>nvc0: respect edgeflag attribute width</li>
<li>nouveau: set MaxDrawBuffers to the same value as MaxColorAttachments</li>
<li>nouveau: relax fence emit space assert</li>
</ul>
<p>Ivan Kalvachev (1):</p>
<ul>
<li>r600g: Fix special negative immediate constants when using ABS modifier.</li>
</ul>
<p>Jason Ekstrand (2):</p>
<ul>
<li>nir/lower_vec_to_movs: Pass the shader around directly</li>
<li>nir: Report progress from lower_vec_to_movs().</li>
</ul>
<p>Jose Fonseca (2):</p>
<ul>
<li>gallivm: Translate all util_cpu_caps bits to LLVM attributes.</li>
<li>gallivm: Explicitly disable unsupported CPU features.</li>
</ul>
<p>Julien Isorce (4):</p>
<ul>
<li>st/va: pass picture desc to begin and decode</li>
<li>nvc0: fix crash when nv50_miptree_from_handle fails</li>
<li>st/va: do not destroy old buffer when new one failed</li>
<li>st/va: add more errors checks in vlVaBufferSetNumElements and vlVaMapBuffer</li>
</ul>
<p>Kenneth Graunke (6):</p>
<ul>
<li>i965: Fix missing BRW_NEW_*_PROG_DATA flagging caused by cache reuse.</li>
<li>nir: Report progress from nir_split_var_copies().</li>
<li>nir: Properly invalidate metadata in nir_split_var_copies().</li>
<li>nir: Properly invalidate metadata in nir_opt_copy_prop().</li>
<li>nir: Properly invalidate metadata in nir_lower_vec_to_movs().</li>
<li>nir: Properly invalidate metadata in nir_opt_remove_phis().</li>
</ul>
<p>Marek Olšák (1):</p>
<ul>
<li>radeonsi: add register definitions for Stoney</li>
</ul>
<p>Nanley Chery (1):</p>
<ul>
<li>mesa/glformats: Undo code changes from _mesa_base_tex_format() move</li>
</ul>
<p>Nicolai Hähnle (1):</p>
<ul>
<li>st/mesa: fix mipmap generation for immutable textures with incomplete pyramids</li>
</ul>
<p>Nigel Stewart (1):</p>
<ul>
<li>osmesa: Expose GL entry points for Windows build via DEF file.</li>
</ul>
<p>Roland Scheidegger (1):</p>
<ul>
<li>gallivm: disable f16c when not using AVX</li>
</ul>
<p>Samuel Li (2):</p>
<ul>
<li>radeonsi: add support for Stoney asics (v3)</li>
<li><ahref="https://bugs.freedesktop.org/show_bug.cgi?id=92900">Bug 92900</a> - [regression bisected] About 700 piglit regressions is what could go wrong</li>
</ul>
<h2>Changes</h2>
<p>Alex Deucher (1):</p>
<ul>
<li>radeonsi: enable optimal raster config setting for fiji (v2)</li>
@@ -152,25 +197,39 @@ if they're not installed in your system. You should be told what's missing.
<br>
<br>
<li>xf86-video-vmware: Now, once libxatracker is installed, we proceed with building and replacing the current Xorg driver. First check if your system is 32- or 64-bit. If you're building for a 32-bit system, you will not be needing the --libdir=/usr/lib64 option to autogen.
<li>xf86-video-vmware: Now, once libxatracker is installed, we proceed with
building and replacing the current Xorg driver.
First check if your system is 32- or 64-bit.
<pre>
cd $TOP/xf86-video-vmware
./autogen.sh --prefix=/usr --libdir=/usr/lib64
./autogen.sh --prefix=/usr --libdir=${LIBDIR}
make
sudo make install
</pre>
<li>vmwgfx kernel module. First make sure that any old version of this kernel module is removed from the system by issuing
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.