Comparing 8d3ccdbb9b...4e9d510044 - mesa

fran/mesa

Author	SHA1	Message	Date
Erico Nunes	71fb721ca5	lima/ppir: use ra_get_best_spill_node to select spill node ra_get_best_spill_node is what other users of the mesa register allocator use. Switching to it now also fixes an infinite loop issue with ppir regalloc with the ppir control flow patchset, and also provides a small gain over the previous herusitic on number of spilled nodes testing with shader-db. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-08-20 21:16:02 +00:00
Eric Anholt	c1dc84e71d	tgsi: Remove unused tgsi_check_soa_dependencies(). Acked-by: Eric Engestrom <eric@engestrom.ch> Reviewed-By: Gert Wollny <gert.wollny@collabora.com>	2019-08-20 13:31:13 -07:00
Eric Anholt	4ebe6b2e72	tgsi: Drop the SSE2 constants setup that's been dead code since 2011. The SSE2 executor was removed in `4eb3225b38` ("Remove tgsi_sse2.") Acked-by: Eric Engestrom <eric@engestrom.ch> Reviewed-By: Gert Wollny <gert.wollny@collabora.com>	2019-08-20 13:31:13 -07:00
Eric Anholt	98c58355d3	tgsi: drop a stale comment This was fixed in `912ed84f83` ("tgsi: move to using vector for system values.") Acked-by: Eric Engestrom <eric@engestrom.ch> Reviewed-By: Gert Wollny <gert.wollny@collabora.com>	2019-08-20 13:31:13 -07:00
Eric Anholt	553cd82d64	gitlab-ci: Enable the GLES2/3 CTS on softpipe. The GLES2 CTS takes about 8 minutes of total runtime (at parallel 4 is ~2 minutes in the test stage if runners are free), while GLES3 takes about 25. Since the GLES3 run is pretty expensive, just do a cheap touch test of 1 out of every 10 tests in the test list on MRs, until we can get the runtime down. v2: Drop the full run for now until we can bring runtime down or bring up a dedicated mesa runner. Reviewed-by: Eric Engestrom <eric@engestrom.ch> (v1) Reviewed-By: Gert Wollny <gert.wollny@collabora.com> (v1)	2019-08-20 13:31:13 -07:00
Jose Maria Casanova Crespo	6c904773fe	mesa: reverse no_error on compressed_tex_sub_image for TEX_MODE_CURRENT This fixes the regression introduced on "mesa: refactor compressed_tex_sub_image function" that started to crash KHR-GLES2.texture_3d.compressed_texture.negative_compressed_tex_sub_image Fixes: `7df233d68d` ("mesa: refactor compressed_tex_sub_image function") Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-20 20:45:21 +01:00
Adam Jackson	b283919398	glx: Eliminate glx_config::{rgb,float,colorIndex}Mode These are redundant with glx_config::renderType, let's just use that consistently.	2019-08-20 14:05:07 -04:00
Adam Jackson	74ca87e4bc	glx: Remove unused glx_config::pixmapMode Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-20 14:05:03 -04:00
Adam Jackson	35fc7bdf0e	glx: convert glx_config_create_list to one big calloc Simpler, less failure prone, less malloc overhead, what's not to like. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-20 14:05:01 -04:00
Adam Jackson	97d58eabcc	glx: convert a malloc+memset to calloc Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-20 14:04:59 -04:00
Adam Jackson	cabd09c9e7	glx: Fix parameter documentation of glx_config_create_list 'minimum_size' is not, in fact, an argument to this function. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-20 14:04:56 -04:00
Arcady Goldmints-Orlov	3835535537	anv: inline uniforms blocks don't count toward descriptor set limits In a descriptor set inline uniform blocks don't use up any bindings. However, the presence of any inline uniform blocks doed require the use of the descriptor buffer, which takes up one binding. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-20 16:48:45 +00:00
Daniel Schürmann	df86c5ffb3	nir: add divergence analysis pass. This pass expects the shader to be in LCSSA form. The algorithm is based on 'The Simple Divergence Analysis' from Diogo Sampaio, Rafael De Souza, Sylvain Collange, Fernando Magno Quintão Pereira. Divergence Analysis. ACM Transactions on Programming Languages and Systems (TOPLAS) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-08-20 17:40:13 +02:00
Rhys Perry	7b07034931	nir/subgroups: Lower clustered reductions with cluster_size >= subgroup_size into reductions The behavior for reductions with cluster_size >= subgroup_size is implementation defined. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-20 17:40:10 +02:00
Rhys Perry	911a1dfad2	nir/lcssa: allow to create LCSSA phis for loop-invariant booleans ACO depends on LCSSA phis for divergent booleans to work correctly. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-20 17:40:05 +02:00
Daniel Schürmann	9c40ad49d5	nir/lcssa: Skip loop invariant variables when converting to LCSSA. Co-authored-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-20 17:40:01 +02:00
Rhys Perry	8a6cfaa15a	nir: make nir_to_lcssa() a general NIR pass. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-20 17:39:54 +02:00
Daniel Schürmann	204846ad06	nir/lcssa: handle deref instructions properly Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Fixes: `414148cdc1` "nir: Support deref instructions in loop_analyze"	2019-08-20 17:39:52 +02:00
Jose Maria Casanova Crespo	7c56a68c8b	tgsi_to_nir: only update TGSI properties of the current shader stage The implementation introduced in "tgsi_to_nir: be careful about not losing any TGSI properties silently (v2)" updates all the TGSI properties, but it didn't take into account that the shader_info structure uses a union to store the different attributes for each shader stage. Now we only update the attributes if they affect current shader stage, avoiding to overwrite members of the union that should be overwritten. This has created hundreds of regressions in v3d. For example the TGSI_PROPERTY_VS_BLIT_SGPRS_AMD was overwritting the same position used by TGSI_PROPERY_CS_FIXED_BLOCK_DEPTH. Fixes: `e300365197` ("tgsi_to_nir: be careful about not losing any TGSI properties silently (v2)") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-20 10:30:21 +00:00
Samuel Pitoiset	83a63a5b12	radv/gfx10: do not emit PA_SC_TILE_STEERING_OVERRIDE twice CLEAR_STATE emits it for us. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-08-20 12:13:44 +02:00
Samuel Pitoiset	2ca8629fa9	radv: do not emit PKT3_CONTEXT_CONTROL with AMDGPU 3.6.0+ It's emitted by the kernel. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-08-20 12:13:41 +02:00
Gert Wollny	6a09405368	mesa/program: Take ARB_framebuffers_no_attachments into account in wpos correction If a drawbuffer is an fbo without an attachment then its 'Height' will be zero, and we have to take its 'DefaultGeometry.Height' into account. Fixes on softpipe (with the exception of tests that use multisample): dEQP-GLES31.functional.fbo.no_attachments.* Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-20 10:04:24 +02:00
Sagar Ghuge	fe0e9db797	iris: Enable non coherent framebuffer fetch on broadwell v2: Use GEN_GEN in iris_state (Kenneth Graunke) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-20 00:50:58 -07:00
Sagar Ghuge	57ce422e20	iris: Free resource if failed to allocate surface state Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-20 00:50:55 -07:00
Sagar Ghuge	02244bc515	iris: Pass isl_surf to fill_surface_state Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Suggested-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-20 00:50:45 -07:00
Sagar Ghuge	638a157e02	iris: Add infrastructure to support non coherent framebuffer fetch Create separate SURFACE_STATE for render target read in order to support non coherent framebuffer fetch on broadwell. Also we need to resolve framebuffer in order to support CCS_D. v2: Add outputs_read check (Kenneth Graunke) v3: 1) Import Curro's comment from get_isl_surf 2) Rename get_isl_surf method 3) Clean up allocation in case of failure Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-20 00:50:44 -07:00
Sagar Ghuge	61c0637afb	iris: Add helper functions to get tile offset All helper functions are ported from i965 driver. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-20 00:50:43 -07:00
Sagar Ghuge	7e816991cc	iris: Add helper function to get isl dim layout v2: Add missing space (Caio) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-20 00:50:41 -07:00
Sagar Ghuge	58471e20d2	iris: Add render target read entry in binding table This will be used in next patches for supporting non coherent framebuffer fetch on Broadwell. v2: Fix comment (Kenneth Graunke) v3: 1) Fix a few nits (Caio) 2) Add comment (Caio) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-20 00:50:31 -07:00
Kai Wasserbäch	1abe87383e	build: Bump C++ standard requirement to C++14 to fix FTBFS with LLVM 10 When building Mesa against a recent LLVM 10 with C++11, the build fails if the AMD common code is built as well due to "std::index_sequence" being undeclared. LLVM requires a minimum of C++14. Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Acked-by: Eric Engestrom <eric@engestrom.ch>	2019-08-20 05:39:19 +00:00
Rob Herring	d0ec5d38f6	panfrost: Add madvise support to BO cache The kernel now supports madvise ioctl to indicate which BOs can be freed when there is memory pressure. Mark BOs purgeable when they are in the BO cache. The BOs must also be munmapped when they are in the cache or they cannot be purged. We could optimize avoiding the madvise ioctl on older kernels once the driver version bump lands, but probably not worth it given the other driver features also being added. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Signed-off-by: Rob Herring <robh@kernel.org>	2019-08-19 19:33:20 -05:00
Rob Herring	c45c2d7960	panfrost: Sync UAPI header from kernel Sync the panfrost_drm.h UAPI header with the latest from the kernel. This adds madvise ioctl and GPU feature params. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Signed-off-by: Rob Herring <robh@kernel.org>	2019-08-19 19:33:20 -05:00
Pierre-Eric Pelloux-Prayer	0f07d18e48	mesa: add ext_dsa GetMultiTexLevelParameterEXT Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-19 18:50:08 -04:00
Pierre-Eric Pelloux-Prayer	e8c5dc9c24	mesa: add EXT_dsa glCompressedMultiTex* functions display list support Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-19 18:50:07 -04:00
Pierre-Eric Pelloux-Prayer	1cb8e12717	mesa: add EXT_dsa glCompressedMultiTex* functions Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-19 18:50:05 -04:00
Pierre-Eric Pelloux-Prayer	a886025ef5	mesa: add EXT_dsa glCompressedTex* functions display list support Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-19 18:50:03 -04:00
Pierre-Eric Pelloux-Prayer	8c76221886	mesa: add EXT_dsa glCompressedTexture(Sub)Image1D/2D/3D functions Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-19 18:49:57 -04:00
Pierre-Eric Pelloux-Prayer	7df233d68d	mesa: refactor compressed_tex_sub_image function Combine compressed_tex_sub_image, compressed_tex_sub_image_error and compressed_tex_sub_image_no_error in a single function. The added "enum tex_mode mode" parameter allows to implement the DSA / non-DSA variants and their error/no_error combination. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-19 18:49:43 -04:00
Bas Nieuwenhuizen	6c5d983865	radv: Add Renoir support. Took the freedom to enable dfsm even though I don't have benchmark results yet, but it seems Raven-like. Rest is from radeonsi. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-08-19 22:34:11 +00:00
Marek Olšák	223b3174bd	radeonsi/nir: always lower ballot masks as 64-bit, codegen handles it This fixes KHR-GL45.shader_ballot_tests.ShaderBallotBitmasks. This solution is better, because the IR isn't dependent on wave32.	2019-08-19 17:23:38 -04:00
Marek Olšák	5d37194d43	radeonsi: remove the unsafemath debug option unlikely to be used in the future Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-19 17:23:38 -04:00
Marek Olšák	5586411de4	radeonsi/nir: fix counting shader inputs & outputs	2019-08-19 17:23:38 -04:00
Marek Olšák	452cb7055f	radeonsi/nir: fix assertion in si_nir_load_sampler_desc	2019-08-19 17:23:38 -04:00
Marek Olšák	1f8a661748	radeonsi: clean up si_llvm_context_set_tgsi Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-19 17:23:38 -04:00
Marek Olšák	43f8b5642b	radeonsi: allocate and resize global_buffers as needed Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-19 17:23:38 -04:00
Marek Olšák	c315cb509d	radeonsi/gfx10: don't set PA_SC_TILE_STEERING_OVERRIDE if CLEAR_STATE sets it Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-19 17:23:38 -04:00
Marek Olšák	5a2e65be89	radeonsi: don't emit PKT3_CONTEXT_CONTROL on amdgpu Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-19 17:23:38 -04:00
Marek Olšák	8d0d753bd0	radeonsi: fix an assertion failure: assert(!res->b.is_shared) This only appears to happen on Raven2. Possible way to reproduce: resource_get_handle(WINSYS_HANDLE_TYPE_KMS) --> sets is_shared = true resource_get_handle(WINSYS_HANDLE_TYPE_DMABUF) --> fail Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org>	2019-08-19 17:23:38 -04:00
Marek Olšák	bdcbac9459	radeonsi: handle the use_ngg_streamout flag in si_update_ngg	2019-08-19 17:23:38 -04:00
Marek Olšák	a6b3ca1c70	radeonsi: move the tess factor ring size assertion to a place where it matters Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-19 17:23:38 -04:00
Marek Olšák	21217efdfe	ac/nir: set image=true when loading FMASK for images Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-19 17:23:38 -04:00
Christian Gmeiner	f52b9218ff	etnaviv: rs: add support for 64bpp clears Starting with HALTI2 the RS supports 64bpp clears. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Philipp Zabel <philipp.zabel@gmail.com> Reviewed-by: Jonathan Marek <jonathan@marek.ca>	2019-08-19 22:36:45 +02:00
Christian GMEINER	7492685b1b	etnaviv: update headers from rnndb Update to etna_viv commit c51353e. Signed-off-by: Christian GMEINER <christian.GMEINER@bachmann.info> Reviewed-by: Philipp Zabel <philipp.zabel@gmail.com> Reviewed-by: Jonathan Marek <jonathan@marek.ca>	2019-08-19 22:36:45 +02:00
Eric Anholt	1395503424	swrast: Make the fetch funcs table sparse. This shrinks the table, avoids needing to update the table with NULL entries on every MESA_FORMAT addition, and removes a surprising, non-unit-tested format number ordering dependency. Acked-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2019-08-19 11:48:03 -07:00
Eric Anholt	c45c33a5a2	gallium: Remove manual defining of PIPE_FORMAT enum values. Now that SVGA doesn't have a table that has to be in PIPE_FORMAT order, we can let the enums have whatever values they naturally would without worrying about holes. Acked-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2019-08-19 11:48:01 -07:00
Eric Anholt	84db6ba740	svga: Drop unsupported formats from the format table. Now that we're using the array initializers, we don't need to manually fill out all these stub entries. Produced with "sed -i '/.INVALID.INVALID.*INVALID/d' src/gallium/drivers/svga/svga_format.c" Acked-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2019-08-19 11:43:02 -07:00
Eric Anholt	ef37da52c0	svga: Remove duplication in the format table. By using the [ ] = {} array initializer syntax, we no longer need the entries to be listed in PIPE_FORMAT_* value order. This means that people adding new gallium formats don't need to cargo-cult changes to this driver or regress that non-unit-tested requirement. While I'm here, drop the lines for formats that no longer exist (the numbered ones in the table). Acked-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2019-08-19 11:42:55 -07:00
Eric Anholt	42efa789b5	svga: Factor out the format conversion table entry lookup. Seemed like a sensible cleanup, while I was looking at whether I could make the table sparse. To make the svga table not require fixups on every new gallium format, we may want to change how it's populated. Acked-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2019-08-19 11:42:36 -07:00
Jason Ekstrand	5167e94f23	nir: Add more source types to nir_tex_instr_src_type Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-19 17:03:34 +00:00
Alyssa Rosenzweig	2bb4dc4054	pan/midgard: Compute liveness per-block Rather than using a regalloc based on live internals, computed hastily with repeated invocations of a forward-analysis pass, we switch to compute liveness information on a per-block basis. Within a given basic block, we compute liveness backwards with a linear-time algorithm; for common shaders, this may help RA terminate quicker. Across blocks, we use a work list (really a work set) and check if we're making progress. This isn't terribly efficient, but it gets the job done. Point is, we get the live_in/live_out for each block. From there, it's simple to rerun the linear-time update algorithm to compute the interference graph. The benefit of this technique is the ability to ignore "gaps" in liveness across intermediate blocks that are never executed. On simple shaders like the loops in glmark, this results in a minor reduction in register pressure. The motivation was a complex shader in Krita that failed register allocation due to an unfortunate interaction between texture pipeline registers and control flow. This shader now compiles successfully. total instructions in shared programs: 3439 -> 3438 (-0.03%) instructions in affected programs: 22 -> 21 (-4.55%) helped: 1 HURT: 0 total bundles in shared programs: 2077 -> 2076 (-0.05%) bundles in affected programs: 12 -> 11 (-8.33%) helped: 1 HURT: 0 total quadwords in shared programs: 3457 -> 3456 (-0.03%) quadwords in affected programs: 20 -> 19 (-5.00%) helped: 1 HURT: 0 total registers in shared programs: 341 -> 338 (-0.88%) registers in affected programs: 9 -> 6 (-33.33%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 33.33% max: 33.33% x̄: 33.33% x̃: 33.33% Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig	24c91bb54b	pan/midgard: Analyze load/store for swizzle propagation If there's a nontrivial swizzle fed into an extra (shortened) argument, we bail on copyprop. No glmark changes (since it doesn't use fancy texturing/loads). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig	9ae4d3653e	pan/midgard: Treat cubemaps "stores" as loads It's always been ambiguous which they are, but their primary register is their output, not their input; therefore, they are loads. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig	20dd482668	pan/midgard: Clamp cubemap swizzle to XYXX Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig	2788721cc4	pan/midgard: Clamp st_vary swizzle by number of components Same issue with liveness analysis. If we store out a vec3, we should not reference the .w component. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig	edc8e41566	pan/midgard: Use type-appropriate swizzle for texture coordinate The texture coordinate for a 2D texture could be a vec2 or a vec3, depending if it's an array texture or not. If it's vec2 (non-array texture), we should not reference the z component; otherwise, liveness analysis will get very confused when z is never written. v2: Fix typo (Ilia). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig	2bcb3d9226	pan/midgard: Set mask for lowered read-hazard moves If we need to lower a move for a read from a vec2 texture coordinate, we shouldn't write zw, even incidentally. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig	739e09c297	pan/midgard: Fix texw lowering with complex control flow Fixes shaders with control flow like: out = 0; if (A) { if (B) out = texture(A, ...) } else { out = texture(B, ...) } Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig	6f1c8c148d	pan/midgard: Add mir_rewrite_index_dst_single helper Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig	d68019ad1f	pan/midgard: Print predecessors in MIR Just as a sanity check. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig	e3a418fe86	pan/midgard: Index blocks for printing Better than having pointers flying about. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig	2f92479ffc	pan/midgard: Add mir_foreach_src This is repeated often enough. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig	84580c6dbc	pan/midgard: Add mir_foreach_instr_in_block_rev Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig	c8c4471a92	pan/midgard: Add mir_foreach_successor helper Now we should be able to walk the control-flow graph naturally. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig	b8e526c520	pan/midgard: Add mir_foreach_predecessor utility It's ugly, but c'est la vie. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig	b4b2e111f8	pan/midgard: Link exit block The exit block has been 'dangling' in the successors graph, so let's ensure it's linked in. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig	07c960cac0	pan/midgard: Add mir_exit_block helper The exit block is gauranteed to be empty, signaling the end of the program. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig	aeeeef1242	pan/midgard: Maintain block predecessor set While we already compute the successors array, for backwards data flow analysis, it is useful to walk the control flow graph backwards based on predecessors, so let's compute that information as well. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig	4fa09329c1	pan/midgard: Use ralloc on ctx/blocks This will allow us to get some level of automatic memory management. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-19 08:32:17 -07:00
Alyssa Rosenzweig	b59b1793b8	pan/midgard: Shrink successors[] to 2 length A block can't have more. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-19 08:32:17 -07:00
Roman Stratiienko	fdd6151612	nir: Add missing dependency in Android.nir.gen.mk Fixes incremental build with Android Signed-off-by: Roman Stratiienko <roman.stratiienko@globallogic.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-08-19 09:53:18 +03:00
Erico Nunes	99d5bdcfa5	meson: build lima tools as part of 'all' tools This is primarily so that this build gets tested in CI and we don't break it again. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-08-18 22:27:55 +02:00
Connor Abbott	c550d367a7	ac/nir: Fix store_scratch with a non-full writemask By adding one more helper to ac_llvm_build, we can also easily keep vector stores together. Fixes the tests/spec/glsl-1.30/execution/fs-large-local-array-vec4.shader_test piglit test. Fixes: `74470baebb` ("ac/nir: Lower large indirect variables to scratch") Reviewed-by: Marek Olšák <marek.olsak@amd.com	2019-08-18 15:15:45 +02:00
Vasily Khoruzhick	0e394cda0d	glsl/standalone: init shader stage in init_gl_program() Otherwise lima standalone compiler fails when trying to compile fragment shader with: lima_compiler: ../src/compiler/nir/nir.c:55: nir_shader_create: Assertion `si->stage == stage' failed Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-08-17 11:14:40 -07:00
Jason Ekstrand	16edd02bfa	iris: Only request an input mask if the shader needs it Fixes: `aebca3961b` "iris: Fix handling of SIMD32 fragment shaders" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-16 19:59:42 -05:00
Xiong, James	dcad15ff54	gallium: add back YVU support PIPE_FORMAT_YV12 is not handled so switching to PIPE_FORMAT_IYUV and adding back YVU support. Signed-off-by: James Xiong <james.xiong@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-16 13:24:49 -07:00
Erico Nunes	7a51abab42	lima: actually wait for bo in lima_bo_wait PIPE_TIMEOUT_INFINITE is unsigned and gets assigned to signed fields where it ends up as -1. When this reaches the kernel as a timeout it gets translated as no timeout, which cause the waiting functions to return immediately and not actually wait for a completion. This seems to cause unstable results with lima where even piglit tests randomly fail. Handle this by setting the signed max value in case of infinite timeout. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-08-16 16:31:29 +02:00
Rhys Perry	0a790c3019	nir/algebraic: add a few masking-before-unpack optimizations Helps some Dawn of War 3 and F1 2017 shaders with ACO: Totals from affected shaders: SGPRS: 2136 -> 2128 (-0.37 %) VGPRS: 1624 -> 1628 (0.25 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 168068 -> 164332 (-2.22 %) bytes LDS: 44 -> 44 (0.00 %) blocks Max Waves: 222 -> 221 (-0.45 %) Wait states: 0 -> 0 (0.00 %) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-16 12:13:01 +01:00
Vasily Khoruzhick	861c2b8d31	lima: fix compilation of standalone compiler Fixes: e0aeee946004("lima: add summary report for shader-db") Reviewed-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-08-15 16:59:51 -07:00
Bas Nieuwenhuizen	b9fb90e6d3	Revert "radv/gfx10: Enable DCC for storage images." Quite useless without DCC for LAYOUT_GENERAL. Fixes: `b4dad3afaa` Revert "radv: Do not decompress on LAYOUT_GENERAL." Acked-by: Dave Airlie <airlied@redhat.com>	2019-08-16 01:22:54 +02:00
Bas Nieuwenhuizen	b4dad3afaa	Revert "radv: Do not decompress on LAYOUT_GENERAL." Causes issues with a bunch of games with DXVK. Fixes: `50add1b33a` "radv: Do not decompress on LAYOUT_GENERAL." Acked-by: Dave Airlie <airlied@redhat.com>	2019-08-16 01:22:35 +02:00
Dave Airlie	f3af7886fe	mesa: add support for CET to x86/x86-64 asm files. Control-flow enforcement technology is a new instructions on x86 processors to denote where indirect jumps can land. Gcc auto adds the instruction (which encodes as a NOP on older CPUs) to entrypoints but assembler files need manual adding. This adds it to all the entry points in the mesa x86/x86-64 assembler files. This will only happen if mesa is built with the -fcf-protection flag to gcc as some distros are wanting to do. Acked-by: Eric Anholt <eric@anholt.net>	2019-08-16 09:00:35 +10:00
Alyssa Rosenzweig	78eda70892	pan/bifrost: Manually constant fold register class Fixes errors for some people building Mesa: ../src/panfrost/bifrost/bifrost_sched.c:32:31: error: initializer element is not constant const unsigned max_vec2_reg = max_primary_reg / 2; ../src/panfrost/bifrost/bifrost_sched.c:33:31: error: initializer element is not constant const unsigned max_vec3_reg = max_primary_reg / 4; // XXX: Do we need to align vec3 to vec4 boundary? ../src/panfrost/bifrost/bifrost_sched.c:34:31: error: initializer element is not constant const unsigned max_vec4_reg = max_primary_reg / 4; ../src/panfrost/bifrost/bifrost_sched.c:35:32: error: initializer element is not constant const unsigned max_registers = max_primary_reg + ../src/panfrost/bifrost/bifrost_sched.c:40:28: error: initializer element is not constant const unsigned vec2_base = primary_base + max_primary_reg; ../src/panfrost/bifrost/bifrost_sched.c:41:28: error: initializer element is not constant const unsigned vec3_base = vec2_base + max_vec2_reg; ../src/panfrost/bifrost/bifrost_sched.c:42:28: error: initializer element is not constant const unsigned vec4_base = vec3_base + max_vec3_reg; ../src/panfrost/bifrost/bifrost_sched.c:43:27: error: initializer element is not constant const unsigned vec4_end = vec4_base + max_vec4_reg; Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-15 19:06:35 +00:00
Erik Faye-Lund	18ab42644b	gallium/util: widen type before multiplication This method returns size_t, but the multiplication multiplies two integers, leading to overflow rather than type widening. Noticed by compiling with MSVC, which emits a warning. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-15 20:23:53 +02:00
Erik Faye-Lund	0091f62978	mesa: avoid warning on Windows On Windows, p_atomic_inc_return returns an unsigned long long rather than the type the pointer refers to, so let's make sure we cast the result to the right type. Otherwise, we'll trigger a warning about the wrong format-string for the type. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-15 20:23:49 +02:00
Erik Faye-Lund	544b088616	win32: unify strcasecmp definitions There was two incompatible definitions of strcasecmp, which lead to a compiler warning. Let's clean this up by only leaving one of them, and using that one all the time. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-15 20:23:44 +02:00
Erik Faye-Lund	ecd312be96	mesa/main: avoid warning when casting offset to pointer This generates a warning on some 64-bit systems, so let's cast to a properly sized integer first. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-15 20:23:39 +02:00
Erik Faye-Lund	c646cd4bac	nir: avoid warning when casting bogus pointer This intentionally-bogus pointer generates a warning on some 64-bit systems, so let's cast to a properly-sized integer first. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-15 20:23:35 +02:00
Erik Faye-Lund	b355eef920	glsl: fixup u64-warning Similarly to the unsigned-version, we need to first cast the result to a suiting integer before negating the number, otherwise we'll trigger a warning. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-15 20:23:13 +02:00
Kenneth Graunke	f741de236b	isl: Enable Unorm Path in Color Pipe Improves performance on my Icelake 8x8 locked to 700Mhz. For example, some GfxBench5 subtests have the following results: - [i965] gl_manhattan: ................ 7.01119% +/- 0.180971% (n=5) - [i965] gl_4 (Car Chase): 4.24351% +/- 0.175622% (n=5) - [i965] gl_blending: ................ 3.36327% +/- 0.180267% (n=5) - [i965] gl_5_normal (Aztec Ruins): 1.67962% +/- 0.243534% (n=10) - [iris] gl_manhattan: ................ 3.92357% +/- 0.073965% (n=25) - [iris] gl_4 (Car Chase): 2.17746% +/- 0.0826858% (n=5) - [iris] gl_blending: ................ 2.79599% +/- 0.803652% (n=15) - [iris] gl_5_normal (Aztec Ruins): 1.30930% +/- 0.106523% (n=25) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-15 10:39:09 -07:00
Rafael Antognolli	ceeaf93c8e	anv: Properly initialize device->slice_hash. When subslices_delta == 0 and we take the early return, device->slice_hash is not initialized on GEN11. It then causes a segfault when going through anv_DestroyDevice, if compiled with valgrind. Fixes: `7bc022b4bb` ("anv/gen11: Emit SLICE_HASH_TABLE when pipes are unbalanced.) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-15 09:42:48 -07:00
Danylo Piliaiev	72354d43d4	intel/compiler: Fix resource leak in error path CID: 1452261 Fixes: `04a99515` "intel/compiler: add ability to override shader's assembly" Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-08-15 08:17:36 +00:00
Alyssa Rosenzweig	44a6c38bd6	panfrost: Implement native RECT textures We started honouring the normalized_coords flag in the texture descriptor, but a bisection revealed that broke RECT textures -- since we were also lowering them in the shader. So just remove the shader-based lowering, use native RECT textures, and enjoy the nominal reduction in complexity and performance boost. Fixes: `3e47a1181b` ("panfrost: Add MALI_SAMP_NORM_COORDS flag") Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 16:57:42 -07:00
Alyssa Rosenzweig	6fe4822cca	panfrost: Add R10G10B10A2_SSCALED vertex format Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 16:57:24 -07:00
Alyssa Rosenzweig	e823a47f02	pan/midgard: Disassemble UBO index explicitly It's a bit of a special case but that's fine. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 16:57:24 -07:00
Alyssa Rosenzweig	3d54ed2488	pan/midgard: Account for unaligned UBOs when promoting uniforms We only know how to promote aligned accesses, although theoretically we should be able to promote unaligned to swizzles in the future. Check this. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 16:57:24 -07:00
Alyssa Rosenzweig	03350eb8b8	pan/midgard: Add mir_ubo_shift helper Different UBO reads have different shift requirements. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 16:57:24 -07:00
Alyssa Rosenzweig	cf3bb10f51	pan/midgard: Address emit_ubo_read offset in bytes We'll want to be smarter about unaligned reads, so let's get this code all in one place. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 16:57:24 -07:00
Alyssa Rosenzweig	65e6cb4eb0	pan/midgard: Wire writemask into UBO reads Helps the disassembly be clearer and maybe regalloc be smarter. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 16:57:24 -07:00
Alyssa Rosenzweig	ec2f0b580f	pan/midgard: Identify UBO/SSBO op symmetry It's the same thing, just shifted. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 16:57:24 -07:00
Alyssa Rosenzweig	375d4c2c74	panfrost: Extend blending to MRT Our hardware supports independent (per-RT) blending, but we need to route those settings through from Gallium. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 16:42:40 -07:00
Alyssa Rosenzweig	dff4986b1a	pan/midgard: Emit store_output branch just-in-time We'll need multiple branches for MRT, so we can't defer. Also, we need to track dependencies to ensure r0 is set to the correct value for each store_output. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 16:42:40 -07:00
Alyssa Rosenzweig	2fc44c4dc8	pan/midgard: Add dont_eliminate flag We need to treat fragment writes specially. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 16:42:40 -07:00
Alyssa Rosenzweig	6ed3843224	pan/mfbd: Stuff in RT count Fixes DATA_INVALID_FAULTs with multiple render targets. We do always allocate space for 4 cbufs just to keep things sane. This may not be strictly necessary. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 16:42:40 -07:00
Alyssa Rosenzweig	716be7862e	pan/decode: Dump FBD tagged pointer Turns out the rt count is stuffed in here.. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 16:42:40 -07:00
Alyssa Rosenzweig	358372b256	pan/decode: Decode invalid access type upon fault We don't have a good way to confirm this, but it parallels the kernel definitons for MMU faults nicely. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 16:42:39 -07:00
Alyssa Rosenzweig	f5cc5ef404	pan/decode: Fix duplicate heap_end property This was supposed to read heap_start. It's the same value but still, better get this right. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 16:42:39 -07:00
Alyssa Rosenzweig	b78e04c17b	panfrost: Note "MFBD preload disable" bit It's a chicken bit, as far as I can tell. Buck buck. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 16:39:57 -07:00
Alyssa Rosenzweig	64720d1e9e	pan/bifrost: Link in compiler We enable the standalone compiler, build the new files, and let it blast. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 22:54:07 +00:00
Alyssa Rosenzweig	b93fa7d232	pan/bifrost: Check in remainder of the Bifrost compiler What it says on the tin. Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com> Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 22:54:07 +00:00
Alyssa Rosenzweig	0e126aa0f0	pan/bifrost: Add bifrost_print.c/h IR printers. Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com> Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 22:54:07 +00:00
Alyssa Rosenzweig	d8d8b08fe5	pan/bifrost: Style format the disassembler $ astyle .c .h --style=linux -s8 Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 22:54:07 +00:00
Alyssa Rosenzweig	fca491c0e1	pan/bifrost: Stub out standalone compiler We don't actually have a standalone compiler in-tree yet, but let's get prepared for when we do. Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com> Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 22:54:07 +00:00
Alyssa Rosenzweig	62bbc23da5	pan/bifrost: Sync disassembler with Ryan's tree The disassembler was updated to move common code with the compiler into a shared header. Additional, some new ops and control registers relating to rounding were added. Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com> Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 22:54:07 +00:00
Alyssa Rosenzweig	b73cbd6880	panfrost: Remove standalone pandecode tool Now that panwrap has gained the ability to trace directly without dumping to the filesystem, there's no need to lug around this tool. I can assure you nobody will miss it. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 15:09:17 -07:00
Alyssa Rosenzweig	6f4d796911	pan/midgard: Fix disassembly termination condition Fixes: `863bdd1f8d` ("pan/midgard: Break, not return, in disassembler") Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 15:09:17 -07:00
Alyssa Rosenzweig	de2efd5ea7	panfrost: Ensure we upload at least 1 blend RT Otherwise we'll get memory junk. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 15:09:17 -07:00
Alyssa Rosenzweig	54438267c3	panfrost: Zero tripipe on initialize I don't think the hardware cares, but this adds a lot of noise to traces that we would rather not need to look at. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 15:09:17 -07:00
Alyssa Rosenzweig	1ab6290746	pan/midgard: Improve disassembler robustness Some memory corruption / etc issues let to an accidental "fuzzing" of the disassembler ;) This uncovered some issues leading to a disassembler hang, so let's fix that. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 15:09:17 -07:00
Alyssa Rosenzweig	9c4c7211a3	pan/decode: Split public.h out We want a defined ABI for tracing; this set of functions should be as small as strictly necessary to minimize panwrap shenanigans. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 15:09:17 -07:00
Alyssa Rosenzweig	4f03728fb7	pan/decode: Prefer uint64_t to mali_ptr This removes an unwanted dependency on panfrost-job.h Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 15:09:17 -07:00
Alyssa Rosenzweig	6c84a2665c	pan/midgard: Allocate spill_slot once Multiple spill moves share a single spill slot. Issue found in Krita. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 14:58:34 -07:00
Alyssa Rosenzweig	2a9031ea44	pan/midgard: Use hint on midgard_instruction for spill_move This allows us to have multiple spill moves, whereas otherwise for N spill moves, the first N-1 would be clobbered. Issue found in Krita. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 14:58:34 -07:00
Alyssa Rosenzweig	3e6f2e7aba	panfrost: Remove panfrost_add_dependency asserts It doesn't... make a ton of sense to need to assert and this routine is hotter than you might expect. Doesn't matter for release builds, of course. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 14:58:34 -07:00
Marek Olšák	aafc95ceb6	radeonsi: add support for Renoir Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-08-14 17:31:04 -04:00
Eric Engestrom	a3d6024199	meson: add nir tests to the compiler/nir test suite Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-14 22:17:06 +01:00
Eric Engestrom	d0916edfcb	EGL: sync headers with Khronos Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-14 21:48:23 +01:00
Christian Gmeiner	2c4fe6af78	relnotes: Add new ext on etnaviv for 19.2. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jonathan Marek <jonathan@marek.ca>	2019-08-14 21:47:35 +02:00
Christian Gmeiner	17200bb67a	etnaviv: fix weird indentation Fixes: `797a2e4fd0` ("etnaviv: update logic to determine uniform limits") Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jonathan Marek <jonathan@marek.ca>	2019-08-14 21:29:48 +02:00
Ian Romanick	0e6581b87d	nir/algebraic: Reassociate shift-by-constant of shift-by-constant v2: After some review discussion with Alyssa, the replacements now correct account for cases where (b+c) >= bitsize. v3: Use a temporary to simplify the Python code quite a bit. Suggested by Jason. Haswell and all Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16251155 -> 16249576 (<.01%) instructions in affected programs: 232627 -> 231048 (-0.68%) helped: 547 HURT: 1 helped stats (abs) min: 1 max: 15 x̄: 2.89 x̃: 3 helped stats (rel) min: 0.04% max: 7.84% x̄: 1.14% x̃: 1.06% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% 95% mean confidence interval for instructions value: -3.12 -2.65 95% mean confidence interval for instructions %-change: -1.20% -1.06% Instructions are helped. total cycles in shared programs: 365924392 -> 365372103 (-0.15%) cycles in affected programs: 59207053 -> 58654764 (-0.93%) helped: 497 HURT: 34 helped stats (abs) min: 1 max: 29300 x̄: 1118.16 x̃: 16 helped stats (rel) min: <.01% max: 10.59% x̄: 1.82% x̃: 1.82% HURT stats (abs) min: 2 max: 424 x̄: 101.03 x̃: 63 HURT stats (rel) min: 0.07% max: 46.17% x̄: 4.72% x̃: 2.06% 95% mean confidence interval for cycles value: -1426.41 -653.77 95% mean confidence interval for cycles %-change: -1.66% -1.15% Cycles are helped. total spills in shared programs: 8870 -> 8871 (0.01%) spills in affected programs: 104 -> 105 (0.96%) helped: 0 HURT: 1 Ivy Bridge and all pre-Gen7 platforms had similar results. (Ivy Bridge shown) total instructions in shared programs: 11956236 -> 11955635 (<.01%) instructions in affected programs: 94110 -> 93509 (-0.64%) helped: 106 HURT: 0 helped stats (abs) min: 1 max: 14 x̄: 5.67 x̃: 4 helped stats (rel) min: 0.12% max: 4.71% x̄: 1.96% x̃: 0.76% 95% mean confidence interval for instructions value: -6.62 -4.72 95% mean confidence interval for instructions %-change: -2.27% -1.64% Instructions are helped. total cycles in shared programs: 179296340 -> 178788044 (-0.28%) cycles in affected programs: 51009603 -> 50501307 (-1.00%) helped: 82 HURT: 7 helped stats (abs) min: 5 max: 27820 x̄: 6199.00 x̃: 16 helped stats (rel) min: 0.30% max: 8.16% x̄: 2.58% x̃: 3.11% HURT stats (abs) min: 2 max: 8 x̄: 3.14 x̃: 2 HURT stats (rel) min: 0.02% max: 1.40% x̄: 0.34% x̃: 0.10% 95% mean confidence interval for cycles value: -7649.38 -3773.00 95% mean confidence interval for cycles %-change: -2.71% -1.99% Cycles are helped. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> [v2] Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-14 11:15:37 -07:00
Ian Romanick	73aaeac0a3	nir/algebraic: Reassociate add-and-shift to be shift-and-add A common thing in many shaders: uniform vs { vec4 bones[...]; }; ... x = some_calculation(bones[i + 0]); y = some_calculation(bones[i + 1]); z = some_calculation(bones[i + 2]); This turns into stuff like vec1 32 ssa_12 = iadd ssa_11, ssa_0 vec1 32 ssa_13 = ishl ssa_12, ssa_3 vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0) vec1 32 ssa_15 = iadd ssa_11, ssa_1 vec1 32 ssa_16 = ishl ssa_15, ssa_3 vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0) vec1 32 ssa_18 = iadd ssa_11, ssa_2 vec1 32 ssa_19 = ishl ssa_18, ssa_3 vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0) By reassociating the shift and the add, we can reduce this to vec1 32 ssa_12 = ishl ssa_11, ssa_3 vec1 32 ssa_13 = iadd ssa_12, ssa_0 vec1 32 ssa_14 = intrinsic load_ssbo (ssa_7, ssa_13) (16, 4, 0) vec1 32 ssa_16 = iadd ssa_12, ssa_1 vec1 32 ssa_17 = intrinsic load_ssbo (ssa_7, ssa_16) (16, 4, 0) vec1 32 ssa_19 = iadd ssa_12, ssa_2 vec1 32 ssa_20 = intrinsic load_ssbo (ssa_7, ssa_19) (16, 4, 0) v2: Add some commentary from Rhys Perry's nearly identical patch. All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16277758 -> 16250704 (-0.17%) instructions in affected programs: 1440284 -> 1413230 (-1.88%) helped: 4920 HURT: 6 helped stats (abs) min: 1 max: 69 x̄: 5.50 x̃: 4 helped stats (rel) min: 0.10% max: 18.33% x̄: 2.21% x̃: 1.79% HURT stats (abs) min: 1 max: 12 x̄: 4.50 x̃: 3 HURT stats (rel) min: 0.18% max: 3.23% x̄: 1.91% x̃: 2.55% 95% mean confidence interval for instructions value: -5.67 -5.31 95% mean confidence interval for instructions %-change: -2.26% -2.16% Instructions are helped. total cycles in shared programs: 367118526 -> 365895358 (-0.33%) cycles in affected programs: 93504145 -> 92280977 (-1.31%) helped: 2754 HURT: 1269 helped stats (abs) min: 1 max: 47039 x̄: 460.66 x̃: 16 helped stats (rel) min: <.01% max: 34.93% x̄: 3.77% x̃: 1.12% HURT stats (abs) min: 1 max: 1500 x̄: 35.85 x̃: 9 HURT stats (rel) min: 0.01% max: 17.35% x̄: 2.18% x̃: 0.75% 95% mean confidence interval for cycles value: -387.31 -220.78 95% mean confidence interval for cycles %-change: -2.11% -1.68% Cycles are helped. LOST: 1 GAINED: 1 Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-14 11:15:32 -07:00
Andrii Simiklit	ff2225cf88	nir/find_array_copies: Reject copies with mismatched lengths copy_deref for wildcard dereferences requires the same arrays lengths otherwise it leads to a crash in optimizations like 'nir_opt_copy_prop_vars' because these optimizations expect 'copy_deref' just for arrays with the same lengths. v2: check was moved to 'try_match_deref' to fix aoa cases (Jason Ekstrand <jason@jlekstrand.net>) v3: -fixed comment -the condition merged with other one (Jason Ekstrand <jason@jlekstrand.net>) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111286 Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>	2019-08-14 18:11:31 +00:00
Alyssa Rosenzweig	c4a4f3db5a	pan/midgard: Prefix blobber-db output for grepping Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 10:31:09 -07:00
Alyssa Rosenzweig	5f0f9e1333	pan/midgard: Implement blobber-db We wire through some shader-db-style stats on the current shader in the disassemble so we can get a quick estimate of shader complexity from a trace. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Suggested-by: Rob Clark <robdclark@chromium.org>	2019-08-14 10:31:09 -07:00
Alyssa Rosenzweig	863bdd1f8d	pan/midgard: Break, not return, in disassembler We'll want to dump some stats after the shader, and I refuse to use one teensy little goto. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-14 10:31:09 -07:00
Ian Romanick	f2965fde9b	nir/range-analysis: Fail gracefully on non-SSA sources Tested-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-14 09:02:38 -07:00
Christian Gmeiner	1290cc3e27	etnaviv: split destroy_shader Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jonathan Marek <jonathan@marek.ca>	2019-08-14 15:10:07 +02:00
Christian Gmeiner	f90b23b8c4	etnaviv: split link_shader Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jonathan Marek <jonathan@marek.ca>	2019-08-14 15:10:07 +02:00
Christian Gmeiner	0765a1dd0e	etnaviv: split dump_shader Also this adds the missing impl for etna_dump_shader_nir(..). Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jonathan Marek <jonathan@marek.ca>	2019-08-14 15:10:07 +02:00
Christian Gmeiner	a36d04daa1	etnaviv: mv etnaviv_compiler.c etnaviv_compiler_tgsi.c Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jonathan Marek <jonathan@marek.ca>	2019-08-14 15:10:07 +02:00
Christian Gmeiner	b2da8a8357	etnaviv: correct PIPE_SHADER_CAP_MAX_CONST_BUFFER_SIZE handling Have a correct answer to GL_MAX_FRAGMENT_UNIFORM_VECTORS and GL_MAX_VERTEX_UNIFORM_VECTORS. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach l.stach@pengutronix.de	2019-08-14 12:29:56 +02:00
Christian Gmeiner	797a2e4fd0	etnaviv: update logic to determine uniform limits Taken 1:1 from the header file. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach l.stach@pengutronix.de	2019-08-14 12:29:56 +02:00
Christian Gmeiner	45cb5eee5d	etnaviv: put uniform limit determination into own function Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach l.stach@pengutronix.de	2019-08-14 12:29:56 +02:00
Marek Vasut	8f97262cdd	etnaviv: Use reentrant screen lock around flush The flush callback may be called on the same pipe context, and thus the same stream, from two different threads of execution. However, etna_cmd_stream_flush{,2}() must not be called on the same stream from two different threads of execution as that would mess up the etna_bo refcounting and likely have other ugly side effects. Fix this by using a reentrant screen lock around the flush callback. Signed-off-by: Marek Vasut <marex@denx.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>	2019-08-14 10:36:36 +02:00
Marek Vasut	6bb4b6d078	etnaviv: Add valgrind support Add Valgrind support for etnaviv to track BO leaks. Signed-off-by: Marek Vasut <marex@denx.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>	2019-08-14 10:36:20 +02:00
Marek Vasut	cf92074277	etnaviv: Use hash table to track BO indexes Use hash table instead of ad-hoc arrays. Signed-off-by: Marek Vasut <marex@denx.de> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>	2019-08-14 10:36:04 +02:00
Marek Vasut	23f5f126d5	etnaviv: Fix double-free in etna_bo_cache_free() The following situation can happen in a multithreaded OpenGL application. A BO is submitted from etna_cmd_stream #1 with flags set for read. A BO is submitted from etna_cmd_stream #2 with flags set for write. This triggers a flush on stream #1 and clears the BO's current_stream pointer. If at this point, stream #2 attempts to queue BO again, which does happen, the BO will be added to the submit list twice. The Linux kernel driver correctly detects this and warns about it with "BO at index %u already on submit list" kernel message. However, when cleaning the BO cache in etna_bo_cache_free(), the BO which was submitted twice will also be free()d twice, this triggering a glibc double free detector. The fix is easy, even if the BO does not have current_stream set, iterate over current streams' list of BOs before adding the BO to it and verify that the BO is not yet there. Signed-off-by: Marek Vasut <marex@denx.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>	2019-08-14 10:35:48 +02:00
Roman Stratiienko	1ea95e37cc	kmsro: Add missing definitions to Android.mk Signed-off-by: Roman Stratiienko <roman.stratiienko@globallogic.com> Reviewed-by: Rob Herring robh@kernel.org	2019-08-14 07:39:53 +00:00
Gert Wollny	742d3c918f	softpipe: Add support for ARB_derivative_control Enables and passes piglits: spec/ARB_drivative_control/ dfdx-coarse dfdx-dfdy dfdx-fine dfdy-coarse dfdy-fine Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-08-14 07:03:15 +00:00
Vasily Khoruzhick	b579af77f3	lima/ppir: print srcs and dests in ppir_node_print_prog() Now we have an accessors for ppir src, so it's possible to easily print all srcs and dests while dumping ppir representation. Reviewed-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-08-13 22:44:07 -07:00
Vasily Khoruzhick	6920710af5	lima/ppir: use src accessors in ppir regalloc Get rid of most switch/case by using src accessors Reviewed-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-08-13 22:44:07 -07:00
Vasily Khoruzhick	a5e7c12ced	lima/ppir: add ppir_node to ppir_src We'll need it if we want to walk through node sources Reviewed-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-08-13 22:43:58 -07:00
Vasily Khoruzhick	afa64a2105	lima/ppir: introduce accessors for ppir_node sources Sometimes we need to walk through ppir_node sources, common accessor for all node types will simplify code a lot. Reviewed-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-08-13 22:38:07 -07:00
Jordan Justen	0f5be81edd	iris: Expose aux buffer as 2nd plane w/modifiers Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-13 15:20:47 -07:00
Jordan Justen	246eebba4a	iris: Export and import surfaces with modifiers that have aux data The DRI interface for modifiers with aux data treats the aux data as a separate plane of the main surface. When the dri layer requests the plane associated with the aux data, we save the required information into the dri aux plane image. Later when the image is used, the dri plane image will be available in the pipe_resource structure's `next` field. Therefore in iris, we reconstruct the aux setup from this separate dri plane image when the image is used. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-13 15:20:47 -07:00
Kenneth Graunke	99c8eb997d	iris: Do proper format checks for Y+CCS modifier support We need to ensure that the DRI image format supports CCS. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-08-13 15:20:47 -07:00
Jordan Justen	51f941c20c	iris: Create single bo for surfaces with modifiers and aux data Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-13 15:20:47 -07:00
Jordan Justen	2c7b577e13	iris: Split iris_resource_alloc_aux to enable aux modifiers Reworks: * If the aux-state is not ISL_AUX_STATE_AUX_INVALID, then use memset even when memset_value is zero. The hiz buffer initial aux-state will be set to invalid, and therefore we can skip the memset. But, for CCS it will be set to ISL_AUX_STATE_PASS_THROUGH, and therefore the aux data must be cleared to 0 with the memset. Previously we would use BO_ALLOC_ZEROED with the CCS aux data, so this memset wasn't required. Now, the CCS aux data may be part of the main surface. We prefer to not use BO_ALLOC_ZEROED excessively, so the memset is needed for the CCS case. (Nanley) Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-13 15:20:46 -07:00
Jordan Justen	aad36dfd16	iris: Add aux offset into hiz_address This is not currently required because the hiz buffer is in a separate buffer, and therefore the offset is 0. If we combine the aux buffer with the main surface buffer, then the hiz offset may become non-zero. Suggested-by: Nanley Chery <nanley.g.chery@intel.com> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-13 15:20:39 -07:00
Marek Olšák	f5e1f9ccef	tgsi_to_nir: add assertions for max varying slots Nine uses GENERIC slots > 31. Trivial.	2019-08-13 18:15:53 -04:00
Marek Olšák	fad962eddc	tgsi_to_nir: expand vec3 system values to vec4 for nir_intrinsic_load_work_group_id Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-13 18:15:53 -04:00
Marek Olšák	88a511bd42	tgsi_to_nir: fix incorrect number of image src1 components Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-13 18:15:53 -04:00
Mauro Rossi	37841f52b2	i965/gen11: fix genX_bits.h include path Instead of "genX_bits.h" use "genxml/genX_bits.h" as already done in other similar cases Besides being more correct, it also fixes building error in Android. Fixes: `f0d2923` ("i965/gen11: Emit SLICE_HASH_TABLE when pipes are unbalanced.") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-08-13 23:58:25 +02:00
Alyssa Rosenzweig	0c56330361	panfrost: Workaround bug in partial update implementation We can't intersect with empty regions. Fixes: `65ae86b854` ("panfrost: Add support for KHR_partial_update()") Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-13 11:13:48 -07:00
Eric Anholt	46daaca55e	gitlab-ci: Run the GLES2 CTS on llvmpipe. This is the start of doing CTS tests on merges to Mesa master. We use the surfaceless platform so that we don't need to bother bringing up weston or X11. The surface size is kept low to reduce runtime, but this comes at the cost of many rendering tests skipping due to too-small render targets (as we see the impact of Mesa on the shared runner pool, we can reevaluate this and what set of CTS tests we want to run). We split the job up across 4 runners (each at 4 llvmpipe threads), so that the job can load-balance across our shared runners and finish sooner (since dEQP is very single-thread-performance bound). Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-13 10:30:01 -07:00
Eric Anholt	ab49873b44	gitlab-ci: Switch the meson-main build type to debugoptimized. Now that we're running the drivers we build, building with optimization is important for keeping our runtime down. Shaves about 4 minutes of runtime off of GLES2 CTS of llvmpipe at 64x64. v2: Only switch meson-main until we enable CTS for other builds on request by Michel. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-13 10:30:01 -07:00
Eric Anholt	9605749f99	gitlab-ci: Set the prefix to ./install instead of the DESTDIR. If we don't set DESTDIR, then the DEFAULT_DRIVER_DIR built into the libraries is correct and we don't need to use LIBGL_DRIVERS_PATH and friends for CI usage. Incidentally, this moves our installed paths from /builds/anholt/mesa/install/usr/local/lib (for example) to /builds/anholt/mesa/install/lib for simplicity. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-13 10:30:01 -07:00
Eric Anholt	f417ced5cc	gitlab-ci: Build the CTS in the debian build image. This will let us reuse the image for test runs. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-13 10:30:01 -07:00
Eric Anholt	86ae3c2186	surfaceless: Fix swrast-path segfault when loader doesn't know driver name. If we're hitting the swrast fallback path here, it's probably because we stumbled across a KMS-only device (such as the ASpeed that some of our CI runners have) that will then return a NULL driver_name. Don't crash in that case. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-13 10:30:01 -07:00
Eric Anholt	6a8d39dccd	surfaceless: Fix swrast path. We get a getDrawableInfo() call in the MakeCurrent path, which platform_device was handling correctly by returning the pbuffer's width/height but platform_surfaceless segfaulted for. Reuse platform_device's implementation. Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-08-13 10:29:34 -07:00
Eric Anholt	030aa6e184	gitlab-ci: Move around which builds cover which swrast. I want to enable CI of llvmpipe out of the meson-main build. So, kick classic swrast/osmesa to meson-i386, then promote llvmpipe to meson-main (along with nine, now that classic osmesa isn't keeping it out of there). Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-13 10:29:34 -07:00
Eric Anholt	b816edcbf4	meson: Don't require DRI classic swrast for OSMesa. OSMesa doesn't care about this build option, it links against src/mesa/swrast regardless. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2019-08-13 10:29:34 -07:00
Alyssa Rosenzweig	29cfd154e3	panfrost: Implement transform feedback Midgard has no hardware support for transform feedback, so we simulate it in software. Lucky us. What Midgard does do is write out vertex shader outputs to main memory unconditonally. Fragment shaders read varyings back from main memory; there's no on-chip storage for varyings. Whether this was a reasonable design is a question I will not be engaging in this commit message. What that does mean is that, in some sense, Midgard always does transform feedback uncondtionally, and there's no way to turn off transform feedback. Normally, we would allocate some scratch memory every frame to store the varyings in an arbitrary format (interleaved for simplicity), and then feed that scratch to the fragment shader and discard when the rendering completes. The only difference now is that sometimes, for some buffers, we use a BO provided to us by Gallium and a format provided by Gallium, instead of allocating the memory and choosing the format ourselves. This has some limitations -- in particular, it only works at vec4 granularity, so a corresponding GLSL linkage patch is needed to correctly implement transform feedback for non-vec4 types. Nevertheless, given the hardware already works in this admittedly-bizarre fashion, transform feedback is "free". Or, at least, it's no more expensive than any other rendering. Specifically not implemented is dynamically-sized transform feedback (i.e. with geometry/tesselation shaders). Spoiler alert: Midgard has no support for geometry or tessellation shaders, despite advertising support. They get compiled to massive compute shaders. How's that for checkbox compliance? Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-08-13 09:43:41 -07:00
Alyssa Rosenzweig	7c29588c07	panfrost: Increment offsets[] per draw We have to maintain the internal offset ourselves. Per v3d. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-08-13 09:43:39 -07:00
Alyssa Rosenzweig	e7a05a601e	panfrost: Fixup stream out information per variant We could probably get away with doing this once per pipe_shader_state but let's not jump down that rabbit hole quite yet. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-08-13 09:43:32 -07:00
Alyssa Rosenzweig	5b0a1a4e49	panfrost: Route outputs_written through the compiler It's there in shader_info, but we need to access it from pan_context.c Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-08-13 09:43:17 -07:00
Alyssa Rosenzweig	f714eab882	panfrost: Import stream out utility from iris We'll need this in a moment. Ken's implementation, lightly edited for Panfrost. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-08-13 09:43:14 -07:00
Alyssa Rosenzweig	9b2514d6c6	panfrost: Flush when using transform feedback This is a huge hack to workaround incomplete BO flushing logic, but it's enough for the dEQP transform feedback tests, and doing the resource management to get this right is out-of-scope for this patch series. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-08-13 09:43:11 -07:00
Alyssa Rosenzweig	4b0001c42d	panfrost: Set PIPE_CAP_TGSI_TEXCOORD It doesn't really make sense, since we don't have special texture coordinate varyings, but it'll make some code simpler for XFB and it doesn't hurt us, even if I lose a bit of my soul setting it. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-08-13 09:43:09 -07:00
Alyssa Rosenzweig	72fc06df9c	panfrost: Wire up statistics for primitives GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN should now be handled. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-08-13 09:43:04 -07:00
Alyssa Rosenzweig	7c224c1008	panfrost: Implement callbacks for PRIMITIVES queries We're just going to compute them in the driver but let's get the structures setup to handle them. Implementation from v3d. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-08-13 09:42:48 -07:00
Rob Clark	72d086fc36	freedreno/a6xx: move SSBO/image consts to IBO stateobj Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:11:26 -07:00
Rob Clark	ab01ab4d4f	freedreno/a6xx: move VS driverparams to it's own stateobj If driver-params are required, we really should emit it on every draw for correctness. And if not required, we should emit a DISABLE so that un-applied state updates from previous draws don't corrupt the const state. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:11:26 -07:00
Rob Clark	882d53d8e3	freedreno/ir3+a6xx: same VBO state for draw/binning Worth ~+20% on gl_driver2 Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:11:26 -07:00
Rob Clark	4b82d1bbb7	freedreno/a6xx: add fd_emit_take_group() Which takes ownership of the stateobj. Useful for streaming state- objs, to avoid an extra ref/unref Worth ~5% at gl_driver2 Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:11:26 -07:00
Rob Clark	4a188e4215	freedreno/ir3: track # of driver params To avoid emitting unneeded const state. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:11:26 -07:00
Rob Clark	7f1e3391c6	freedreno/a6xx: move immediates to program stateobj Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:11:26 -07:00
Rob Clark	f0b91730a1	freedreno/a6xx: stop using ir3_emit_{vs,fs}_consts() Should be no functional change. Next step is to re-arrange various const state into different stateobjs. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:11:26 -07:00
Rob Clark	53667a43c4	freedreno/ir3: push ctx further up call chain Move more of the code to deal just w/ screen, without requiring ctx. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:11:26 -07:00
Rob Clark	4080dfb8af	freedreno/ir3: move ring_wfi() further up call chain Hoist them out of code-paths that will eventually be called directly for various a6xx+ const related stateobjs. This ends up duplicating one constlen check in ir3_emit_vs_consts(), to avoid what could otherwise be an unnecessary WFI on older gens. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:11:26 -07:00
Rob Clark	c6fab232c8	freedreno/all: move more emit helpers to screen framebuffer_barrier() still depends on the ctx, but the rest can move to screen. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:11:26 -07:00
Rob Clark	684f4b5843	freedreno/a3xx-a6xx+ir3: move emit_const* to screen These don't need to be in context, and we'll need them in screen in a later patch. Plus it's a good cleanup. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:11:26 -07:00
Rob Clark	566f2281c5	freedreno/a6xx: add fd6_emit_init_screen() Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:11:26 -07:00
Rob Clark	e89255b0a5	freedreno/a5xx: add fd5_emit_init_screen() Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:11:25 -07:00
Rob Clark	d256e3f34a	freedreno/a3xx: add fd3_emit_init_screen() Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:11:25 -07:00
Rob Clark	b9d3f39728	freedreno/a2xx: add fd2_emit_init_screen() Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:08:07 -07:00
Rob Clark	ec0ec641d8	freedreno/a4xx: add fd4_emit_init_screen() Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:08:07 -07:00
Rob Clark	2f94de2372	freedreno/a2xx: call fd2_emit_ib() directly from fd2 Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:08:07 -07:00
Rob Clark	eb45422c5f	freedreno/a5xx: call fd5_emit_ib() directly from fd5 Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:08:07 -07:00
Rob Clark	50e15e1c6f	freedreno/a4xx: call fd4_emit_ib() directly from fd4 Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:08:07 -07:00
Rob Clark	4326eeac97	freedreno/a3xx: call fd3_emit_ib() directly from fd3 No reason for the indirection when called from a3xx specific code. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:08:07 -07:00
Rob Clark	32014afa44	freedreno/ir3: move VS driver-param emit Move DP emit to it's own function. No functional change, just code motion to prepare for splitting up const state into multiple state- objs on a6xx. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:08:07 -07:00
Rob Clark	5722149bf1	freedreno/ir3: drop unneeded ir3_ra() args Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-13 08:08:07 -07:00
Boris Brezillon	65ae86b854	panfrost: Add support for KHR_partial_update() Implement ->set_damage_region() region to support partial updates. This is a dummy implementation in that it does not try to merge damage rects. It also does not deal with distinct regions and instead pick the largest quad as the only damage rect and generate up to 4 reload rects out of it (the left/right/top/bottom regions surrounding the biggest damage rect). We also do not try to reduce the number of draws by passing all quad vertices to the blit request (would require extending u_blitter) Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-13 14:41:10 +02:00
Daniel Stone	492ffbed63	st/dri2: Implement DRI2bufferDamageExtension Add a pipe_screen->set_damage_region() hook to propagate set-damage-region requests to the driver, it's then up to the driver to decide what to do with this piece of information. If the hook is left unassigned, the buffer-damage extension is considered unsupported. Signed-off-by: Daniel Stone <daniels@collabora.com> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-13 14:40:45 +02:00
Harish Krupo	a4a8ebe156	egl/dri: Use __DRI2_BUFFER_DAMAGE extension for KHR_partial_update Use the DRI2 interface callback to pass the damage rects to the driver. Signed-off-by: Harish Krupo <harishkrupo@gmail.com> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Qiang Yu <yuq825@gmail.com> Tested-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-13 14:40:31 +02:00
Daniel Stone	bd08a83b09	dri_interface: add DRI2_BufferDamage interface Add a new DRI2_BufferDamage interface to support the EGL_KHR_partial_update extension, informing the driver of an overriding scissor region for a particular drawable. Based on a commit originally authored by: Harish Krupo <harish.krupo.kps@intel.com> renamed extension, retargeted at DRI drawable instead of context, rewritten description Signed-off-by: Daniel Stone <daniels@collabora.com> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Qiang Yu <yuq825@gmail.com> Tested-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-13 14:40:14 +02:00
Harish Krupo	b4345da876	egl/android: Delete set_damage_region from egl dri vtbl The intension of the KHR_partial_update was not to send the damage back to the platform but to send the damage to the driver to ensure that the following rendering could be restricted to those regions. This patch removes the set_damage_region from the egl_dri vtbl and all the platfrom_*.c files. Then upcomming patches add a new dri2 interface for the drivers to implement Signed-off-by: Harish Krupo <harishkrupo@gmail.com> Reviewed-by: Daniel Stone <daniels@collabora.com> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Qiang Yu <yuq825@gmail.com> Tested-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-13 14:39:38 +02:00
Jordan Justen	fc12fd05f5	iris: Implement pipe_screen::resource_get_param Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-13 01:12:30 -07:00
Jordan Justen	3198c5b7bf	gallium/dri2: Use pipe_screen::resource_get_param in image queries Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Eric Anholt <eric@anholt.net>	2019-08-13 01:12:29 -07:00
Jordan Justen	2decad495f	gallium/dri2: Support images with multiple planes for modifiers Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Eric Anholt <eric@anholt.net>	2019-08-13 01:12:29 -07:00
Jordan Justen	6e749a6b2b	gallium/dri2: Refactor image property queries This refactor will let us more easily use pipe_screen::resource_get_param as an alternative to pipe_screen::resource_get_handle. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Eric Anholt <eric@anholt.net>	2019-08-13 01:12:29 -07:00
Jordan Justen	c5c2365455	state_tracker/winsys_handle: Add plane input field Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Eric Anholt <eric@anholt.net>	2019-08-13 01:12:29 -07:00
Jordan Justen	2066966c10	gallium/dri2: Support creating multi-planar modifier images Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Eric Anholt <eric@anholt.net>	2019-08-13 01:12:29 -07:00
Jordan Justen	fe06655e86	gallium/dri2: Implement dri2ImageExtension.queryDmaBufFormatModifierAttribs Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Eric Anholt <eric@anholt.net>	2019-08-13 01:12:29 -07:00
Jordan Justen	0346b70083	gallium/screen: Add pipe_screen::resource_get_param This function retrieves individual parameters selected by enum pipe_resource_param. It can be used as a more direct alternative to pipe_screen::resource_get_handle. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Eric Anholt <eric@anholt.net>	2019-08-13 01:12:24 -07:00
Iago Toral Quiroga	2353f7f7ef	vc4: clamp gl_PointSize to a minimum of 1.0 The OpenGL ES spec requires that the value of gl_PointSize is clamped to an implementation-dependent range matching what is advertised by GL_ALIASED_POINT_SIZE_RANGE. For VC4 this is [1.0, 512.0], but the hardware won't clamp to the minimum side of the range and won't render points with a size strictly smaller than 1.0 either, so we need to clamp manually. For points larger than the maximum size of the range the hardware clamps automatically. Fixes piglit test: spec/!opengl 2.0/vs-point_size-zero Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-13 09:44:54 +02:00
Iago Toral Quiroga	3539bd63dd	v3d: clamp gl_PointSize to a minimum of 1.0 The OpenGL ES spec requires that the value of gl_PointSize is clamped to an implementation-dependent range matching what is advertised by GL_ALIASED_POINT_SIZE_RANGE. For V3D this is [1.0, 512.0], but the hardware won't clamp to the minimum side of the range and won't render points with a size strictly smaller than 1.0 either, so we need to clamp manually. For points larger than the maximum size of the range the hardware clamps automatically. Fixes piglit test: spec/!opengl 2.0/vs-point_size-zero Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-13 09:44:54 +02:00
Iago Toral Quiroga	48f5c34301	nir: add a pass to clamp gl_PointSize to a range The OpenGL and OpenGL ES specs require that implementations clamp the value of gl_PointSize to an implementation-depedent range. This pass is useful for any GPU hardware that doesn't do this automatically for either one or both sides of the range, such as V3D. v2: - Turn into a generic NIR pass (Eric). - Make the pass work before lower I/O so we can use the deref variable to inspect if we are writing to gl_PointSize (Eric). - Make the pass take the range to clamp as parameter and allow it to clamp to both sides of the range or just one side. - Make the pass report progress. v3: - Fix copyright header (Eric) - use fmin/fmax instead of bcsel to clamp (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-13 09:44:12 +02:00
Iago Toral Quiroga	62e0ca3064	v3d: line length style fixes Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-13 08:38:19 +02:00
Iago Toral Quiroga	99e9809cab	v3d: honor the write mask on store operations v2: - Fix incremental update of the const offset when we need to emit a sequence with more than one write because of the writemask. - Do not move the tmu write emission to a separate helper. v3: - Get the store writemask before the loop, use ffs to get the first component to write and clear writemask bits as we process the components (Eric). - Simplified the code that figured out the number of components for the TMU config based on the number of tmu writes for stores and atomics. v4: - Code clean-ups (Eric). Fixes: KHR-GLES31.core.shader_image_load_store.advanced-cast-cs KHR-GLES31.core.shader_image_load_store.advanced-cast-fs KHR-GLES31.core.shader_storage_buffer_object.advanced-switchBuffers-cs KHR-GLES31.core.shader_storage_buffer_object.advanced-switchPrograms-cs KHR-GLES31.core.shader_storage_buffer_object.basic-operations-case1-cs Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-13 08:38:19 +02:00
Iago Toral Quiroga	3d65d2a488	v3d: refactor ntq_emit_tmu_general() slightly When we implement write masks on store operations we might need to emit multiple write sequences for a given store intrinsic. To make that easier, let's split the emission of the tmud instructions to their own block after we are done with the code that only needs to run once no matter how many write sequences we need to emit. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-13 08:38:19 +02:00
Iago Toral Quiroga	b594796f1b	v3d: do not automatically flush current job for SSBOs and shader images If the current job has a sequence of draw calls involving SSBOs and/or shader images, we would flush the job in between each draw call. With this change, we won't flush the current job and we rely on the application inserting correct barriers by issuing glMemoryBarrier() when needed. v2 (Eric): - When mapping a buffer for writing, we always need to flush. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-13 08:25:15 +02:00
Iago Toral Quiroga	f1cf1153e8	v3d: only process glMemoryBarrier() for SSBOs and images PIPE_BARRIER_UPDATE is defined as: PIPE_BARRIER_UPDATE_BUFFER \| PIPE_BARRIER_UPDATE_TEXTURE Which means we were flushing for any flags other than these two, but this was intended to only flush for ssbos and images. Actually, the driver automatically flushes jobs as we need, including writes/reads involving SSBOs and images, so we don't really need to flush anything when the program emits a barrier. However, this may lead to excessive flushing in some cases, so we will soon change this to avoid atutomatic flushing of the current job for SSBOs and images, meaning that we will rely on the application to emit correct memory barriers for these that we should make sure to process here. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-13 08:25:15 +02:00
Iago Toral Quiroga	f1559ca922	v3d: fix flushing of SSBOs and shader images If the current draw call includes SSBOs, then we must flush any jobs that are writing to the same SSBOs (so that our SSBOs reads are correct), as well as jobs reading from the same SSBO (so that our SSBO writes don't stomp previous SSBO reads). The exact same logic applies to shader images. In this case we were already flushing previous writes, but we should also flush previous reads. Note that We don't need to call v3d_flush_jobs_reading_resource() and v3d_flush_jobs_writing_resource() separately though, since flushing jobs that read a resource also flushes those writing to it. Suggested-by: Eric Anholt <eric@anholt.net> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-13 08:25:15 +02:00
Caio Marcelo de Oliveira Filho	1021abab07	intel/tools: Fix aub_file initialization in intel_dump_gpu The `device` can be set earlier either by a command line or a by intercepting an ioctl call to get the I915_PARAM_CHIPSET_ID done by the application early. In both cases `aub_file` and `devinfo` would not be initialized. Fix by splitting the conditions - `device == 0`: use the FD to get both device and devinfo. - Or `devinfo.gen == 0`: use `device` to initialize it. And separatedly, initialize aub_file the first time it is needed. Fixes: `d594d2a052` ("intel/tools: use device info initializer") Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-12 19:18:26 -07:00
Rafael Antognolli	f0d29238df	i965/gen11: Emit SLICE_HASH_TABLE when pipes are unbalanced. If the pixel pipes have a different number of subslices, emit a slice hashing table that will ensure proper workload distribution. v2: Set Mask field to 0xffff for workaround (Ken).	2019-08-12 16:19:08 -07:00
Rafael Antognolli	7bc022b4bb	anv/gen11: Emit SLICE_HASH_TABLE when pipes are unbalanced. If the pixel pipes have a different number of subslices, emit a slice hashing table that will ensure proper workload distribution. v2: Don't need to set the mask - it's mbo (Ken).	2019-08-12 16:19:08 -07:00
Rafael Antognolli	a1a499e7fe	iris/gen11: Emit SLICE_HASH_TABLE when pipes are unbalanced. If the pixel pipes have a different number of subslices, emit a slice hashing table that will ensure proper workload distribution. v2: Don't need to set the mask - it's mbo (Ken). v3: Don't keep a reference to the resource used for emitting the table (Ken).	2019-08-12 16:19:08 -07:00
Rafael Antognolli	ad513fd386	intel: Get information about pixel pipes subslices. v2: Use 1 instead of 1UL (Ken).	2019-08-12 16:19:08 -07:00
Rafael Antognolli	32344dc581	intel/gen_decoder: Decode SLICE_HASH_TABLE.	2019-08-12 16:19:08 -07:00
Rafael Antognolli	e1cb71c182	intel/genxml: Update 3D_MODE and add SLICE_HASH_TABLE. Add these fields and the 3DSTATE_SLICE_TABLE_STATE_POINTERS instruction so we can properly configure the slice and subslice hashing on ICL+ v2: Make 'Mask' field a mbo (Ken).	2019-08-12 16:19:08 -07:00
Jason Ekstrand	d787a2d05e	anv: Implement VK_KHR_pipeline_executable_properties Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-12 22:56:07 +00:00
Jason Ekstrand	67cb55ad11	anv: Add a ralloc context to anv_pipeline Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-12 22:56:07 +00:00
Jason Ekstrand	fec4bdff40	anv: Force a full re-compile when CAPTURE_INTERNAL_REPRESENTATION_TEXT is set Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-12 22:56:07 +00:00
Jason Ekstrand	651fbbf9b8	anv/pipeline: Split setting up per-stage keys into its own loop Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-12 22:56:07 +00:00
Jason Ekstrand	78f3dfb4a2	anv: Record shader compile stats in the pipeline cache We're going to want these to be available regardless of caching. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-12 22:56:07 +00:00
Jason Ekstrand	2af380d20f	anv/pipeline: Stash generated code in the pipeline stage Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-12 22:56:07 +00:00
Jason Ekstrand	8d3cbd0393	intel/fs: Add SLM size to brw_cs_prog_data We don't need it for state setup but it's a useful statistic we want to pass on to developers. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-12 22:56:07 +00:00
Jason Ekstrand	134607760a	intel/compiler: Fill a compiler statistics struct This commit is all annoying plumbing work which just adds support for a new brw_compile_stats struct. This struct provides a binary driver readable form of the same statistics we dump out to stderr when we INTEL_DEBUG is set with a shader stage. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-12 22:56:07 +00:00
Khaled Emara	2720ad5fd9	freedreno: disable tiling for cubemaps Tiling doesn't work quite well with cubemaps. Revert to linear textures, until it's fixed.	2019-08-12 22:30:54 +00:00
Khaled Emara	0ae16fb565	freedreno: add tiling parameters for 2D/2DArray/3D	2019-08-12 22:30:54 +00:00
Khaled Emara	aeaba3e4a6	freedreno: simplified slices setup for a3xx a3xx doesn't support ASTC and layout_first always returns false	2019-08-12 22:30:54 +00:00
Khaled Emara	e11a239e8c	freedreno: enable tiled textures for debug builds	2019-08-12 22:30:54 +00:00
Paulo Zanoni	866bb775de	intel/fs: add 64 bit integer multiplication lowering While NIR's lower_imul64() solves the case of 64 bit integer multiplications generated early, we don't have a way to lower such instructions when they are generated by our own backend, such as the scan/reduce intrinsics. We'll need this soon, so implement it now. An easy way to test this is to simply disable nir_lower_imul64 to let those operations reach the backend. v2: - Fix Q/UQ copy/paste errors (Caio). - Transform an 'if' into 'else if' (Caio). - Add an extra comment to clarify the need for 64b = 32b * 32b (Caio). - Make private functions private (Caio). v3: - Remove ambiguity with 'b' and 'd' variables (Caio). - Allocate potentially less regs for the dwords (Caio). Cc: Jason Ekstrand <jason.ekstrand@intel.com> Cc: Matt Turner <matt.turner@intel.com> Cc: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-08-12 15:16:23 -07:00
Paulo Zanoni	9217cf3b5e	intel/compiler: invert the logic of lower_integer_multiplication() Invert the logic of how progress is handled: remove the continue statements and mark progress inside the places where it actually happens. We're going to add a new lowering that also looks for BRW_OPCODE_MUL, so inverting the logic here makes the resulting code much easier to follow. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-08-12 15:16:23 -07:00
Paulo Zanoni	6ba4717924	intel/compiler: don't instantiate a builder for each instruction Don't instantiate a builder for each instruction during lower_integer_multiplication(). Instantiate one only when needed. On the other hand, these unneeded builders don't seem to cost much to init, so I don't expect any significant difference in performance: this is mostly about code organization. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-08-12 15:16:23 -07:00
Paulo Zanoni	75b3868dcc	intel/compiler: extract subfunctions of lower_integer_multiplication() The lower_integer_multiplication() function is already a little too big. I want to add more to it, so let's reorganize the existing code first. Let's start with just extracting the current code to subfunctions. Later we'll change them a little more. v2: Make private functions private (Caio). v3: Fix typo (Caio). Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-08-12 15:16:23 -07:00
Rhys Perry	7740149852	nir: merge and extend nir_opt_move_comparisons and nir_opt_move_load_ubo v2: add to series v3: update Makefile.sources v4: don't remove a comment and break statement v4: use nir_can_move_instr Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-12 22:01:30 +00:00
Rhys Perry	da8ed68aca	nir: replace nir_move_load_const() with nir_opt_sink() This is mostly the same as nir_move_load_const() but can also move undef instructions, comparisons and some intrinsics (being careful with loops). v2: actually delete nir_move_load_const.c v3: fix nir_opt_sink() usage in freedreno v3: update Makefile.sources v4: replace get_move_def with nir_can_move_instr and nir_instr_ssa_def v4: handle if uses v4: fix handling of nested loops v5: re-write adjust_block_for_loops v5: re-write setting of use_block for if uses Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Co-authored-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-12 22:01:30 +00:00
Francisco Jerez	c2fe7a0fb8	anv/gen9: Optimize slice and subslice load balancing behavior. See "i965/gen9: Optimize slice and subslice load balancing behavior." for the rationale. According to Jason, improves Aztec Ruins performance by 2.7%. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1) v2: Undo CPU performance micro-optimization done in i965 and iris due to lack of data justifying it on anv. Use cmd_buffer_apply_pipe_flushes wrapper instead of emitting pipe control command directly. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-12 14:40:21 -07:00
Andreas Baierl	1c45541c7f	lima/ppir: Add fddx and fddy Lower fddx and fddy and set the right bits in codegen. Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Reviewed-by: Erico Nunes <nunes.erico@gmail.com>	2019-08-12 23:20:04 +02:00
Bas Nieuwenhuizen	f1da129220	radv: Enable VK_KHR_pipeline_executable_properties. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen	afad67cd7a	radv: Implement radv_GetPipelineExecutableStatisticsKHR. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen	35302f0189	radv: Implement radv_GetPipelineExecutableInternalRepresentationsKHR. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen	86864eedd2	radv: Implement radv_GetPipelineExecutablePropertiesKHR. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen	8874af8ef4	radv: Keep shader info when needed. This allows enabling the shader info keeping on a per shader basis. Also disables the cache on a per shader basis. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen	e8a256eb54	radv: Add VK_KHR_pipeline_executable_properties in disabled state. So we can add the functions. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen	5444d3e0c2	radv: Use string for nir dumping. Reviewed-by: Dave Airlie <airlied@redhat.com> Allows us to easily dump all nir shaders for combined variants in vega and simplifies ownership.	2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen	739a2880f5	radv: Get max workgroup size without nir. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 23:00:24 +02:00
Bas Nieuwenhuizen	290ca0c4dd	radv: Add utility function to calculate max waves. Not AC because a lot of it is data extraction out of radv structs. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 23:00:24 +02:00
Francisco Jerez	026773397b	iris/gen9: Optimize slice and subslice load balancing behavior. See "i965/gen9: Optimize slice and subslice load balancing behavior." for the rationale. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-12 13:17:58 -07:00
Francisco Jerez	03cba9f5d9	intel/genxml: Add GT_MODE hashing defs for Gen9. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-12 13:17:58 -07:00
Francisco Jerez	9406b3a5c1	i965/gen9: Optimize slice and subslice load balancing behavior. The default pixel hashing mode settings used for slice and subslice load balancing are far from optimal under certain conditions (see the comments below for the gory details). The top-of-the-line GT4 parts suffer from a particularly severe performance problem currently due to a subslice load balancing issue. Fixing this seems to improve graphics performance across the board for most of the benchmarks in my test set, up to ~20% in some cases, e.g. from SKL GT4: unigine/valley: 3.44% ±0.11% gfxbench/gl_manhattan31: 3.99% ±0.13% gputest/pixmark_piano: 7.95% ±0.33% synmark/OglTexFilterAniso: 15.22% ±0.07% synmark/OglTexMem128: 22.26% ±0.06% Lower-end platforms are also affected by some subslice load imbalance to a lesser degree, especially during CCS resolve and fast clear operations, which are handled specially here due to rasterization ocurring in reduced CCS coordinates, which changes the semantics of the pixel hashing mode settings. No regressions seen during my tests on some SKL, KBL and BXT configurations. Additional benchmark reports welcome on any Gen9 platforms (that includes anything with Skylake, Broxton, Kabylake, Geminilake, Coffeelake, Whiskey Lake, Comet Lake or Amber Lake in your renderer string). P.S.: A similar problem is likely to be present on other non-Gen9 platforms, especially for CCS resolve and fast clear operations. Will follow-up with additional patches fixing the hashing mode for those once I have enough performance data to justify it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-12 13:17:58 -07:00
Alyssa Rosenzweig	b1965831e4	pan/midgard: Handle 64-bit address in mir_mask_of_read_components This is a bit of a hack, but it'll hold us over until we have 64-bit support wired through. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:03 -07:00
Alyssa Rosenzweig	41e68094f8	pan/midgard: Allocate separate spill indices for lowered moves This helps RA be slightly more reasonable. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:03 -07:00
Alyssa Rosenzweig	14b5b9ac38	pan/midgard: Extend liveness analysis to trinary ops Fixes RA fails with multiple indirect SSBO writes. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:03 -07:00
Alyssa Rosenzweig	c690b37d76	pan/midgard: Fix load/store pairing This used a delicate hack to try to find indirect inputs and skip them as candidates for pairing. Let's use a better criterion -- no sources -- and pair based on that. We could do better, but that would require more complex data flow analysis than we're interested in doing here. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:02 -07:00
Alyssa Rosenzweig	15954ab6ca	pan/midgard: Implement nir_intrinsic_load_num_work_groups Just a sysval to route through. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:02 -07:00
Alyssa Rosenzweig	7229af794b	pan/midgard: Implement some compute builtins We implement gl_WorkGroupID and gl_LocalInvocationID, which map to ld_compute_id with special sources. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:02 -07:00
Alyssa Rosenzweig	2b4e579585	pan/midgard: Rename ld_global_id -> ld_compute_id It's used for more general loads within a compute shader. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:02 -07:00
Alyssa Rosenzweig	a5059f2cba	pan/midgard: Handle partial writes in liveness analysis This allows liveness analysis within a loop to be more fine grained, fixing RA failures with partial spilled movs within a loop, as well as enabling a slight reduction of register pressure more generally: total registers in shared programs: 350 -> 347 (-0.86%) registers in affected programs: 12 -> 9 (-25.00%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 25.00% max: 25.00% x̄: 25.00% x̃: 25.00% Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig	e333bf606f	pan/midgard: Dump "no spill"? Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig	cc3df917d3	pan/midgard: Absorb nonexistance sources Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig	0a7cc239bd	pan/midgard: Pretty-print destinations They're not "sources" but they follow the same conventions. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig	ba8ec19a64	pan/midgard: Pretty-print units Since we are seeing some use of MIR post-scheduling, let's get this printed right. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig	73f54f286a	pan/midgard: Print mask in dumped MIR Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig	2ec4f9a74b	pan/midgard: Add no_spill flag Hint for the RA to avoid infinite spilling loops. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig	7090971f2f	pan/midgard: Generalize mir_mask_of_read_components This now works for load/store and texture instructions as well as ALU. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig	419ddd63b0	pan/midgard: Implement SSBO access Just laying the groundwork. Reads and writes should be supported (both direct and indirect, either int or float, vec1/2/3/4), but no bounds checking is done at the moment. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:01 -07:00
Alyssa Rosenzweig	a8639b91b5	pan/midgard: Pipe uniform mask through when spilling This is a corner case that happens a lot with SSBOs. Basically, if we only read a few components of a uniform, we need to only spill a few components or otherwise we try to spill what we spilled and RA hangs. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:43:00 -07:00
Alyssa Rosenzweig	63e240dd05	pan/midgard: Clamp sysval component count We don't want to load a 128-bit sysval when 64-bits will do. Fixes RA failures with SSBO indirect writes. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:42:59 -07:00
Alyssa Rosenzweig	e7ac46be7a	pan/midgard: Pass uploaded midgard_instruction through We want to edit it after emission in some cases. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:42:59 -07:00
Alyssa Rosenzweig	fa68740187	pan/midgard: Allow sysval destination override Sometimes a sysval is used to facilitate an instruction but is not the instruction itself. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:42:59 -07:00
Alyssa Rosenzweig	60d80157d1	panfrost: Force flush every compute job This is of course suboptimal for performance, forcing each glDispatchCompute call to be submitted separately to the kernel and finish to completion. However, for the initial bring-up of compute jobs, this simplifies quite a bit. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:42:59 -07:00
Alyssa Rosenzweig	2efa025b05	panfrost: Add SSBO system value For each SSBO index we get from Gallium/NIR, we need two pieces of information in the shader: 1. The address of the SSBO in GPU memory. Within the shader, we'll be accessing it with raw memory load/store, so we need the actual address, not just an index. 2. The size of the SSBO. This is not strictly necessary, but at some point, we may like to do bounds checking on SSBO accesses. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-12 12:42:59 -07:00
Alyssa Rosenzweig	e881aa8c12	gallium/util: Add u_stream_outputs_for_vertices helper This u_prim.h helper determines the number of outputs for stream output, given a particular primitive type and a vertex count. This is useful for statically calculating sizes of stream output buffers (i.e. when there is no geometry/tessellation shader in use). This helper will be used in Panfrost's transform feedback implementation, as you can probably guess since why else would I be submitting it.... See also dEQP's getTransformFeedbackOutputCount routine. v2: Simplify definition using new helpers, which also extends to non-ES2 primitive types (Eric). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-12 12:22:54 -07:00
Marek Olšák	8ce4f9bbc3	radeonsi: remove the always_nir option tgsi_to_nir is no longer optional if NIR is enabled.	2019-08-12 14:52:17 -04:00
Marek Olšák	4e545f934f	radeonsi/nir: implement default tess level system values Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-08-12 14:52:17 -04:00
Marek Olšák	9c7746ceae	compiler: add SYSTEM_VALUE_TESS_LEVEL_OUTER/INNER_DEFAULT TCS system values for internal passthru TCS, needed by radeonsi NIR support Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-08-12 14:52:17 -04:00
Marek Olšák	5167ca27fa	gallium: add TGSI_SEMANTIC_DEFAULT_OUTER/INNER_LEVEL for radeonsi NIR support.	2019-08-12 14:52:17 -04:00
Marek Olšák	f8d4198998	tgsi_to_nir: handle tess level inner/outer varyings for internal radeonsi shaders Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-08-12 14:52:17 -04:00
Marek Olšák	8ac2583cd8	tgsi_to_nir: add support for the stencil FS output Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-08-12 14:52:17 -04:00
Marek Olšák	f3f1d0dfd0	tgsi_to_nir: add support for TEX_LZ Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-12 14:52:17 -04:00
Marek Olšák	1b881852bc	compiler: add SYSTEM_VALUE_USER_DATA_AMD for internal radeonsi shaders	2019-08-12 14:52:17 -04:00
Marek Olšák	f0ccc5457a	compiler: add shader_info.cs.user_data_components_amd	2019-08-12 14:52:17 -04:00
Marek Olšák	155789c8e7	tgsi_to_nir: add basic compute shader support Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-08-12 14:52:17 -04:00
Marek Olšák	5a0adfd9f0	tgsi_to_nir: add support for LOAD & STORE with SSBOs and images Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-08-12 14:52:17 -04:00
Marek Olšák	0b121cb89a	tgsi_to_nir: make setup_texture_info reusable Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-12 14:52:17 -04:00
Marek Olšák	70fd85172b	tgsi_to_nir: add support for TXF_LZ Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-08-12 14:52:17 -04:00
Marek Olšák	028dbd35ba	compiler: add shader_info.vs.blit_sgprs_amd for internal radeonsi shaders	2019-08-12 14:52:17 -04:00
Marek Olšák	e300365197	tgsi_to_nir: be careful about not losing any TGSI properties silently (v2) v2: squash with Timur Kristof's commit Reviewed-By: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-08-12 14:52:17 -04:00
Marek Olšák	8b6814211a	tgsi/scan: don't set GS_INVOCATIONS for all shader stages Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-08-12 14:52:17 -04:00
Marek Olšák	9fb2fd0b43	compiler: add ACCESS_STREAM_CACHE_POLICY radeonsi will use this. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-08-12 14:52:17 -04:00
Marek Olšák	902dd50cf0	gallium: add AMD-specific compute TGSI enums for tgsi_to_nir	2019-08-12 14:52:17 -04:00
Marek Olšák	6a2bdb8d01	gallium: add TGSI_PROPERTY_VS_BLIT_SGPRS_AMD for tgsi_to_nir needed by radeonsi NIR support	2019-08-12 14:52:17 -04:00
Marek Olšák	d1ad4fda31	st/mesa: don't allocate mipmapped texture for NEAREST_MIPMAP_LINEAR Reviewed-by: Brian Paul <brianp@vmware.com>	2019-08-12 14:52:17 -04:00
Kenneth Graunke	5180a222c0	glsl: Optimize the SoftFP64 shader when first creating it. By optimizing the shader before inlining, we avoid having to redo this work for each inlined copy of a function. It should also reduce the memory consumption a bit. This cuts the KHR-GL46.arrays_of_arrays_gl.SubroutineFunctionCalls2 runtime by 25% on my Icelake. That test compiles many shaders, which contain large types (dmat4) and division (expensive operations). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-08-12 10:42:32 -07:00
Christian Gmeiner	914ecc9384	etnaviv: fix compile warnings in release build [27/31] Compiling C object 'src/gallium/drivers/etnaviv/df32d18@@etnaviv@sta/etnaviv_compiler_nir.c.o'. In file included from ../../src/gitlab_mesa/src/gallium/drivers/etnaviv/etnaviv_compiler_nir.c:552: ../../src/gitlab_mesa/src/gallium/drivers/etnaviv/etnaviv_compiler_nir_emit.h: In function 'ra_assign': ../../src/gitlab_mesa/src/gallium/drivers/etnaviv/etnaviv_compiler_nir_emit.h:903:9: warning: unused variable 'ok' [-Wunused-variable] bool ok = ra_allocate(g); ^~ ../../src/gitlab_mesa/src/gallium/drivers/etnaviv/etnaviv_compiler_nir.c: In function 'etna_compile_shader_nir': ../../src/gitlab_mesa/src/gallium/drivers/etnaviv/etnaviv_compiler_nir.c:663:9: warning: unused variable 'ok' [-Wunused-variable] bool ok = emit_shader(c->nir, &options, &v->num_temps, &num_consts); ^~ Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Jonathan Marek <jonathan@marek.ca>	2019-08-12 16:58:13 +00:00
Bas Nieuwenhuizen	e040c1b274	radv: Do not setup attachments without a framebuffer. Test that found this: dEQP-VK.geometry.layered.1d_array.secondary_cmd_buffer Fixes: `49e6c2fb78` "radv: Store color/depth surface info in attachment info instead of framebuffer." Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 17:19:24 +02:00
Jason Ekstrand	14c96a6300	anv: Implement VK_EXT_subgroup_size_control version 2 The version bump adds a proper features struct. Fixes: `d10de25309` "anv: Implement VK_EXT_subgroup_size_control" Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-12 14:56:33 +00:00
Jason Ekstrand	8aef89cc2d	vulkan: Update the XML and headers to 1.1.119 Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-12 14:56:33 +00:00
Bas Nieuwenhuizen	d062bec48d	radv: Hash Wave32 settings in shader key. Can result in different shaders. Fixes: `8a86908e9a` "radv/gfx10: add Wave32 support for vertex, tessellation and geometry shaders" Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 13:32:18 +00:00
Bas Nieuwenhuizen	ba8d3c362b	radv: Properly use Wave64 for non-NGG GS and copy shader. Fixes: `8a86908e9a` "radv/gfx10: add Wave32 support for vertex, tessellation and geometry shaders" Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 13:32:18 +00:00
Bas Nieuwenhuizen	035406ecf7	radv: Put wave size in shader options/info. Instead of having the three values everywhere. This is also more future proof if we want the driver to make those decisions eventually. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 13:32:18 +00:00
Bas Nieuwenhuizen	71621e877f	relnotes: Make entries for radv more consistent. Always use 'on' as for the rest of the drivers. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 13:29:27 +00:00
Bas Nieuwenhuizen	38961729a8	relnotes: Add new exts on radv for 19.2. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-12 13:29:27 +00:00
Tapani Pälli	d4b574f26a	iris: reorder arguments as expected by the function CID: 1452262 Fixes: `b4c54894bb` "iris: Handle vertex shader with window space position" Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>	2019-08-12 13:08:26 +03:00
Tapani Pälli	590ba15d6e	iris/android: move iris_query.c to 'per gen' LIBIRIS_SRC_FILES Fixes Iris build on Android. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-08-12 10:06:36 +03:00
Kenneth Graunke	0f3768bc5d	iris: Free query on error path CID: 1452276	2019-08-11 14:04:31 -07:00
Kenneth Graunke	661be3fef9	iris: Add missing 'break' We don't want to fall through to unreachable(). CID: 1452277	2019-08-11 14:04:31 -07:00
Caio Marcelo de Oliveira Filho	5ed4e31c08	spirv: Drop lower_workgroup_access_to_offsets Intel drivers are not using this anymore, and turnip still don't have Compute Shaders, so won't make a difference to stop using this option. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Rob Clark <robdclark@chromium.org>	2019-08-10 22:15:35 -07:00
Caio Marcelo de Oliveira Filho	925e9142bd	i965/spirv: Lower shared memory later Instead of asking spirv_to_nir to lower the workgroup (shared memory) to offsets, keep them as derefs longer, then lower it later on. Because Workgroup memory doesn't have explicit offsets, we need to set those using nir_lower_vars_to_explicit_types before calling the I/O lowering pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-10 22:15:35 -07:00
Danylo Piliaiev	61d6be84f3	i965: Use force_compat_profile driconf option Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-10 11:39:29 -07:00
Eric Engestrom	d7eb40962b	i965: fix mem leak in error path Fixes: `8ae6667992` ("intel/perf: move query_object into perf") Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2019-08-10 12:14:56 +01:00
Eric Engestrom	1c82fa0a92	gitlab-ci: simplify $CROSS option Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-08-10 12:11:28 +01:00
Kenneth Graunke	f1dba99639	iris: minor restyling	2019-08-10 00:16:45 -07:00
Mark Janes	9c597514d4	iris/query: enable amd performance monitors Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-09 19:28:34 -07:00
Mark Janes	469af7fdc9	iris/perf: get monitor results Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-09 19:28:32 -07:00
Mark Janes	1cb4fc184f	iris/perf: add begin/end hooks Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-09 19:28:24 -07:00
Mark Janes	8c4c346665	iris/perf: add delete query Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-09 19:28:17 -07:00
Mark Janes	aca42759ff	iris/perf: implement iris_create_monitor_object This is the first call that provides the iris context to the monitor implementation. On the first call, use the iris context to initialize the monitor context. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-09 19:28:14 -07:00
Mark Janes	0fd4359733	iris/perf: implement routines to return counter info With this commit, Iris will report that AMD_performance_monitor is supported, and will allow the caller to query the available metrics. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-09 19:28:03 -07:00
Eric Engestrom	e4aa0fc63a	anv: add missing `break` Fixes: `f6e7de41d7` ("anv: Implement VK_EXT_line_rasterization") Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-09 23:34:31 +01:00
Lionel Landwerlin	e2d761de03	util: drop final reference to p_compiler.h Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-09 22:59:43 +03:00
Lionel Landwerlin	85bf1dc2de	util: os_misc: drop p_compiler.h include Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-09 22:59:43 +03:00
Lionel Landwerlin	c44c3948c7	util: u_math: drop p_compiler.h include This file was moved from gallium so drop depending on gallium headers. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-09 22:59:43 +03:00
Lionel Landwerlin	8818db8f2c	vc4: prepare for p_compiler.h dependency removal Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-09 22:59:43 +03:00
Lionel Landwerlin	8a884a25c5	amd: prepare dropping include of p_compiler.h Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-09 22:59:43 +03:00
Lionel Landwerlin	a233a3a74e	mesa: be consistent on GL_TRUE/GL_FALSE & TRUE/FALSE Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-09 22:59:43 +03:00
Lionel Landwerlin	8f4dea20fc	mesa: drop some p_compiler.h types Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-09 22:50:29 +03:00
Lionel Landwerlin	7abac7a8bf	mesa: add stddef include in preparation for dropping p_compiler.h Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-09 22:50:17 +03:00
Lionel Landwerlin	6637395073	panfrost: prepare for p_compiler.h dependency removal Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-09 22:50:03 +03:00
Lionel Landwerlin	351c2ad157	i965: don't use p_compiler.h types Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-09 22:49:48 +03:00
Eric Engestrom	9be5ce1d73	gitlab-ci: generate meson cross-files earlier Suggested-by: Michel Dänzer <michel@daenzer.net> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-09 20:07:50 +01:00
Alyssa Rosenzweig	9bc99e60a8	panfrost: Assign varying buffers dynamically Rather than hardcoding certain varying buffer indices "by convention", work it out at draw time. This added flexibility is needed for futureproofing and will be enable streamout. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-09 11:53:21 -07:00
Alyssa Rosenzweig	46dae9ef58	panfrost: Assign indices at draw-time This will allow us to shuffle buffers. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-09 11:53:21 -07:00
Alyssa Rosenzweig	af6d3f7cb5	panfrost: Break out pan_varyings.c This code is fairly self-contained, so let's factor it out of the giant pan_context.c monster. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-09 11:53:21 -07:00
Alyssa Rosenzweig	4dba493fd7	panfrost: Enable PIPE_CAP_STREAM_OUTPUT_INTERLEAVE_BUFFERS Just as easy/hard as the rest of XFB. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-09 11:53:21 -07:00
Alyssa Rosenzweig	5ff7973560	panfrost: Import streamout data structures Pretty much copypasted from v3d to jumpstart us. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-09 11:53:21 -07:00
Alyssa Rosenzweig	c82672c9c1	pan/midgard: Account for swizzle/mask in st_vary Register allocation for varying stores is a bit different, since the instructions ignore the writemask (varyings are normalized packed/vectorized..) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-09 11:50:45 -07:00
Alyssa Rosenzweig	5ad83015cd	pan/decode: Resolve crash with NULL attr/varyings This case needs more investigation, but this was found with geometry shaders. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-09 11:50:45 -07:00
Krzysztof Raszkowski	c0ab268f9c	gallium/swr: Fix glClear when it's used with glEnable/glDisable GL_SCISSOR_TEST When GL_SCISSOR_TEST is enabled glClear is handled by state tracker and there is no need to do this in gallium driver. Reviewed-by: Alok Hota alok.hota@intel.com	2019-08-09 18:56:13 +02:00
Gurchetan Singh	d6f8ce1c96	util: Revert "util: added missing headers in anon-file" This reverts commit `c73988300f`. Reason: Made a fix for this, then saw @eric's change ("util/anon_file: add missing"), but some sequence of events I don't really remember caused this to get merged. So revert ;-) Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-09 09:13:45 -07:00
Marek Vasut	bb47bedc85	etnaviv: Remove etna_bo_from_handle() prototype Remove etna_bo_from_handle() as there are no known users. Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>	2019-08-09 17:21:55 +02:00
Lionel Landwerlin	cefb4341b7	anv: drop unused code We stopped using this when we moved to Jason's mi_builder. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-09 17:01:38 +03:00
Christian Gmeiner	889e752965	etnaviv: fix typo Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jonathan Marek <jonathan@marek.ca>	2019-08-09 13:08:20 +00:00
Christian Gmeiner	de5070ea8d	etnaviv: add gpu_supports_texture_target(..) Currently I am seeing a handful of the following debug message: translate_texture_target:495: Unhandled texture target: 0 PIPE_BUFFER is not handled in translate_texture_target(..) which makes sense as it is used to translate from PIPE_XXX to GPU specific value during etna_create_sampler_view_state(..). To fix this problem introduce gpu_supports_texture_target(..) which just checks if the texture target is supported. Fixes: `dfe048058f` ("etnaviv: support 3D and 2D array textures") Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jonathan Marek <jonathan@marek.ca>	2019-08-09 13:08:20 +00:00
Jon Turney	0141b7c6b2	util: Cygwin has linux-style pthread_setname_np Fixes: `dcf9d91a` ("util: Handle differences in pthread_setname_np")	2019-08-09 12:46:43 +00:00
Tapani Pälli	5e38db0c47	anv/android: disable shared representable image support explicitly Android 9 loader conditionally advertises VK_KHR_shared_presentable_image extension based on this property and it looks like it does not initialize the struct before query. Pragmas are added to ignore warnings with Android specific structure types in same manner as commit `8d386e6eef` did. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-09 08:53:54 +03:00
Vasily Khoruzhick	39a90749af	lima: introduce a struct describing texture descriptor Use a struct with bitfields to construct texture descriptor instead of poking bits in array of uint32_t. It improves code readability and makes it easier to experiment with unknown fields. Also fix mipmapping while we're at it - Utgard can have up to 13 levels, but 64 bytes is enough only for 10. Calculate descriptor size dynamically to account extra levels if we need them. Reviewed-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-08-08 19:17:20 -07:00
Vasily Khoruzhick	edf008c04e	lima: add texel format table Introduce a table for supported texel formats and use it to check whether format is supported and for converting pipe format to lima texel format. Reviewed-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-08-08 19:17:20 -07:00
Gurchetan Singh	c73988300f	util: added missing headers in anon-file Otherwise I get: ../src/util/anon_file.c: In function ‘create_tmpfile_cloexec’: ../src/util/anon_file.c:75:9: error: implicit declaration of function ‘mkostemp’ [-Werror=implicit-function-declaration] fd = mkostemp(tmpname, O_CLOEXEC); ^~~~~~~~ ../src/util/anon_file.c:133:7: error: implicit declaration of function ‘asprintf’ [-Werror=implicit-function-declaration] asprintf(&name, "%s/mesa-shared-%s-XXXXXX", path, debug_name); ^~~~~~~~ ../src/util/anon_file.c:141:4: error: implicit declaration of function ‘free’ [-Werror=implicit-function-declaration] free(name) Fixes: c0376a ("util: add anon_file.h for all memfd/temp file usage")	2019-08-08 16:21:57 -07:00
Gurchetan Singh	42759dc986	virgl: check scanout mask Otherwise, virgl will report renderable or texturable formats as also scan-out formats. v2: drop host feature check (@kusma) Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-08-08 16:21:57 -07:00
Gurchetan Singh	3da029ac1a	virgl: fixup_readback_format --> fixup_formats This function is generalizable. Suggested-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-08-08 16:21:57 -07:00
Gurchetan Singh	bf0ca99ec7	virgl: access caps in a less verbose way in virgl_is_format_supported Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-08-08 16:21:57 -07:00
Alyssa Rosenzweig	5a898e2a65	pan/midgard: Disassemble load/store barrel shift Arm assembly intensifies. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-08 15:49:12 -07:00
Eric Engestrom	525a917c6c	util/anon_file: const string param Fixes: `c0376a1234` ("util: add anon_file.h for all memfd/temp file usage") Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Tested-by: Eric Anholt <eric@anholt.net> Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>	2019-08-08 22:02:54 +01:00
Eric Engestrom	8a028b0df2	util/anon_file: drop unused #include Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Tested-by: Eric Anholt <eric@anholt.net> Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>	2019-08-08 22:02:54 +01:00
Eric Engestrom	60af7f5a81	util/anon_file: add missing #include Fixes: `c0376a1234` ("util: add anon_file.h for all memfd/temp file usage") Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Tested-by: Eric Anholt <eric@anholt.net> Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>	2019-08-08 22:02:54 +01:00
Greg V	ac1561088d	intel/perf: use MAJOR_IN_SYSMACROS/MAJOR_IN_MKDEV Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Fixes: `134e750e16` ("i965: extract performance query metrics")	2019-08-08 21:44:33 +01:00
Greg V	0233372581	util: fix cpuset support on FreeBSD Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-08 21:44:33 +01:00
Greg V	c00ee00031	i965/tiled_memcpy: avoid creating bswap32 if it exists as a macro (e.g. on FreeBSD) Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-08 21:44:33 +01:00
Greg V	7b520dc74f	anv: add MAP_POPULATE fallback define for portability FreeBSD does not have MAP_POPULATE Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-08 21:44:33 +01:00
Greg V	2be3f16600	anv: remove unused Linux-specific include Fixes: `4201cc2dd3` ("anv: Implement VK_KHX_external_semaphore_fd") Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-08 21:44:33 +01:00
Greg V	c0dc5c1859	meson: define ETIME to ETIMEDOUT if not present Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-08 21:44:33 +01:00
Roman Stratiienko	28061e0ab0	lima: Fix Android.mk 1. Update LOCAL_SRC_FILES according to commit `54434fe670` ("lima/gpir: Rework the scheduler"). 2. Add libpanfrost_shared.a dependency. 3. Generate lima_nir_algebraic.c with Android.mk Fixes Android build error introduced by commit `5adfc8602c` ("lima/ppir: move sin/cos input scaling into NIR") Signed-off-by: Roman Stratiienko <roman.stratiienko@globallogic.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Acked-by: Qiang Yu <yuq825@gmail.com>	2019-08-08 17:47:22 +00:00
Roman Stratiienko	26a01a6797	Add libpanfrost_shared to Android build 1. Add missing directory to ./Android.mk 2. Fix ./src/panfrost/Android.shared.mk Signed-off-by: Roman Stratiienko <roman.stratiienko@globallogic.com> Reviewed-by: Icenowy Zheng <icenowy@aosc.io> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Acked-by: Qiang Yu <yuq825@gmail.com>	2019-08-08 17:47:22 +00:00
Rhys Perry	c52c54a746	anv,i965,iris: deduplicate setting of total_shared v5: add patch Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-08 12:10:39 -05:00
Rhys Perry	024a46a407	anv: use derefs for shared memory access vkpipeline-db for my Skylake GPU: total instructions in shared programs: 8847602 -> 8847896 (<.01%) instructions in affected programs: 10165 -> 10459 (2.89%) helped: 8 HURT: 2 total cycles in shared programs: 1606273555 -> 1606251634 (<.01%) cycles in affected programs: 2201803 -> 2179882 (-1.00%) helped: 7 HURT: 3 The shaders with more instructions is due to a loop over a shared array in Three Kingdoms being unrolled (and creating a lot of nested ifs). Not sure if that's good or bad. One of the shaders with worse cycles is only worse by 0.04% and the other two are the shaders with loops unrolled. v2: add patch v4: don't set spirv_options.shared_addr_format v4: move comment concerning the shared address format used and NULL v4: add vkpipeline-db results v5: rename to nir_lower_vars_to_explicit_types v5: move setting of total_shared to outside brw_compile_cs v6: set shared_addr_format v6: formatting changes Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> (v5) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-08 12:10:39 -05:00
Rhys Perry	fd73ed1bd7	nir: add nir_lower_to_explicit() v2: use glsl_type_size_align_func v2: move get_explicit_type() to glsl_types.cpp/nir_types.cpp v2: use align() instead of util_align_npot() v2: pack arrays a bit tighter v2: rename mem_* to field_* v2: don't attempt to handle when struct offsets are already set v2: use column_type() instead of recreating it v2: use a branch instead of \|= in nir_lower_to_explicit_impl() v2: assign locations to variables and update shared_size and num_shared v2: allow the pass to be used with nir_var_{shader_temp,function_temp} v4: rebase v5: add TODO v5: small formatting changes v5: remove incorrect assert in get_explicit_type() v5: rename to nir_lower_vars_to_explicit_types v5: correctly update progress when only variables are updated v5: rename get_explicit_type() to get_explicit_shared_type() v5: add comment explaining how get_explicit_shared_type() is different v5: update cast strides v6: update progress when lowering nir_var_function_temp variables v6: formatting changes v6: add more detailed documentation comment for get_explicit_shared_type v6: rename get_explicit_shared_type to get_explicit_type_for_size_align v7: fix comment in nir_lower_vars_to_explicit_types_impl() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> (v5) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-08 12:10:39 -05:00
Rhys Perry	8bd2e138f5	nir/lower_explicit_io: add nir_var_mem_shared support v2: require nir_address_format_32bit_offset instead v3: don't call nir_intrinsic_set_access() for shared atomics Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-08 12:10:39 -05:00
Erik Faye-Lund	1e21bb4123	mesa: avoid warning on Windows On Windows, p_atomic_inc_return returns an unsigned long long rather than the type the pointer refers to, so let's make sure we cast the result to the right type. Otherwise, we'll trigger a warning about the wrong format-string for the type. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Acked-by: Eric Engestrom <eric@engestrom.ch>	2019-08-08 18:20:29 +02:00
Erik Faye-Lund	e0a740c633	mesa/main: cast away constness This avoids a warning about implicitly casting away the constness of the pointer. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Acked-by: Eric Engestrom <eric@engestrom.ch>	2019-08-08 18:20:29 +02:00
Erik Faye-Lund	75097114d9	spirv: fixup signature This avoids a warning on some compiler, complaining about implicitly casting the function-pointer. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Fixes: `d482a8f` "spirv: Update the OpenCL.std.h header" Acked-by: Eric Engestrom <eric@engestrom.ch>	2019-08-08 18:20:29 +02:00
Lucas Stach	68c24b09c2	etnaviv: remember data offset into BO Imported resources might not start at offset 0 into the buffer object. Make sure to remember the offset that is provided with the handle on import. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-08-08 16:11:34 +02:00
Danylo Piliaiev	b8842bc312	i965: Emit a dummy MEDIA_VFE_STATE before switching from GPGPU to 3D There is an object-level preemption workaround which requires this. However, even without object-level preemption, we seem to have issues with geometry flickering when 3D and compute are combined in the same batch and this appears to fix it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110395 Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: mesa-stable@lists.freedesktop.org	2019-08-08 13:39:15 +00:00
Bas Nieuwenhuizen	23a9d20997	radv: Avoid VEGA/RAVEN scissor bug in binning. CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-08 14:08:21 +02:00
Bas Nieuwenhuizen	4a3f987afd	radv: Avoid binning RAVEN hangs. Mirroring radeonsi. CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-08 14:08:21 +02:00
Bas Nieuwenhuizen	66ecc3eac8	radv: Fix off by one for S_028C48_MAX_ALLOC_COUNT. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-08 14:08:21 +02:00
Jan Zielinski	207026d29e	swr/rasterizer: modernize thread TLB Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-08-08 12:33:21 +02:00
Jan Zielinski	387599a661	swr/rasterizer: Refactor events collection mechanism Several improvements and cleanups in events and statstics mechanisms Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-08-08 11:15:07 +02:00
Jan Zielinski	ff75c35846	swr/rasterizer: improvements in simdlib 1. fix build issues with MSVC 2019 compiler The MSVC 2019 compiler seems to have an issue with optimized code-gen when using the _mm256_and_si256() intrinsic. Only disable use of integer vpand on buggy versions MSVC 2019. Otherwise allow use of integer vpand intrinsic. 2. Remove unused vec/matrix functionality Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-08-08 10:53:47 +02:00
Jan Zielinski	b55a93fdd4	swr/rasterizer: Events are now grouped and enabled by knobs All events are now grouped as follows: -Framework (i.e. ThreadStart) [always ON] -Api (i.e. SwrSync) [always ON] -Pipeline [default ON] -Shader [default ON] -SWTag [default OFF] -Memory [default OFF] Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-08-08 10:33:25 +02:00
Jan Zielinski	982d99490f	swr/rasterizer: do not mark tiles dirty until actually rendered Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-08-08 10:16:20 +02:00
Jan Zielinski	4f04f260d9	swr/rasterizer: enable size accumulation in mem stats Small refactoring is also performed Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-08-08 10:16:20 +02:00
Jan Zielinski	365ad367f1	swr/rasterizer: enable using AOS vertex data format Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-08-08 10:16:20 +02:00
Iago Toral Quiroga	fb9f7872e7	v3d: handle wait requirement when retrieving query results correctly Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-08 08:36:52 +02:00
Iago Toral Quiroga	0f2d1dfe65	v3d: use the GPU to record primitives written to transform feedback We can use the PRIMITIVE_COUNTS_FEEDBACK packet to write various primitive counts to a buffer, including the number of primives written to transform feedback buffers, which will handle buffer overflow correctly. There are a couple of caveats with this: Primitive counters are reset when we emit a 'Tile Binning Mode Configuration' packet, which can happen in the middle of a primitives query, so we need to read the buffer when we submit a job and accumulate the counts in the context so we don't lose them. We also need to do the same when we switch primitive type during transform feedback so we can compute the correct number of recorded vertices from the number of primitives. This is necessary so we can provide an accurate vertex count for draw from transform feedback. v2: - When computing the number of vertices for a primitive, pass in the base primitive, since that is what the hardware will count. - No need to update primitive counts when switching primitive types if the base primitives are the same. - Log perf warning when mapping the primitive counts BO for readback (Eric). - Only emit the primitive counts packet once at job end (Eric). - Use u_upload mechanism for the primitive counts buffer (Eric). - Use the XML to generate indices into the primitive counters buffer (Eric). Fixes piglit tests: spec/ext_transform_feedback/overflow-edge-cases spec/ext_transform_feedback/query-primitives_written-bufferrange spec/ext_transform_feedback/query-primitives_written-bufferrange-discard spec/ext_transform_feedback/change-size base-shrink spec/ext_transform_feedback/change-size base-grow spec/ext_transform_feedback/change-size offset-shrink spec/ext_transform_feedback/change-size offset-grow spec/ext_transform_feedback/change-size range-shrink spec/ext_transform_feedback/change-size range-grow spec/ext_transform_feedback/intervening-read prims-written Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-08 08:36:52 +02:00
Iago Toral Quiroga	cf8986bce0	gallium/util: add a helper to compute vertex count from primitive count v2: - Only compute vertex counts for base primitives. - Add a unit test (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-08 08:36:52 +02:00
Iago Toral Quiroga	9eb8699e0f	v3d: be more explicit about the query types supported Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-08 08:36:52 +02:00
Iago Toral Quiroga	9b316ab57a	v3d: generate packet unpack functions These were not being compiled because of the lack of __gen_unpack_address. v2: - Shift raw address correctly (Eric). Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-08 08:36:52 +02:00
Iago Toral Quiroga	5ffb8b1716	v3d: add header guards in v3d_packet_helpers.h Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-08 08:36:52 +02:00
Tomeu Vizoso	e7eac8a1e8	panfrost: Print errors from kernel Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-08 07:42:52 +02:00
Tomeu Vizoso	7c8434889d	panfrost: Mark buffers as PANFROST_BO_HEAP What we call GROWABLE in Mesa corresponds to the HEAP BO flag in the kernel. These buffers cannot be memory mapped in the CPU side at the moment, so make sure they are also marked INVISIBLE. This allows us to allocate a big heap upfront (16MB) without actually reserving space unless it's needed. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-08 07:42:52 +02:00
Tomeu Vizoso	19afd41e65	panfrost: Mark BOs as NOEXEC Unless a BO has the EXECUTABLE flag, mark it as NOEXEC. v2: - Rework version detection (Alyssa). Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-08 07:42:52 +02:00
Tomeu Vizoso	9398932c2d	panfrost: Take into account flags when looking up in the BO cache This will be useful right now so we avoid retrieving a non-executable buffer when a executable one is needed. As we support more flags, this logic will need to be extended to consider the different trade-offs to be made when matching BO specifications to BOs in the cache. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-08 07:42:52 +02:00
Tomeu Vizoso	950b5fc596	panfrost: Allocate shaders in their own BOs Instead of all shaders being stored in a single BO, have each shader in its own. This removes the need for a 16MB allocation per context, and allows us to place transient blend shaders in BOs marked as executable (before they were allocated in the transient pool, which shouldn't be executable). v2: - Store compiled blend shaders in a malloc'ed buffer, to avoid reading from GPU-accessible memory when patching (Alyssa). - Free struct panfrost_blend_shader (Alyssa). - Give the job a reference to regular shaders when emitting (Alyssa). v3: - Split out the allocation flags change (Rob). Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-08 07:42:52 +02:00
Tomeu Vizoso	5804d75b9c	util/hash_table: Fix hashing in clears on 32-bit Some hash functions (eg. key_u64_hash) will attempt to dereference the key, causing an invalid access when passed DELETED_KEY_VALUE (0x1) or FREED_KEY_VALUE (0x0). When in 32-bit arch a 64-bit key value doesn't fit into a pointer, so hash_table_u64 internally use a pointer to a struct containing the 64-bit key value. Fix _mesa_hash_table_u64_clear() to handle the 32-bit case by creating a temporary hash_key_u64 to pass to the hash function. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Cc: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-08-08 07:42:52 +02:00
Tapani Pälli	aba57b11ee	anv: support GetSwapchainGrallocUsage2ANDROID for Android New function supports gralloc1 usage flags that get set separately for producer and consumer. As we still need to support old method too, let's share common code and use android_convertGralloc0To1Usage helper. Bump the VK_ANDROID_native_buffer version to indicate support for the new call. Changes were tested on Android Celadon P with Basemark GPU and various Sascha Willems Vulkan demos. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-08 05:08:01 +00:00
Mark Janes	51c3ab618b	st/mesa: eliminate unnecessary redirection Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:56 -07:00
Mark Janes	61c54a8878	intel/perf: fix debug typo Misspelling was seen with INTEL_DEBUG=perfmon. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:56 -07:00
Mark Janes	2df1ab4d48	intel/perf: make gen_perf_query_object private Encapsulate the details of this structure within the perf implemenation. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:56 -07:00
Mark Janes	deea3798b6	intel/perf: make perf context private Encapsulate the details of this data structure. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:56 -07:00
Mark Janes	1f4f421ce0	intel/perf: print debug information INTEL_DEBUG=perfmon will iterate over the perf queries, printing information about the state of each query. Some of this information will be private to intel/perf, and needs to a dump routine that can be called from i965. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:56 -07:00
Mark Janes	a663c8c26e	intel/perf: make internal methods private Now that all references from i965 have been moved to perf, we can make internal methods private again. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:56 -07:00
Mark Janes	be8b466cff	intel/perf: make oa_sample_buffers private All references to this data structure have been moved inside the perf subsystem. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:56 -07:00
Mark Janes	f2a049b4e3	intel/perf: expose method to create query By encapsulating this implementation within perf, we can eventually make struct gen_perf_ctx private. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:56 -07:00
Mark Janes	9f5c160d82	intel/perf: move initialization of pipeline statistics metrics to gen_perf Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:56 -07:00
Mark Janes	9f84efb452	intel/perf: move get_query_data into gen_perf This refactor moves several helper functions for get_query_data as well: - accumulate_oa_reports - read_gt_frequency - get_pipeline_stats_data - get_oa_counter_data Functions which are no longer referenced in brw_performance_query.c have been removed. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:56 -07:00
Mark Janes	73eccdc4a5	intel/perf: move delete_query to gen_perf Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:56 -07:00
Mark Janes	8c9eac1234	intel/perf: move is_query_ready to gen_perf Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:56 -07:00
Mark Janes	a9be292722	intel/perf: move wait_query to perf The following methods have duplicate implementation of read_oa_samples_until in brw_performance_query.c: - read_oa_samples_for_query - read_oa_samples_until They ar still referenced by other methods in the file and will be removed on the subsequent commit. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:56 -07:00
Mark Janes	3c8ed58486	intel/perf: create a vtable entry for bo_busy Iris and i965 variants of this method need to be called by perf routines. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:56 -07:00
Mark Janes	6fed756388	intel/perf: create a vtable entry for bo_wait_rendering Iris and i965 variants of this method need to be called by perf routines. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:56 -07:00
Mark Janes	511bb15d4b	intel/perf: create a vtable entry for batch_references Iris and i965 variants of this method need to be called by perf routines. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:56 -07:00
Mark Janes	3ecb23092e	intel/perf: refactor gen_perf_end_query into gen_perf Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:56 -07:00
Mark Janes	018f9b81e5	intel/perf: refactor gen_perf_begin_query into gen_perf Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	52d3db9ab6	intel/perf: move perf-related state into gen_perf_context To move more operations into intel/perf, several state items are needed. Save references to that state in the perf_ctxt, rather than passing them in for every operation. This commit includes an initializer for gen_perf_context, to set those references and also encapsulate the initialization of the sample buffer state. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	df18acee78	intel/perf: create a vtable entries for buffer object map/unmap These operations are needed to refactor subsequent methods into perf Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	a330d759c5	intel/perf: move client reference counts into perf Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	4d0d4aa1b5	intel/perf: move open_perf into perf Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	79ded7cc8f	intel/perf: move close_perf into perf Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	f57c8a6dc1	intel/perf: create a vtable entry for emit_mi_flush This method is needed to move subsequent methods into perf. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	52f7a0bff7	intel/perf: use temporary pointers to simplify access to perf state Most accesses to perf state were made through repeated dereferences of brw_context members. Prefering temporary variables of perf_ctx and perf_cfg has the following advantages: - more concise implementation - easier refactor when moving subsequent methods to perf Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	a157f5acb1	intel/perf: move snapshot_statistics_registers into perf Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	8ae6667992	intel/perf: move query_object into perf Query objects can now be encapsulated within the perf subsystem. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	7e890ed476	intel/perf: create a vtable entry for store_register_mem64 This method is needed to move subsequent methods into perf. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	4b2c885207	intel/perf: move free_sample_bufs into perf Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	2f712d21b9	intel/perf: move reap_old_sample_buffers into perf Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	31758bd36c	intel/perf: move get_free_sample_buf into perf Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	e08a69b7f4	intel/perf: move the perf context into perf The "context" that is necessary to submit and process perf commands to the hardware was previously present in the brw_context.perfquery struct. This commit moves it into perf and provides a more understandable name. The intention is for this struct to be private, when all methods that access it are migrated into perf. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	fb622054f7	intel/perf: move get_metric_id to perf Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	b14e15e26a	intel/perf: move oa_sample_buf structure to perf oa_sample_buf holds the data provided by the kernel that will be collated into performance metrics. Since this functionality will be implemented in perf, the struct needs to be defined there. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	e091f33990	intel/perf: enumerate query-based metrics in perf Iris and i965 both need to enumerate the available metrics, so these routines must be located in perf. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	2446f5cfd8	intel/perf: move perf-related constants to common location The perf subsystem needs several macro definitions that were duplicated in Iris and i965 headers. Place these macros within perf, if the perf implementation contains the only references to the values. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	67675a5802	intel/perf: create a vtable entry for capture_frequency_stat_register In preparation for calling both Iris and i965 implementions from perf. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	ae3fac851d	intel/perf: create a vtable entry for batchbuffer_flush In preparation for calling both Iris and i965 implementions from perf. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	a921b215dd	intel/perf: create a vtable entry for emit_report_count In preparation for calling both Iris and i965 implementions from perf. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	9a2a2e8bea	intel/perf: create a vtable entry for bo_unreference In preparation for calling both Iris and i965 implementions from perf. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	439d5a3eff	intel/perf: create a vtable for low-level driver functions Performance metrics collections requires several actions (eg bo_map()) that have different implementations for Iris and i965. The perf subsystem needs a vtable for each of these actions, so it can invoke the corresponding implementation for each driver. The first call to be added to the table is bo_alloc. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	ea66484e86	intel/perf: use common ioctl wrapper There were multiple ioctl-wrapper functions, so a common implementation was put in gen_gem.h. With a common implementation, perf no longer needs the caller to configure one for it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Mark Janes	07d3bd5c46	intel/perf: rename gen_perf to gen_perf_config This structure contains the configurations of the metrics for the current platform, and the settings needed for the perf subsystem to query that configuration from the device. This data is available without a rendering context, and needed to support MDAPI metrics for Vulkan. A gen_perf_context struct will be added later, which holds additional state from the rendering context necessary for metric data collection. The gen_perf struct needs a more precise name to reduce confusion. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-07 21:33:55 -07:00
Ilia Mirkin	9ff8da0e50	nvc0: fix program dumping, use _debug_printf This debug situation is unforunate. debug_printf only does something with DEBUG set, but in practice all that needs to be moved to !NDEBUG. For now, use _debug_printf which always prints. However the whole function is guarded by !NDEBUG. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-08-07 22:32:02 -04:00
Ilia Mirkin	f6af104340	nvc0: add support for ATOMC_WRAP TGSI operations Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-08-07 22:32:02 -04:00
Ilia Mirkin	a2bb7b26a1	gallium: redefine ATOMINC_WRAP to be more hardware-friendly Both AMD and NVIDIA hardware define it this way. Instead of replicating the logic everywhere, just fix it up in one place. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-07 22:31:56 -04:00
Ilia Mirkin	582c86346d	st/mesa: relax EXT_shader_image_load_store enable There's no reason to bring format-less load requirement into this extension. It requires a size to be provided, and a compatible format is computed from the size + data type. For example layout(size1x32) uniform iimage1D image; becomes DCL IMAGE[0], 1D, PIPE_FORMAT_R32_SINT, WR whereas PIPE_CAP_IMAGE_LOAD_FORMATTED is designed to allow PIPE_FORMAT_NONE to be provided as a format and still enable LOAD operations to be performed. So the shader has all the information it needs about the format. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-07 22:31:38 -04:00
Mark Janes	a29bc3a3ad	i965/perf: restore mdapi statistics query metrics Registration of mdapi metrics based on statistics query registers was inadvertently removed in the commit that checks for OA kernel support. The statistics queries are not dependent on OA. Fixes: `96e1c945f2` ("i965: Move device info initialization to common code") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-07 17:20:04 -07:00
Greg V	c0376a1234	util: add anon_file.h for all memfd/temp file usage Move the Weston os_create_anonymous_file code from egl/wayland into util, add support for Linux memfd and FreeBSD SHM_ANON, use that code in anv/aubinator instead of explicit memfd calls for portability. Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-07 22:57:55 +00:00
Pierre-Eric Pelloux-Prayer	519bebdb40	radeonsi: limit DPBB context_states_per_bin batches when using gfx9 workaround It seems that using 'context_states_per_bin = 1' for DPBB fixes the reported issue. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110214 Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-07 18:45:24 -04:00
Pierre-Eric Pelloux-Prayer	120d0ef937	radeonsi: reduce DPBB persistent_states_per_bin value for APUs Fixes some reported GPU hangs on RAVEN. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111231 Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-07 18:45:22 -04:00
Pierre-Eric Pelloux-Prayer	6bda9ca062	radeonsi: fix typo in DPBB register field Also only set FLUSH_ON_BINNING_TRANSITION for GPU families that needs it (matches what si_emit_dpbb_disable is doing). Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-07 18:45:20 -04:00
Pierre-Eric Pelloux-Prayer	90bded140e	radeonsi: fix S_028C48_MAX_ALLOC_COUNT value This field uses "value minus 1" encoding. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-07 18:45:09 -04:00
Christian Gmeiner	323cda475b	etnaviv: drop struct etna_3d_state Also drop #if 0 code block. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Philipp Zabel <philipp.zabel@gmail.com>	2019-08-07 22:12:00 +02:00
Yevhenii Kolesnikov	0325860e90	mesa: Use _mesa_delete_transform_feedback_object in drivers Function _mesa_delete_transform_feedback_object called from within drivers once driver-specific clean-up has been done. Brings into conformity with how other GL objects are handled. CC: Eric Anholt <eric@anholt.net> CC: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-07 17:25:22 +00:00
Yevhenii Kolesnikov	4f767ded6e	mesa: use _mesa_delete_query in drivers Now drivers can call _mesa_delete_query once driver-specific clean-up has been done. Brings into conformity with how other GL objects are handled. CC: Eric Anholt <eric@anholt.net> CC: Kenneth Graunke <kenneth@whitecape.org> Suggested-by: Eric Anholt <eric@anholt.net> Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-07 17:25:22 +00:00
Juan A. Suarez Romero	4619535ab7	docs: update calendar, add news item and link release notes for 19.1.4 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2019-08-07 18:51:32 +02:00
Juan A. Suarez Romero	a19d43ebd5	docs: add sha256 checksums for 19.1.4 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `7fcb69a33c`)	2019-08-07 18:49:25 +02:00
Juan A. Suarez Romero	8484fafc78	docs: add release notes for 19.1.4 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `b84ffa028d`)	2019-08-07 18:49:23 +02:00
Bas Nieuwenhuizen	5a26f528cb	meson,i965: Link with android deps when building for android. The DBG marco in brw_blorp.c ends up calling an android log function: error: undefined reference to '__android_log_print' v2: On suggestion from Lionel, hang the Android dependency onto a new libintel_common dependency. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-07 15:34:46 +02:00
Erik Faye-Lund	da9e2958ec	gallium/dump: add missing query-type to short-list Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Fixes: `3f6b3d9db7` ("gallium: add PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-07 12:03:24 +00:00
Erik Faye-Lund	70a93922db	gallium/dump: add missing query-type to short-list Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Fixes: `a677799e51` ("gallium: add PIPE_QUERY_SO_OVERFLOW_ANY_PREDICATE and corresponding cap") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-07 12:03:24 +00:00
Eric Engestrom	32ce010951	gitlab-ci: don't install autotools deps These could've been deleted a long time ago, but apparent we forgot. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2019-08-07 10:18:25 +01:00
Eric Engestrom	5b10ddf358	util: fix mem leak of program path Fixes: `759b940389` ("util: Get program name based on path when possible") Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-08-07 08:42:42 +01:00
Eric Engestrom	991137144a	meson: build intel-ui tools as part of `all` tools Reported-by: Mark Janes <mark.a.janes@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111289 Cc: Dylan Baker <dylan@pnwbakers.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-07 08:19:31 +01:00
Eric Engestrom	c32ebfe003	gitlab-ci: add gtk3 dev files for `-D tools=intel-ui` We also need to update wayland-protocols and libXrandr (and randrproto), as they are too old for gdk3 (which gtk3 depends on). Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-07 08:19:30 +01:00
Jan Vesely	6b8269d0bb	clover: Fix build after clang r367864 v2: Drop special case of llvm-9 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Acked-by: Dieter Nützel <Dieter@nuetzel-hh.de> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Aaron Watry <awatry@gmail.com>	2019-08-06 23:33:55 -04:00
Timothy Arceri	d81e11332b	mesa: remove super old TODOs from shaderapi.c Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-07 13:31:40 +10:00
John Stultz	fcfa2d1447	mesa: freedreno: Android.registers.mk: Fix up register xml.h file generation The current Androdi.registers.mk file causes build failures that look like: FAILED: external/mesa3d/src/freedreno/Android.registers.mk:49: error: implicit rules are obsolete: out/target/product/linaro_db845c/gen/STATIC_LIBRARIES/libfreedreno_registers_intermediates/registers/%.xml.h Caused by the following Android build rule change: https://android.googlesource.com/platform/build/+/HEAD/Changes.md#implicit_rules I tried to replace this with something similar to the static pattern suggested in the URL above, but ended up getting all the xml.h files generated using only the first a2xx.xml source file. So I've fallen back to explicitly defining the make rules for each. Additionally, we needed to provide the proper LOCAL_EXPORT_C_INCLUDE_DIRS and add the defined static library to the components that depend on the register headers. Acked-by: Eric Anholt <eric@anholt.net> Signed-off-by: John Stultz <john.stultz@linaro.org>	2019-08-07 02:18:38 +00:00
John Stultz	96baf052b2	mesa: Add ir3/ir3_nir_imul.c generation to Android.mk With current master we're seeing build failures with AOSP: error: undefined symbol: ir3_nir_lower_imul This is due to the ir3_nir_imul.c file not being generated in the Android.mk files. This patch simply adds it to the Android build, after which thigns build and book ok on db410c. Cc: Rob Clark <robdclark@chromium.org> Cc: Emil Velikov <emil.l.velikov@gmail.com> Cc: Amit Pundir <amit.pundir@linaro.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Alistair Strachan <astrachan@google.com> Cc: Greg Hartman <ghartman@google.com> Cc: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: John Stultz <john.stultz@linaro.org>	2019-08-07 02:18:19 +00:00
Rohan Garg	16edd56fcc	panfrost: Take into account a index_bias for glDrawElementsBaseVertex calls Midgard does not accept a index_bias directly and relies instead on a bias correction offset (offset_bias_correction) in order to calculate the unbiased vertex index. We need to make sure we adjust offset_start and vertex_count in order to take into account the index_bias as required by a glDrawElementsBaseVertex call and then supply a additional offset_bias_correction to the hardware. Signed-off-by: Rohan Garg <rohan.garg@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-06 17:18:19 -07:00
Bas Nieuwenhuizen	4bb17c08ae	radv/gfx10: Enable DCC for storage images. v2: Hide it behind a perftest flag. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-07 02:13:07 +02:00
Bas Nieuwenhuizen	3a5950f501	radv: Add device argument for dcc compression check. Because it is about to be generation dependent. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-07 02:13:07 +02:00
Bas Nieuwenhuizen	8c63ffe54d	radv: Disable compression for compute DCC decompress store. Previously we relied on stores not using DCC but that is going to change, so disable compression explicitly. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-07 02:13:07 +02:00
Bas Nieuwenhuizen	216a9d8871	radv: Add extra struct to image view creation. For extra args. Unlike image creation, I'm not embedding the vk struct in there, so all the inline structs can be kept. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-07 02:13:07 +02:00
Bas Nieuwenhuizen	50add1b33a	radv: Do not decompress on LAYOUT_GENERAL. We handle render loops properly now and STORAGE still disables DCC/TC-compat HTILE in general. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-07 02:13:07 +02:00
Bas Nieuwenhuizen	66131ceb8b	radv: Pass through render loop detection to internal layout decisions. And do nothing with it yet. Everything outside a renderpass has no render loop. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-07 02:13:07 +02:00
Bas Nieuwenhuizen	a171a6663d	radv: Add render loop detection in renderpass. VK spec 7.3: "Applications must ensure that all accesses to memory that backs image subresources used as attachments in a given renderpass instance either happen-before the load operations for those attachments, or happen-after the store operations for those attachments." So the only renderloops we can have is with input attachments. Detect these. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-07 02:13:07 +02:00
Timothy Arceri	a5b9394b87	drirc: Add vendor workaround for Divinity: Original Sin EE Reviewed-by: Marek Olšák <marek.olsak@amd.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93551	2019-08-07 10:12:49 +10:00
Timothy Arceri	dca119f12c	mesa/gallium: add dric option to allow overriding GL vendor string Will be used in the following patch. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93551	2019-08-07 10:12:49 +10:00
Marek Olšák	c95e2a1c6b	relnotes/19.2: document EXT_texture_dhadow_lod	2019-08-06 20:10:15 -04:00
Bas Nieuwenhuizen	04c6feb12c	radv: Fix config reg assert. Using the wrong bounds Fixes: "219d6939df8 radv: add more assertions to make sure packets are correctly emitted" Reviewed-by: Andres Rodriguez <andresx7@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-08-07 08:58:23 +10:00
Marek Olšák	16577f5002	tgsi_to_nir: add a few needed double opcodes for internal radeonsi shaders v2 (Connor): - Split out prep work from adding opcodes, and rewrite the former Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 18:03:26 -04:00
Marek Olšák	2207daf549	tgsi_to_nir: implement a few needed 64-bit integer opcodes for internal radeonsi shaders v2 (Connor): - Split this out from the prep work, and rework the former - Add support for U64SNE Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 18:03:24 -04:00
Connor Abbott	37f6350c1d	ttn: Prepare for 64-bit sources and destinations v2: Properly handle 32->64 bit conversions Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 18:03:22 -04:00
Connor Abbott	4b10949482	ttn: Use 1-bit NIR comparison opcodes We shouldn't be using the versions that output a 32-bit boolean, since nir_opt_algebraic won't optimize them as well. Drivers will lower these to the 32-bit versions after optimizing, if appropriate. Also, this will make implementing 64-bit comparisons easier. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 18:03:19 -04:00
Connor Abbott	e7fd90e8ef	nir/builder: Add nir_b2i Same as nir_b2f but for integers. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 18:03:10 -04:00
Pierre-Eric Pelloux-Prayer	f84c9ad17a	radeonsi: enable EXT_shader_image_load_store This depends on LLVM 10 because this needs https://reviews.llvm.org/D65283 Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:41:07 -04:00
Pierre-Eric Pelloux-Prayer	25fff591c1	radeonsi: add support for nir atomic_inc_wrap/atomic_dec_wrap Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:41:06 -04:00
Pierre-Eric Pelloux-Prayer	8789248541	radeonsi: add support for tgsi ATOMDEC_WRAP / ATOMINC_WRAP opcodes Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:41:04 -04:00
Pierre-Eric Pelloux-Prayer	704a6b5948	ac: add ac_atomic_inc_wrap / ac_atomic_dec_wrap support Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:41:03 -04:00
Pierre-Eric Pelloux-Prayer	a9ec718652	nir: add atomic_inc_wrap/atomic_dec_wrap image intrinsics Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:41:02 -04:00
Pierre-Eric Pelloux-Prayer	fc0a2e5d01	glsl: add EXT_shader_image_load_store new image functions This extension has 2 functions that are missing from the ARB versions: - imageAtomicIncWrap - imageAtomicDecWrap Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:41:00 -04:00
Pierre-Eric Pelloux-Prayer	70a47fb032	glsl: add EXT_shader_image_load_store keywords to lexer All of them already existed for ARB_shader_image_load_store. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:40:58 -04:00
Pierre-Eric Pelloux-Prayer	cfba168b6c	glsl: add size qualifiers from EXT_shader_image_load_store Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:40:56 -04:00
Pierre-Eric Pelloux-Prayer	cd45d09226	glsl: handle differences between ARB/EXT versions of shader_image_load_store Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:40:55 -04:00
Pierre-Eric Pelloux-Prayer	5db28b0cf7	mesa: add EXT_shader_image_load_store glBindImageTextureEXT function The implementation is almost identical to glBindImageTexture except for error checking. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:40:53 -04:00
Pierre-Eric Pelloux-Prayer	71e619a825	glapi: add EXT_shader_image_load_store Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:40:52 -04:00
Pierre-Eric Pelloux-Prayer	91924453ee	gallium: add PIPE_CAP_TGSI_ATOMINC_WRAP to indicate support Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:40:51 -04:00
Pierre-Eric Pelloux-Prayer	8b6bfed3d2	tgsi: add ATOMICINC_WRAP/ATOMICDEC_WRAP opcode Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:40:34 -04:00
Marek Olšák	1d8a71af57	radeonsi/gfx10: enable all CUs for GS if NGG is never used Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-06 17:09:03 -04:00
Marek Olšák	91227a1e17	radeonsi/gfx10: add global use_ngg and use_ngg_streamout flags Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-06 17:09:02 -04:00
Marek Olšák	f064b530f6	radeonsi/gfx10: remove an obsolete VGT_REUSE_OFF workaround Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-06 17:09:01 -04:00
Marek Olšák	37dd8ebcf7	radeonsi/gfx10: disable LATE_ALLOC_GS on Navi14 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-06 17:08:59 -04:00
Marek Olšák	c5a6ecf61a	radeonsi/gfx10: implement a bug workaround for GE_PC_ALLOC Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-06 17:08:58 -04:00
Marek Olšák	8f8c28767e	radeonsi/gfx10: implement a bug workaround for NGG -> legacy transitions Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-06 17:08:57 -04:00
Marek Olšák	cb9d95623b	radeonsi/gfx10: implement a GE bug workaround Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-06 17:08:56 -04:00
Marek Olšák	e08b0d7ac4	radeonsi/gfx10: set GE_CNTL for tessellation correctly to match PAL Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-06 17:08:54 -04:00
Marek Olšák	71b53020b7	radeonsi/gfx10: simplify NGG code in si_update_shaders Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-06 17:08:53 -04:00
Marek Olšák	a232f5e07c	radeonsi/gfx10: fix input VGPRs for legacy VS Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-06 17:08:51 -04:00
Marek Olšák	8d90157d49	radeonsi: make sure that rasterizer state != NULL and remove all NULL checking Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-06 17:08:39 -04:00
Marek Olšák	8b8819e88a	radeonsi: make sure that DSA state != NULL and remove all NULL checking Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-06 17:08:39 -04:00
Marek Olšák	b758eed9c3	radeonsi: make sure that blend state != NULL and remove all NULL checking Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-06 17:08:39 -04:00
Marek Olšák	8b68511ebc	radeonsi: DCC MSAA blending bug - include logic op, limit to Navi14 and older Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-06 17:08:50 -04:00
Marek Olšák	e69c1c8b8f	radeonsi: determine accurately whether logic op is enabled Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-06 17:08:48 -04:00
Marek Olšák	b38f5eb17a	radeonsi: skip draw calls with 0-sized index buffers Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-06 17:08:39 -04:00
Marek Olšák	e777720173	radeonsi/nir: lower PS inputs before scanning the shader Lowering PS inputs can eliminate some of them, which messes up persp/linear barycentric coord usage info. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-06 17:08:46 -04:00
Marek Olšák	f818d9ae3c	radeonsi/nir: handle key.mono.u.ps.interpolate_at_sample_force_center Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-06 17:08:39 -04:00
Marek Olšák	b3eed3cff9	radeonsi: add missing prints into si_dump_shader_key Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-06 17:08:15 -04:00
Marek Olšák	6b3ee86989	radeonsi: disable SDMA image copies on dGPUs to fix corruption in games Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-08-06 17:08:08 -04:00
Pierre-Eric Pelloux-Prayer	0556932f4a	mesa: add EXT_dsa glMultiTexCoordPointerEXT function Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:03:22 -04:00
Pierre-Eric Pelloux-Prayer	e364ddece3	mesa: add EXT_dsa glMultiTexGen* functions Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:03:21 -04:00
Pierre-Eric Pelloux-Prayer	e8e0de6a8f	mesa: add EXT_dsa glCopyMultiTexImage* and glCopyMultiTexSubImage* Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:03:19 -04:00
Pierre-Eric Pelloux-Prayer	f28d9ab1a3	mesa: add EXT_dsa glGetMultiTexParameteriv/fvEXT Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:03:18 -04:00
Pierre-Eric Pelloux-Prayer	989c375852	mesa: add EXT_dsa glMultiTexSubImage1D/2D/3DEXT Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:03:16 -04:00
Pierre-Eric Pelloux-Prayer	aac6578732	mesa: add EXT_dsa glMultiTexImage1D/2D/3DEXT + glGetMultiTexImageEXT Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:03:15 -04:00
Pierre-Eric Pelloux-Prayer	885dbe2e84	mesa: add glBindMultiTextureEXT display list support Fixes: `0972b0b059` ("mesa: add support for glBindMultiTextureEXT") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:03:13 -04:00
Pierre-Eric Pelloux-Prayer	d9e26c3483	mesa: add EXT_dsa glMultiTexParameter* functions Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:03:12 -04:00
Pierre-Eric Pelloux-Prayer	e04f95057f	mesa: add EXT_dsa (Get)MultiTexEnv functions Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:03:10 -04:00
Pierre-Eric Pelloux-Prayer	04b8e50bb8	mesa: add _mesa_(get)texenvi(f)v_indexed helpers They are exactly like _mesa_GetTexEnvfv/_mesa_GetTexEnviv except they take a GLuint texunit parameter instead of relying of ctx->Texture.CurrentUnit. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:03:08 -04:00
Pierre-Eric Pelloux-Prayer	0e595326c4	mesa: add new helper _mesa_get_texobj_by_target_and_texunit Based on the 'static get_texobj_by_target' function from texparam.c, but extended to also take the texunit as a parameter. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:03:06 -04:00
Pierre-Eric Pelloux-Prayer	58030d2b3d	mesa: replace _mesa_get_current_fixedfunc_tex_unit with _mesa_get_fixedfunc_tex_unit The new function implements the same feature but doesn't depend on ctx->Texture.CurrentUnit. This change allows to use it from indexed functions. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-06 17:02:52 -04:00
Danylo Piliaiev	b4c54894bb	iris: Handle vertex shader with window space position Iris advertises support for PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION so let's actually implement it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110657 Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-06 20:25:35 +00:00
Erico Nunes	b783f9f77e	lima: fix pipe_debug_callback warnings Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-08-06 20:29:53 +02:00
Vasily Khoruzhick	5adfc8602c	lima/ppir: move sin/cos input scaling into NIR Reviewed-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-08-06 17:49:22 +00:00
Antia Puentes	954224b714	nir/spirv: Fix gl_BaseVertex for non-indexed draws for OpenGL Lowers BaseVertex to the correct system value for OpenGL. v2: use options->environment rather than adding a new flag to spirv_to_nir_options Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-06 09:11:27 -07:00
Kenneth Graunke	382f92a814	iris: Increase BATCH_SZ to 64kB This seems to improve performance by roughly ~1% across the board. Thanks to Rafael Antognolli and Dan Walsh for their help tuning.	2019-08-06 09:09:26 -07:00
Bas Nieuwenhuizen	2af00b1fdd	ac/nir: Use correct cast for readfirstlane and ptrs. Fixes: `028ce527` "radv: Add non-uniform indexing lowering." Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-08-06 15:48:50 +00:00
Bas Nieuwenhuizen	2301b2e029	radv: Do non-uniform lowering before bool lowering. Since it can introduce comparisons. Fixes: `028ce52739` "radv: Add non-uniform indexing lowering." Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-08-06 15:48:50 +00:00
Jonathan Marek	dfe048058f	etnaviv: support 3D and 2D array textures Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-08-06 10:37:36 -04:00
Jonathan Marek	3508f2fb18	etnaviv: fix 3d texture upload Fix uploading of 3D textures and 2D array textures: * Remove asserts in BLT and RS checking z * Use box->z/box->depth in etna_copy_resource_box and CPU tile/untile * Track mip level depth and use it in etna_copy_resource Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-08-06 10:37:36 -04:00
Jonathan Marek	ed7a27719a	etnaviv: add alternative NIR compiler enable with ETNA_MESA_DEBUG=nir Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>	2019-08-06 10:33:17 -04:00
Jonathan Marek	ee1ed59458	etnaviv: prep for UBOs Allow UBO relocs and only emitting uniforms that are actually used. GC7000Lite has no address register, so upload uniforms to a UBO object to LOAD from. I removed the code to check for changes to individual uniforms and just reupload to entire uniform state when the state is dirty. I think there was very limited benefit to it and it isn't compatible with relocs. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-08-06 10:33:17 -04:00
Jonathan Marek	ca58c1120e	etnaviv: disasm: add dual16 bits, immediate decoding, and some opcodes Also use structs from etnaviv_asm since they hold the same information. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-08-06 10:33:17 -04:00
Jonathan Marek	e9a5181ad6	etnaviv: asm: new features * Dual16 bits * Halti5 disable multiple uniform src * write_mask compose * Halti2+ immediates Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-08-06 10:33:17 -04:00
Jonathan Marek	98e59f0a0a	etnaviv: update headers from rnndb Update to etna_viv commit f38ba2d. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-08-06 10:33:17 -04:00
Erico Nunes	e0aeee9460	lima: add summary report for shader-db Very basic summary, loops and gpir spills:fills are not updated yet and are only there to comply with the strings to shader-db report.py regex. For now it can be used to analyze the impact of changes in instruction count in both gpir and ppir. The LIMA_DEBUG=shaderdb setting can be useful to output stats on applications other than shader-db. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-08-06 15:43:31 +02:00
Erico Nunes	9e41a514a8	lima: add support for debug callback This adds support for glDebugMessageCallback which is required to support shader-db reports. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-08-06 15:43:26 +02:00
Tomeu Vizoso	67f4e1e787	panfrost/ci: Remove two tests from list of failures These tests have been fixed by: `b514f41183` ("glcpp: use pre-expansion line number for __LINE__") Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-08-06 15:19:43 +02:00
Jon Turney	84fae8e649	st/dri: Move dri2_format_mapping table and it's accessors from dri2.c to dri_helpers.c `8af1990a` exposed dri2_get_mapping_by_fourcc() in dri_helpers.h, so it could be used by dri_get_egl_image(), but didn't move it. This breaks the build in the with_dri=false case (e.g. when building for a target which doesn't have libdrm, so swrast is only dri driver built)	2019-08-06 12:21:56 +00:00
Jonathan Marek	b514f41183	glcpp: use pre-expansion line number for __LINE__ Fixes the following deqp tests: dEQP-GLES2.functional.shaders.preprocessor.predefined_macros.line_2_* It don't see the spec requiring this, but it seems to be better, as the clang preprocessor for example has this behavior. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-06 11:27:04 +00:00
Jason Ekstrand	bc612536eb	anv: Emit a dummy MEDIA_VFE_STATE before switching from GPGPU to 3D There is an object-level preemption workaround which requires this. However, even without object-level preemption, we seem to have issues with geometry flickering when 3D and compute are combined in the same batch and this appears to fix it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109630 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111267 Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-06 05:46:28 +00:00
Ian Romanick	5544b2cbbd	nir/algebraic: Use value range analysis to eliminate useless unary ops Sandy Bridge is the big winner because it lies at something of a crossroads. It supports a fairly high OpenGL version, and it still has the old style math box. The high OpenGL version means a lot more shaders can run on it. The old style math box means extra moves are necessary to resolve source modifiers on operands to complex math instructions like COS, SQRT, and RCP. v2: Remove a couple patterns that are now redundant. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16282006 -> 16278207 (-0.02%) instructions in affected programs: 174555 -> 170756 (-2.18%) helped: 661 HURT: 0 helped stats (abs) min: 1 max: 36 x̄: 5.75 x̃: 3 helped stats (rel) min: 0.06% max: 23.68% x̄: 2.81% x̃: 1.94% 95% mean confidence interval for instructions value: -6.16 -5.34 95% mean confidence interval for instructions %-change: -3.02% -2.60% Instructions are helped. total cycles in shared programs: 367168597 -> 367134284 (<.01%) cycles in affected programs: 1105276 -> 1070963 (-3.10%) helped: 460 HURT: 150 helped stats (abs) min: 1 max: 568 x̄: 96.60 x̃: 82 helped stats (rel) min: 0.02% max: 32.50% x̄: 7.99% x̃: 4.27% HURT stats (abs) min: 1 max: 901 x̄: 67.49 x̃: 39 HURT stats (rel) min: 0.07% max: 20.00% x̄: 4.90% x̃: 4.22% 95% mean confidence interval for cycles value: -65.68 -46.82 95% mean confidence interval for cycles %-change: -5.59% -4.05% Cycles are helped. Sandy Bridge total instructions in shared programs: 10824272 -> 10802557 (-0.20%) instructions in affected programs: 1237988 -> 1216273 (-1.75%) helped: 8199 HURT: 0 helped stats (abs) min: 1 max: 41 x̄: 2.65 x̃: 2 helped stats (rel) min: 0.12% max: 20.00% x̄: 2.04% x̃: 1.73% 95% mean confidence interval for instructions value: -2.70 -2.59 95% mean confidence interval for instructions %-change: -2.07% -2.00% Instructions are helped. total cycles in shared programs: 154009894 -> 153843598 (-0.11%) cycles in affected programs: 10650486 -> 10484190 (-1.56%) helped: 4973 HURT: 1533 helped stats (abs) min: 1 max: 3904 x̄: 40.20 x̃: 20 helped stats (rel) min: 0.02% max: 41.72% x̄: 2.63% x̃: 1.67% HURT stats (abs) min: 1 max: 453 x̄: 21.94 x̃: 8 HURT stats (rel) min: 0.02% max: 41.91% x̄: 1.54% x̃: 0.58% 95% mean confidence interval for cycles value: -28.02 -23.10 95% mean confidence interval for cycles %-change: -1.74% -1.56% Cycles are helped. LOST: 0 GAINED: 2 GM45 and Iron Lake had similar results. (Iron Lake shown) total instructions in shared programs: 8135196 -> 8134888 (<.01%) instructions in affected programs: 31920 -> 31612 (-0.96%) helped: 169 HURT: 0 helped stats (abs) min: 1 max: 12 x̄: 1.82 x̃: 2 helped stats (rel) min: 0.43% max: 3.23% x̄: 1.23% x̃: 1.16% 95% mean confidence interval for instructions value: -2.01 -1.64 95% mean confidence interval for instructions %-change: -1.32% -1.15% Instructions are helped. total cycles in shared programs: 188575724 -> 188574092 (<.01%) cycles in affected programs: 406840 -> 405208 (-0.40%) helped: 169 HURT: 0 helped stats (abs) min: 4 max: 72 x̄: 9.66 x̃: 10 helped stats (rel) min: 0.07% max: 2.16% x̄: 0.57% x̃: 0.47% 95% mean confidence interval for cycles value: -10.72 -8.59 95% mean confidence interval for cycles %-change: -0.63% -0.50% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-05 20:14:14 -07:00
Ian Romanick	8d14380971	nir/algebraic: Use value range analysis to convert fmin to fsat All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16297320 -> 16282006 (-0.09%) instructions in affected programs: 2434498 -> 2419184 (-0.63%) helped: 8091 HURT: 1 helped stats (abs) min: 1 max: 51 x̄: 1.89 x̃: 2 helped stats (rel) min: 0.04% max: 14.29% x̄: 0.98% x̃: 0.95% HURT stats (abs) min: 7 max: 7 x̄: 7.00 x̃: 7 HURT stats (rel) min: 0.28% max: 0.28% x̄: 0.28% x̃: 0.28% 95% mean confidence interval for instructions value: -1.94 -1.85 95% mean confidence interval for instructions %-change: -0.99% -0.96% Instructions are helped. total cycles in shared programs: 367221624 -> 367168597 (-0.01%) cycles in affected programs: 126409635 -> 126356608 (-0.04%) helped: 5612 HURT: 1023 helped stats (abs) min: 1 max: 2332 x̄: 31.11 x̃: 16 helped stats (rel) min: <.01% max: 30.31% x̄: 1.69% x̃: 1.42% HURT stats (abs) min: 1 max: 2372 x̄: 118.84 x̃: 16 HURT stats (rel) min: <.01% max: 46.98% x̄: 1.46% x̃: 0.35% 95% mean confidence interval for cycles value: -11.52 -4.46 95% mean confidence interval for cycles %-change: -1.26% -1.14% Cycles are helped. total spills in shared programs: 8868 -> 8870 (0.02%) spills in affected programs: 28 -> 30 (7.14%) helped: 0 HURT: 1 total fills in shared programs: 21903 -> 21904 (<.01%) fills in affected programs: 42 -> 43 (2.38%) helped: 0 HURT: 1 Haswell total instructions in shared programs: 13353925 -> 13338728 (-0.11%) instructions in affected programs: 2265850 -> 2250653 (-0.67%) helped: 8127 HURT: 5 helped stats (abs) min: 1 max: 51 x̄: 1.88 x̃: 2 helped stats (rel) min: 0.04% max: 20.00% x̄: 1.13% x̃: 1.07% HURT stats (abs) min: 5 max: 16 x̄: 9.00 x̃: 6 HURT stats (rel) min: 0.19% max: 0.52% x̄: 0.35% x̃: 0.28% 95% mean confidence interval for instructions value: -1.91 -1.83 95% mean confidence interval for instructions %-change: -1.15% -1.11% Instructions are helped. total cycles in shared programs: 375535444 -> 375536343 (<.01%) cycles in affected programs: 131206582 -> 131207481 (<.01%) helped: 5590 HURT: 1055 helped stats (abs) min: 1 max: 2844 x̄: 34.15 x̃: 16 helped stats (rel) min: <.01% max: 21.57% x̄: 2.08% x̃: 1.60% HURT stats (abs) min: 1 max: 2487 x̄: 181.78 x̃: 21 HURT stats (rel) min: <.01% max: 40.66% x̄: 1.96% x̃: 0.37% 95% mean confidence interval for cycles value: -4.74 5.01 95% mean confidence interval for cycles %-change: -1.51% -1.37% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 23401 -> 23407 (0.03%) spills in affected programs: 248 -> 254 (2.42%) helped: 2 HURT: 5 total fills in shared programs: 34850 -> 34845 (-0.01%) fills in affected programs: 383 -> 378 (-1.31%) helped: 2 HURT: 5 Ivy Bridge total instructions in shared programs: 11975423 -> 11968117 (-0.06%) instructions in affected programs: 845703 -> 838397 (-0.86%) helped: 4071 HURT: 0 helped stats (abs) min: 1 max: 51 x̄: 1.79 x̃: 1 helped stats (rel) min: 0.08% max: 8.21% x̄: 1.04% x̃: 0.93% 95% mean confidence interval for instructions value: -1.87 -1.71 95% mean confidence interval for instructions %-change: -1.06% -1.02% Instructions are helped. total cycles in shared programs: 179674318 -> 179635552 (-0.02%) cycles in affected programs: 5100065 -> 5061299 (-0.76%) helped: 2650 HURT: 611 helped stats (abs) min: 1 max: 900 x̄: 21.85 x̃: 16 helped stats (rel) min: <.01% max: 21.55% x̄: 2.39% x̃: 1.40% HURT stats (abs) min: 1 max: 1841 x̄: 31.33 x̃: 6 HURT stats (rel) min: <.01% max: 58.71% x̄: 1.64% x̃: 0.37% 95% mean confidence interval for cycles value: -14.14 -9.64 95% mean confidence interval for cycles %-change: -1.75% -1.52% Cycles are helped. LOST: 3 GAINED: 7 Sandy Bridge total instructions in shared programs: 10828844 -> 10824272 (-0.04%) instructions in affected programs: 525678 -> 521106 (-0.87%) helped: 2386 HURT: 0 helped stats (abs) min: 1 max: 51 x̄: 1.92 x̃: 2 helped stats (rel) min: 0.11% max: 7.96% x̄: 1.05% x̃: 0.94% 95% mean confidence interval for instructions value: -2.04 -1.80 95% mean confidence interval for instructions %-change: -1.08% -1.03% Instructions are helped. total cycles in shared programs: 154024591 -> 154009894 (<.01%) cycles in affected programs: 4005766 -> 3991069 (-0.37%) helped: 1245 HURT: 506 helped stats (abs) min: 1 max: 585 x̄: 21.07 x̃: 16 helped stats (rel) min: 0.02% max: 11.57% x̄: 1.98% x̃: 0.83% HURT stats (abs) min: 1 max: 639 x̄: 22.81 x̃: 6 HURT stats (rel) min: 0.01% max: 26.21% x̄: 1.07% x̃: 0.26% 95% mean confidence interval for cycles value: -10.57 -6.21 95% mean confidence interval for cycles %-change: -1.23% -0.97% Cycles are helped. GM45 and Iron Lake had similar results. (Iron Lake shown) total instructions in shared programs: 8137248 -> 8135196 (-0.03%) instructions in affected programs: 148322 -> 146270 (-1.38%) helped: 992 HURT: 0 helped stats (abs) min: 1 max: 32 x̄: 2.07 x̃: 2 helped stats (rel) min: 0.41% max: 9.73% x̄: 1.74% x̃: 1.51% 95% mean confidence interval for instructions value: -2.16 -1.98 95% mean confidence interval for instructions %-change: -1.80% -1.67% Instructions are helped. total cycles in shared programs: 188583424 -> 188575724 (<.01%) cycles in affected programs: 4409620 -> 4401920 (-0.17%) helped: 956 HURT: 6 helped stats (abs) min: 2 max: 168 x̄: 8.09 x̃: 8 helped stats (rel) min: 0.04% max: 6.76% x̄: 0.27% x̃: 0.18% HURT stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6 HURT stats (rel) min: 0.10% max: 0.10% x̄: 0.10% x̃: 0.10% 95% mean confidence interval for cycles value: -8.41 -7.60 95% mean confidence interval for cycles %-change: -0.29% -0.25% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-05 20:14:14 -07:00
Ian Romanick	b77070e293	nir/algebraic: Use value range analysis to eliminate tautological compares It's only one application on one platform (Haswell) that's affected, but spills and fills increase quite dramatically. :( All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16320850 -> 16297320 (-0.14%) instructions in affected programs: 448012 -> 424482 (-5.25%) helped: 1938 HURT: 0 helped stats (abs) min: 2 max: 264 x̄: 12.14 x̃: 10 helped stats (rel) min: 0.35% max: 43.75% x̄: 5.85% x̃: 5.38% 95% mean confidence interval for instructions value: -12.80 -11.48 95% mean confidence interval for instructions %-change: -5.99% -5.72% Instructions are helped. total cycles in shared programs: 367496943 -> 367221624 (-0.07%) cycles in affected programs: 8557232 -> 8281913 (-3.22%) helped: 1907 HURT: 26 helped stats (abs) min: 4 max: 12802 x̄: 147.21 x̃: 48 helped stats (rel) min: 0.03% max: 75.85% x̄: 5.55% x̃: 3.94% HURT stats (abs) min: 4 max: 1870 x̄: 208.23 x̃: 20 HURT stats (rel) min: 0.16% max: 32.11% x̄: 8.31% x̃: 0.79% 95% mean confidence interval for cycles value: -165.38 -119.48 95% mean confidence interval for cycles %-change: -5.68% -5.04% Cycles are helped. LOST: 1 GAINED: 0 Haswell total instructions in shared programs: 13374211 -> 13353925 (-0.15%) instructions in affected programs: 349868 -> 329582 (-5.80%) helped: 1669 HURT: 1 helped stats (abs) min: 1 max: 264 x̄: 12.57 x̃: 10 helped stats (rel) min: 0.12% max: 46.81% x̄: 6.86% x̃: 6.49% HURT stats (abs) min: 700 max: 700 x̄: 700.00 x̃: 700 HURT stats (rel) min: 64.34% max: 64.34% x̄: 64.34% x̃: 64.34% 95% mean confidence interval for instructions value: -13.25 -11.04 95% mean confidence interval for instructions %-change: -7.01% -6.63% Instructions are helped. total cycles in shared programs: 375763544 -> 375535444 (-0.06%) cycles in affected programs: 6932686 -> 6704586 (-3.29%) helped: 1622 HURT: 48 helped stats (abs) min: 2 max: 12229 x̄: 148.31 x̃: 68 helped stats (rel) min: 0.06% max: 74.03% x̄: 5.94% x̃: 4.12% HURT stats (abs) min: 3 max: 7451 x̄: 259.44 x̃: 41 HURT stats (rel) min: 0.05% max: 54.99% x̄: 8.52% x̃: 2.88% 95% mean confidence interval for cycles value: -159.86 -113.31 95% mean confidence interval for cycles %-change: -5.86% -5.18% Cycles are helped. total spills in shared programs: 23258 -> 23401 (0.61%) spills in affected programs: 54 -> 197 (264.81%) helped: 4 HURT: 2 total fills in shared programs: 34775 -> 34850 (0.22%) fills in affected programs: 52 -> 127 (144.23%) helped: 4 HURT: 1 LOST: 5 GAINED: 0 Ivy Bridge total instructions in shared programs: 11996051 -> 11977964 (-0.15%) instructions in affected programs: 346679 -> 328592 (-5.22%) helped: 1508 HURT: 0 helped stats (abs) min: 2 max: 198 x̄: 11.99 x̃: 10 helped stats (rel) min: 0.26% max: 19.83% x̄: 5.73% x̃: 5.43% 95% mean confidence interval for instructions value: -12.65 -11.34 95% mean confidence interval for instructions %-change: -5.86% -5.60% Instructions are helped. total cycles in shared programs: 179891389 -> 179691339 (-0.11%) cycles in affected programs: 7869479 -> 7669429 (-2.54%) helped: 1485 HURT: 23 helped stats (abs) min: 1 max: 12615 x̄: 136.16 x̃: 54 helped stats (rel) min: 0.02% max: 71.84% x̄: 4.69% x̃: 3.49% HURT stats (abs) min: 1 max: 403 x̄: 93.48 x̃: 6 HURT stats (rel) min: 0.04% max: 34.01% x̄: 8.68% x̃: 0.81% 95% mean confidence interval for cycles value: -154.59 -110.73 95% mean confidence interval for cycles %-change: -4.79% -4.19% Cycles are helped. Sandy Bridge total instructions in shared programs: 10829247 -> 10828844 (<.01%) instructions in affected programs: 21258 -> 20855 (-1.90%) helped: 88 HURT: 0 helped stats (abs) min: 2 max: 17 x̄: 4.58 x̃: 5 helped stats (rel) min: 0.52% max: 3.92% x̄: 2.05% x̃: 2.21% 95% mean confidence interval for instructions value: -5.03 -4.13 95% mean confidence interval for instructions %-change: -2.21% -1.89% Instructions are helped. total cycles in shared programs: 154035437 -> 154024591 (<.01%) cycles in affected programs: 430176 -> 419330 (-2.52%) helped: 78 HURT: 10 helped stats (abs) min: 2 max: 4649 x̄: 143.06 x̃: 32 helped stats (rel) min: 0.05% max: 6.02% x̄: 2.03% x̃: 1.07% HURT stats (abs) min: 3 max: 265 x̄: 31.30 x̃: 6 HURT stats (rel) min: 0.10% max: 8.67% x̄: 1.03% x̃: 0.21% 95% mean confidence interval for cycles value: -232.53 -13.97 95% mean confidence interval for cycles %-change: -2.13% -1.23% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8137402 -> 8137248 (<.01%) instructions in affected programs: 2280 -> 2126 (-6.75%) helped: 10 HURT: 0 helped stats (abs) min: 12 max: 19 x̄: 15.40 x̃: 15 helped stats (rel) min: 3.90% max: 11.73% x̄: 7.19% x̃: 6.95% 95% mean confidence interval for instructions value: -17.69 -13.11 95% mean confidence interval for instructions %-change: -8.99% -5.39% Instructions are helped. total cycles in shared programs: 188538716 -> 188583424 (0.02%) cycles in affected programs: 69326 -> 114034 (64.49%) helped: 0 HURT: 10 HURT stats (abs) min: 2068 max: 7686 x̄: 4470.80 x̃: 4870 HURT stats (rel) min: 27.20% max: 173.66% x̄: 69.55% x̃: 59.41% 95% mean confidence interval for cycles value: 2830.86 6110.74 95% mean confidence interval for cycles %-change: 39.18% 99.91% Cycles are HURT. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-05 20:14:13 -07:00
Ian Romanick	96fcb3f95b	nir/algebraic: Use value range analysis to eliminate tautological compares not used by if-statements This just eliminates tautological / contradictory compares that are used for bcsel and other non-if-statement cases. If-statements are not affected because removing flow control can cause the i965 instrution scheduler to create some very long live ranges resulting in unncessary spilling. This causes some shaders to fall of a performance cliff. Since many small if-statements are already flattened to bcsel, this optimization covers more than 68% of the possible cases (2417 shaders helped for instructions on Skylake vs. 3554). v2: Reorder and add whitespace to make the relationship between the patterns more obvious. Suggested by Caio. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16333474 -> 16322028 (-0.07%) instructions in affected programs: 438559 -> 427113 (-2.61%) helped: 1765 HURT: 0 helped stats (abs) min: 1 max: 275 x̄: 6.48 x̃: 4 helped stats (rel) min: 0.20% max: 36.36% x̄: 4.07% x̃: 1.82% 95% mean confidence interval for instructions value: -6.87 -6.10 95% mean confidence interval for instructions %-change: -4.30% -3.84% Instructions are helped. total cycles in shared programs: 367608554 -> 367511103 (-0.03%) cycles in affected programs: `8368829` -> 8271378 (-1.16%) helped: 1541 HURT: 129 helped stats (abs) min: 1 max: 4468 x̄: 66.78 x̃: 39 helped stats (rel) min: 0.01% max: 45.69% x̄: 4.10% x̃: 2.17% HURT stats (abs) min: 1 max: 973 x̄: 42.25 x̃: 10 HURT stats (rel) min: 0.02% max: 64.39% x̄: 2.15% x̃: 0.60% 95% mean confidence interval for cycles value: -64.90 -51.81 95% mean confidence interval for cycles %-change: -3.89% -3.36% Cycles are helped. total spills in shared programs: 8867 -> 8868 (0.01%) spills in affected programs: 18 -> 19 (5.56%) helped: 0 HURT: 1 total fills in shared programs: 21900 -> 21903 (0.01%) fills in affected programs: 78 -> 81 (3.85%) helped: 0 HURT: 1 All Gen6 and earlier platforms had similar results. (Sandy Bridge shown) total instructions in shared programs: 10829877 -> 10829247 (<.01%) instructions in affected programs: 30240 -> 29610 (-2.08%) helped: 177 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 3.56 x̃: 3 helped stats (rel) min: 0.37% max: 17.39% x̄: 2.68% x̃: 1.94% 95% mean confidence interval for instructions value: -3.93 -3.18 95% mean confidence interval for instructions %-change: -3.04% -2.32% Instructions are helped. total cycles in shared programs: 154036580 -> 154035437 (<.01%) cycles in affected programs: 352402 -> 351259 (-0.32%) helped: 96 HURT: 28 helped stats (abs) min: 1 max: 128 x̄: 14.73 x̃: 6 helped stats (rel) min: 0.03% max: 24.00% x̄: 1.51% x̃: 0.46% HURT stats (abs) min: 1 max: 117 x̄: 9.68 x̃: 4 HURT stats (rel) min: 0.03% max: 2.24% x̄: 0.43% x̃: 0.23% 95% mean confidence interval for cycles value: -13.40 -5.03 95% mean confidence interval for cycles %-change: -1.62% -0.53% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-05 20:14:13 -07:00
Ian Romanick	fa116ce357	nir/range-analysis: Range tracking for ffma and flrp A similar technique could be used for fmin3, fmax3, and fmid3. This could be squashed with the previous commit. I kept it separate to ease review. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-05 20:14:13 -07:00
Ian Romanick	586602c5d9	nir/range-analysis: Range tracking for bcsel This could be squashed with the previous commit. I kept it separate to ease review. v2: Add some missing cases. Use nir_src_is_const helper. Both suggested by Caio. Use a table for mapping source ranges to a result range. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-05 20:14:13 -07:00
Ian Romanick	3009cbed50	nir/range-analysis: Tighten the range of fsat based on the range of its source This could be squashed with the previous commit. I kept it separate to ease review. v2: Use a switch statement and add more comments. Both suggested by Caio. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-05 20:14:13 -07:00
Ian Romanick	405de7ccb6	nir/range-analysis: Rudimentary value range analysis pass Most integer operations are omitted because dealing with integer overflow is hard. There are a few things that could be smarter if there was a small amount more tracking of ranges of integer types (i.e., operands are Boolean, operand values fit in 16 bits, etc.). The changes to nir_search_helpers.h are included in this patch to simplify reordering the changes to nir_opt_algebraic.py. v2: Memoize range analysis results. Without this, some shaders appear to get stuck in infinite loops. v3: Rebase on many months of Mesa changes, including 1-bit Boolean changes. v4: Rebase on "nir: Drop imov/fmov in favor of one mov instruction". v5: Use nir_alu_srcs_equal for detecting (aa). Previously just the SSA value was compared, and this incorrectly matched (a.xa.y). v6: Many code improvements including (but not limited to) better names, more comments, and better use of helper functions. All suggested by Caio. Rework the handling of several opcodes to use a table for mapping source ranges to a result range. This change fixed a bug that caused fmax(gt_zero, ge_zero) to be incorrectly recognized as ge_zero. Slightly tighten the range of fmul by recognizing that xx is gt_zero if x is gt_zero. Add similar handling for -xx. v7: Use _______ in the tables as an alias for unknown. Suggested by Caio. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-05 20:14:13 -07:00
Ian Romanick	d24edb4b8c	nir/algebraic: Simplify some comparisons like a+constant < constant v2: Remove unsafe integer versions of the optimizations. This change had no effect on shader-db results. Suggested by Caio. All Gen6+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16333713 -> 16332631 (<.01%) instructions in affected programs: 258112 -> 257030 (-0.42%) helped: 1275 HURT: 407 helped stats (abs) min: 1 max: 7 x̄: 1.17 x̃: 1 helped stats (rel) min: 0.20% max: 8.33% x̄: 1.33% x̃: 0.86% HURT stats (abs) min: 1 max: 2 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.11% max: 2.94% x̄: 0.98% x̃: 0.98% 95% mean confidence interval for instructions value: -0.70 -0.59 95% mean confidence interval for instructions %-change: -0.84% -0.70% Instructions are helped. total cycles in shared programs: 367596791 -> 367601268 (<.01%) cycles in affected programs: 3420062 -> 3424539 (0.13%) helped: 1553 HURT: 783 helped stats (abs) min: 1 max: 742 x̄: 24.36 x̃: 6 helped stats (rel) min: 0.05% max: 21.12% x̄: 1.47% x̃: 0.65% HURT stats (abs) min: 1 max: 557 x̄: 54.04 x̃: 14 HURT stats (rel) min: 0.01% max: 33.66% x̄: 3.36% x̃: 1.43% 95% mean confidence interval for cycles value: -1.60 5.43 95% mean confidence interval for cycles %-change: -0.03% 0.33% Inconclusive result (value mean confidence interval includes 0). LOST: 0 GAINED: 2 Iron Lake total instructions in shared programs: 8137992 -> 8137874 (<.01%) instructions in affected programs: 17501 -> 17383 (-0.67%) helped: 104 HURT: 2 helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1 helped stats (rel) min: 0.25% max: 2.63% x̄: 0.87% x̃: 0.72% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.45% max: 0.45% x̄: 0.45% x̃: 0.45% 95% mean confidence interval for instructions value: -1.22 -1.00 95% mean confidence interval for instructions %-change: -0.94% -0.76% Instructions are helped. total cycles in shared programs: 188540038 -> 188539650 (<.01%) cycles in affected programs: 704574 -> 704186 (-0.06%) helped: 125 HURT: 84 helped stats (abs) min: 2 max: 96 x̄: 6.45 x̃: 4 helped stats (rel) min: <.01% max: 3.47% x̄: 0.42% x̃: 0.25% HURT stats (abs) min: 2 max: 58 x̄: 4.98 x̃: 4 HURT stats (rel) min: 0.01% max: 2.75% x̄: 0.36% x̃: 0.33% 95% mean confidence interval for cycles value: -3.20 -0.52 95% mean confidence interval for cycles %-change: -0.19% -0.03% Cycles are helped. GM45 total instructions in shared programs: 5008889 -> 5008830 (<.01%) instructions in affected programs: 8824 -> 8765 (-0.67%) helped: 52 HURT: 1 helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1 helped stats (rel) min: 0.25% max: 2.38% x̄: 0.86% x̃: 0.72% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.45% max: 0.45% x̄: 0.45% x̃: 0.45% 95% mean confidence interval for instructions value: -1.27 -0.95 95% mean confidence interval for instructions %-change: -0.96% -0.71% Instructions are helped. total cycles in shared programs: 128969426 -> 128969128 (<.01%) cycles in affected programs: 399798 -> 399500 (-0.07%) helped: 74 HURT: 30 helped stats (abs) min: 2 max: 22 x̄: 6.76 x̃: 6 helped stats (rel) min: <.01% max: 1.83% x̄: 0.46% x̃: 0.29% HURT stats (abs) min: 2 max: 58 x̄: 6.73 x̃: 6 HURT stats (rel) min: 0.06% max: 2.75% x̄: 0.42% x̃: 0.21% 95% mean confidence interval for cycles value: -4.60 -1.14 95% mean confidence interval for cycles %-change: -0.32% -0.08% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-05 20:14:13 -07:00
Ian Romanick	7c64cbf49d	nir/algebraic: Recognize (a < 0 \|\| 0 < b) as min(a, -b) < 0 Similar to commit `97e6c1b9` and `f5cf74d8ba`. First apply 0 < b => -b < 0 to get (a < 0 \|\| -b < 0), then apply some pre-existing rules to get min(a, -b) < 0. v2: Substantially update the comment explaining the use of is_used_once and the duplication of patterns. Suggested by Caio. Also, while flt and fge are not commutative, ior and iand are. Half of the original patterns were redundant, so delete them. As alternate justification for deleting them, fmin(a, -b) < 0 <=> 0 < fmax(-a, b). Proof left as an exercise for the reader. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16333789 -> 16333713 (<.01%) instructions in affected programs: 11424 -> 11348 (-0.67%) helped: 32 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 2.38 x̃: 2 helped stats (rel) min: 0.20% max: 1.67% x̄: 0.76% x̃: 0.69% 95% mean confidence interval for instructions value: -3.03 -1.72 95% mean confidence interval for instructions %-change: -0.89% -0.62% Instructions are helped. total cycles in shared programs: 367598295 -> 367596791 (<.01%) cycles in affected programs: 141414 -> 139910 (-1.06%) helped: 23 HURT: 6 helped stats (abs) min: 3 max: 386 x̄: 72.52 x̃: 20 helped stats (rel) min: 0.15% max: 4.86% x̄: 1.01% x̃: 0.76% HURT stats (abs) min: 4 max: 88 x̄: 27.33 x̃: 12 HURT stats (rel) min: 0.22% max: 3.95% x̄: 1.08% x̃: 0.59% 95% mean confidence interval for cycles value: -93.51 -10.21 95% mean confidence interval for cycles %-change: -1.10% -0.05% Cycles are helped. total instructions in shared programs: 10830836 -> 10830779 (<.01%) instructions in affected programs: 6895 -> 6838 (-0.83%) helped: 12 HURT: 0 helped stats (abs) min: 1 max: 14 x̄: 4.75 x̃: 1 helped stats (rel) min: 0.14% max: 1.61% x̄: 0.65% x̃: 0.33% 95% mean confidence interval for instructions value: -8.46 -1.04 95% mean confidence interval for instructions %-change: -1.03% -0.27% Instructions are helped. total cycles in shared programs: 154028477 -> 154032740 (<.01%) cycles in affected programs: 178433 -> 182696 (2.39%) helped: 3 HURT: 9 helped stats (abs) min: 3 max: 20 x̄: 11.00 x̃: 10 helped stats (rel) min: 0.07% max: 0.20% x̄: 0.12% x̃: 0.09% HURT stats (abs) min: 27 max: 1415 x̄: 477.33 x̃: 262 HURT stats (rel) min: 0.22% max: 6.45% x̄: 2.49% x̃: 1.76% 95% mean confidence interval for cycles value: 28.68 681.82 95% mean confidence interval for cycles %-change: 0.37% 3.30% Cycles are HURT. Iron Lake total instructions in shared programs: 8137966 -> 8137992 (<.01%) instructions in affected programs: 3281 -> 3307 (0.79%) helped: 0 HURT: 6 HURT stats (abs) min: 3 max: 7 x̄: 4.33 x̃: 3 HURT stats (rel) min: 0.63% max: 1.01% x̄: 0.76% x̃: 0.64% 95% mean confidence interval for instructions value: 2.17 6.50 95% mean confidence interval for instructions %-change: 0.56% 0.96% Instructions are HURT. total cycles in shared programs: 188539386 -> 188540038 (<.01%) cycles in affected programs: 103826 -> 104478 (0.63%) helped: 0 HURT: 7 HURT stats (abs) min: 16 max: 218 x̄: 93.14 x̃: 80 HURT stats (rel) min: 0.14% max: 0.95% x̄: 0.53% x̃: 0.46% 95% mean confidence interval for cycles value: 10.26 176.02 95% mean confidence interval for cycles %-change: 0.24% 0.81% Cycles are HURT. GM45 total instructions in shared programs: 5008876 -> 5008889 (<.01%) instructions in affected programs: 1645 -> 1658 (0.79%) helped: 0 HURT: 3 HURT stats (abs) min: 3 max: 7 x̄: 4.33 x̃: 3 HURT stats (rel) min: 0.63% max: 1.00% x̄: 0.76% x̃: 0.63% total cycles in shared programs: 128968950 -> 128969426 (<.01%) cycles in affected programs: 64854 -> 65330 (0.73%) helped: 0 HURT: 4 HURT stats (abs) min: 18 max: 218 x̄: 119.00 x̃: 120 HURT stats (rel) min: 0.14% max: 0.95% x̄: 0.60% x̃: 0.66% 95% mean confidence interval for cycles value: -62.92 300.92 95% mean confidence interval for cycles %-change: -0.05% 1.26% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-05 20:14:13 -07:00
Ian Romanick	92b75c126b	nir/algebraic: Replace checks that a value is between (or not) [0, 1] v2: Add an extra line to one of the proofs. Suggested by Caio. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 16329772 -> 16329427 (<.01%) instructions in affected programs: 41980 -> 41635 (-0.82%) helped: 110 HURT: 0 helped stats (abs) min: 1 max: 20 x̄: 3.14 x̃: 2 helped stats (rel) min: 0.19% max: 5.56% x̄: 1.12% x̃: 0.94% 95% mean confidence interval for instructions value: -4.10 -2.17 95% mean confidence interval for instructions %-change: -1.28% -0.96% Instructions are helped. total cycles in shared programs: 367551273 -> 367549979 (<.01%) cycles in affected programs: 492462 -> 491168 (-0.26%) helped: 76 HURT: 25 helped stats (abs) min: 1 max: 400 x̄: 42.86 x̃: 12 helped stats (rel) min: 0.06% max: 10.72% x̄: 1.23% x̃: 0.75% HURT stats (abs) min: 2 max: 730 x̄: 78.52 x̃: 16 HURT stats (rel) min: 0.17% max: 6.89% x̄: 2.08% x̃: 1.23% 95% mean confidence interval for cycles value: -37.79 12.16 95% mean confidence interval for cycles %-change: -0.90% 0.07% Inconclusive result (value mean confidence interval includes 0). LOST: 0 GAINED: 2 Sandy Bridge total instructions in shared programs: 10831115 -> 10830836 (<.01%) instructions in affected programs: 37830 -> 37551 (-0.74%) helped: 70 HURT: 0 helped stats (abs) min: 1 max: 20 x̄: 3.99 x̃: 2 helped stats (rel) min: 0.33% max: 7.14% x̄: 1.21% x̃: 0.97% 95% mean confidence interval for instructions value: -5.47 -2.50 95% mean confidence interval for instructions %-change: -1.49% -0.92% Instructions are helped. total cycles in shared programs: 154029323 -> 154028477 (<.01%) cycles in affected programs: 247909 -> 247063 (-0.34%) helped: 52 HURT: 6 helped stats (abs) min: 2 max: 254 x̄: 25.81 x̃: 4 helped stats (rel) min: 0.07% max: 4.39% x̄: 0.81% x̃: 0.19% HURT stats (abs) min: 4 max: 403 x̄: 82.67 x̃: 8 HURT stats (rel) min: 0.18% max: 1.60% x̄: 0.71% x̃: 0.53% 95% mean confidence interval for cycles value: -34.83 5.65 95% mean confidence interval for cycles %-change: -0.98% -0.32% Inconclusive result (value mean confidence interval includes 0). Iron Lake total instructions in shared programs: 8138007 -> 8137966 (<.01%) instructions in affected programs: 4060 -> 4019 (-1.01%) helped: 31 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.32 x̃: 1 helped stats (rel) min: 0.68% max: 8.33% x̄: 1.45% x̃: 0.90% 95% mean confidence interval for instructions value: -1.50 -1.15 95% mean confidence interval for instructions %-change: -2.11% -0.79% Instructions are helped. total cycles in shared programs: 188539492 -> 188539386 (<.01%) cycles in affected programs: 26280 -> 26174 (-0.40%) helped: 25 HURT: 0 helped stats (abs) min: 2 max: 8 x̄: 4.24 x̃: 4 helped stats (rel) min: 0.08% max: 2.11% x̄: 0.54% x̃: 0.50% 95% mean confidence interval for cycles value: -5.08 -3.40 95% mean confidence interval for cycles %-change: -0.70% -0.37% Cycles are helped. GM45 total instructions in shared programs: 5008897 -> 5008876 (<.01%) instructions in affected programs: 2096 -> 2075 (-1.00%) helped: 16 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.31 x̃: 1 helped stats (rel) min: 0.68% max: 7.69% x̄: 1.41% x̃: 0.89% 95% mean confidence interval for instructions value: -1.57 -1.06 95% mean confidence interval for instructions %-change: -2.32% -0.49% Instructions are helped. total cycles in shared programs: 128969020 -> 128968950 (<.01%) cycles in affected programs: 18490 -> 18420 (-0.38%) helped: 15 HURT: 0 helped stats (abs) min: 2 max: 8 x̄: 4.67 x̃: 4 helped stats (rel) min: 0.08% max: 2.11% x̄: 0.51% x̃: 0.48% 95% mean confidence interval for cycles value: -6.03 -3.30 95% mean confidence interval for cycles %-change: -0.78% -0.24% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-05 20:14:13 -07:00
Jonathan Marek	a44b4200f3	tgsi_to_nir: fix nir_gather_ssa_types for TGSI->NIR shaders Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>	2019-08-05 22:09:47 -04:00
Jason Ekstrand	f6e7de41d7	anv: Implement VK_EXT_line_rasterization Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-06 02:05:28 +00:00
Jason Ekstrand	f03512f90b	genxml: Rename 3DSTATE_SF::Anti-Aliasing Enable This makes it consistent with the new name when it's moved to 3DSTATE_RASTER. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-06 02:05:28 +00:00
Jason Ekstrand	abf9e10488	anv: Use dirty bits for dynamic state tracking Previously, we assumed that the dirty bit was always 1 << VK_DYNAMIC_* and this assumption is about to be false. Extensions which define new VK_DYNAMIC_* enums won't be nice and tightly packed which this really requires. Instead, add functions to don the conversions and rework the bits a bit. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-06 02:05:28 +00:00
Jason Ekstrand	aa13f75f01	anv: Advertise the right line width range on gen9 and CHV Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-06 02:05:28 +00:00
Alyssa Rosenzweig	77295b1fdc	meson: Add panfrost to the --auto list Look ma, we're a real driver now! I was waiting until Panfrost stabilises a bit for this, but now that 19.2 is almost here, let's make us official :) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-08-05 17:42:05 -07:00
Erico Nunes	360bda0b1d	lima/ppir: enable lower_vector_cmp to lower fall_equal Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-08-05 23:36:46 +02:00
Erico Nunes	9e8f8dbcd1	lima: re-run nir_opt_algebraic after int lowering nir_lower_int_to_float is currently only meant to run once, and some ops must be lowered after being converted from int ops to be implementable, so re-run nir_opt_algebraic after lowering ints to floats. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-08-05 23:36:35 +02:00
Alyssa Rosenzweig	3db4949197	pan/midgard: Extend SSA concurrency checks to other args No glmark changes, but this seems like a good idea. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-05 11:22:49 -07:00
Alyssa Rosenzweig	2869758355	pan/midgard: Rewrite bidirectionally when eliminating moves Symptom: the sky is black in SuperTuxKart (flashbacks to SMB/NES emulation intensify). Essentially, what happened is a fixed (special) move to r0 was eliminated but scheduling did not factor this in, so can_run_concurrent_ssa returned true even when there was a logical data dependency that needed to be resolved. Fixes: `20771ede1c` ("pan/midgard: Add post-RA move elimination") Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-05 10:58:39 -07:00
Danylo Piliaiev	04a9951580	intel/compiler: add ability to override shader's assembly When dumping shader's assembly with INTEL_DEBUG=vs,tcs,... sha1 of the resulting assembly is also printed, having environment variable INTEL_SHADER_ASM_READ_PATH present driver will try to load a "%sha1%.bin" file from the path and substitute current assembly with the one from the file. Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-08-05 17:19:09 +00:00
Danylo Piliaiev	430823c96b	intel/tools: add binary output type to i965_asm Add '-t,--type' command line option to specify the output type which can be 'bin', 'c_literal' or 'hex'. Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>	2019-08-05 17:19:09 +00:00
Alyssa Rosenzweig	1f8b653acb	panfrost: Add app blacklist In preparation for an initial 19.2 release, add a blacklist for apps known to be buggy under Panfrost to protect users. Panfrost is NOT a conformant implementation at this time. Distros: please do not revert this patch. If blacklisted apps are run using Panfrost, dragons will bite you. Thanks :) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-08-05 16:04:47 +00:00
Kenneth Graunke	64b73b770b	iris: Fix bad external BO hash table and zombie list interactions A while ago, we started deferring GEM object closure and VMA release until buffers were idle. This had some unforeseen interactions with external buffers. We keep imported buffers in hash tables, so if we have repeated imports of the same GEM object, we map those to the same iris_bo structure. This is critical for several reasons. Unfortunately, we broke this assumption. When freeing a non-idle external buffer, we would drop it from the hash tables, then move it to the zombie list. If someone reimported the same GEM object, we would not find it in the hash tables, and go ahead and make a second iris_bo for that GEM object. But the old iris_bo would still be in the zombie list, and so we would eventually call GEM_CLOSE on it - closing a BO that should have still been live. To work around this, we defer removing a BO from the hash tables until it's actually fully closed. This has the strange effect that an external BO may be on the zombie list, and yet be resurrected before it can be properly cleaned up. In this case, we remove it from the list so it won't be freed. Fixes severe instability in Weston, which was hitting EINVALs and ENOENTs from execbuf2, due to batches referring to a GEM object that had been closed, or at least had its VMA torched. Fixes: `457a55716e` ("iris: Defer closing and freeing VMA until buffers are idle.")	2019-08-05 08:53:41 -07:00
Kenneth Graunke	48e5a99d86	iris/bufmgr: Move iris_bo_reference into hash_find_bo, rename it Everybody importing an external buffer was looking it up in the hash table, then referencing it. We can just do that in the helper instead, which also gives us a convenient spot to stash extra code shortly.	2019-08-05 08:53:07 -07:00
Ahmad Fatoum	4f75ea57c2	gallium: add stm DRM entry point The STM32MP157 features a Vivante GC400 GPU supported by etnaviv. Add a DRM entry point for the STM display controller, so mesa can be used with it. Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-05 14:53:31 +00:00
Eric Engestrom	c251e2e662	gitlab-ci: don't remove a package we don't install anymore Fixes: `85dace1c0b` ("gitlab-ci: remove software-properties-common") Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-08-05 15:43:26 +01:00
Andrii Simiklit	dc471f2ef8	etnaviv: fix a null pointer dereference This issue was found by cppcheck Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>	2019-08-05 15:31:43 +02:00
Connor Abbott	74470baebb	ac/nir: Lower large indirect variables to scratch results from radeonsi NIR: Totals from affected shaders: SGPRS: 704 -> 464 (-34.09 %) VGPRS: 2056 -> 672 (-67.32 %) Spilled SGPRs: 24 -> 0 (-100.00 %) Spilled VGPRs: 28406 -> 0 (-100.00 %) Private memory VGPRs: 0 -> 3182 (0.00 %) Scratch size: 1064 -> 3228 (203.38 %) dwords per thread Code Size: 935260 -> 40180 (-95.70 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 28 -> 70 (150.00 %) Wait states: 0 -> 0 (0.00 %) results from radv: Totals from affected shaders: SGPRS: 80 -> 48 (-40.00 %) VGPRS: 204 -> 108 (-47.06 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 256 (0.00 %) dwords per thread Code Size: 15792 -> 9504 (-39.82 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 1 -> 2 (100.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-08-05 11:45:18 +02:00
Timothy Arceri	3c9144f9e5	drirc: Add discard workaround for Divinity: Original Sin EE This adds an additional work around for the game to fix the blocky shadows as reported in bug 105282 Acked-by: Eric Engestrom <eric.engestrom@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105282	2019-08-05 15:35:00 +10:00
Erico Nunes	486b33558a	lima/ppir: simplify load uni/temp op lowering and scheduling The load uniform/temporary operations output only to a pipeline register, which must be consumed by another op in the same instruction later. The current implementation delays the decision of who will consume this result to until the scheduling step. If the consumer node is not able to use the pipeline register, a mov node may have to be created, during the scheduler step. As part of the ppir scheduler simplification, and now that the ppir scheduler supports pipeline register dependencies, this can be simplified by always creating a single mov node outputting to a normal register that can be used directly by all consumers. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-08-04 13:38:19 +02:00
Erico Nunes	fd29c4d6c5	lima/ppir: simplify select op lowering and scheduling The select operation relies on the select condition coming from the result of the the alu scalar mult slot, in the same instruction. The current implementation creates a mov node to be the predecessor of select, and then relies on an exception during scheduling to ensure that both ops are inserted in the same instruction. Now that the ppir scheduler supports pipeline register dependencies, this can be simplified by making the mov explicitly output to the fmul pipeline register, and the scheduler can place it without an exception. Since the select condition can only be placed in the scalar mult slot, differently than a regular mov, define a separate op for it. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-08-04 13:38:18 +02:00
Erico Nunes	eb82637c2f	lima/ppir: support pipeline registers in scheduler The ppir scheduler grew to be rather complicated and containing many exceptions as it also has to take care of inserting additional nodes when it is mandatory for nodes to be in the same instruction. As such, the lima lowering and scheduling process can be difficult to understand and maintain. The ppir lowering step created nodes hoping that the scheduler would notice the exception and do the right thing. This proposal adds a simple refactor to the scheduler so that it places nodes with pipeline registers in the same instruction. With the scheduler handling this in a general way, it is possible to create same-instruction dependencies by using pipeline registers during the lowering stage. This is simpler to maintain because now we can make these dependencies explicit in a single place (lowering), and we can drop exceptions from scheduling. Reducing the complexity of the scheduler is also useful as preparatory work to support control flow in ppir. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-08-04 13:38:11 +02:00
Eric Engestrom	a1da8eccbe	docs: fix "empty array" meson syntax On recent versions of Meson (0.47+) these are synonymous, but we still support older versions than that, so let's use the correct syntax to avoid confusing users of old Meson versions. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-04 12:21:19 +01:00
Eric Engestrom	1361ab3c82	egl: drop unnecessary function deref Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Adam Jackson <ajax@redhat.com>	2019-08-04 11:26:20 +01:00
Eric Engestrom	e7e3fd5c03	glx: drop unnecessary pointer deref for function calls Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Adam Jackson <ajax@redhat.com>	2019-08-04 11:26:20 +01:00
Eric Engestrom	9668d7f539	introduce c11_compat.h to provide C11 things in C99 Right now, all it does is provide the new standard `static_assert()` name. Fixes: `fbf7c38da3` ("egl/wayland: use bitset.h for `formats` bit set") Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Tested-by: Bhushan Shah <bshah@kde.org>	2019-08-04 11:14:25 +01:00
Eric Engestrom	64ffc289be	travis: add MacOS Scons build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2019-08-04 11:11:32 +01:00
Eric Engestrom	8f1cdac793	symbols-check: fix `nm` invocation on MacOS According to Mac OSX's man page [1], this is how we should get the list of exported symbols: nm -g -P foo.dylib -g to only show the exported symbols -P to show it in a "portable" format, ie. readable by a script Since this is supported by GNU nm as well, let's use that everywhere, although some care needs to be taken as there are some differences in the output. [1] https://www.unix.com/man-page/osx/1/nm/ Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Tested-by: Vinson Lee <vlee@freedesktop.org>	2019-08-04 11:06:27 +01:00
Eric Engestrom	59f8809f3c	symbols-check: discard platform symbols early (as the comment there already claimed) Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Tested-by: Vinson Lee <vlee@freedesktop.org>	2019-08-04 11:06:27 +01:00
Eric Engestrom	81b3d141b3	symbols-check: skip test if we can't get the symbols list Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Tested-by: Vinson Lee <vlee@freedesktop.org>	2019-08-04 11:06:27 +01:00
Vasily Khoruzhick	c780af7771	lima/ppir: move alu vec to scalar lowering into NIR Utgard PP is vec4, but some operations are scalar, utilize NIR vec to scalar lowering pass and indicate operations that we want to lower. Reviewed-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-08-04 02:17:12 +00:00
Jason Ekstrand	aebca3961b	iris: Fix handling of SIMD32 fragment shaders The brw_wm_prog_data_dispatch_grf_start_reg and _prog_offset helpers read the _NPixelDispatchEnable fields from 3DSTATE_PS to figure out which bits to pull out of the prog data and stuff where. Therefore, they need to be called with the final set of _NPixelDispatchEnable bits after we've done the workaround for SIMD32 and 16x MSAA. Otherwise, if you end up with a somewhat odd combination of enables, the GRF start reg and KSP data ends up in the wrong slots. In particular, running SIMD32-only is broken but several other combinations are as well. Fixes: `5445c176e2` "iris: Disable SIMD32 when using a 16x MSAA..." Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-03 22:24:40 +00:00
Bas Nieuwenhuizen	9f37c9903b	mesa: Rename GLX_USE_TLS to USE_ELF_TLS. These days it is not GLX only and it does not work with all TLS implementations. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-03 20:18:17 +02:00
Bas Nieuwenhuizen	d7ca1efc6c	meson: Do not use GLX_USE_TLS on Android. The asm code expects a specific kind of implementation, but Android uses something different (emutls). Turns out mesa has a fallback with pthread_getspecific, with an optimizaiton if only a single thread is used. emutls also uses getspecific, so lets just use the optimized mesa implementation. Fixes: `20294dceeb` "mesa: Enable asm unconditionally, now that gen_matypes is gone." Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-03 18:40:04 +02:00
Christian Gmeiner	2dd598c129	etnaviv: s/boolean/bool Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Philipp Zabel <philipp.zabel@gmail.com>	2019-08-03 12:32:28 +02:00
Andreas Baierl	5254e53deb	lima/ppir: Add gl_FrontFace handling Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-08-03 08:04:12 +00:00
Jason Ekstrand	b62b0cfa71	intel/nir: Add 1-bit opcodes to brw_cmod_for_nir_comparison_op Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-08-03 00:35:48 +00:00
Jason Ekstrand	c02c3ff612	intel/nir: Add a common nir comparison -> cmod helper We already had one in the vec4 code, we just had move it. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-08-03 00:35:48 +00:00
Eric Engestrom	2fd30e3722	util: fix pointer type on NetBSD NetBSD expects a `void *` argument [1] as the printf-style arguments to the formatting string, so we need to cast the `const` away. [1] https://netbsd.gw.com/cgi-bin/man-cgi?pthread_setname_np++NetBSD-current Suggested-by: Kamil Rytarowski <n54@gmx.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-08-03 00:20:21 +00:00
Eric Engestrom	b558fa4dfe	meson: remove unused field Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Eric Anholt <eric@anholt.net> Tested-by: Vinson Lee <vlee@freedesktop.org>	2019-08-03 00:08:37 +00:00
Eric Engestrom	9a07606b84	meson: replace last uses of libxmlconfig with idep_xmlconfig Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Eric Anholt <eric@anholt.net> Tested-by: Vinson Lee <vlee@freedesktop.org>	2019-08-03 00:08:37 +00:00
Eric Engestrom	178811d8f6	meson: drop unused dep_{thread,dl} Unused as of last commit. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Eric Anholt <eric@anholt.net> Tested-by: Vinson Lee <vlee@freedesktop.org>	2019-08-03 00:08:37 +00:00
Eric Engestrom	d2d85b950d	meson: replace libmesa_util with idep_mesautil This automates the include_directories and dependencies tracking so that all users of libmesa_util don't need to add them manually. Next commit will remove the ones that were only added for that reason. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Eric Anholt <eric@anholt.net> Tested-by: Vinson Lee <vlee@freedesktop.org>	2019-08-03 00:08:37 +00:00
Alyssa Rosenzweig	8ddb38209d	pan/midgard: Print texture outmod I have no idea who thought this was a good idea. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 16:54:53 -07:00
Alyssa Rosenzweig	ad864a0bbb	pan/midgard: Promote all 16 uniforms Now that register spilling is in place, this is reasonable. It turns out for some shaders, it's actually better to cap at 8 work registers and extra >8 uniform reigsters and tolerate the spilling, since the extra resulting threads make up for the spillage. So incidentally, the shader that spills here is in -bterrain, which jumps from 19fps to 21fps as a result of this change. total instructions in shared programs: 3513 -> 3448 (-1.85%) instructions in affected programs: 776 -> 711 (-8.38%) helped: 20 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 3.25 x̃: 2 helped stats (rel) min: 3.57% max: 16.00% x̄: 8.37% x̃: 7.19% 95% mean confidence interval for instructions value: -4.28 -2.22 95% mean confidence interval for instructions %-change: -10.02% -6.73% Instructions are helped. total bundles in shared programs: 2067 -> 2024 (-2.08%) bundles in affected programs: 515 -> 472 (-8.35%) helped: 19 HURT: 1 helped stats (abs) min: 1 max: 6 x̄: 2.37 x̃: 2 helped stats (rel) min: 2.13% max: 17.86% x̄: 10.19% x̃: 11.11% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 3.23% max: 3.23% x̄: 3.23% x̃: 3.23% 95% mean confidence interval for bundles value: -3.01 -1.29 95% mean confidence interval for bundles %-change: -12.13% -6.91% Bundles are helped. total quadwords in shared programs: 3468 -> 3426 (-1.21%) quadwords in affected programs: 764 -> 722 (-5.50%) helped: 19 HURT: 1 helped stats (abs) min: 1 max: 5 x̄: 2.26 x̃: 2 helped stats (rel) min: 1.41% max: 12.50% x̄: 6.76% x̃: 7.14% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.08% max: 1.08% x̄: 1.08% x̃: 1.08% 95% mean confidence interval for quadwords value: -2.83 -1.37 95% mean confidence interval for quadwords %-change: -8.08% -4.65% Quadwords are helped. total registers in shared programs: 383 -> 360 (-6.01%) registers in affected programs: 112 -> 89 (-20.54%) helped: 19 HURT: 0 helped stats (abs) min: 1 max: 3 x̄: 1.21 x̃: 1 helped stats (rel) min: 12.50% max: 27.27% x̄: 20.63% x̃: 20.00% 95% mean confidence interval for registers value: -1.47 -0.95 95% mean confidence interval for registers %-change: -22.39% -18.87% Registers are helped. total threads in shared programs: 432 -> 451 (4.40%) threads in affected programs: 19 -> 38 (100.00%) helped: 11 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.73 x̃: 2 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% 95% mean confidence interval for threads value: 1.41 2.04 95% mean confidence interval for threads %-change: 100.00% 100.00% Threads are [helped]. total loops in shared programs: 4 -> 4 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 0 -> 4 spills in affected programs: 0 -> 4 helped: 0 HURT: 2 total fills in shared programs: 0 -> 7 fills in affected programs: 0 -> 7 helped: 0 HURT: 2 Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 16:52:21 -07:00
Alyssa Rosenzweig	e94239b9a4	pan/midgard: Break mir_spill_register into its function No functional changes, just breaks out a megamonster function and fixes the indentation. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 16:52:21 -07:00
Alyssa Rosenzweig	d4bcca19da	pan/midgard: Switch sources to an array for trinary sources We need three independent sources to support indirect SSBO writes (as well as textures with both LOD/bias and offsets). Now is a good time to make sources just an array so we don't have to rewrite a ton of code if we ever needed a fourth source for some reason. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 16:48:54 -07:00
Alyssa Rosenzweig	513d02cfeb	pan/midgard: Remove "r27-only" register class As far as I know, there's no such thing as a load/store op that only takes its argument in r27. We just need to set the appropriate arg_1 field in the RA to specify other registers if we want them. To facilitate this, various RA-related changes are needed across the compiler ; this should also fix indirect offsets which were implicitly interpreted as "r27-only" despite not even passing through RA yet. One ripple effect change is switching the move insertion point and adjusting the liveness analysis accordingly, so while this was intended as a purely functional change, there are some shader-db changes: total instructions in shared programs: 3511 -> 3498 (-0.37%) instructions in affected programs: 563 -> 550 (-2.31%) helped: 12 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.08 x̃: 1 helped stats (rel) min: 0.93% max: 5.00% x̄: 2.58% x̃: 2.33% 95% mean confidence interval for instructions value: -1.27 -0.90 95% mean confidence interval for instructions %-change: -3.23% -1.93% Instructions are helped. total bundles in shared programs: 2067 -> 2067 (0.00%) bundles in affected programs: 398 -> 398 (0.00%) helped: 7 HURT: 4 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 1.54% max: 10.00% x̄: 5.04% x̃: 5.56% HURT stats (abs) min: 1 max: 2 x̄: 1.75 x̃: 2 HURT stats (rel) min: 2.13% max: 4.26% x̄: 3.72% x̃: 4.26% 95% mean confidence interval for bundles value: -0.95 0.95 95% mean confidence interval for bundles %-change: -5.21% 1.50% Inconclusive result (value mean confidence interval includes 0). total quadwords in shared programs: 3464 -> 3454 (-0.29%) quadwords in affected programs: 1199 -> 1189 (-0.83%) helped: 18 HURT: 4 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 1.03% max: 5.26% x̄: 2.44% x̃: 1.79% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 2.56% max: 2.82% x̄: 2.63% x̃: 2.56% 95% mean confidence interval for quadwords value: -0.98 0.07 Inconclusive result (value mean confidence interval includes 0). total registers in shared programs: 383 -> 373 (-2.61%) registers in affected programs: 56 -> 46 (-17.86%) helped: 12 HURT: 2 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 9.09% max: 33.33% x̄: 29.58% x̃: 33.33% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 20.00% max: 50.00% x̄: 35.00% x̃: 35.00% 95% mean confidence interval for registers value: -1.13 -0.29 95% mean confidence interval for registers %-change: -35.07% -5.63% Registers are helped. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 14:20:03 -07:00
Alyssa Rosenzweig	5d9b7a8ddb	pan/midgard: Handle get/set_swizzle for load/store arguments Load/store's main "argument 0" already has its swizzle handled correctly (for stores, that is). But the tinier arguments, the compact ones with a component select but not a full swizzle, those are not yet handled. Let's do something about that!	2019-08-02 14:20:03 -07:00
Alyssa Rosenzweig	9aeb726045	pan/midgard: Fix block successors Rather than an ersatz thing that sort of looks like successors but is in fact just the source order traversal with some backward jumps hacked in for loops... construct an actual flow graph so we can do analysis sanely. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 14:20:03 -07:00
Alyssa Rosenzweig	1a116037d8	pan/midgard: Add helper to pack load/store registers Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 14:20:03 -07:00
Alyssa Rosenzweig	e112d9d333	pan/midgard: Decode register/component in load/store argument 3-bits out of 8 down! Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 14:20:03 -07:00
Alyssa Rosenzweig	5a572f4b55	pan/midgard: Fix REGISTER_OFFSET r27 isn't the special one, usually. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 14:20:03 -07:00
Alyssa Rosenzweig	c908772ee4	pan/midgard: Split ld/st unknown to arg_1/arg_2 fields The 16-bit field can be decomposed to two independent 8-bit fields, each representing a single (additional) argument to the load/store op, generally used for encoding registers. Addressable registers here are substantially limited compared to the main register in a load/store op. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 14:20:02 -07:00
Bas Nieuwenhuizen	2d54fdb563	radv: Expose VK_KHR_imageless_framebuffer. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-08-02 22:35:25 +02:00
Bas Nieuwenhuizen	9475782eac	radv: Implement VK_KHR_imageless_framebuffer. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-08-02 22:35:19 +02:00
Bas Nieuwenhuizen	a7041f3b4e	radv: Store image view also outside framebuffer. So we can use it with imageless framebuffers. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-08-02 22:19:16 +02:00
Bas Nieuwenhuizen	49e6c2fb78	radv: Store color/depth surface info in attachment info instead of framebuffer. That way we can use it for imageless framebuffers. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-08-02 22:18:51 +02:00
Alyssa Rosenzweig	cd98d94516	panfrost: Allocate polygon lists on-demand Rather than alloacting a huge (64MB) polygon list on context creation and sharing it across framebuffers, we instead allocate polygon lists as BOs (which consistently hit the cache) sized appropriately; for about a month, we've known how to calculate the polygon list size so this has only recently become possible. The good news is we can render to truly massive framebuffers without crashing and, more importantly, we eliminate the 64MB upfront overhead. If a list that size isn't actually needed, it's not allocated. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-08-02 21:54:58 +02:00
Boris Brezillon	ed501c00cb	panfrost: Handle the bo == NULL case in panfrost_bo_[un]reference() Allows us to pass BOs without checking if they're NULL or not. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 21:54:58 +02:00
Boris Brezillon	12f72175f3	panfrost: Get rid of the skippable param in attach_vt_framebuffer() The only user of this function always passes true. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 21:54:58 +02:00
Boris Brezillon	8227d284f7	panfrost: Don't emit a new FB desc when setting a new FB state The FB desc will be emitted/attached on the first draw targetting this new FB. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 21:54:58 +02:00
Boris Brezillon	95507a3dd4	panfrost: Bail out early when doing a wallpaper blit The wallpaper blit is a bit special in that the operation is targetting the current FB, but the u_blitter logic creates a new surface for it which makes util_framebuffer_state_equal() return false. In that case we don't want a new FB descriptor to be emitted/attached, so let's just copy the new state into ctx->pipe_framebuffer and exit the function. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 21:54:58 +02:00
Boris Brezillon	8645afce4c	panfrost: Bail out early when new and current FB states are equal If the current FB matches the new one there's nothing to be done in panfrost_set_framebuffer_state(). By bailing out early in that case we avoid emitting new FB descriptors (the old ones are still valid). Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 21:54:58 +02:00
Boris Brezillon	17d6ee2bd1	panfrost: Delay FB descriptor allocation No need to emit SFBD/MFBD at frame invalidation. They can be emitted when the framebuffer is attached, which saves us a potential FB desc re-allocation if a new FB is bound after the swap. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 21:54:58 +02:00
Boris Brezillon	b5ca1e5458	panfrost: Remove job from ctx->jobs at submission time This guarantees that new draws targetting the same framebuffer will get a new job instance. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 21:54:58 +02:00
Boris Brezillon	20b00e1ff2	panfrost: Make ctx->job useful ctx->job is supposed to serve as a cache to avoid an hash table lookup everytime we access the job attached to the currently bound FB, except it was never assigned to anything but NULL. Fix that by adding the missing assignment in panfrost_get_job_for_fbo(). Also add a missing NULL assignment in the ->set_framebuffer_state() path. While at it, add extra assert()s to make sure ctx->job is consistent. Fixes: `59c9623d0a` ("panfrost: Import job data structures from v3d") Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 21:52:56 +02:00
Bas Nieuwenhuizen	72e7b7a00b	ac/nir,radv: Optimize bounds check for 64 bit CAS. When the application does not ask for robust buffer access. Only implemented the check in radv. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-08-02 21:21:55 +02:00
Roland Scheidegger	74baeacafc	gallivm: fix issue with AtomicCmpXchg wrapper on llvm 3.5-3.8 These versions still need wrapper but already have both success and failure ordering. (Compile tested on llvm 3.3, 3.7, 3.8.) v2: don't duplicate whole function (suggested by Brian). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111102 Reviewed-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2019-08-02 20:16:17 +02:00
Matt Turner	dcf9d91a80	util: Handle differences in pthread_setname_np There are a lot of unfortunate differences in the implementation of this function. NetBSD and Mac OS X in particular require different arguments. https://stackoverflow.com/questions/2369738/how-to-set-the-name-of-a-thread-in-linux-pthreads/7989973#7989973 provides for a good overview of the differences. Fixes: `9c411e020d` ("util: Drop preprocessor guards for glibc-2.12") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111264 Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> [Eric: use DETECT_OS_* instead of PIPE_OS_*] Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-08-02 18:38:52 +01:00
Eric Engestrom	55eadf971a	util/os_time: use detect_os.h to uncouple from gallium Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-08-02 18:38:52 +01:00
Eric Engestrom	bffa23313a	util/u_debug: use detect_os.h Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-08-02 18:38:52 +01:00
Eric Engestrom	7f12a66ad5	util/os_misc: use detect_os.h to start uncoupling from gallium Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-08-02 18:38:52 +01:00
Eric Engestrom	87adc898b3	util/os_memory: use detect_os.h to uncouple it from gallium While at it, remove p_compiler.h as well as it is unused. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-08-02 18:38:52 +01:00
Eric Engestrom	9a5148190a	gallium: deduplicate os detection logic by using detect_os.h This allows us to avoid having to rename all the PIPE_OS_* at once while still making sure PIPE_OS_* and DETECT_OS_* are always in sync. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-08-02 18:38:52 +01:00
Eric Engestrom	8c52bca112	gallium/utils: drop PIPE_SUBSYSTEM_WINDOWS_USER This is basically just an alias for PIPE_OS_WINDOWS. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-08-02 18:38:52 +01:00
Eric Engestrom	e740e7a6f0	scons: rename PIPE_SUBSYSTEM_EMBEDDED to EMBEDDED_DEVICE It has nothing to do with the PIPE_SUBSYSTEM_* stuff from gallium. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-08-02 18:38:52 +01:00
Eric Engestrom	8c63348c94	gallium: remove never-used PIPE_SUBSYSTEM_DRI PIPE_SUBSYSTEM_DRI was introduced in `dacfef1589` ("gallium: New configuration header.") 11 years ago, and was never used. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-08-02 18:38:52 +01:00
Eric Engestrom	bfb70032d4	util: fix typo in comment Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-08-02 18:38:52 +01:00
Eric Engestrom	362e9d8682	util: introduce detect_os.h Mostly copied from src/gallium/include/pipe/p_config.h, so I kept its copyright and authorship. Other than the obvious rename, the big difference is that these are always defined, to be used as `#if DETECT_OS_LINUX`. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-08-02 18:38:52 +01:00
Rob Clark	9d5beab441	freedreno/batch: fix dependency loop detection We can have a scenario like: A -> B A -> C -> B When adding the A->C dependency, it doesn't really matter that C depends on something that A depends on, that isn't a necessary condition for a dependency loop. Instead what we want to know is that nothing C depends on, directly or indirectly, depends on A. We can detect this by recursively OR'ing the dependents_mask of C and all it's dependencies. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-02 10:24:14 -07:00
Rob Clark	e1790c532a	freedreno/a6xx: add missing flush/invalidates for blit Various things we were missing for multiple blits in a single batch. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-02 10:24:14 -07:00
Rob Clark	d8379da19e	freedreno/a6xx: skip tiles with no geometry If no clear, and no geometry according to VSC_STATE[pipe] we can skip the tile entirely. If there is a fast-clear, we can't skip restore (clear) or resolve IBs, but we can still skip draw IB. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-02 10:24:14 -07:00
Rob Clark	de3e130fc9	freedreno/a6xx: VSC overflow detection/handling Check VSC_SIZE/VSC_SIZE2 regs from cmdstream to detect overflow, and skip use of VSC visibility stream when overflow is detected, to avoid GPU hangs. This is done w/ introduction of some CP_REG_TEST/ CP_COND_REG_EXEC packet pairs. In addition, eventually (after a frame or two) detect the condition and resize the VSC buffers until overflow no longer happens. Note that this significantly reduces the initial size of the VSC buffers, backing out a previous hack to make them 16x larger than what should be typically required (the previous "solution" for VSC overflow). Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-02 10:24:14 -07:00
Rob Clark	401f532bea	freedreno/a6xx: remove USE/IGNORE_VISIBILITY draw patching Seems this isn't needed anymore on a6xx to control whether visibility stream is used. And it would be hard to deal with if it was, for disabling use of VSC stream in draw pass. So just remove it and simplify things. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-02 10:24:14 -07:00
Rob Clark	146d6e6463	freedreno/a6xx: cleanup "blit_mem" Rename to "control_mem", and switch to using a struct to manage the layout, rather than just ad-hoc hard-coded offsets. For recovering from VSC stream overflow, we'll need to add more, but best to clean it up first. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-02 10:24:14 -07:00
Rob Clark	1cbb7f7601	freedreno: refresh tile debug Fix some #ifdef'd bitrot, and get rid of #ifdef so it doesn't bitrot again. And add a prints for per-tile state. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-02 10:24:14 -07:00
Rob Clark	44f3c1cf01	freedreno: update registers Pull in some updates of VSC regs Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-02 10:24:14 -07:00
Rob Clark	c179ded9cb	freedreno/gmem: small cleanup Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-02 10:24:14 -07:00
Rob Clark	e2bb3e84ab	freedreno/drm: convert ring_pool to child_pool Worth another couple percent at driver2 Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-02 10:24:14 -07:00
Rob Clark	9ac23794c9	freedreno/drm: remove idx_lock Since it ends up contended, it is a bit of a bottleneck for workloads with high driver overhead. Worth nearly +10% at gfxbench driver2. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-08-02 10:24:14 -07:00
Rob Clark	e439f63467	freedreno/batch: always update last_fence Not all flush paths come thru fd_context_flush(), so we should also set last_fence in the batch flush path. This avoids some no-op flushes just to get a fence. For example when pctx->flush_resource() triggers a flush. We should probably keep the last_fence update in fd_context_flush() as well to handle deferred flush case. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-02 10:24:14 -07:00
Rob Clark	c93eae7f10	freedreno: drop unused fd_fence_ref param The pscreen param was just there to satisfy pipe_screen::fence_reference But some of the internal uses passed NULL for screen. Which is a bit ugly. Instead drop the param and add a shim function to plug into the screen. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-02 10:24:14 -07:00
Alyssa Rosenzweig	1637a53890	pan/midgard: Print invert modifier Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 09:57:15 -07:00
Alyssa Rosenzweig	62a5ee3bb4	pan/midgard: Flip conditionals We would like to flip ops to have a constant in the second place to enable inlining of the constant. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 09:57:15 -07:00
Alyssa Rosenzweig	d066ca3575	pan/midgard: Add bitwise src/invert fusing De Morgan's Laws and some special ops basically. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 09:57:15 -07:00
Alyssa Rosenzweig	620c2717cf	pan/midgard: Add .not propagation pass Essentially .pos propagation but for bitwise. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 09:57:15 -07:00
Alyssa Rosenzweig	b821e1b85e	pan/midgard: Fuse invert into bitwise ops We use the new invert flag to produce ops like inand. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 09:57:15 -07:00
Jonathan Marek	d8584c5cf2	freedreno: a2xx: implement texture tiling Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Rob Clark <robdclark@chromium.org>	2019-08-02 15:58:22 +00:00
Jonathan Marek	fb5c3db0ab	freedreno: a2xx: use nir_lower_alu_to_scalar instead of lowering pass nir_lower_alu_to_scalar can now be used to only lower certain ops, so we don't need the custom pass. And we can lower fall_equal/fany_nequal with lower_vector_cmp instead. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Rob Clark <robdclark@chromium.org>	2019-08-02 15:58:22 +00:00
Jonathan Marek	e652ca4e0b	freedreno: a2xx: fix HW binning for batches with >256K vertices Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Rob Clark <robdclark@chromium.org>	2019-08-02 15:58:22 +00:00
Jonathan Marek	257957b026	freedreno: a2xx: fix fneg/fabs/fsat opcodes Previously we would get a fmov with modifiers, but now that mov has no type these opcodes need to be supported. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Rob Clark <robdclark@chromium.org>	2019-08-02 15:58:22 +00:00
Jonathan Marek	43dbd7d603	freedreno: a2xx: fix order of NIR opts int_to_float needs to come after bool_to_float, and lower_to_source_mods needs to come after both, since they don't deal wih source mods. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Rob Clark <robdclark@chromium.org>	2019-08-02 15:58:22 +00:00
Jonathan Marek	57e980a4fb	freedreno: a2xx: fix non-etc1 cubemaps Not sure how this happened, but apparently all cubemaps need swapped XY. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Rob Clark <robdclark@chromium.org>	2019-08-02 15:58:22 +00:00
Jonathan Marek	2e029acbe2	freedreno: a2xx: fix fast clear not being used for Z24X8 buffers Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Rob Clark <robdclark@chromium.org>	2019-08-02 15:58:22 +00:00
Jonathan Marek	e25388c97b	freedreno: align renderonly scanout buffers Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Rob Clark <robdclark@chromium.org>	2019-08-02 15:58:22 +00:00
Eric Engestrom	6125c93e00	gitlab-ci: just build all the tools This line was mistakenly added while there is already a `-D tools=all` a few lines below. Fixes: `f60defa72d` ("gitlab-ci: Add a shader-db run using v3d on drm-shim.") Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-02 16:41:19 +01:00
Sergii Romantsov	a86eccfb78	i965/clear: clear_value better precision Test-case with depth-clear 0.5 and format MESA_FORMAT_Z24_UNORM_X8_UINT fails due inconsistent clear-value of 0.4999997. Maybe its better to improve? CC: Jason Ekstrand <jason.ekstrand@intel.com> Fixes: `0ae9ce0f29` (i965/clear: Quantize the depth clear value based on the format) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111113 Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com> Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-02 14:25:34 +00:00
Samuel Pitoiset	e8110e51c6	radv: fix image_has_{cmask,fmask}() helpers The driver should now rely on cmask_offset because CMASK can be disabled by the driver for some reasons (eg. mipmaps). Apply the same change for FMASK, although it should be useless. Fixes: `ad1bc8621d` ("radv: remove radv_get_image_fmask_info()") Fixes: `10d08da52c` ("radv/gfx10: add missing dcc_tile_swizzle tweak") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-08-02 14:00:50 +02:00
Samuel Pitoiset	ad1bc8621d	radv: remove radv_get_image_fmask_info() It's unnecessary to duplicate fields in another struct. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-08-02 13:34:46 +02:00
Samuel Pitoiset	10d08da52c	radv/gfx10: add missing dcc_tile_swizzle tweak Fixes: `c90f46700d` ("radv/gfx10: mask DCC tile swizzle by alignment") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-08-02 13:34:43 +02:00
Samuel Pitoiset	9c9745e8dd	radv: remove radv_get_image_cmask_info() It's unnecessary to duplicate fields in another struct. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-08-02 13:34:41 +02:00
Samuel Pitoiset	856487a280	radv: only account for tile_swizzle for color surfaces with DCC It's 0 for depth surfaces with TC compat HTILE enabled. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-08-02 13:34:39 +02:00
Bas Nieuwenhuizen	e1c5d8a364	radv: Enable VK_KHR_shader_atomic_int64 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-08-02 12:26:32 +02:00
Bas Nieuwenhuizen	a17f2206d3	ac/nir: Implement LLVM9 64-bit buffer compare & exchange. LLVM 9 does not have a 64-bit buffer compswap intrinsic, so this extracts the ptr, does a bound check and then uses a cmpxchg LLVM instruction. Not ideal, but the earliest release we're going to get a proper intrinsic is LLVM 10. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-08-02 12:26:11 +02:00
Connor Abbott	73274c9ec2	Revert "ac/nir: handle negate modifier" This reverts commit `bfea7e4d29`.	2019-08-02 11:14:50 +02:00
Connor Abbott	4a382d66ee	Revert "ac/nir: handle abs modifier" This reverts commit `d3c80733cd`. These were only appearing due to memory corruption.	2019-08-02 11:14:08 +02:00
Timothy Arceri	06ec14d692	iris: bump compat profile support to 4.6 All of the current piglit compat profile tests pass. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-02 18:56:53 +10:00
Timothy Arceri	74f96b06d6	egl: fix OpenGL 3.1 context creation >From the EGL_KHR_create_context spec: "* If OpenGL 3.1 is requested, the context returned may implement any of the following versions: * Version 3.1. The GL_ARB_compatibility extension may or may not be implemented, as determined by the implementation. * The core profile of version 3.2 or greater." Fixes CTS tests: dEQP-EGL.functional.create_context_ext.gl_31.rgb888_depth_stencil dEQP-EGL.functional.create_context_ext.robust_gl_31.rgb888_depth_stencil dEQP-EGL.functional.create_context_ext.gl_31.rgb888_depth_no_stencil dEQP-EGL.functional.create_context_ext.robust_gl_31.rgb888_depth_no_stencil dEQP-EGL.functional.create_context_ext.gl_31.rgba8888_depth_no_stencil dEQP-EGL.functional.create_context_ext.gl_31.rgb888_no_depth_no_stencil dEQP-EGL.functional.create_context_ext.robust_gl_31.rgba8888_depth_no_stencil dEQP-EGL.functional.create_context_ext.robust_gl_31.rgb888_no_depth_no_stencil dEQP-EGL.functional.create_context_ext.gl_31.rgba8888_no_depth_no_stencil dEQP-EGL.functional.create_context_ext.robust_gl_31.rgba8888_no_depth_no_stencil dEQP-EGL.functional.create_context_ext.gl_31.rgba8888_depth_stencil dEQP-EGL.functional.create_context_ext.robust_gl_31.rgba8888_depth_stencil Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-02 18:56:53 +10:00
Connor Abbott	f41516bdb5	nir/find_array_copies: Reject copies with mismatched type When we detect a scalar/vector copy through load_deref/store_deref, we have to be careful since those can bitcast an int to a float and vice-versa even though copy_deref can't. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111251 Fixes: `156306e5e6` ("nir/find_array_copies: Handle wildcards and overlapping copies") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-02 10:34:29 +02:00
Samuel Pitoiset	7368000868	radv: re-apply "Optimize rebinding the same descriptor set." This makes it cheaper to just change the dynamic offsets with the same descriptor sets. This optimization has been reverted a while back because of random GPU hangs on GFX9, no it looks fine, at least CTS no longer hangs on GFX9 and it doesn't hang on GFX10 as well. It fixes a performance problem with Wolfenstein Youngblood. Suggested-by: Philip Rebohle <philip.rebohle@tu-dortmund.de>	2019-08-02 09:56:55 +02:00
Samuel Pitoiset	96a5445559	radv/gfx10: use the correct target machine for Wave32 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-08-02 09:37:38 +02:00
Samuel Pitoiset	8a86908e9a	radv/gfx10: add Wave32 support for vertex, tessellation and geometry shaders It can be enabled with RADV_PERFTEST=gewave32. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-08-02 09:37:36 +02:00
Samuel Pitoiset	953bbacc23	radv/gfx10: add Wave32 support for fragment shaders It can be enabled with RADV_PERFTEST=pswave32. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-08-02 09:37:34 +02:00
Kenneth Graunke	18c2e09dc7	gallium: Implement GL_EXT_shader_samples_identical via a new capability This exposes the textureSamplesIdenticalEXT function in GLSL. We enable it for iris and radeonsi, because their compilers already have support for this. Tested on Intel Kabylake and AMD Vega 64. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-01 23:38:54 -07:00
Kenneth Graunke	adcc0a8fdc	intel/tools: Fix aubinator_viewer build. This functions was recently renamed and not all callers were updated. Fixes: `086c486a75` ("intel/device: rename gen_get_device_info")	2019-08-01 23:36:41 -07:00
Francisco Jerez	54fbc625ea	intel/ir: Fix CFG corruption in opt_predicated_break(). Specifically the optimization of a conditional BREAK + WHILE sequence into a conditional WHILE seems pretty broken. The list of successors of "earlier_block" (where the conditional BREAK was found) is emptied and then re-created with the same edges for no apparent reason. On top of that the list of predecessors of the block immediately after the WHILE loop is emptied, but only one of the original edges will be added back, which means that potentially several blocks that still have it on their list of successors won't be on its list of predecessors anymore, causing all sorts of hilarity due to the inconsistency in the control flow graph. The solution is to remove the code that's removing valid edges from the CFG. cfg_t::remove_block() will already clean up after itself. The assert in bblock_t::combine_with() also needs to be removed since we will be merging a block with multiple children into the first one of them. Found the issue on a hardware enabling branch originally, but apparently somebody reproduced the same problem independently on master in the meantime. Fixes: `d13bcdb3a9` ("i965/fs: Extend predicated break pass to predicate WHILE.") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111009 Cc: jiradet.jd@gmail.com Cc: Sergii Romantsov <sergii.romantsov@globallogic.com> Cc: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org Tested-by: Paul Chelombitko <qamonstergl@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-08-01 16:56:48 -07:00
Mark Janes	ddb59cd20e	intel/device: make internal functions private The device info initializer makes several fuctions internal: - handling of device override - updating topology from kernel information The implementation file is slightly reordered due to the renamed functions being static. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-01 16:40:03 -07:00
Mark Janes	086c486a75	intel/device: rename gen_get_device_info Rename the original device info initialization routine so callers don't mistakenly call the wrong one: gen_get_device_info_from_fd: Queries kernel for full device info, including topology details. gen_get_device_info_from_pci_id: Partially initializes device info based on PCI ID lookup, when the kernel is not available. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-01 16:39:56 -07:00
Mark Janes	d594d2a052	intel/tools: use device info initializer Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-01 16:39:54 -07:00
Mark Janes	e4a0070db4	anv: use initialization routine for gen_device_info Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-01 16:39:51 -07:00
Mark Janes	49465f1330	iris/screen: use initialization routine for gen_device_info Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-01 16:39:48 -07:00
Mark Janes	96e1c945f2	i965: Move device info initialization to common code With perf queries, initializing the device info is much more complex than just getting a PCI ID and calling gen_get_device_info. This commit adds a new gen_get_device_info_from_fd helper in common code which does all of the requisite kernel queries to get device info including all of the topology information. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-01 16:39:44 -07:00
Mark Janes	1186f6ea69	i965/perf: verify kernel support before registering OA metrics When gen_device_info updates the topology in it's initializer, the kernel queries will fail silently. Iris and anv have minimum kernel requirements that support the queries. i965 must verify kernel support before reporting OA metrics. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-01 16:39:41 -07:00
Mark Janes	7852fe5415	intel/common: provide common ioctl routine i965 links against libdrm for drmIoctl, but anv and iris both re-implement this routine to avoid the dependency. intel/dev also needs an ioctl wrapper, so lets share the same implementation everywhere. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-08-01 16:38:40 -07:00
Alyssa Rosenzweig	b40ba2db6c	panfrost: Remove unused argument A relic from when we didn't have an online compiler, hah. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig	ff345d4a01	panfrost: Handle MESA_SHADER_COMPUTE in compile callback Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig	73c40d6bbb	pan/midgard: Use standard list traversal to find initial tag Fixes a hang (and abort) on empty shaders, which you shouldn't have anyway but better safe than sorry. DCE going on the fritz is no reason to freeze the system. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig	4647999327	panfrost: Use gl_shader_stage directly for compiles No need to add a third set of enums to the mix. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig	d9eb65c60c	panfrost: Emit "draw" info for compute jobs Important fields relating to shader state and UBOs are filled out from this (misnomer) function. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig	22a8f6de61	panfrost: Feed compute shaders into the compiler The path for compute shader compiles resembles the graphic shader compile path, although it is substantially simpler as we don't need any shader keying. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig	1b284628ef	panfrost: Expose compute shaders as panfrost_shader_variants Whether variants are packed by graphics or compute is irrelevant. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig	8b53230d47	panfrost: Remove shader state *base It is now unused. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig	c228046b4b	panfrost: Remove CSO dependency from shader_compile We want this routine to be generic across graphics and compute, so let the caller deal with the typing. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig	428bed3bde	panfrost: Generalize UBO upload for other shader stages Now that everything is unified, this generalization is nice and easy. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig	a34370e855	panfrost: Guard vertex upload by ctx->vertex != NULL This is irrelevant for graphics but matters for compute workloads. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig	3bfdb878aa	panfrost: Generalize vertex shader upload This allows us to reuse the same code path for compute. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig	3b7224190e	panfrost: Share gl_enables between VERTEX/COMPUTE Catch-all for magic bits. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig	871c02b12e	panfrost: Invoke compute shader according to grid info We already have helpers for packing invocations (due to its role in instanced vertex shaders), so we can reuse this drop in for compute shaders. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig	748ccbc808	panfrost: Explain and include compute FBD Squint at it hard enough and you realize it's the beginning of an SFBD... I guess... A compute shader with register spilling would be able to confirm this, but we would expect to see the first field \| 1 and an address splattered later, setting up TLS. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig	3113be3127	panfrost: Unify-driven cleanup Again, now that stages are unified some logic goes away. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig	ac6aa93f9e	panfrost: Unify ctx->vs and ctx->fs It's a little verbose, but this way we can support other shader stages without too much contortion. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:03 -07:00
Alyssa Rosenzweig	4b93152c29	panfrost: Flesh out launch_grid stub It's still incomplette, but we're able to hook into launch_grid to create a stub COMPUTE job. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:02 -07:00
Alyssa Rosenzweig	cd1be4605c	panfrost: Cleanup via payload unification Since these are now indexable, quite a bit of code cleans up. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:02 -07:00
Alyssa Rosenzweig	0da52015a1	panfrost: Unify payload_vertex/payload_tiler Rather than disparate variables, let's use an array of payloads indexed by the shader stage. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:02 -07:00
Alyssa Rosenzweig	902115f94f	panfrost: Only wallpaper if we drew something last_tiler.gpu may be NULL at flush time despite no clear and existing jobs -- if we executed a compute-only workload. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:02 -07:00
Alyssa Rosenzweig	2d86828243	panfrost: Adjust shader CAPs to expose dEQP compute Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:01 -07:00
Alyssa Rosenzweig	39fe9f5e2f	panfrost: Expose NIR as our PIPE_SHADER_CAP_SUPPORTED_IRS We could expose TGSI as well -- we pipe it through tgsi_to_nir for Gallium-internal shaders anyway -- but we'd rather not. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:01 -07:00
Alyssa Rosenzweig	1697760e05	panfrost: Copy freedreno's panfrost_get_compute_param Values reported here aren't remotely correct, but it's a start to just get the entrypoint stubbed out. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:01 -07:00
Alyssa Rosenzweig	c8bc664447	panfrost: Expose COMPUTE-related caps for GLES3.1 Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:01 -07:00
Alyssa Rosenzweig	5a8b83ca0b	panfrost: Stub out launch_grid Just dumps some information about the invocation for later debug. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:01 -07:00
Alyssa Rosenzweig	a8fc40aaf5	panfrost: Stub out compute CSO Doesn't do anything, just gets the functions there. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:23:01 -07:00
Alyssa Rosenzweig	e913986868	panfrost: Implement gl_FrontFacing Interestingly, this requires no compiler changes. It's just exposed as a special varying. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:15:03 -07:00
Alyssa Rosenzweig	f3e15122d4	panfrost: Add support for decoding gl_FrontFacing Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:15:03 -07:00
Alyssa Rosenzweig	9e66ff3ea9	pan/decode: Use max varying index as varying buffer count This allows us to decode asymmetric varyings correctly, which occurs with e.g. gl_FrontFacing. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-08-01 16:15:03 -07:00
Timothy Arceri	2afedfaf9a	iris: add support for gl_ClipVertex in tess eval shaders Required for OpenGL compat support. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-01 16:12:37 -07:00
Timothy Arceri	00b5bf2d72	iris: add support for gl_ClipVertex in geometry shaders This will enable us to support the OpenGL compat profile. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-08-01 16:12:27 -07:00
Jason Ekstrand	70dc017aec	nir: Stop whacking gl_FrontFacing to a system value We have a cap bit for gallium and a GLSL compiler flag to control this. Just trust what GLSL gives us and stop forcing it. In order for this to be safe, we have to advertise another cap in some of the gallium drivers. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-01 21:59:37 +00:00
Alyssa Rosenzweig	4e736b88f3	panfrost: Implement panfrost_set_shader_buffers callback Just copy over the passed SSBO for now. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-01 14:32:08 -07:00
Alyssa Rosenzweig	898a18ea89	gallium/util: Add util_set_shader_buffers_mask helper Conceptually follows util_set_vertex_buffers_mask but for SSBOs. v2: Fix missing ~ when clearing mask. Adjust mask behaviour to match freedreno/v3d when buffer == NULL. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-01 14:31:56 -07:00
Jonathan Marek	3e33173200	kmsro: move entry points from etnaviv to kmsro These drivers are kmsro drivers so they should be part of the kmsro #if This fixes missing imx_drm driver when building with only freedreno+kmsro Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-08-01 16:31:51 -04:00
Emil Velikov	85dace1c0b	gitlab-ci: remove software-properties-common Currently we use the python package to manage repositories. At the same time we also do that by hand - since it's a trivial echo to a file. Stay consistent, remove the package and manage things manually. Acked-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2019-08-01 16:16:15 +00:00
Brian Paul	3307c85a7d	st/mesa: fix MSVC compile breakage Trivial.	2019-08-01 09:07:21 -06:00
Gert Wollny	9de00e74fe	virgl: Enable depth_clamp by lowering if the host is new enough. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-01 05:58:53 +00:00
Gert Wollny	b2e92c45ce	gallium: Make PIPE_CAP_DEPTH_CLIP_DISABLE a tri-state value and use it Use value "2" to signal that lowering is needed and supported and enable it accordingly. v2: - Note in CAP description that this lowering currently requires TGSI - use "true" instead of GL_TRUE (both Erik) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-01 05:58:53 +00:00
Gert Wollny	616f320745	mesa/st: Signal state changes when depth_clamp is emulated v1 implemented by Erik Faye-Lund <erik.faye-lund@collabora.com> v2: - Add GS and TES - fix constants state update flags (Erik) v3: don't update rasterizer when depth_clamp is lowered (Erik) v4: Correct NewDepthClamp and also set flags for NewClipControl (Erik) v5: Also set shader_has_one_variant property acording to possible depth_clamp lowering (Marek) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-01 05:58:53 +00:00
Gert Wollny	d004fcc04a	mesa/st: Add depth clamping to rasterizer code implemented by Erik Faye-Lund <erik.faye-lund@collabora.com> v2: Use current depth range values for clamping (Erik) v3: fix scons-win64 build Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-01 05:58:53 +00:00
Gert Wollny	57361d89fa	mesa/st: Tie depth_clamp code into other shaders (GS and TES) v2: Use file scope defined depth_range_state in common v3: - don't use the one_shader_variant property, as this is not correct (Marek) - also use tests on available shader stages to enable depth_clamp lowering v4: Don't use key.st, use st directly (Marek) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-01 05:58:53 +00:00
Gert Wollny	d81ba38b02	mesa/st: Tie depth_clamp lowering into the FS v1 implemented by Erik Faye-Lund <erik.faye-lund@collabora.com> v2: Use different call for FS v3: Use file scope defined depth_range_state Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-01 05:58:53 +00:00
Gert Wollny	fefb152067	mesa/st: Tie depth clamp lowering in to the VP code v1: implemented by Erik Faye-Lund <erik.faye-lund@collabora.com> v2: Add handling of the ARB_clip_control depth mode v3: Move depth_range_state to file scope and remove training zeros (Erik) v4: - don't use the one_shader_variant property, as this is not correct (Marek) - also use tests on available shader stages to enable depth_clamp lowering V5: Don't use key.st, use st directly (Marek) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-01 05:58:53 +00:00
Erik Faye-Lund	b048d8bf8f	mesa/st: add tgsi-lowering code for depth-clamp This is a TGSI pass that lowers depth-clamping into shader-operations, by replacing the depth-value with 0 (a z-coordinate of zero will always pass the OpenGL depth test conditions), and using a dedicated varying to interpolate the real depth-value instead. Finally we replace the depth-output in the fragment shader. v1 implemented by Erik Faye-Lund <erik.faye-lund@collabora.com> v2: Add support for handling depth clip mode, and refactor code v3: - Rename _vs functions to _last_vertex_stage (Erik) - Use 0.0 depth to avoid clipping (Erik) v4: Fix inversion of bool value for clip control property Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-01 05:58:53 +00:00
Gert Wollny	78ba12f40f	mesa/st: replace boolean declarations by bool Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-08-01 05:58:53 +00:00
Gert Wollny	7fb47195d8	Revert "softpipe: Don't draw when rasterizer_discard is set" This was too aggressive and breaks TF (Ilia) This reverts commit `4ee638cd78`. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-08-01 05:57:41 +00:00
Eric Engestrom	a563bb9e28	docs: reword meson instructions Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-08-01 00:42:02 +01:00
Eric Engestrom	8a1e803643	travis: drop unnecessary Meson option for MacOS Those are already their default values on MacOS. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-08-01 00:25:20 +01:00
Jason Ekstrand	b539157504	intel/vec4: Drop all of the 64-bit varying code Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 18:14:09 -05:00
Jason Ekstrand	d03ec807a4	intel/fs: Drop all of the 64-bit varying code Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 18:14:09 -05:00
Jason Ekstrand	942c759059	intel: Use NIR to lower 64-bit varying access Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 18:14:09 -05:00
Jason Ekstrand	078dcb7ccd	nir/lower_io: Add an option to lower 64-bit varyings Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 18:14:09 -05:00
Jorge Natz	a63e82deb5	docs: Update Platforms and Drivers page with more comprehensive information. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-07-31 22:50:43 +00:00
Dave Airlie	7ad6ec80d9	nir: use common deref has indirect code in scratch lowering. This doesn't seem to need it's own copy here. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-08-01 08:32:12 +10:00
Eric Engestrom	5d7bcac4e7	nir: remove explicit nir_intrinsic_index_flag values These were left after a rebase and happen to make NIR_INTRINSIC_SWIZZLE_MASK == NIR_INTRINSIC_SRC_ACCESS, which is how it was noticed. Fixes: `6f20643b47` ("nir: Allow qualifiers on copy_deref and image instructions") Cc: Connor Abbott <cwabbott0@gmail.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-31 23:28:20 +01:00
Yevhenii Kolesnikov	830a8e6c47	state_tracker: Free Labels for querry and tranform_feedback Memory leaks were observed on iris with GL_KHR_debug. Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-31 22:16:42 +00:00
Kenneth Graunke	b61f17d362	iris: Skip emitting 3DSTATE_INDEX_BUFFER if possible We were emitting 3DSTATE_INDEX_BUFFER on every indexed draw, even if back-to-back draws referred to the same index buffer. This improves drawoverhead scores in the DrawElements cases by about 10%, by giving us even more minimal batches.	2019-07-31 15:14:10 -07:00
Mike Blumenkrantz	8af1990ad7	st/dri: simplify dri_get_egl_image by reusing dri2_format_table this makes dri2_get_mapping_by_fourcc accessible from dri_helpers.h and does a direct lookup on the fourcc id to match the pipe format v2 (Ken): Allow map to be NULL, use img->texture->format. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-31 15:11:15 -07:00
Erico Nunes	82bf5a8aac	lima: enable lower_bitops in ppir The mali pp doesn't support integers and some nir_algebraic optimizations may result in ops that are not easily lowerable to floats, so disable optimizations resulting in bitops. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Jonathan Marek <jonathan@marek.ca>	2019-07-31 23:06:26 +02:00
Erico Nunes	b3676a6548	nir/algebraic: rename lower_bitshift to lower_bitops Optimizations that insert bitshift or bitwise operations should not be applied on GPUs that don't support integer operations. The .lower_bitshift could be used to control the bitshift related ones, but there was also one bitwise optimization uncovered. Since only lima and freedreno use this option and the use case is that no bit operations are wanted, let's rename it to .lower_bitops and use it to control all bitops related optimizations. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Jonathan Marek <jonathan@marek.ca>	2019-07-31 23:06:04 +02:00
Erico Nunes	99c956fb47	lima/ppir: lower fdot in nir_opt_algebraic Now that we have fsum in nir, we can move fdot lowering there. This helps reduce ppir complexity and enables the lowered ops to be part of other nir optimizations in the optimization loop. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-07-31 21:35:58 +02:00
Erico Nunes	4a407df682	nir/algebraic: add new fsum ops and fdot lowering The Mali400 pp doesn't implement fdot but has fsum3 and fsum4, which can be used to optimize fdot lowering. fsum2 is not implemented and can be further lowered to an add with the vector components. Currently lima ppir handles this lowering internally, however this happens in a very late stage and requires a big chunk of code compared to a nir_opt_algebraic lowering. By having fsum in nir, we can reduce ppir complexity and enable the lowered ops to be part of other nir optimizations in the optimization loop. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-31 21:35:58 +02:00
Erico Nunes	7f8ff686b7	lima/ppir: refactor texture code to simplify scheduler The 'varying fetch' pp instruction deals only with coordinates, and 'texture fetch' deals only with the sampler index. Previously it was not possible to clearly map ppir_op_load_coords and ppir_op_load_texture to pp instructions as the source coordinates were kept in the ppir_op_load_texture node, making this harder to maintain. The refactor is made with the attempt to clearly map ppir_op_load_coords to the 'varying fetch' and ppir_op_load_texture to the 'texture fetch'. The coordinates are still temporarily kept in the ppir_op_load_texture node as nir has both sampler and coordinates in a single instruction and it is only possible to output one ppir node during emit. But now after lowering, the sources are transferred to the (always) created ppir_op_load_coords node, and it should be possible to directly map them to their pp instructions from there onwards. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-07-31 21:22:41 +02:00
Erico Nunes	d2901de09e	lima/ppir: lower texture projection Lower texture projection in ppir using nir_lower_tex and nir_lower_tex. This will insert a mul with the coordinate division before the load varying. Even though the lima pp supports projection in the load varying instruction while loading the coordinates (from a register or a varying), it requires that both the coordinates and projector be components in a single register. nir currently handles them in separate ssa, and attempting to merge them manually may end up in worse code than just doing the coordinate division manually. So for now let's just lower the projection to add support for it in lima. In the future, an optimization pass may be implemented in lima to ensure that both coords and projector come in the same register, then this lowering may be disabled and in this case lima may use the built-in projection and save the mul instruction from lowering. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-07-31 21:22:41 +02:00
Vinson Lee	412e1b51fe	scons: Fix random_r check. Fixes: `597bddad47` ("scons: Test for random_r()") Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 18:23:55 +00:00
Kenneth Graunke	3f9012839e	Revert "st/dri: simplify dri_get_egl_image by reusing dri2_format_table" This reverts commit `c47af8b95f`. It causes dEQP-EGL regressions. (I think there is an easy fix, but we'll have it go through review again.)	2019-07-31 11:06:32 -07:00
Alyssa Rosenzweig	91c4acedaf	pan/midgard: Don't special case inline_constant Another constant source of bugs. Ain't that special. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-31 10:59:19 -07:00
Alyssa Rosenzweig	29416a8599	pan/midgard: De-special-case branching It's not that special. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-31 10:59:18 -07:00
Alyssa Rosenzweig	3e47a1181b	panfrost: Add MALI_SAMP_NORM_COORDS flag Corresponds to the normalized coordinates? flag on images in OpenCL and evidently also shows up in GL, so let's wire it in. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-31 10:56:11 -07:00
Alyssa Rosenzweig	cf6cad3922	panfrost: Simplify filter_mode definition It's just a bit field containing some flags; there's no need for all the macro magic. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-31 10:56:11 -07:00
Alyssa Rosenzweig	160795429d	pan/midgard: Shrink "compute FBD" We still don't know what it is, but from a newer trace we now know it's half the size we thought it was. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-31 10:56:11 -07:00
Alyssa Rosenzweig	194b49ee28	panfrost: Flip texture/sampler fields We had them backwards in both the command stream and the Midgard stack. In OpenGL ES 2.0, they're always the same, but in Vulkan/later-GL/CL they diverge so we can fix this. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-31 10:56:11 -07:00
Alyssa Rosenzweig	a692126c93	panfrost: Add MALI_ATTR_IMAGE value Images are implemented (in part) as special attributes, so include support for decoding this. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-31 10:56:11 -07:00
Mike Blumenkrantz	c47af8b95f	st/dri: simplify dri_get_egl_image by reusing dri2_format_table this makes dri2_get_mapping_by_fourcc accessible from dri_helpers.h and does a direct lookup on the fourcc id to match the pipe format Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-31 09:50:06 -07:00
Mike Blumenkrantz	7404833c2e	gallium: add handling for YUV planar surfaces st/dri: this adds a table (similar to the one in i965) which provides mappings for turning various planar formats into multiple sampler views. whereas only NV12 and IYUV were supported, now many more formats are supported here: * P0XX * YUV4XX * YVU4XX * AYUV * XYUV * YUYV * UYVY the table is used directly to handle image creation, simplifying a lot of code and resolving related TODO/FIXME items where workarounds were previously in place to manage NV12 and IYUV formats exclusively st/mesa: the changes here relate to setting up samplers for the planar formats. this requires: * checking for driver support for all the sampler formats * creating the samplers with the corresponding formats and swizzling * running nir_lower_tex with the appropriate options to trigger the lowering for each plane->sampler fixes kwg/mesa#36 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-31 09:50:06 -07:00
Mike Blumenkrantz	338a29b08f	gallium: add AYUV and XYUV formats this only adds the PIPE_FORMAT members, not any direct handling for them Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-31 09:50:06 -07:00
Alyssa Rosenzweig	7f75b2b5af	pan/midgard: Simplify discard logic The "branch offset" is, in fact, ignored. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-31 09:39:16 -07:00
Alyssa Rosenzweig	27524d1462	pan/midgard: Add units for more instructions For everything but freduce, we have some sense of what units the instruction takes. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-31 09:39:16 -07:00
Alyssa Rosenzweig	64235b1ecc	pan/midgard: Fix ball/bany opcode table This were seriously messed up beyond all recognition. How we're passing shaders.random.* is a mystery. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-31 09:39:16 -07:00
Alyssa Rosenzweig	13ee87c8b9	pan/midgard: Document branch combination LUT This took way longer to figure out than it should have.. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-31 09:39:16 -07:00
Kenneth Graunke	2037478702	st/mesa: Skip scissor rect updates when scissor is entirely disabled. If any scissor rectangles are enabled, then we need to set proper scissor rectangles for all viewports. But if the scissor test is entirely disabled, then we can skip updating any scissor rectangles. Without this step, we were updating the scissor rectangles based on the current framebuffer size. So if an app rendered to a variety of render targets at different sizes, with scissor test disabled each time, we'd still be continually updating the scissor rectangles, even though it's not necessary. In Civilization VI, this drops us from 310-350 set_scissor_state calls per frame to 0, as it doesn't appear to use scissor testing. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-31 08:33:50 -07:00
Emil Velikov	72b97ad9b2	egl/drm: ensure the backing gbm is set before using it Currently, if we error out before gbm_dri is set (say due to a different name of the backing GBM implementation, or otherwise) the tear down will trigger a NULL ptr deref and crash out. Move the gbm_dri initialization as early as possible. v2: Drop check in dri2_teardowm_drm (Eric) Reported-by: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-07-31 14:18:12 +01:00
Eric Engestrom	4bf7e7b170	docs: update required meson version Fixes: `f7b6a8d12f` ("meson: bump required version to 0.46") Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-31 11:50:39 +01:00
Samuel Pitoiset	c66021069e	radv/gfx10: implement a GE bug workaround Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-31 12:14:29 +02:00
Samuel Pitoiset	9a3fc7b6fa	radv/gfx10: remove an obsolete VGT_REUSE_OFF workaround Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-31 12:14:29 +02:00
Samuel Pitoiset	bb8f25233a	radv/gfx10: disable LATE_ALLOC_GS on Navi14 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-31 12:14:29 +02:00
Samuel Pitoiset	e041a74588	radv/gfx10: implement a bug workaround for GE_PC_ALLOC Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-31 12:14:29 +02:00
Samuel Pitoiset	0e1724af61	radv/gfx10: implement a bug workaround for NGG -> legacy transitions Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-31 12:14:29 +02:00
Samuel Pitoiset	29cca5f381	radv: skip draw calls with 0-sized index buffers Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-31 12:14:29 +02:00
Eric Engestrom	fed6aa2fec	autotools: delete leftover script wrapper Randomly came across this file, which was likely only used by autotools to pass arguments to the test. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-31 10:16:30 +01:00
Eric Engestrom	53b98b0185	virgl: make use of local variable Otherwise that variable is only used in an assert() and would need an ASSERTED to avoid the warning. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Eric Engestrom	20c89b060f	mesa: add an ASSERTED Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Eric Engestrom	bbeb507543	compiler/nir: add an ASSERTED Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Eric Engestrom	7e2fe85a40	intel: add a couple of ASSERTED Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Eric Engestrom	abc226cf41	tree-wide: replace MAYBE_UNUSED with ASSERTED Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Eric Engestrom	ab9c76769a	r600: replace MAYBE_UNUSED with specific #ifdef Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Eric Engestrom	745bae40ad	gallium/aux: replace MAYBE_UNUSED with UNUSED MAYBE_UNUSED is going away, so let's replace legitimate uses of it with UNUSED, which the former aliased to so far anyway. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Eric Engestrom	513e67d2e4	mesa: replace MAYBE_UNUSED with UNUSED MAYBE_UNUSED is going away, so let's replace legitimate uses of it with UNUSED, which the former aliased to so far anyway. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Eric Engestrom	c8a453a770	v3d: replace MAYBE_UNUSED with UNUSED MAYBE_UNUSED is going away, so let's replace legitimate uses of it with UNUSED, which the former aliased to so far anyway. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Eric Engestrom	d470f1acce	v3d: drop incorrect MAYBE_UNUSED While at it, use that `screen` variable everywhere. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Eric Engestrom	84b8a50540	st/tests: drop incorrect MAYBE_UNUSED Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Eric Engestrom	aed15fa799	radv: drop incorrect MAYBE_UNUSED `compressed` is clearly always used on the line right after. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Eric Engestrom	21196ec927	r600: move variable to proper scope It helps show when it's actually used. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Eric Engestrom	5febd4d575	compiler: replace MAYBE_UNUSED with UNUSED MAYBE_UNUSED is going away, so let's replace legitimate uses of it with UNUSED, which the former aliased to so far anyway. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Eric Engestrom	bac5760e7b	mesa: drop MAYBE_UNUSED var Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Eric Engestrom	e1dd6c2575	anv: drop MAYBE_UNUSED var Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Eric Engestrom	644cca65d3	i965: drop unused MAYBE_UNUSED function Added in `1b85c605a6` but never used. Cc: Jordan Justen <jordan.l.justen@intel.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Eric Engestrom	7a3fb14609	i965: replace MAYBE_UNUSED with GEN condition Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Eric Engestrom	eee70e09bf	intel: replace MAYBE_UNUSED with UNUSED MAYBE_UNUSED is going away, so let's replace legitimate uses of it with UNUSED, which the former aliased to so far anyway. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Eric Engestrom	e775b938b2	intel: drop incorrect MAYBE_UNUSED All these are actually always used. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Eric Engestrom	14be04fb2b	egl: replace MAYBE_UNUSED with UNUSED MAYBE_UNUSED is going away, so let's replace legitimate uses of it with UNUSED, which the former aliased to so far anyway. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-31 09:41:05 +01:00
Samuel Pitoiset	ea38565011	radv/gfx10: add Wave32 support for compute shaders It can be enabled with RADV_PERFTEST=cswave32. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-31 09:35:04 +02:00
Kenneth Graunke	3a22a8bf49	iris: Skip repeated depth buffer disables. Often times, the depth buffer is entirely disabled, but color render targets change. For example, GenerateMipmaps will change the color render target for each miplevel, but there is no depth buffer. In the Civilization VI benchmark, this drops the median number of 3DSTATE_DEPTH_BUFFER etc. packets emitted per frame from 472 to 34.	2019-07-30 19:47:41 -07:00
Marek Olšák	665989d98b	radeonsi: release NIR in the right place to fix crashes	2019-07-30 22:06:23 -04:00
Marek Olšák	9ac7d0a0e2	radeonsi: fix packing of key.mono.u.ps	2019-07-30 22:06:23 -04:00
Marek Olšák	033c39a660	ac/nir: fix incorrect Phis if callbacks use control flow inside control flow	2019-07-30 22:06:23 -04:00
Marek Olšák	d3c80733cd	ac/nir: handle abs modifier	2019-07-30 22:06:23 -04:00
Marek Olšák	efe2d8c5f9	ac: fix a memory leak in the error path of ac_build_type_name_for_intr	2019-07-30 22:06:23 -04:00
Marek Olšák	f6eca14f1b	ac: allow control flow statements in NIR callbacks This fixes a crash when compiling geometry shaders on radeonsi.	2019-07-30 22:06:23 -04:00
Marek Olšák	bfea7e4d29	ac/nir: handle negate modifier	2019-07-30 22:06:23 -04:00
Marek Olšák	33a8eab7a9	radeonsi: don't use lp_build_if for the prim discard compute shader	2019-07-30 22:06:23 -04:00
Marek Olšák	5562b6b067	radeonsi: don't use lp_build_if for the wrapping if block in the VS prolog	2019-07-30 22:06:23 -04:00
Marek Olšák	0ef4c1c04d	radeonsi: don't use lp_build_if for the wrapping if block in merged shaders	2019-07-30 22:06:23 -04:00
Marek Olšák	6ec7d603f5	radeonsi: don't use lp_build_if (in most common places)	2019-07-30 22:06:23 -04:00
Marek Olšák	3406a57ff3	radeonsi: don't use lp_build_alloca	2019-07-30 22:06:23 -04:00
Marek Olšák	9234275320	radeonsi/nir: implement FBFETCH for KHR_blend_equation_advanced	2019-07-30 22:06:23 -04:00
Marek Olšák	925161c84c	radeonsi/nir: set input_interpolate_loc for color inputs Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-07-30 22:06:23 -04:00
Marek Olšák	5787bbf90d	radeonsi/nir: set tgsi_shader_info::num_memory_instructions	2019-07-30 22:06:23 -04:00
Marek Olšák	0993dbcbef	radeonsi/nir: accurately set input_usage_mask for doubles (v2) v2: fix doubles Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-30 22:06:23 -04:00
Marek Olšák	56e3c70b56	radeonsi/nir: accurately set output_usagemask (v2) v2: fix doubles	2019-07-30 22:06:23 -04:00
Marek Olšák	37527f8a11	radeonsi/nir: accurately set reads_*_outputs for TCS	2019-07-30 22:06:23 -04:00
Marek Olšák	6697e42c3c	radeonsi/nir: clean up gather_intrinsic_load_deref_input_info	2019-07-30 22:06:23 -04:00
Marek Olšák	5f16fdefdf	radeonsi/nir: add an option to convert TGSI to NIR Use at your own risk. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-30 22:06:23 -04:00
Marek Olšák	eb43559bb8	radeonsi/nir: clean up some nir_scan_shader code Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-30 22:06:23 -04:00
Marek Olšák	34dc6ed2a5	radeonsi/gfx10: disable DCC image stores Uncompressed image stores are usually faster. Also, the driver didn't set WRITE_COMPRESS_ENABLE, so I don't know what the hw did for image stores.	2019-07-30 22:06:23 -04:00
Marek Olšák	17021efc74	radeonsi: adjust RB+ blend optimization settings based on PAL	2019-07-30 22:06:23 -04:00
Marek Olšák	27ac9a3326	ac/surface: allow linear swizzle mode automatic selection on gfx9 & 10 let addrlib make the decision to get the same result as PAL.	2019-07-30 22:06:23 -04:00
Pierre-Eric Pelloux-Prayer	a0ac0e2653	mesa: add EXT_dsa indexed generic queries Only GetPointerIndexedvEXT needs an implementation, the other functions are aliases of existing functions.	2019-07-30 22:04:26 -04:00
Pierre-Eric Pelloux-Prayer	ef84d93f3d	mesa: add EXT_dsa indexed texture commands functions Added functions: - EnableClientStateIndexedEXT - DisableClientStateIndexedEXT - EnableClientStateiEXT - DisableClientStateiEXT Implemented using the idiom provided by the spec: if (array == TEXTURE_COORD_ARRAY) { int savedClientActiveTexture; GetIntegerv(CLIENT_ACTIVE_TEXTURE, &savedClientActiveTexture); ClientActiveTexture(TEXTURE0+index); XXX(array); ClientActiveTexture(savedActiveTexture); } else { // Invalid enum }	2019-07-30 22:04:26 -04:00
Pierre-Eric Pelloux-Prayer	7534c536ca	mesa: add EXT_dsa (Named)Framebuffer functions These functions dont support display list as specified: Should the selector-free versions of various OpenGL 3.0 and EXT_framebuffer_object framebuffer object commands not be allowed in display lists [...]? RESOLVED: Yes	2019-07-30 22:04:26 -04:00
Pierre-Eric Pelloux-Prayer	e26c6764f2	mesa: add EXT_dsa NamedBuffer functions	2019-07-30 22:04:26 -04:00
Jason Ekstrand	9265e9d11a	i965/curbe: Look at SYSTEM_VALUE_FRAG_COORD instead of VARYING_SLOT_POS When transitioning gl_FragCoord over to a system value, we missed one instance of VARYING_SLOT_POS in i965. As of this commit, i965 has no references to VARYING_SLOT_POS. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111263 Fixes: `4bb6e6817e` "intel: Use a system value for gl_FragCoord" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-30 19:21:09 -05:00
Jason Ekstrand	8fd2f2c276	intel/fs: Implement quad_swap_horizontal with a swizzle on gen7 This fixes dEQP-VK.subgroups.quad.compute.subgroupquadswaphorizontal_* on all gen7 platforms. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-30 22:38:19 +00:00
Jason Ekstrand	499d760c6e	intel/fs: Use ALIGN16 instructions for all derivatives on gen <= 7 The issue here was discovered by a set of Vulkan CTS tests: dEQP-VK.glsl.derivate..dynamic_ These tests use ballot ops to construct a branch condition that takes the same path for each 2x2 quad but may not be uniform across the whole subgroup. They then tests that derivatives work and give the correct value even when executed inside such a branch. Because the derivative isn't executed in uniform control-flow and the values coming into the derivative aren't smooth (or worse, linear), they nicely catch bugs that aren't uncovered by simpler derivative tests. Unfortunately, these tests require Vulkan and the equivalent GL test would require the GL_ARB_shader_ballot extension which requires int64. Because the requirements for these tests are so high, it's not easy to test on older hardware and the bug is only proven to exist on gen7; gen4-6 are a conjecture. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-30 22:38:19 +00:00
Eric Engestrom	bf8b5de6b9	scons+meson: suppress spammy build warning on MacOS Originally introduced in `c7f3657450` ("darwin: Suppress type conversion warnings for GLhandleARB") to fix Bugzilla #66346 [1], this workaround was never ported to Scons or Meson. [1] https://bugs.freedesktop.org/66346 Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2019-07-30 23:21:42 +01:00
Matt Turner	46a3ea06be	i965/fs: Print the scheduler mode. Line wrap some awfully long lines while we are here. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-07-30 14:35:43 -07:00
Matt Turner	dabb5d4bee	i965/fs: Add a shader_stats struct. It'll grow further, and we'd like to avoid adding an additional parameter to fs_generator() for each new piece of data. v2 (idr): Rebase on 17 months. Track a visitor instead of a cfg. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-30 14:35:43 -07:00
Connor Abbott	11a49f289d	lima/gp: Support exp2 and log2 log2 is tricky because there cannot be a move between complex1 and postlog2. We can't guarantee that scheduling complex1 will succeed when we schedule postlog2, so we try to schedule complex1 and if it fails we back out by rewriting the postlog2 as a move and introducing a new postlog2 so that we can try again later. Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Acked-by: Qiang Yu <yuq825@gmail.com>	2019-07-30 23:01:15 +02:00
Connor Abbott	c2f48d8f32	lima/gpir: Always schedule complex2 and _impl right after complex1 See https://gitlab.freedesktop.org/lima/mesa/issues/94 for the gory details of why this is needed. For _impl this is easy, since it never increases register pressure and it goes in the complex slot hence it never counts against max nodes. It's a bit more challenging for complex2, since it does count against max nodes, so we need to change the reservation logic to reserve an extra slot for complex2 when scheduling complex1. This second part isn't strictly necessary yet, but it will be for exp2. Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Acked-by: Qiang Yu <yuq825@gmail.com>	2019-07-30 23:00:41 +02:00
Bas Nieuwenhuizen	2b53c49d2f	radv: Fix descriptor set allocation failure. Set all the handles to VK_NULL_HANDLE: "If the creation of any of those descriptor sets fails, then the implementation must destroy all successfully created descriptor set objects from this command, set all entries of the pDescriptorSets array to VK_NULL_HANDLE and return the error." (Vulkan 1.1.117 Spec, section 13.2) CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-30 22:33:24 +02:00
Andres Rodriguez	2b71b4e793	radv: fix queries with WAIT_BIT returning VK_NOT_READY When vkGetQueryPoolResults() is called with VK_QUERY_RESULT_WAIT_BIT set, the driver is supposed to wait for the query to become available before returning. Currently, radv returns once the query is indeed ready, but it returns VK_NOT_READY. It also fails to populate the results. The problem is a missing volatile in the secondary check for query availability. This patch removes the secondary check altogether since it is redundant with the preceding loop. This bug was found with an unreleased version of SteamVR. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-27 10:19:19 -04:00
Matt Turner	c9b86cf526	meson: Test for program_invocation_name program_invocation_name and program_invocation_short_name are both GNU extensions. I don't believe one can exist without the other, so only check for program_invocation_name. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-07-30 11:49:09 -07:00
Matt Turner	597bddad47	scons: Test for random_r() Suggested-by: Eric Engestrom <eric.engestrom@intel.com>	2019-07-30 11:49:09 -07:00
Matt Turner	c96407f37e	meson: Test for random_r() It's better to test for needed functions instead of using external knowledge about presence in this or that C library. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-07-30 11:49:09 -07:00
Matt Turner	9cc4311d86	st/nine: Drop preprocessor guards for glibc-2.12 Same rationale as the previous patch, but additionally these checks just seem entirely unnecessary. pthread_self() has been used in Mesa since at least 1999. Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-07-30 11:49:09 -07:00
Matt Turner	9c411e020d	util: Drop preprocessor guards for glibc-2.12 glibc-2.12 was released in 2010. No one is building new Mesa against 9 year old glibc, and removing these checks allows the code to work on other C libraries like musl. Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-07-30 11:49:09 -07:00
Alyssa Rosenzweig	a3c59f9f00	pan/midgard: Nothing to see here, move along folks Fixes: `dee1e18fe4` ("pan/midgard: Cleanup ops table") Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-30 10:49:13 -07:00
Lionel Landwerlin	7deb5ec0e8	spirv: don't discard access set by vtn_pointer_dereference We can have a access flag already set here so just augment the existing ones. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `0fb61dfdeb` ("spirv: propagate access qualifiers through ssa & pointer") Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-30 17:43:59 +00:00
Sagar Ghuge	587a497529	iris: Enable EXT_texture_shadow_lod Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-30 10:42:20 -07:00
Sagar Ghuge	adb9e18348	gallium: Add PIPE_CAP_TEXTURE_SHADOW_LOD v2: Line wrap to 80 char (Marek Olsak) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-30 10:42:20 -07:00
Sagar Ghuge	6e04bd5f13	i965: Enable EXT_texture_shadow_lod Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-30 10:42:20 -07:00
Paulo Zanoni	25b03526c4	glsl: Add builtin functions for EXT_texture_shadow_lod With the help of Sagar, Ian and Ivan. v2: Fix dependencies (Ian Romanick) v3: 1) fix function name (Marek Olsak) 2) Add check for extension enable (Marek Olsak) Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-30 10:42:20 -07:00
Paulo Zanoni	154c789ad5	glsl: Allow _textureCubeArrayShadow function to accept ir_texture_opcode This will be used to support one of the function from Ext_texture_shadow_lod specification. With the help of Sagar, Ian and Ivan. Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-30 10:42:20 -07:00
Paulo Zanoni	d80a74fb99	mesa: extension boilerplate for EXT_texture_shadow_lod With the help of Sagar, Ian and Ivan. Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-30 10:42:20 -07:00
Alyssa Rosenzweig	dee1e18fe4	pan/midgard: Cleanup ops table Hopefully this should make a few ops make more sense. No functional changes. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-30 10:35:22 -07:00
Alyssa Rosenzweig	834aeb1e52	pan/midgard: Extend copy-propagation to swizzles We can compose them when we rewrite, which is.. more code.. but helps. total instructions in shared programs: 3611 -> 3513 (-2.71%) instructions in affected programs: 672 -> 574 (-14.58%) helped: 11 HURT: 2 helped stats (abs) min: 2 max: 14 x̄: 9.09 x̃: 10 helped stats (rel) min: 5.71% max: 24.56% x̄: 17.99% x̃: 18.87% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 1.19% max: 2.08% x̄: 1.64% x̃: 1.64% 95% mean confidence interval for instructions value: -10.45 -4.62 95% mean confidence interval for instructions %-change: -20.07% -9.87% Instructions are helped. total bundles in shared programs: 2117 -> 2067 (-2.36%) bundles in affected programs: 356 -> 306 (-14.04%) helped: 11 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 4.55 x̃: 5 helped stats (rel) min: 4.55% max: 15.22% x̄: 13.63% x̃: 14.71% 95% mean confidence interval for bundles value: -5.64 -3.45 95% mean confidence interval for bundles %-change: -15.71% -11.55% Bundles are helped. total quadwords in shared programs: 3567 -> 3468 (-2.78%) quadwords in affected programs: 695 -> 596 (-14.24%) helped: 11 HURT: 1 helped stats (abs) min: 2 max: 14 x̄: 9.09 x̃: 10 helped stats (rel) min: 5.56% max: 21.88% x̄: 14.97% x̃: 15.15% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 2.38% max: 2.38% x̄: 2.38% x̃: 2.38% 95% mean confidence interval for quadwords value: -10.96 -5.54 95% mean confidence interval for quadwords %-change: -17.42% -9.63% Quadwords are helped. total registers in shared programs: 391 -> 383 (-2.05%) registers in affected programs: 46 -> 38 (-17.39%) helped: 9 HURT: 1 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 25.00% max: 25.00% x̄: 25.00% x̃: 25.00% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 10.00% max: 10.00% x̄: 10.00% x̃: 10.00% 95% mean confidence interval for registers value: -1.25 -0.35 95% mean confidence interval for registers %-change: -29.42% -13.58% Registers are helped. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-30 10:35:10 -07:00
Alyssa Rosenzweig	c45487b770	pan/midgard: Extract simple source mod check Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-30 10:35:09 -07:00
Alyssa Rosenzweig	2d2abb08d0	pan/midgard: Lower texr/texw mixed registers Conceptually, r28-r29 (as used for reading) and r28-r29 (as used for writing) aren't registers at all, merely push/pull arrangements. So you can't feed a texture result back into itself without explicitly moving in the middle. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-30 10:01:20 -07:00
Alyssa Rosenzweig	2b248af43e	pan/midgard: Always set .cont for derivatives in loops We need to keep the helper invocations alive. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-30 10:01:19 -07:00
Alyssa Rosenzweig	8f887329c0	pan/midgard: Implement derivatives Implement the fdd* and fdd* opcodes in the Midgard compiler. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-30 10:01:19 -07:00
Alyssa Rosenzweig	982134d22e	pan/midgard: Compose original texture swizzle in RA Used for lowering derivatives. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-30 10:01:19 -07:00
Alyssa Rosenzweig	79875a9a64	pan/midgard: Add new swizzles Used for derivatives. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-30 10:01:19 -07:00
Alyssa Rosenzweig	81e7782e30	pan/midgard: Add OP_IS_DERIVATIVE helper Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-30 10:01:19 -07:00
Alyssa Rosenzweig	ae6aea0d98	pan/midgard: Add make_compiler_temp_reg helper Corrollary to make_compiler_temp (for SSA). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-30 10:01:19 -07:00
Alyssa Rosenzweig	30b15a830a	pan/midgard: Move nir_*_src_index to compiler.h These helpers are useful for code emission everywhere. Share the love! Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-30 10:01:19 -07:00
Alyssa Rosenzweig	c9498b3c5e	pan/midgard: Disassemble unknown texture ops as hex I'm not sure why I ever thought decimal was a good idea. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-30 10:01:19 -07:00
Alyssa Rosenzweig	0714481894	pan/midgard: Add support for disassembling derivatives They're just texture ops. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-30 10:01:19 -07:00
Connor Abbott	a094928abc	nir/find_array_copies: Use correct parent array length instr->type is the type of the array element, not the type of the array being dereferenced. Rather than fishing out the parent type, just use parent->num_children which should be the length plus 1. While we're here add another assert for the issue fixed by the previous commit. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111251 Fixes: `156306e5e6` ("nir/find_array_copies: Handle wildcards and overlapping copies") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-30 17:14:33 +02:00
Connor Abbott	7788992bc6	nir: Fix comparison for nir_deref_instr_is_known_out_of_bounds() There was an off-by-one error. Fixes: `156306e5e6` ("nir/find_array_copies: Handle wildcards and overlapping copies") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-30 17:14:28 +02:00
Samuel Pitoiset	9d7ead6f9b	radv/gfx10: only compile the GS copy shader on-demand Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-30 16:51:30 +02:00
Michel Dänzer	5229f27f06	gitlab-ci: Fix scons build directory path Fixes: `dd3d0b2897` "gitlab-ci: Only keep the build logs as artifacts." Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-07-30 16:18:50 +02:00
Jan Zielinski	4d2890e8f7	swr/rasterizer: Add memory tracking support Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-07-30 15:58:36 +02:00
Jan Zielinski	5dd9ad1570	swr/rasterizer: Better implementation of scatter Added support for avx512 scatter instruction. Non-avx512 will now call into a C function to do the scatter emulation. This has better jit compile performance than the previous approach of jitting scalar loops. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-07-30 13:39:19 +00:00
Jan Zielinski	ad9aff5528	swr/rasterizer: cleanups for tessellation This commit introduces small fixes in preparation for tessellation support. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-07-30 13:39:18 +00:00
Jan Zielinski	c5c05979f7	rasterizer/swr: move BucketMgr to SwrContext This move gets us back to parity with global manager in that we can dump render context buckets now. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-07-30 13:39:18 +00:00
Alejandro Piñeiro	cda4c62893	v3d: take into account separate_stencil when checking if stencil should be cleared In most cases this is not needed because the usual is that when a separate stencil is written, the parent resource is also written. This is needed if we have a separate stencil, no depth buffer, and the source and destination is the same, as in that case the stencil can be updated, but not the parent source (like if you are blitting only the stencil buffer). On that situation, the following access to the stencil buffer would clear the stencil buffer (so overwritting the previous blitting) cleared because the parent source has v3d_resource.writes to 0. As far as I see, that situation only happens with the GL_DEPTH32F_STENCIL8 format. Note that one alternative would consider that if the separate_stencil has been written, the parent should also be considered written (and update its "writes" field accordingly). But I found this patch more natural. Fixes the following piglit tests: spec/arb_depth_buffer_float/fbo-stencil-gl_depth32f_stencil8-blit spec/arb_depth_buffer_float/fbo-stencil-gl_depth32f_stencil8-copypixels the latter regressed when internally glCopyPixels implementation started to use blitting. So: Fixes: `131d40cfc9` ("st/mesa: accelerate glCopyPixels(STENCIL)") Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-30 12:05:23 +02:00
Daniel Schürmann	45638e14fb	radv: Don't include radv_private.h from radv_shader.h This patch decouples radv_shader.h from any LLVM dependency. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-30 10:29:11 +02:00
Rafael Antognolli	f27908152b	i965/gen10: Remove unnecessary workaround. In fact, the description of the workaround states that the mask field doesn't work correctly on gen10, and we need to set it to 0xffff even we we only want to update a single field: "The mask bits are not implemented properly on 3DSTATE_3D_MODE. Driver must always program bits 31:16 of DW1 a value of 0xFFFF. This means if it is only updating 1 field, it must update all the fields to the correct value." So unless we want to change any of the fields of 3DSTATE_3D_MODE, there's not need to emit. Additionally, it seems this workaround is not required on gen11. And last but not least, this workaround is not implemented on iris or anv, and it doesn't seem to be missed there. So let's just remove the whole thing. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-29 16:54:17 -07:00
Kenneth Graunke	44e713eddb	iris: Fix SO offset to be 32-bit in DrawTransformFeedback handling We accidentally started copying a full 64-bit value rather than copying a 32-bit offset and zeroing the top 32-bits. This caused us to compute bogus vertex counts which could lead to GPU hangs in some cases. Thanks to Clayton Craft for catching the regressions! Fixes: `0e24d10ff5` ("iris: Use gen_mi_builder to handle CS ALU operations.")	2019-07-29 16:38:19 -07:00
Jason Ekstrand	4bb6e6817e	intel: Use a system value for gl_FragCoord It's kind-of an anomaly that the Intel drivers are still treating gl_FragCoord as an input. It also makes zero sense because we have to special-case it in the back-end. Because ANV is the only user of nir_lower_wpos_center, we go ahead and just update it to look for nir_intrinsic_load_frag_coord as part of this patch. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-29 23:30:26 +00:00
Jason Ekstrand	44268b1c72	glsl: Treat gl_FragCoord as a varying even when it's a system value This fixes glsl-fcoord-invariant-pass.shader_test on drivers that set GLSLFragCoordIsSysVal which includes radeonsi among others. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-29 23:30:26 +00:00
Jason Ekstrand	169d896df2	mesa/spirv: Set frag_coord_is_sysval to GLSLFragCoordIsSysVal Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-29 23:30:26 +00:00
Jason Ekstrand	e401303597	intel/fs: Remove calculate_urb_setup from fs_visitor Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-29 23:30:26 +00:00
Rob Clark	010d255656	freedreno/a6xx: fix MSAA resolve hangs Seems like RB_BLIT_SCISSOR needs to be aligned to (minimum?) tile size. Fixes intermittent GPU hangs triggered by some of the three.js samples on https://threejs.org/ Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-07-29 15:15:31 -07:00
Rob Clark	73cc2dc084	freedreno/ir3: fix for array/reg store vs meta instructions fishgl.com has a shader which does roughly: foo = texture(...); if (bar) foo = texture(...); after lowering phi webs to regs we end up w/ a vec4 reg (array). But since it was not an indirect access, we try to skip the extra mov. This results that the per-component fanout (split) meta instructions store directly to the reg (array). Which doesn't work out in RA. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-07-29 15:15:31 -07:00
Eric Engestrom	f7b6a8d12f	meson: bump required version to 0.46 0.45 has a few annoying bugs (like the one in !358 [1]), and 0.46 is well over a year old by now, so let's move to it. [1] https://gitlab.freedesktop.org/mesa/mesa/merge_requests/358 Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-07-29 23:07:30 +01:00
Leo Liu	8d7f2e2221	radeon/vcn/vp9: add Arcturus VP9 support Arcturus CHIP enum is less than Navi10, since it's still gfx9, but its VCN version belongs to VCN2.x Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-29 17:52:58 -04:00
Leo Liu	a439863918	radeon/vcn: add Arcturus decode support different internal registers offset from previous HW Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-29 17:52:56 -04:00
Marek Olšák	7708540363	amd: add support for Arcturus Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-29 17:52:54 -04:00
Marek Olšák	417ab8ef6b	radeonsi: add AMD_DEBUG=nogfx for testing Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-29 17:52:53 -04:00
Marek Olšák	19d04191c4	radeonsi: add support for compute-only chips Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-29 17:52:51 -04:00
Sonny Jiang	c82f338855	gallium/auxiliary/vl: add compute shaders for deint yuv Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-29 17:52:49 -04:00
Sonny Jiang	ef77a92bca	gallium/auxiliary/vl: don't call gfx functions on compute-only chips Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-29 17:52:46 -04:00
James Zhu	b618b65c98	gallium/auxiliary/vl: add PIPE_CAP_GRAPHICS check for vl compositor Init graphic shader Only when PIPE_CAP_GRAPHICS is true. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-29 17:52:42 -04:00
Marek Olšák	187cc07d05	gallium: create multimedia contexts as compute-only if graphics is unsupported Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-29 17:52:41 -04:00
Marek Olšák	ea7646dc13	gallium: add PIPE_CAP_GRAPHICS Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-29 17:52:39 -04:00
Samuel Pitoiset	372c3dcfdb	radv: implement VK_EXT_index_type_uint8 Natively supported on VI+. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-29 23:36:53 +02:00
Lionel Landwerlin	c6196f7025	anv: implement VK_EXT_index_type_uint8 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-29 21:26:07 +00:00
Lionel Landwerlin	0d3a532a33	vulkan: Bump headers to 1.1.117 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-29 21:26:07 +00:00
Lionel Landwerlin	161b5f00db	include/vulkan: bump vk_android_native_buffer Taken off https://android.googlesource.com/platform/frameworks/native/+/refs/tags/android-9.0.0_r45/vulkan/include/vulkan/vk_android_native_buffer.h Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-29 21:26:07 +00:00
Eric Engestrom	8486dbb066	intel/mi: only resolve to a temp register if source isn't in memory aka. fix a s/\|\|/&&/ typo Fixes: `74063ee61a` ("intel/mi: Add a new gen_mi_store_if() helper.") Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-29 13:35:42 -07:00
Eric Anholt	5596038e2f	gitlab-ci: Enable freedreno shader-db runs. Now that helgrind is less upset and I've completed many successful full shader-db runs, we should be able to enable freedreno shader-db runs for Mesa checkins on the tiny public shader-db. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-07-29 12:52:39 -07:00
Eric Anholt	3c46778b75	nir: Fix helgrind complaints about data race in trivial_swizzle init. Even if the data race wasn't real (I'm not great at reasoning about this), helgrind is a nice enough tool that keeping noise out of it is probably worthwhile. Besides, typing out the numbers keeps the data in the read-only data section instead of emitting code to initialize it every time. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-07-29 12:50:49 -07:00
Eric Anholt	91986fbbdb	freedreno: Fix data race on making the shader's id. The value is only used for IR3_DBG_DISASM, but it cleans up the helgrind output. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-07-29 12:50:49 -07:00
Eric Anholt	6f0521b78c	freedreno: Take a lock around shader variant creation. Shaders are shared across contexts in gallium (part of making it so that you get shader compile at link time, for shader-db and to reduce compiles at draw time). So, we need to protect from variant creation for a shader from multiple threads at the same time. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-07-29 12:50:49 -07:00
Eric Anholt	6e3b220ad3	freedreno: Fix data races with allocating/freeing struct ir3. There is a single ir3_compiler in the screen, and each context may be compiling ir3 shaders, which call ir3_create. ralloc doesn't do any locking on its own, so eventually you can end up racing to break ralloc's linked lists. We really don't want struct ir3 to live as long as the compiler (maybe struct ir3_shader's lifetime, if anything), so you'd better be freeing it anyway. Fixes: `8fe2076243` ("freedreno/ir3: convert over to ralloc") Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-07-29 12:50:49 -07:00
Eric Anholt	65aeeae670	freedreno: Fix helgrind complaint on shader-db key setup. If the variable's going to be static, we shouldn't be memsetting it from every thread and instead just have it in the data section. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-07-29 12:50:49 -07:00
Bas Nieuwenhuizen	aac492901a	radv: Take variable descriptor counts into account for buffer entries. Fixes: `b5e04e9217` "radv: Support allocating variable size descriptor sets." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111019 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-29 20:42:53 +02:00
Jason Ekstrand	99d04a5bd6	anv: Don't claim support for 24 and 48-bit formats on IVB Cc: mesa-stable@lists.freedesktop.org	2019-07-29 11:34:30 -05:00
Jason Ekstrand	7c1b39cf18	isl/formats: R8G8B8_UNORM_SRGB isn't supported on HSW On Haswell, the format works but it doesn't properly do an sRGB decode. It appears to act identically to R8G8B8_UNORM. Only Vulkan uses this format so this only affects Vulkan on HSW. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-07-29 11:34:18 -05:00
Alyssa Rosenzweig	463164b325	pan/midgard: Fix alpha test w.r.t new indexing Fixes: `9beb3391b5` ("pan/midgard: Tag SSA/reg") Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-29 08:31:03 -07:00
Gert Wollny	4ee638cd78	softpipe: Don't draw when rasterizer_discard is set Fixes: dEQP-GLES3.functional.rasterizer_discard.basic.write_depth_points dEQP-GLES3.functional.rasterizer_discard.basic.write_stencil_points dEQP-GLES3.functional.rasterizer_discard.fbo.write_depth_points dEQP-GLES3.functional.rasterizer_discard.fbo.write_stencil_points dEQP-GLES3.functional.rasterizer_discard.scissor.write_depth_points dEQP-GLES3.functional.rasterizer_discard.scissor.write_stencil_points Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-29 15:47:34 +02:00
Gert Wollny	45ac0dfad4	softpipe: Fix cube arrays layer selection To select the correct layer the z-coordinate must be rounded before it is multiplied by six. Fixes a number of tests out of dEQP-GLES31.functional.texture.filtering.cube_array.formats.* Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-29 15:47:34 +02:00
Lionel Landwerlin	6659d11ff0	vulkan/wsi/wayland: implement acquire timeout v2: Eric's nits v3: Reuse timespec utils (Daniel) Deal with ppoll being interrupted by a signal (Daniel) v4: Remove unnecessary time check v5: Deal with EAGAIN from wl_display_prepare_read_queue() (Daniel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v2) Reviewed-by: Daniel Stone <daniels@collabora.com>	2019-07-29 13:11:36 +00:00
Lionel Landwerlin	d2d70c3bb5	util: add a timespec helper Copied from Weston, upon Daniel's suggestion Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Suggested-by: Daniel Stone <daniels@collabora.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2019-07-29 13:11:36 +00:00
Eric Engestrom	ef57fb2350	intel: replace large stack buffer with heap allocation For now, this keeps the "100 bytes" allocation; we can try to figure out the correct size as a follow up. Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-29 13:58:57 +01:00
Samuel Pitoiset	58ee973e87	radv/gfx10: do not use the fast depth or stencil clear bytes path It causes issues on GFX10. This fixes rendering issues with vkmark and Wreckfest at least. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl	2019-07-29 14:47:13 +02:00
Samuel Pitoiset	4aa450193b	ac: do not crash when the buffer data format is invalid This might happen when a pipeline doesn't define the vertex input state, so the buffer data format is 0 (aka INVALID). This fixes crashes when compiling some shaders on GFX10. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-29 13:19:32 +02:00
Rhys Perry	a9f58af454	ac/nir: fix txf_ms with an offset Seems to fix some hair artifacts in Max Payne 3: https://github.com/daniel-schuermann/mesa/issues/76 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Fixes: `f4e499ec79` ('radv: add initial non-conformant radv vulkan driver') Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-29 11:50:13 +01:00
Connor Abbott	a69ab1b7d2	radv: Delete unused local variables in optimization loop Totals from affected shaders: SGPRS: 376 -> 376 (0.00 %) VGPRS: 620 -> 560 (-9.68 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 292 -> 292 (0.00 %) dwords per thread Code Size: 20024 -> 20144 (0.60 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 25 -> 25 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-29 11:37:46 +02:00
Connor Abbott	156306e5e6	nir/find_array_copies: Handle wildcards and overlapping copies This commit rewrites opt_find_array_copies to be able to handle an array copy sequence with other intervening operations in between. In particular, this handles the case where we OpLoad an array of structs and then OpStore it, which generates code like: foo[0].a = bar[0].a foo[0].b = bar[0].b foo[1].a = bar[1].a foo[1].b = bar[1].b ... that wasn't recognized by the previous pass. In order to correctly handle copying arrays of arrays, and in particular to correctly handle copies involving wildcards, we need to use a tree structure similar to lower_vars_to_ssa so that we can walk all the partial array copies invalidated by a particular write, including ones where one of the common indices is a wildcard. I actually think that when factoring in the needed hashing/comparing code, a hash table based approach wouldn't be a lot smaller anyways. All of the changes come from tessellation control shaders in Strange Brigade, where we're able to remove the DXVK-inserted copy at the beginning of the shader. These are the result for radv: Totals from affected shaders: SGPRS: 4576 -> 4576 (0.00 %) VGPRS: 13784 -> 5560 (-59.66 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 8696 -> 6876 (-20.93 %) dwords per thread Code Size: 329940 -> 263268 (-20.21 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 330 -> 898 (172.12 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-29 11:36:25 +02:00
Connor Abbott	c6543efe7a	nir: Print array deref indices as decimal We print the size as decimal too, and using hex without a leading "0x" was very confusing. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-29 11:36:19 +02:00
Connor Abbott	6fc7384fd4	lima/gpir/sched: Handle more special ops in can_use_complex() We were missing handling for a few other ops that rearrange their sources somehow in codegen, namely complex2 and select. This should fix spec@glsl-1.10@execution@built-in-functions@vs-asin-vec3 and possibly other random regressions from the new scheduler which were supposed to be fixed in the commit right after. Fixes: `54434fe670` ("lima/gpir: Rework the scheduler") Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Acked-by: Qiang Yu <yuq825@gmail.com>	2019-07-28 23:38:31 +02:00
Connor Abbott	af95f80a24	lima/gp: Clean up lima_program_optimize_vs_nir() a little Remove an unnecessary nir_lower_regs_to_ssa as that should be done by the state tracker, and add a missing DCE pass after running copy propagation in order to remove the dead copies. This shouldn't fix anything but the second part will reduce shader sizes. Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-07-28 23:38:31 +02:00
Connor Abbott	d26d8c5617	lima/gpir/sched: Don't try to spill when something else has succeeded In try_node(), we assume that the node we pick can still be scheduled successfully after speculatively trying all the other nodes. Normally we always undo every node after speculating it, so that when we finally schedule best_node the scheduler state is exactly the same and it succeeds. However, we also try to spill nodes, which can change the state and in a corner case that can make scheduling best_node fail. In particular, the following sequence of events happened with piglit shaders@glsl-vs-if-nested: a partially-ready node N was spilled and a register store node S, which is a use of N, was created and then later the other uses of N were scheduled, so that S is now ready and N is partially ready. First we try to schedule S and succeed, then we try to schedule another node M, which fails, so we try to spill the remaining uses of N. This succeeds, but scheduling M still fails so that best_node is still S. However since one of the uses of N is one cycle ago, and therefore we inserted a read dependent on S one cycle ago when spilling N, S can no longer be scheduled as read-after-write latency is three cycles. While we could ad-hoc try to catch cases like this, or (the best option but very complicated) treat the spill as speculative and roll it back if we decide not to schedule the node, a simpler solution is to just give up on spilling if we've already successfully speculatively scheduled another node. We'd give up a few cases where we discover that by spilling even harder we could schedule a more desirable node, but that seems like it would be pretty rare in practice. With this we guarantee that nothing has been touched after best_node was successfully scheduled. We also cut down on pointless spilling, since if we already scheduled a node it's unlikely that spilling harder will let us schedule an even better node, and hence any spilling at this point is probably useless. While we're here, clean up the code around spilling by flattening the two if's and getting rid of the second unnecessary check for INT_MIN. Fixes: `54434fe670` ("lima/gpir: Rework the scheduler") Acked-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Connor Abbott <cwabbott0@gmail.com>	2019-07-28 23:38:31 +02:00
Ilia Mirkin	de17922b8a	nv50/ir: don't consider the main compute function as taking arguments With OpenCL, kernels can take arguments and return values (?). However in practice, there is no more TGSI compute implementation, and even if there were, it would probably have named functions and no explicit main. This improves RA considerably for compute shaders, since temps are not kept around as return values. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-07-27 18:24:11 -04:00
Ilia Mirkin	3e468ff2fe	nv50/ir: handle insn not being there for definition of CVT arg This can happen if it's e.g. a uniform or a function argument. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111217 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Karol Herbst <kherbst@redhat.com> Cc: mesa-stable@lists.freedesktop.org	2019-07-27 18:24:11 -04:00
Ilia Mirkin	23dfff0669	nouveau: flip DEBUG -> !NDEBUG The meson conversion chose to change the meaning of DEBUG to "used for debugging" to be "used for expensive things for debugging", primarily for nir_validate. Flip things over so that we get nice things with optimizations enabled. While we're at it, also kill off nouveau_statebuf.h which is unused (and has a mention of DEBUG which is how I found it). Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-07-27 18:24:11 -04:00
Ilia Mirkin	9f8ed5aa67	nvc0: allow a non-user buffer to be bound at position 0 Previously the code only handled it for positions 1 and up (as would be for UBO's in GL). It's not a lot of trouble to handle this, and vl or vdpau want this. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111213 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Karol Herbst <kherbst@redhat.com> Cc: mesa-stable@lists.freedesktop.org	2019-07-27 18:24:11 -04:00
Ilia Mirkin	c52b057e00	nv50,nvc0: update sampler/view bind functions to accept NULL array Apparently vl (or vdpau) wants to pass that in now. Handle it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111213 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Karol Herbst <kherbst@redhat.com> Cc: mesa-stable@lists.freedesktop.org	2019-07-27 18:24:11 -04:00
Ilia Mirkin	face27fdc5	gallium/vl: fix compute tgsi shaders to not process undefined components This caused nouveau's function handling logic to think that the MAIN function was due to receive external parameters, and cascaded some failures after that. Instead avoid having the undefined components in the first place. Fixes: `f6ac0b5d71` (gallium/auxiliary/vl: Add compute shader to support video compositor render) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111213 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111217 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-27 18:24:11 -04:00
Alyssa Rosenzweig	159abd527e	pan/midgard: Introduce invert field This will enable us to fuse inverts in various ways. Marginal hurt: total instructions in shared programs: 3610 -> 3611 (0.03%) instructions in affected programs: 67 -> 68 (1.49%) helped: 0 HURT: 1 Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-26 13:38:41 -07:00
Alyssa Rosenzweig	9beb3391b5	pan/midgard: Tag SSA/reg Rather than putting registers after SSA in the MIR indexing, put them side-by-side, shifted 1, using the bottom bit as the SSA/reg select. This will allow us to generate SSA temps in the compiler. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-26 13:38:41 -07:00
Boyuan Zhang	b0626c1f30	radeon/vcn: enable rate control for hevc encoding Set cu_qp_delta_enable_flag on when rate control is enabled, and set it off when rate control is disabled (e.g. constant qp). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110673 Cc: mesa-stable@lists.freedesktop.org V2: fix typo and add bugzilla info Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com>	2019-07-26 14:33:09 -04:00
Boyuan Zhang	5115c25bb8	radeon/uvd: enable rate control for hevc encoding Set cu_qp_delta_enable_flag on when rate control is enabled, and set it off when rate control is disabled (e.g. constant qp). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110673 Cc: mesa-stable@lists.freedesktop.org V2: fix typo and add bugzilla info Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com>	2019-07-26 14:33:09 -04:00
Boyuan Zhang	9aaf3aaf5d	radeon/vcn: fix poc for hevc encode MaxPicOrderCntLsb should be at least 16 according to the spec, therefore add minimum value check. Also use poc value passed from st instead of calculation in slice header encoding. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110673 Cc: mesa-stable@lists.freedesktop.org V2: Fix typo V3: Use MAX2 macro instead of coding. Also MaxPicOrderCntLsb should be power of 2 according to spec. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com>	2019-07-26 14:33:09 -04:00
Boyuan Zhang	77cf700fa3	radeon/uvd: fix poc for hevc encode MaxPicOrderCntLsb should be at least 16 according to the spec, therefore add minimum value check. Also use poc value passed from st instead of calculation in slice header encoding. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110673 Cc: mesa-stable@lists.freedesktop.org V2: Fix typo V3: Use MAX2 macro instead of coding. Also MaxPicOrderCntLsb should be power of 2 according to spec. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com>	2019-07-26 14:33:09 -04:00
Sagar Ghuge	d5992ab134	nir: Optimize umod lowering We don't have calculate final quotient in order to calculate unsigned modulo result. Once we are done with error correction we have partial result which can be used to find out modulo operation result Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-26 11:19:23 -07:00
Alyssa Rosenzweig	f8c71d7632	pan/midgard: Improve scheduling Make scalar scheduling onto vector units more aggressive (it can only help while we schedule strictly in order). Also, allow imov on VLUT. total bundles in shared programs: 2176 -> 2117 (-2.71%) bundles in affected programs: 901 -> 842 (-6.55%) helped: 24 HURT: 0 helped stats (abs) min: 1 max: 18 x̄: 2.46 x̃: 2 helped stats (rel) min: 2.08% max: 20.00% x̄: 8.68% x̃: 5.94% 95% mean confidence interval for bundles value: -3.93 -0.99 95% mean confidence interval for bundles %-change: -10.92% -6.45% Bundles are helped. total quadwords in shared programs: 3605 -> 3566 (-1.08%) quadwords in affected programs: 1984 -> 1945 (-1.97%) helped: 28 HURT: 5 helped stats (abs) min: 1 max: 3 x̄: 1.68 x̃: 2 helped stats (rel) min: 1.02% max: 14.29% x̄: 5.12% x̃: 2.94% HURT stats (abs) min: 1 max: 3 x̄: 1.60 x̃: 1 HURT stats (rel) min: 0.57% max: 9.09% x̄: 6.40% x̃: 9.09% 95% mean confidence interval for quadwords value: -1.67 -0.69 95% mean confidence interval for quadwords %-change: -5.37% -1.37% Quadwords are helped. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-26 10:28:46 -07:00
Alyssa Rosenzweig	94e281b9e0	pan/midgard: Specialize mod checking by type when checking constants Fixes inlining of integer constants. total quadwords in shared programs: 3585 -> 3568 (-0.47%) quadwords in affected programs: 625 -> 608 (-2.72%) helped: 13 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.31 x̃: 1 helped stats (rel) min: 1.27% max: 9.52% x̄: 3.84% x̃: 2.94% 95% mean confidence interval for quadwords value: -1.60 -1.02 95% mean confidence interval for quadwords %-change: -5.60% -2.07% Quadwords are helped. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-26 09:47:40 -07:00
Alyssa Rosenzweig	e823d33e77	pan/midgard: Use more aggressive writeout criteria We loosen the requirement of "no dependencies" to simply be "no non-pipelined dependencies", so we check for what could be pipelined. total bundles in shared programs: 2176 -> 2156 (-0.92%) bundles in affected programs: 779 -> 759 (-2.57%) helped: 20 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.33% max: 20.00% x̄: 6.47% x̃: 2.78% 95% mean confidence interval for bundles value: -1.00 -1.00 95% mean confidence interval for bundles %-change: -9.44% -3.50% Bundles are helped. total quadwords in shared programs: 3605 -> 3585 (-0.55%) quadwords in affected programs: 1391 -> 1371 (-1.44%) helped: 20 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.19% max: 14.29% x̄: 3.84% x̃: 1.64% 95% mean confidence interval for quadwords value: -1.00 -1.00 95% mean confidence interval for quadwords %-change: -5.73% -1.94% Quadwords are helped. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-26 09:47:40 -07:00
Alyssa Rosenzweig	c7fc5f3567	pan/midgard: Pipeline non-SSA registers Rather than bailing if we see something that's not SSA, do out the analysis to check if we can pipeline and do so if we can. total registers in shared programs: 392 -> 391 (-0.26%) registers in affected programs: 3 -> 2 (-33.33%) helped: 1 HURT: 0 Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-26 09:40:10 -07:00
Alyssa Rosenzweig	79f0896491	pan/midgard: Add mir_mask_of_read_components helper This facilitates analysis of vec4 registers (after going out-of-SSA). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-26 09:37:28 -07:00
Alyssa Rosenzweig	481447cb00	pan/midgard: Add mir_is_written_before helper Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-26 09:20:52 -07:00
Alyssa Rosenzweig	95732cc9ef	pan/midgard: Obey fragment writeout criteria Rather than always emitting an extra move for fragments, check the actual criteria and emit accordingly. (This was lost during the RA improvements at the end of May). total bundles in shared programs: 2210 -> 2176 (-1.54%) bundles in affected programs: 501 -> 467 (-6.79%) helped: 34 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 1.59% max: 33.33% x̄: 13.13% x̃: 12.50% 95% mean confidence interval for bundles value: -1.00 -1.00 95% mean confidence interval for bundles %-change: -16.06% -10.21% Bundles are helped. total quadwords in shared programs: 3639 -> 3605 (-0.93%) quadwords in affected programs: 795 -> 761 (-4.28%) helped: 34 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.96% max: 33.33% x̄: 11.22% x̃: 8.33% 95% mean confidence interval for quadwords value: -1.00 -1.00 95% mean confidence interval for quadwords %-change: -14.31% -8.13% Quadwords are helped. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-26 08:37:09 -07:00
Alyssa Rosenzweig	20771ede1c	pan/midgard: Add post-RA move elimination Think of this pass as register coalescing part 2. After RA runs, but before scheduling, we scan for code of the form: mov rN, rN and delete the move, since it's totally redundant. This pass helps already, but it'd of course be much more effective paired with register coalescing to encourage moves in general to end up in this form. Nevertheless, even by itself: total instructions in shared programs: 3665 -> 3613 (-1.42%) instructions in affected programs: 2046 -> 1994 (-2.54%) helped: 52 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.19% max: 25.00% x̄: 8.02% x̃: 4.00% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -10.26% -5.79% Instructions are helped. total bundles in shared programs: 2256 -> 2213 (-1.91%) bundles in affected programs: 1154 -> 1111 (-3.73%) helped: 43 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.33% max: 25.00% x̄: 9.10% x̃: 5.56% 95% mean confidence interval for bundles value: -1.00 -1.00 95% mean confidence interval for bundles %-change: -11.60% -6.60% Bundles are helped. total quadwords in shared programs: 3689 -> 3642 (-1.27%) quadwords in affected programs: 2025 -> 1978 (-2.32%) helped: 47 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.19% max: 25.00% x̄: 7.86% x̃: 3.85% 95% mean confidence interval for quadwords value: -1.00 -1.00 95% mean confidence interval for quadwords %-change: -10.30% -5.42% Quadwords are helped. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-26 08:37:09 -07:00
Alyssa Rosenzweig	cb6dea6b4d	pan/midgard: Share mir_nontrivial_outmod To be used with redundant move elimination. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-26 08:37:08 -07:00
Alyssa Rosenzweig	b6946d35c8	pan/midgard: Implement texture RA total instructions in shared programs: 3916 -> 3665 (-6.41%) instructions in affected programs: 1405 -> 1154 (-17.86%) helped: 35 HURT: 0 helped stats (abs) min: 1 max: 21 x̄: 7.17 x̃: 3 helped stats (rel) min: 3.00% max: 28.57% x̄: 20.11% x̃: 21.74% 95% mean confidence interval for instructions value: -9.35 -4.99 95% mean confidence interval for instructions %-change: -22.75% -17.46% Instructions are helped. total bundles in shared programs: 2472 -> 2256 (-8.74%) bundles in affected programs: 906 -> 690 (-23.84%) helped: 32 HURT: 0 helped stats (abs) min: 1 max: 18 x̄: 6.75 x̃: 3 helped stats (rel) min: 5.56% max: 32.26% x̄: 20.83% x̃: 16.67% 95% mean confidence interval for bundles value: -9.09 -4.41 95% mean confidence interval for bundles %-change: -23.77% -17.89% Bundles are helped. total quadwords in shared programs: 3965 -> 3689 (-6.96%) quadwords in affected programs: 1568 -> 1292 (-17.60%) helped: 35 HURT: 0 helped stats (abs) min: 1 max: 21 x̄: 7.89 x̃: 3 helped stats (rel) min: 2.08% max: 28.57% x̄: 19.87% x̃: 20.00% 95% mean confidence interval for quadwords value: -10.38 -5.39 95% mean confidence interval for quadwords %-change: -22.57% -17.17% Quadwords are helped. total registers in shared programs: 411 -> 392 (-4.62%) registers in affected programs: 76 -> 57 (-25.00%) helped: 15 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.27 x̃: 1 helped stats (rel) min: 9.09% max: 50.00% x̄: 30.97% x̃: 33.33% 95% mean confidence interval for registers value: -1.52 -1.01 95% mean confidence interval for registers %-change: -39.12% -22.82% Registers are helped. total threads in shared programs: 426 -> 432 (1.41%) threads in affected programs: 6 -> 12 (100.00%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00% Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-26 08:37:08 -07:00
Alyssa Rosenzweig	13f61f24ea	pan/midgard: Fix backwards blend color load The source and destination were incorrectly flipped in the move, but some details of our internal regalloc made this function anyway. Now that we're changing the regalloc, we need to fix this to avoid regressing blend shaders. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-26 08:37:08 -07:00
Alyssa Rosenzweig	a99ecc2b2b	pan/midgard: Fix scheduling mishap We shouldn't try to schedule onto a vmul if the last unit was a smul; that would force a break ("traveling back in time"). total bundles in shared programs: 2519 -> 2472 (-1.87%) bundles in affected programs: 791 -> 744 (-5.94%) helped: 20 HURT: 0 helped stats (abs) min: 1 max: 9 x̄: 2.35 x̃: 1 helped stats (rel) min: 1.52% max: 11.76% x̄: 7.94% x̃: 7.69% 95% mean confidence interval for bundles value: -3.47 -1.23 95% mean confidence interval for bundles %-change: -9.36% -6.51% Bundles are helped. total quadwords in shared programs: 4028 -> 3965 (-1.56%) quadwords in affected programs: 1223 -> 1160 (-5.15%) helped: 17 HURT: 0 helped stats (abs) min: 1 max: 17 x̄: 3.71 x̃: 2 helped stats (rel) min: 2.97% max: 10.64% x̄: 6.97% x̃: 7.14% 95% mean confidence interval for quadwords value: -5.71 -1.70 95% mean confidence interval for quadwords %-change: -8.03% -5.91% Quadwords are helped. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-26 08:37:08 -07:00
Alyssa Rosenzweig	e4038f9445	pan/midgard: Fix vector->scalar swizzles The swizzle should be taken on the masked component, rather than unconditionally X. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-26 08:37:08 -07:00
Alyssa Rosenzweig	10324095d2	pan/midgard: Add dead move elimination pass This is a special case of DCE designed to run after the out-of-ssa pass to cleanup special register lowering. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-26 08:37:08 -07:00
Alyssa Rosenzweig	082485d663	pan/midgard: Move DCE into its own file Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-26 08:37:08 -07:00
Alyssa Rosenzweig	f9e619fa82	pan/midgard: Add mir_rewrite_dst_tag helper Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-26 08:37:08 -07:00
Alyssa Rosenzweig	b3cab85606	pan/midgard: Fix flipped register bias fields We mixed up component_lo and full, which made it appear that we had less freedom in RA than we actually do. Fix this to fix some disassemblies as well as prepare for RA with the bias field. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-26 08:37:08 -07:00
Alyssa Rosenzweig	be56840d5a	pan/midgard: Update RA for cubemap coords Following the RA work, we apply the same technique to eliminate the move to r27 when loading cubemaps. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-26 08:37:08 -07:00
Eric Engestrom	d2de5b6ba2	anv+tu+radv: delete unusable dev_icd.json As per previous commit, Meson doesn't support using uninstalled libs, they're simply not ready until `ninja install` is ran, so delete them. Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> # for anv Reviewed-by: Eric Anholt <eric@anholt.net> # for tu Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> # for radv	2019-07-26 14:47:53 +00:00
Eric Engestrom	2605e9fe46	docs: fix intel_icd.json path Meson doesn't support using uninstalled libs, they're simply not ready until `ninja install` is ran, at which point one might as well use the proper icd.json file in the install folder. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-26 14:47:53 +00:00
Bas Nieuwenhuizen	9653d80de1	vulkan/wsi/x11: Increase the effective min. images for mailbox. We need 5 images: 1) CPU work 2) GPU work 3) idle 4) queued for flip 5) presenting Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-26 16:37:28 +02:00
Bas Nieuwenhuizen	5eae9bfbfc	vulkan/wsi/x11: Wait for GPU work before present with mailbox. Otherwise the wait only happens at flip time, which messes with keeping idle buffers around if the GPU work makes the image miss the next flip. I decided not to use the wait fences as those are still xshm fences, so that means we'd still have to wait in the application. Just doing it before presenting makes things simpler. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-26 16:37:28 +02:00
Bas Nieuwenhuizen	cc6a72a002	vulkan/wsi/x11: Allow using thread present-only. This allows doing a potential long blocking operation before present. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-26 16:37:28 +02:00
Bas Nieuwenhuizen	55da4e1ec2	vulkan/wsi: Use one fence per image. Much easier to work with if we want to use them in the WS-specific WSI implementation. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-26 16:37:28 +02:00
Lionel Landwerlin	0fb61dfdeb	spirv: propagate access qualifiers through ssa & pointer Not only variables can be flagged as NonUniformEXT but also expressions. We're currently ignoring it in an expression such as : imageLoad(data[nonuniformEXT(rIndex)], 0) The associated SPIRV : OpDecorate %69 NonUniformEXT ... %69 = OpLoad %61 %68 This changes propagates access qualifiers through ssa & pointers so that when it hits a OpLoad/OpStore style instructions, qualifiers are not forgotten. Fixes failure the following tests : dEQP-VK.descriptor_indexing.* Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `8ed583fe52` ("spirv: Handle the NonUniformEXT decoration") Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-26 14:09:55 +00:00
Lionel Landwerlin	86b53770e1	spirv: wrap push ssa/pointer values This refactor allows for common code to apply decoration on all ssa/pointer values. In particular this will allow to propagage access qualifiers. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-26 14:09:55 +00:00
Lionel Landwerlin	8c330728f3	nir: add access to image_deref intrinsics SPIRV added the ability to access variables and have expressions non dynamically uniform and because spirv_to_nir generates deref instructions, we'll need to have that access there. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-26 14:09:55 +00:00
Yevhenii Kolesnikov	02ecf16a70	main: unreference ATIFragmentShader program before creating new one Old program was overwritten without release of memory. Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-26 12:51:05 +00:00
Yevhenii Kolesnikov	fad848094f	state_tracker: Add destroying routine for feedback and select stages Fixes leaking memory on iris. Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-26 15:35:03 +03:00
Iago Toral Quiroga	1a99fc0fd0	v3d: fix glDrawTransformFeedback{Instanced}() This needs to take the vertex count from the provided transform feedback buffer. v2: - don't take the vertex count from the underlying buffer, instead, take it from a v3d subclass of pipe_stream_output_target (Eric). Fixes piglit tests: spec/ext_transform_feedback2/draw-auto spec/ext_transform_feedback2/draw-auto instanced Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-26 08:29:41 +02:00
Iago Toral Quiroga	47eb74ae00	v3d: subclass pipe_streamout_output_target to record TF vertices written Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-26 08:29:41 +02:00
Iago Toral Quiroga	39df568ca1	v3d: refactor v3d_tf_statistics_record slightly Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-26 08:29:41 +02:00
Alyssa Rosenzweig	2f9236096a	Revert "panfrost: Don't DIY point size/coord fields" This reverts commit `4508f43eed`, which broke a bunch of dEQP tests (e.g. in dEQP-GLES2.functional.draw.draw_arrays.*) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 13:17:22 -07:00
Jason Ekstrand	295e5a17da	anv: Disable transform feedback on gen7 It's totally implementable, it's just that the plumbing is a bit different and we never hooked it up. Don't advertise a broken feature. Fixes: `36ee2fd61c` "anv: Implement the basic form of VK_EXT_transform_feedback"	2019-07-25 14:58:14 -05:00
Pierre-Eric Pelloux-Prayer	cd02f60c1e	mesa: Fix GetTextureImage error reporting, again Iago Toral Quiroga fixed this in commit `94f740e3fc`, but it recently regressed in `0d8826f723`. Quoting Iago's original commit message for the fix: GetTex*Image should return INVALID_ENUM if target is not valid, however, GetTextureImage does not receive a target, and instead should return INVALID_OPERATION if the effective target is not valid. From the OpenGL 4.6 core profile spec, section 8.11 Texture Queries: "An INVALID_OPERATION error is generated by GetTextureImage if the effective target is not one of TEXTURE_1D, TEXTURE_2D, TEXTURE_3D, TEXTURE_1D_ARRAY, TEXTURE_2D_ARRAY, TEXTURE_CUBE_MAP_ARRAY, TEXTURE_RECTANGLE, or TEXTURE_CUBE_MAP (for GetTextureImage only)." Note that this differs from the original ARB_direct_state_access spec. However, the EXT_direct_state_access version does take a target parameter, so it should continue reporting INVALID_ENUM. Fixes KHR-GL45.direct_state_access.textures_image_query_errors. Fixes: `0d8826f723` ("mesa: refactor get_texture_image to remove duplicate code") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-25 18:43:40 +00:00
Kenneth Graunke	0e24d10ff5	iris: Use gen_mi_builder to handle CS ALU operations. In a few cases, we switch to MI_MATH instead of MI_PREDICATE, just because we were already doing math and it's easier to chain together. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-25 18:42:55 +00:00
Kenneth Graunke	fe08aa67a8	intel/mi: Add a unit test for gen_mi_store_if(). This tests that predicated stores work. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-25 18:42:55 +00:00
Kenneth Graunke	74063ee61a	intel/mi: Add a new gen_mi_store_if() helper. This performs predicated MI_STORE_REGISTER_MEM commands, assuming that the condition is already loaded into MI_PREDICATE_DATA. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-25 18:42:55 +00:00
Kenneth Graunke	27b5817b6c	intel/mi: Add gen_mi_nz() and gen_mi_z() helpers. These provide comparisons against zero. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-25 18:42:55 +00:00
Kenneth Graunke	4e16b838ba	intel/mi: Add a gen_mi_ior() to go with gen_mi_iand() Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-25 18:42:55 +00:00
Kenneth Graunke	79b8e3c260	intel/mi: Optimize away LOAD_REGISTER_REG from a register to itself We might want to resolve something to be in a particular register, so we can access it outside of the gen_mi framework...but it may already be in that register, at which point there's no work to do. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-25 18:42:55 +00:00
Kenneth Graunke	fe7ed6b057	iris: Make iris_query.c a genxml-compiled file. This will let us use Jason's new MI-builder shortly. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-25 18:42:55 +00:00
Kenneth Graunke	975f7e4a59	iris: Move iris_resolve_conditional_render to the vtable. It's going to be in genxml code shortly. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-25 18:42:55 +00:00
Kenneth Graunke	6c4c7b600d	iris: Refactor genxml macros and inlines into iris_genx_macros.h. This will let us put the genxml boilerplate in one place, before we expand genxml to more files shortly. Like i965/genX_boilerplate.h. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-25 18:42:55 +00:00
Kenneth Graunke	204a3bb816	iris: Make an iris_genx_protos.h header for prototypes. This lets us specify the prototypes once, instead of cut and pasting them per generation. isl uses a similar approach (isl_genX_priv.h). Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-25 18:42:55 +00:00
Marek Olšák	068093e84c	radeonsi: fix DAL hang due to incorrect DCC offset on Raven Set the correct relative offset. Fixes: `f8b6c5a` "radeonsi: rewrite si_get_opaque_metadata, also for gfx10 support"	2019-07-25 14:09:11 -04:00
Jason Ekstrand	9d2aa67c47	anv: Disable subgroup arithmetic on gen7 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-25 16:43:16 +00:00
Eric Anholt	f60defa72d	gitlab-ci: Add a shader-db run using v3d on drm-shim. This provides significant compiler coverage during CI at a fairly low cost in CPU time (~17s per thread for 4 threads on gst-gitlab-htz-runner3). I'm leaving wget in the docker image, as once this is in master I'm planning on having an automatic shader-db comparison between master and the branch included in the artifacts. I also haven't done freedreno yet, because it has some races when run in multithreaded mode that I'm still tracking down. Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-07-25 08:56:55 -07:00
Eric Anholt	dd3d0b2897	gitlab-ci: Only keep the build logs as artifacts. On a build failure, we were tarring up the whole ccache directory, build.ninja, build products, etc. This was over 400MB compressed on a recent early meson-main build failure, which fd.o then has to hang on to for 4 weeks. The build logs are probably the interesting part, are potentially useful regardless ("how did CI's build flags differ from mine?"), and are <500k uncompressed on my personal meson build. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2019-07-25 08:56:49 -07:00
Eric Anholt	f68b987387	gitlab-ci: Always set libdir to lib/ I introduced libdir for cross-builds so we could point at the resulting drivers without per-arch dependencies, but I'd rather not have to type x86_64-linux-whatever for non-cross-builds either. Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-07-25 08:56:19 -07:00
Eric Anholt	494ecef6b4	freedreno: Add support for drm-shim. I'm using this for shader-db analysis on x86_64 systems. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-07-25 08:56:19 -07:00
Eric Anholt	82bf1979d7	v3d: Introduce a DRM shim for calling out to the simulator. The goal is to enable testing of parts of drivers without depending on any particular kernel version or hardware being present. Simply set LD_PRELOAD=$PREFIX/lib/libv3d_drm_shim.so in your environment, and we'll fake a /dev/dri/renderD128 (or whatever the next available node is) using v3dv3. That node can then be used with the surfaceless or gbm EGL platforms. Acked-by: Iago Toral Quiroga <itoral@igalia.com>	2019-07-25 08:56:19 -07:00
Erik Faye-Lund	c5f1432296	glsl: report no function instead of empty candidate list When generating the error message for a missing function error where all available overloads were missing due to a too low GLSL version, we used to report something like this: ---8<--- 0:224(14): error: no matching function for call to `textureCubeLod(samplerCube, vec3, float)'; candidates are: 0:224(14): error: type mismatch ---8<--- This is a pretty confusing error message, and can throw people off when debugging. So let's instead check if any overload is available before we decide what to print. This allow us to report something like this instead: ---8<--- 0:224(14): error: no function with name 'textureCubeLod' 0:224(14): error: type mismatch ---8<--- This is arguably easier to understand for programmers, and doesn't send you on a wild goose chase to figure out what argument is wrong just because you stopped reading the message prematurely. I'm of course referring to a friend, not me. For sure. I would never do that. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-25 17:20:10 +02:00
Bas Nieuwenhuizen	7e1fe81f56	radv: Set correct metadata size for GFX9+. Without correct size, radeonsi assumes the metadata is incorrect, which can and will cause issues. Since the metadata is really incorrect without the size, let us fix that. Fixes: `e43cc3e3af` "radv/gfx9: handle GFX9 opaque metadata" Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-25 17:07:53 +02:00
Arcady Goldmints-Orlov	832cedfdee	anv: report HOST_ALLOCATION as supported for images Report VK_EXTERNAL_MEMORY_HANDLE_TYPE_HOST_ALLOCATION_BIT_EXT as supported for images. It was being shown supported for buffers, but not images. Fixes: `69cc6272fb` ("anv: Implement VK_EXT_external_memory_host") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-25 09:01:26 -05:00
Samuel Pitoiset	7d11bf2155	radv/gfx10: fix intensity formats by setting ALPHA_IS_ON_MSB This fixes dEQP-VK.rasterization.primitive_size.points.point_size_* This also fixes some black squares with the Sascha SSAO demo. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-25 15:48:24 +02:00
Samuel Pitoiset	6a504ab473	radv/gfx10: use L2 for DMA copy/fill operations It's coherent and faster. GFX7-GFX9 should also support this but for now only uses L2 for GFX10 because it's untested on previous gens. This fixes dEQP-VK.memory.pipeline_barrier.transfer_* This also fixes some missing geometry in Dawn Of War III because VBOs weren't updated correctly. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-25 15:48:21 +02:00
Alyssa Rosenzweig	9ce75826cb	pan/midgard: Optimize varying projection We add a new opt pass fusing perspective projection with varyings. Minor win..? We don't combine non-varying projections, since if we're too agressive, the extra load/store traffic will hurt us so it's not really a win in practice. total instructions in shared programs: 3915 -> 3913 (-0.05%) instructions in affected programs: 76 -> 74 (-2.63%) helped: 1 HURT: 0 total bundles in shared programs: 2520 -> 2519 (-0.04%) bundles in affected programs: 46 -> 45 (-2.17%) helped: 1 HURT: 0 total quadwords in shared programs: 4027 -> 4025 (-0.05%) quadwords in affected programs: 80 -> 78 (-2.50%) helped: 1 HURT: 0 Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig	f6438d1e15	pan/midgard: Add perspective projection recombine pass We don't use it yet, since it's actually a shader-db regression. This is primarily helpful as an intermediate step for attaching projection to varyings. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig	8ddb0eda42	pan/midgard: Force perspective ops to use vec4 It doesn't make sense to use them with anything less. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig	b06951d343	pan/midgard: Add R27-only op handling We use a special conflicting register class. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig	f55a760d0c	pan/midgard: Add OP_R27_ONLY helper While load/store ops like st_vary can take an argument in either r26/r27, ops like those for perspective projection must specifically take their argument in r27. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig	233c0faadd	pan/midgard: Enable RA for st_vary Now that all the piping is in place to do so without regressions, we flip on automatic register allocation for varyings. Hooray! total instructions in shared programs: 4025 -> 3915 (-2.73%) instructions in affected programs: 1667 -> 1557 (-6.60%) helped: 62 HURT: 0 helped stats (abs) min: 1 max: 3 x̄: 1.77 x̃: 2 helped stats (rel) min: 0.93% max: 20.00% x̄: 10.80% x̃: 10.64% 95% mean confidence interval for instructions value: -1.89 -1.66 95% mean confidence interval for instructions %-change: -12.50% -9.11% Instructions are helped. total bundles in shared programs: 2683 -> 2520 (-6.08%) bundles in affected programs: 1066 -> 903 (-15.29%) helped: 62 HURT: 0 helped stats (abs) min: 1 max: 3 x̄: 2.63 x̃: 3 helped stats (rel) min: 2.94% max: 42.86% x̄: 23.85% x̃: 22.50% 95% mean confidence interval for bundles value: -2.83 -2.43 95% mean confidence interval for bundles %-change: -27.73% -19.97% Bundles are helped. total quadwords in shared programs: 4192 -> 4027 (-3.94%) quadwords in affected programs: 1584 -> 1419 (-10.42%) helped: 62 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 2.66 x̃: 3 helped stats (rel) min: 1.85% max: 30.00% x̄: 16.49% x̃: 16.52% 95% mean confidence interval for quadwords value: -2.87 -2.46 95% mean confidence interval for quadwords %-change: -19.14% -13.84% Quadwords are helped. total registers in shared programs: 433 -> 411 (-5.08%) registers in affected programs: 67 -> 45 (-32.84%) helped: 23 HURT: 1 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 25.00% max: 50.00% x̄: 41.30% x̃: 50.00% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 14.29% max: 14.29% x̄: 14.29% x̃: 14.29% 95% mean confidence interval for registers value: -1.09 -0.74 95% mean confidence interval for registers %-change: -45.45% -32.52% Registers are helped. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig	210dbe3fc1	pan/midgard: Remove check for `class` Fixes classes defaulting to vec4 in some cases. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig	8842db3a7d	pan/midgard: Move uniforms to special registers The load/store pipes can't take a uniform register in, so an explicit move is necessary here. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig	ae7acde91f	pan/midgard: Emit st_vary registers in install_registers Now that we have its registers handled normally like the rest of the IR. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig	c3ad7500d2	pan/midgard: Add mir_lower_special_reads helper Given the constraints on special registers, we add a helper for lowering these by inserting moves (copies) where needed to satsify the ISA constraints. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig	e169301bd8	pan/midgard: Add emit_explicit_constant helper We generalize the constant emission helper used in fragment writeout as we'll also need it for vertex outputs. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig	eedd6c1dd0	pan/midgard: Add mir_rewrite_index_src_tag Specialized version of a rewrite that only rewrites a certain type of instruction. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig	5d5caf10af	pan/midgard: Add class check This ensures the rules for accessing special register classes are satisfied. This is asserted as a prepass should have lowered offending uses to something satisfying these rules. Special register classes are not work registers and cannot be used for RMW operations; they are essentially 1-way pipes straight into/from fixed-function logic in the shader cores. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig	91195bdff1	pan/midgard: Implement class spilling We reuse the same register spilling mechanism as for work->memory to spill special->work registers, e.g. to allow writing out more than 2 vec4 varyings (without better scheduling anyway). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig	0f38f6466e	pan/midgard: Extend liveness analysis to st_vary These can consume sources now. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig	dca0166ce1	pan/midgard: Implement load/store register classing This does not yet support special->work spilling, nor does it support multiclass breakup. These corner cases will be handled in succeeding commits. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig	839b80aa89	pan/midgard: Allocate special register classes We'll want to also handle load/store and texture registers in our RA loop. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig	480b502443	pan/midgard: Move copy propagation into its own file We also expose some utilities it uses as general MIR helpers. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:37:22 -07:00
Alyssa Rosenzweig	b8caaa3000	pan/midgard: Add mir_simple_swizzle helper Checks for x/xy/xyz/xyzw style swizzles (slightly more general but you get the idea). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:37:21 -07:00
Alyssa Rosenzweig	63385a3fdb	pan/midgard: Add mir_single_use helper Helps as an optimization heuristic. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:37:21 -07:00
Alyssa Rosenzweig	5534fdb7bf	panfrost: Compute I/O counts from shader_info ...rather than exposing it in the vendored compiler region. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:34:21 -07:00
Alyssa Rosenzweig	4508f43eed	panfrost: Don't DIY point size/coord fields Again, it's in shader_info for us! Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:34:21 -07:00
Alyssa Rosenzweig	bab4f6c724	panfrost: Use nir_gather_info information about discards No need to track this ourselves! Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:34:21 -07:00
Alyssa Rosenzweig	48991c7a1f	panfrost: Use NIR helper invocations info We don't need to guesstimate this ourselves. This will help when we bringup derivatives. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:34:21 -07:00
Alyssa Rosenzweig	fb2fe6e7bc	panfrost/sfbd: Flesh out fragment job We include a zsbuf attachment function based on how the corresponding MFBD code works, as well as extending cbufs to mipmapped rendering while we're at it. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:34:21 -07:00
Alyssa Rosenzweig	e6802af8c3	panfrost: Disable tiled formats on SFBD systems Just because we don't have the format codes to render to them yet. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:34:20 -07:00
Alyssa Rosenzweig	990e24469c	panfrost: Move require_sfbd to screen We'll need it to specialize resource creation by chip. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:34:20 -07:00
Alyssa Rosenzweig	a9c73e825a	panfrost: Reserve, but do not upload, shader padding Fixes invalid read errors reported by valgrind. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 06:34:20 -07:00
Alyssa Rosenzweig	b2a3ca6bd5	util/ra: Add a getter for a node class Complements the existing getters and the setter for node class. To be used in the Panfrost RA refactor. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-25 06:14:12 -07:00
Tomeu Vizoso	688d9b4fb7	panfrost/ci: Update kernel to 5.2 Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-25 15:08:44 +02:00
Nicolas Dufresne	08f1cefecd	egl: Also query modifiers when exporting DMABuf This fixes eglExportDMABUFImageQueryMESA() so it will report the modififers of the underlying image. Without this information, re-importing will likely be broken as it is rare these days that no modifiers are used. Reviewed-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Daniel Stone <daniels@collabora.com> Fixes: `8f7338f284` ("egl: add initial EGL_MESA_image_dma_buf_export v2.4")	2019-07-25 05:14:36 +00:00
Heinrich Fink	4886924262	mesa: Enable GL_MESA_framebuffer_flip_y for GL 4.3 Extend MESA_framebuffer_flip_y to be used with OpenGL versions 4.3 and higher. OpenGL 4.3 adds FramebufferParameteri needed by this extension. Reviewed-by: Fritz Koenig <frkoenig@google.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-25 04:47:38 +00:00
Alyssa Rosenzweig	31c9fcbd0f	panfrost: Don't expose some atomic stuff even with dEQP Fixes dEQP crashes. Fixes: `2f93ecd654` ("panfrost: Fake CAPs for dEQP-GLES31") Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-24 17:21:12 -07:00
Dave Airlie	16fcbb2eba	gallium: fix windows build from params change. This is why we can't have nice things. I'm sure there's someway to do this with {0} but I really don't have time for that. Fixes: `2631fd3b0b` ("gallivm: rework lp_build_tgsi_soa to take a struct") Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-25 10:02:22 +10:00
Jonathan Marek	97c8314c5f	nir/algebraic: add scmp algebraic optimizations When 'x' is the result of a scmp op: x != 0.0 or x == 1.0: passthrough x == 0.0 or x != 1.0: invert Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-24 17:36:21 -04:00
Jonathan Marek	9be902097c	nir/algebraic: add option to lower fall_equalN/fany_nequalN Add generic lowerings for fall_equalN/fany_nequalN. These should be optimal for vec4 backends that doesn't have any special instructions for it, as long as they support saturate. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-24 17:36:21 -04:00
Jonathan Marek	397375d3f3	nir/algebraic: add fdot2 optimizations Add simple fdot2 optimizations that are missing. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-24 17:36:21 -04:00
Jonathan Marek	1e089d0575	nir/algebraic: add option to lower fdph For backends that don't have a 'fdph' instructions Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-24 17:36:21 -04:00
Jonathan Marek	bc3b6168ba	nir: replace lower_sincos with algebraic opt This version has less ops for the same precision. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Acked-by: Matt Turner <mattst88@gmail.com>	2019-07-24 17:36:21 -04:00
Jonathan Marek	5a4e71c082	nir/algebraic: allow swizzle in nir_algebraic replace expression This is to allow optimizations in nir_opt_algebraic not otherwise possible Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Matt Turner <mattst88@gmail.com>	2019-07-24 17:36:21 -04:00
Rob Clark	b4f4768672	gallium/u_transfer_helper: fix assert in RGTC case Previously we'd hit the unreachable() for uploading RGTC. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-24 21:11:06 +00:00
Yevhenii Kolesnikov	53730ab32c	main: Free memory allocated for gl_bitmap_atlas structure Structure itself wasn't freed during context tear-down, causing a memory leak on iris. Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-07-24 15:31:26 -04:00
Daniel Schürmann	e272fdd508	nir,intel: lower if (cond) demote() to new intrinsic demote_if(cond) This will effectively enable the optimization in anv. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-24 13:02:18 -05:00
Kenneth Graunke	517005b4cf	i965: Use NIR to lower legacy userclipping. This allows us to drop legacy userclip plane handling in both the vec4 and FS backends, and simplifies a few interfaces. v2 (Jason Ekstrand): - Move brw_nir_lower_legacy_clipping to brw_nir_uniforms.cpp because it's i965-specific. - Handle adding the params in brw_nir_lower_legacy_clipping - Call brw_nir_lower_legacy_clipping from brw_codegen_vs_prog Co-authored-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-24 18:00:13 +00:00
Jason Ekstrand	d10de25309	anv: Implement VK_EXT_subgroup_size_control Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-24 12:55:40 -05:00
Jason Ekstrand	bcef32d49b	anv/pipeline: Plumb pipeline shader stage create flags Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-24 12:55:40 -05:00
Jason Ekstrand	2a236c76f8	intel/compiler: Allow for required subgroup sizes Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-24 12:55:40 -05:00
Jason Ekstrand	4397eb91c1	intel/compiler: Allow for varying subgroup sizes Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-24 12:55:40 -05:00
Jason Ekstrand	799f0f7b28	nir/lower_subgroups: Properly lower masks when subgroup_size == 0 Instead of building a constant mask (which depends on knowing the subgroup size), we build an expression. Because the pass uses the nir_shader_lower_instructions helper, subgroup lowering will be run on any newly emitted instructions as well as the previously existing instructions. In particular, if the subgroup size is known, the newly emitted subgroup_size intrinsic will get turned into a constant and a later constant folding pass will clean it up. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-24 12:55:40 -05:00
Jason Ekstrand	256e6c2d94	vulkan: Update the XML and headers to 1.1.116 Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-24 12:55:40 -05:00
Jason Ekstrand	c84b8eeeac	intel/compiler: Be more conservative about subgroup sizes in GL The rules for gl_SubgroupSize in Vulkan require that it be a constant that can be queried through the API. However, all GL requires is that it's a uniform. Instead of always claiming that the subgroup size in the shader is 32 in GL like we have to do for Vulkan, claim 8 for geometry stages, the maximum for fragment shaders, and the actual size for compute. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-24 12:55:40 -05:00
Jason Ekstrand	1981460af2	intel/compiler: Lower gl_SubgroupSize in postprocess_nir Instead of lowering the subgroup size so early, wait until we have more information. In particular, we're going to want different subgroup sizes from different stages depending on the API. We also defer lowering of subgroup masks because the ge/gt masks require the subgroup size to generate a subgroup mask. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-24 12:55:40 -05:00
Jason Ekstrand	f62227f2b7	intel/nir: Make brw_nir_apply_sampler_key more generic Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-24 12:55:40 -05:00
Sagar Ghuge	87cef718e1	nir: Add lowering for nir_op_irem and nir_op_imod Tested on Gen > 9. v2: 1) Fix lowering 2) Keep a consistent i/u order (Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-24 10:33:09 -07:00
Yevhenii Kolesnikov	882fe09a74	main: Fix memleaks in mesa_use_program Add freeing of SubroutineIndexes to the _mesa_free_shader_state. Fixes: `4566aaaa5b` ("mesa/subroutines: start adding per-context subroutine index support (v1.1)") Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-24 16:54:21 +00:00
Andrii Simiklit	fa2fc68de1	intel/compiler: don't use a keyword struct for a class fs_reg warning: struct 'fs_reg' was previously declared as a class Fixes: `e64be391` ("intel/compiler: generalize the combine constants pass") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>	2019-07-24 13:26:42 +00:00
Qiang Yu	280dfa02fa	lima/ppir: fix disassembler temp read/write print temp read/write use negtive offset, and handle alignment==1 case. Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com>	2019-07-24 20:39:39 +08:00
Eric Engestrom	e7e31b18d6	gallium+mesa: fix tgsi_semantic array type Fixes: `ed23335a31` ("gallium: use enums in p_shader_tokens.h (v2)") Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-24 09:33:29 +01:00
Eric Engestrom	f986741a91	util: fix no-op macro (bad number of arguments) Fixes: `b8e077daee` ("util: no-op __builtin_types_compatible_p() for non-GCC compilers") Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-24 09:13:58 +01:00
Samuel Pitoiset	4389e85dc9	radv/gfx10: enable VK_EXT_transform_feedback When a pipeline uses transform feedback, the driver fallbacks to the legacy path because NGG support for streamout is a non-trivial amount of work. AMDVLK also uses the legacy path for streamout, while RadeonSI uses the new NGG path. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-24 08:23:37 +02:00
Samuel Pitoiset	a3a4fa1860	radv/gfx10: do not enable NGG if a pipeline uses XFB NGG GS for streamout requires a bunch of work, so enable it with the legacy path only for now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-24 08:23:34 +02:00
Samuel Pitoiset	09abe571a2	radv/gfx10: emit streamout shader config Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-24 08:23:32 +02:00
Samuel Pitoiset	383c2e625a	radv/gfx10: declare streamout user SGPRs Required for legacy streamout. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-24 08:23:30 +02:00
Samuel Pitoiset	fd195d8085	radv/gfx10: update streamout descriptors Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-24 08:23:27 +02:00
Samuel Pitoiset	ea337c8b7e	radv/gfx10: fix VS input VGPRs with the legacy path For some reasons, InstanceID is VGPR3 although StepRate0 is set to 1. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-24 08:23:21 +02:00
Dave Airlie	2631fd3b0b	gallivm: rework lp_build_tgsi_soa to take a struct The parameters were getting messy and I have to add a few more for compute shaders, so clean it up before proceeding. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-24 09:20:09 +10:00
Jason Ekstrand	9700e45463	nir/lower_io: Return SSA defs from helpers I can't find a single place where nir_lower_io is called after going out of SSA which is the only real reason why you wouldn't do this. Returning SSA defs is more idiomatic and is required for the next commit. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-23 17:48:49 -05:00
Dylan Baker	7cf50af6f5	meson: allow building all glx without any drivers Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111016 Fixes: `a47c525f32` ("meson: build glx") Acked-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-23 15:34:23 -07:00
Jan Zielinski	3d6cffffcf	swr/rasterizer: Fix 3D resource copies. Ensure constant attributes stay constant with barycentric interpolation. Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-07-23 21:55:09 +02:00
Jan Zielinski	ec4a5f5e13	swr/rasterizer: Fix return type on SIMD8 version of Clamp and Normalize utility functions Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-07-23 21:55:09 +02:00
Jan Zielinski	47cdb0ac27	swr/rasterizer: small formatting changes Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-07-23 21:55:09 +02:00
Jan Zielinski	ccc6b4f96b	swr/rasterizer: Adding support for unhandled clipEnable state Clipping is not correctly handled by the rasterizer - fixing this. Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-07-23 21:55:09 +02:00
Bas Nieuwenhuizen	e5b3f0a867	radv/gfx10: Enable binning. Numbers for Talos: gfx10 without binning: 77.0 77.7 77.2 77.6 gfx10 with binning: 82.3 82.0 82.7 82.4 Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-23 21:26:59 +02:00
Bas Nieuwenhuizen	3268c806fb	radv/gfx10: Implement bin size calculation. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-23 21:26:59 +02:00
Bas Nieuwenhuizen	4b757697e9	radv/gfx9: Select between depth/color bins based on area. Mirrors radeonsi. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-23 21:26:59 +02:00
Bas Nieuwenhuizen	22f2f76789	radv: Generalize binning settings. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-23 21:26:59 +02:00
Bas Nieuwenhuizen	793cbf6161	radv/gfx10: Use new scan converter. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-23 21:26:59 +02:00
Bas Nieuwenhuizen	4058b354c5	radv: Set FLUSH_ON_BINNING_TRANSITION. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-23 21:26:59 +02:00
Bas Nieuwenhuizen	906fcfccfd	radv: Use pbb_allow for framebuffer BREAK_BATCH. Ported from radeonsi. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-23 21:26:59 +02:00
Marek Olšák	264ab6ffcd	radeonsi/nir: set tgsi_shader_info::uses_fbfetch for KHR_blend_equation_adv. This doesn't implement the color buffer load. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-23 15:08:37 -04:00
Marek Olšák	45556731b6	tgsi/scan: add uses_fbfetch Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-23 15:08:37 -04:00
Marek Olšák	ee858871bd	radeonsi: fail if importing a texture with incorrect last_level or samples v2: don't fail if the texture comes from an incompatible driver. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> (v1)	2019-07-23 15:08:27 -04:00
Marek Olšák	f8b6c5a1a6	radeonsi: rewrite si_get_opaque_metadata, also for gfx10 support Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-23 15:03:51 -04:00
Marek Olšák	e718f8e713	radeonsi: simplify si_get_input_prim and remove incorrect TODO comment u_vertices_per_prim(QUADS) is the same as TRIANGLES. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-23 15:03:49 -04:00
Marek Olšák	16392cc3f3	radeonsi/gfx10: fix and enable CLEAR_STATE it was a driver bug. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-23 15:03:47 -04:00
Marek Olšák	ad642d5b3a	radeonsi: stop using info.opcode_count[TGSI_OPCODE_INTERP_SAMPLE] Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-23 15:03:46 -04:00
Marek Olšák	6ac2146a98	ac/nir: implement nir_op_pack_{us}norm_2x16 Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-23 15:03:44 -04:00
Pierre-Eric Pelloux-Prayer	079e5f73d7	mesa/st: rewrite src var when lowering tex_src_plane The assign_extra_samplers() adds the needed extra samplers but they need to be used in the nir_tex_instr. Otherwise the plane information is simply lost and all nir_tex_instr use the same sampler. Here's an example of the bug: NIR before st_nir_lower_tex_src_plane: vec1 32 ssa_8 = load_const (0x00000000 /* 0.000000 /) vec4 32 ssa_9 = tex ssa_0 (texture_deref), ssa_0 (sampler_deref), ssa_5 (coord), ssa_8 (plane) vec1 32 ssa_10 = load_const (0x00000001 / 0.000000 */) vec4 32 ssa_11 = tex ssa_0 (texture_deref), ssa_0 (sampler_deref), ssa_5 (coord), ssa_10 (plane) After: vec4 32 ssa_9 = tex ssa_0 (texture_deref), ssa_0 (sampler_deref), ssa_5 (coord) vec4 32 ssa_11 = tex ssa_0 (texture_deref), ssa_0 (sampler_deref), ssa_5 (coord) This fixes the following piglit test for radeonsi + NIR: - ext_image_dma_buf_import-sample_nv12 - ext_image_dma_buf_import-sample_yuv420 - ext_image_dma_buf_import-sample_yvu420 Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-07-23 15:00:43 -04:00
Pierre-Eric Pelloux-Prayer	e9cf8c1d30	u_blitter: add a msaa parameter to util_blitter_clear Fixes: `ea5b7de138` ("radeonsi: make gl_SampleMaskIn = 0x1 when MSAA is disabled") Tested-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-23 14:42:20 -04:00
Pierre-Eric Pelloux-Prayer	d811446e6c	u_blitter: enable msaa when dst num samples is > 1 Commit `ea5b7de138` broke some piglit tests on radeonsi (Bonaire hardware). This commit fixes half of the regression by enabling msaa if the dest surface has more than 1 sample (instead of hardcoding it to false). Fixes: `ea5b7de138` ("radeonsi: make gl_SampleMaskIn = 0x1 when MSAA is disabled") Tested-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-23 14:42:20 -04:00
Jason Ekstrand	ae392d73c9	nir/gather_info: Look for uses of helper invocations The one obvious omission here is gl_HelperInvocation itself. However, the spec doesn't require that we generate then when gl_HelperInvocation is used, it merely mandates that we report them if they are there. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-23 13:40:41 -05:00
Jason Ekstrand	41ab92a327	nir/gather_info: Move setting uses_64bit out of the switch Otherwise, as we add things to the switch, we're going to forget and add some 64-bit op at some point in the future and it'll stop getting flagged. There's no reason why we can't do the check for derivatives. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-23 13:40:41 -05:00
Jason Ekstrand	0e6cb481fa	nir: Add a nir_tex_instr_has_implicit_derivatives helper Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-23 13:40:41 -05:00
Jason Ekstrand	7a98c7804c	nir: Move nir_alu_instr_is_comparison to the ALU section Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-23 13:40:41 -05:00
Rafael Antognolli	1f4cbc9a06	intel/genxml: Add new test for subgroups. Make sure that a <group> tag within another <group> tag work just fine. v2: rename 'halfbyte' to 'byte' to match the size (Lionel). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-23 17:45:19 +00:00
Rafael Antognolli	fe5ae96d66	intel/genxml: Add basic infra for encoding/decoding unit tests. Adding option to print quiet. v2: Add license header. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-23 17:45:19 +00:00
Rafael Antognolli	e25ebe2ec9	intel/gen_decoder: Decode <group> inside <group>. Now we can decode a <group> tag inside another <group> tag, and properly print its indices and content. v2: Use push/pop stack to fields, groups and iters (Lionel). v3: Add assert(iter->level < DECODE_MAX_ARRAY_DEPTH) (Lionel). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-23 17:45:19 +00:00
Rafael Antognolli	f670c2e1ff	intel/gen_decoder: Add the concept of array "levels". We currently only support one level, which is the basic level of a <group> tag. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-23 17:45:19 +00:00
Rafael Antognolli	618d054283	intel/gen_decoder: Add array field. We currently use the group->next pointer to iterate through the <group> tags. This change them to be a type of field, so we can descend into them while iterating, and then go back to the original position. Will be useful when we want to decode <group>'s inside <group>'s, and when there are more <field>'s after a <group> tag. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-23 17:45:19 +00:00
Rafael Antognolli	21bdd51942	intel/gen_decoder: Rename internally "group" to "array". A gen_group (group in most of the code) can be of several types: - instruction - struct - register - group (?!?) The <group> tag actually represents an array of elements. So at least in our code, lets call it an array to avoid confusion with gen_group. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-23 17:45:19 +00:00
Rafael Antognolli	69506cbb74	intel/gen_decoder: Add gen_spec_load_filename() function. Refactor the code from gen_spec_load_from_path() into a separate function, that can be used with a xml file that doesn't fit the genX.xml filename format. Will be used soon for implementing unit tests for gen_decoder. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-23 17:45:19 +00:00
Rafael Antognolli	1f2b22a6bd	intel/gen_decoder: Fix parsing of small genxml file. When using gen_spec_load_from path, only abort decoding if the read length is 0. Previously, we were aborting if finding an EOF, even if something was read from the file. Also only kill the decoded file if no commands or structs were found, and print a message in such case. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-23 17:45:19 +00:00
Guido Günther	85996567f5	kmsro: Extend to include mxsfb-drm This allows using the LCDIF display controllers (with the mxsfb drm modesetting driver) along with the Etnaviv render-only drivers. LCDIF is found on i.MX SoCs. Signed-off-by: Guido Günther <agx@sigxcpu.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-23 17:12:10 +00:00
Sagar Ghuge	806e5a37ed	anv: Implement VK_KHR_imageless_framebuffer v2: Pass pointer instead of struct instance (Lionel) v3: 1) Fix small nits (Jason) 2) Add way to detect anv_framebuffer don't have attachments (Jason) 3) Get rid of unncessary pNext chain walk (Jason) 4) Keep framebuffer instance in anv_cmd_state (Jason) v4: 1) Dump attachments from cmd_buffer (Jason) v5: 1) Fix condition check and add assertion (Lionel) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-23 10:01:45 -07:00
Alyssa Rosenzweig	840b806d64	panfrost/midgard: Allocate registers once (per-screen) This should save a lot of per-compile time by using the RA the way it's actually supposed to be used. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-23 09:06:21 -07:00
Lionel Landwerlin	772a5f9814	anv: fix use of comma operator This doesn't fix any bug at the moment because the next statement is 'true' which happens to be APIMODE_D3D, but if that changes it could. The fixes tags is as far I could go but the error predates it (2016 is probably far enough). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `8db6f2e6eb` ("anv/pipeline: Roll genX_pipeline_util.h into genX_pipeline.c") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-23 15:54:48 +00:00
Andrii Simiklit	79ab2c3e57	nir: use \| instead of \|\| operator warning: use of logical '\|\|' with constant operand note: use '\|' for a bitwise operation Fixes: `758fdce9fe` ("nir: Add some generic helpers for writing lowering passes") Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>	2019-07-23 18:08:58 +03:00
Arnaud Patard	397f9ba69f	panfrost: Fix T6XX Support While testing kmscube with mesa master, it turns out that kmscube is not working anymore. After bisecting, commit `5a7688fdec` is the culprit. A short trial and error session allowed to find the removed bit of code making kmscube working again. This patch adds it back. Fixes: `5a7688fde` ("panfrost: Use 64-bit descriptors globally") v2: Add comment pointing out this is magic. [Alyssa, trivial] Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-23 08:04:42 -07:00
Alyssa Rosenzweig	83a1d5544a	panfrost: Use correct definition for is_t6xx Rather than anything "early Midgard", limit us specifically to T6XX, as certain workarounds only apply to genuine T6XX, not T7XX. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-23 08:04:42 -07:00
Eric Engestrom	3acc4278ad	nir: don't return void Fixes: `14531d676b` ("nir: make nir_const_value scalar") Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-07-23 16:02:37 +01:00
Eric Engestrom	7797823afa	util: fix asprintf() fallback Fixes: `9607d499dc` ("util: add asprintf() wrapper for MSVC") Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-23 14:59:07 +00:00
Michel Dänzer	22c7738520	st/mesa: Try re-importing resource if necessary in st_vdpau_map_surface This can be the case if the resource was obtained from st_vdpau_output/video_surface_gallium. st_vdpau_output/video_surface_dma_buf do a similar dance internally. v2: * Pass PIPE_HANDLE_USAGE_FRAMEBUFFER_WRITE instead of 0 for usage. Bugzilla: https://bugs.freedesktop.org/111099 Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> # v1 Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-23 16:28:02 +02:00
Michel Dänzer	7499e7362d	radeonsi: Allow PIPE_TEXTURE_2D_ARRAY in si_texture_from_handle Needed for the following st/mesa fix. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-23 16:26:04 +02:00
Alyssa Rosenzweig	2f93ecd654	panfrost: Fake CAPs for dEQP-GLES31 We still have some big ticket items left on GLES 3.0, but it's often helpful to be able to access higher dEQP levels for debugging features that just don't quite match a particular API. Plus, this opens up a whole slew of new features to poke at if boredom overtakes, ahem. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-23 06:36:48 -07:00
Mark Menzynski	7493fbf032	nvc0/ir: Fix assert accessing null pointer Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111007 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111167 Signed-off-by: Mark Menzynski <mmenzyns@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tobias Klausmann<tobias.klausmann@freenet.de>	2019-07-23 15:08:25 +02:00
Samuel Pitoiset	d36af71f44	radv/gfx10: enable CLEAR_state It actually works. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-23 14:15:55 +02:00
Juan A. Suarez Romero	c41545c2f5	docs: update calendar, add news item and link release notes for 19.1.3 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2019-07-23 11:20:00 +00:00
Juan A. Suarez Romero	3843c5f77a	docs: add sha256 checksums for 19.1.3 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `33e57d0ace`)	2019-07-23 11:18:31 +00:00
Juan A. Suarez Romero	fd965a3330	docs: add release notes for 19.1.3 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `09a1b2bdba`)	2019-07-23 11:18:29 +00:00
Erico Nunes	65e6c42d27	lima/ppir: fix branch codegen register encode The branch instruction has 6 bits per register operand which allows it to specify a component in the register. Fix codegen so that it outputs the right component, otherwise it always outputs the x component. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-07-23 08:49:19 +00:00
Erico Nunes	a255b49593	lima/ppir: fix debug logs in regalloc The macros already prepend "ppir: ", remove them from the actual strings so it doesn't appear duplicated. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-07-23 08:24:19 +00:00
Erico Nunes	9254059dd8	lima/ppir: fix alignment on regalloc spilling loads The spilling code spills entire vec4 registers regardless of the components used by the spilled uses. The inserted stores code force the 4 components, but these loads were using a variable number of components, causing bugs on loading the spilled registers. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-07-23 08:24:19 +00:00
Samuel Pitoiset	9343c93e34	radv: fix dumping disassembly with RADV_DEBUG=shaders Fixes: `a20a9d0c5e` ("radv: dont store disasm string unless keep_shader_info flag set") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-23 10:22:29 +02:00
Eric Engestrom	b1c35fa6d6	st/nir: use asprintf() wrapper to fix MSVC issues Fixes: `856e84083e` ("mesa/st: add sampler uniforms") Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-23 08:57:27 +01:00
Eric Engestrom	9607d499dc	util: add asprintf() wrapper for MSVC Fixes: `856e84083e` ("mesa/st: add sampler uniforms") Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-23 08:57:27 +01:00
Ilia Mirkin	affb2da0f8	gallium: remove boolean from state tracker APIs Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-22 22:13:51 -04:00
Ilia Mirkin	0e30c6b8a7	gallium: switch boolean -> bool at the interface definitions This is a relatively minimal change to adjust all the gallium interfaces to use bool instead of boolean. I tried to avoid making unrelated changes inside of drivers to flip boolean -> bool to reduce the risk of regressions (the compiler will much more easily allow "dirty" values inside a char-based boolean than a C99 _Bool). This has been build-tested on amd64 with: Gallium drivers: nouveau r300 r600 radeonsi freedreno swrast etnaviv v3d vc4 i915 svga virgl swr panfrost iris lima kmsro Gallium st: mesa xa xvmc xvmc vdpau va Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 22:13:51 -04:00
Dave Airlie	365f24705f	st/nir: fix arb fragment stage conversion The comment even justifies the wrongness wrongly. We should be translating to pipe values properly here or else fragment maps to tess ctrl. Fixes: `3d7611e9a6` ("st/nir: use NIR for asm programs") Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-23 11:00:53 +10:00
Marek Olšák	cb9eb1834d	radeonsi: fix warning: ‘ret’ may be used uninitialized Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-22 20:57:44 -04:00
Marek Olšák	850619117e	tgsi: fix warning: ‘interp’ may be used uninitialized Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-22 20:57:44 -04:00
Marek Olšák	f257ef2bbb	gallivm: fix warning: ‘op’ may be used uninitialized Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-22 20:57:44 -04:00
Kenneth Graunke	7cdde962c5	iris: Support storage images that have matching typed formats for reads Even if we don't directly support typed reads on a format, we can often translate them to a reasonable matching format. Advertise those too.	2019-07-22 17:30:13 -07:00
Kenneth Graunke	2f1c7fae9e	iris: Stop advertising MSAA storage images by mistake st_extensions.c sets const->MaxImageSamples (GL_MAX_IMAGE_SAMPLES) by looping over [16, 15, .. 1x] MSAA modes, and RGBA/BGRA/ARGB/ABGR 8888 color formats, calling pipe->is_format_supported() for each, with the usage set to PIPE_BIND_SHADER_IMAGE. If any are supported, it selects that number of samples. We were checking if sample_count <= 1, which meant that we were getting a value of 1x MSAA, rather than the expected 0x (feature doesn't exist). But, only on Icelake because Gen11 adds support for typed read messages for R8G8B8A8_UNORM. The lack of typed read messages for these formats was tricking the check on Gen9 to say no correctly. This caused some Icelake conformance failures, because we don't implement this feature. Just check for sample_count == 0 instead.	2019-07-22 17:30:13 -07:00
Kenneth Graunke	82607f8a90	egl: Only expose 565 pbuffer configs if X can export them as DRI3 images Glamor in xorg-server 1.20 cannot expose 16bpp pixmaps when running in the usual 24bpp mode. This meant our 565 pbuffer configs would ultimately fail to create a backing pixmap, leading to crashes. To hack around this, make a 16bpp pixmap and try and export it. If it works, expose the configs. Otherwise, just skip them. This also disables them on DRI2. These configs were only added to pass conformance requirements, and I doubt anybody cares about testing out 565 pbuffer visuals on DRI2-only drivers. v2: Don't leak the fds (caught by Eric Anholt) v3: Don't free(fds), it's not malloc'd Fixes: `dacb11a585` ("egl: Add a 565 pbuffer-only EGL config under X11.") Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-22 16:58:09 -07:00
Kenneth Graunke	6ad31c4ff3	egl: Make the 565 pbuffer-only config single buffered. In commit `dacb11a585`, Eric found the first matching 565 pbuffer config, and stopped. Our double-buffered configs come first in the list, so we added that, making a pbuffer-only config that claimed to be double buffered. This doesn't make sense, since pixmaps/pbuffers are fundamentally not double buffered. When using that config, every call to eglCreatePbufferSurface would fail with EGL_BAD_MATCH. The call chain looks like this: - eglCreatePbufferSurface - dri3_create_pbuffer_surface - dri3_create_surface - dri2_get_dri_config which eventually does: const bool double_buffer = surface_type == EGL_WINDOW_BIT; and then fails to find a matching config, because it ends up looking for a single-buffered config - and there aren't any. To fix this, make the 565 pbuffer config single-buffered. This fixes at least 51 dEQP-EGL.* tests. Fixes: `dacb11a585` ("egl: Add a 565 pbuffer-only EGL config under X11.") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-22 16:58:09 -07:00
Kenneth Graunke	fc21394bc4	egl: Quiet warning about front buffer rendering for pixmaps/pbuffers pbuffer configs cause a million of these warnings to trigger, but when using pixmaps or buffers, there is only one surface, so this warning doesn't make much sense. Retain it for window surfaces for now. Fixes: `dacb11a585` ("egl: Add a 565 pbuffer-only EGL config under X11.") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-22 16:58:09 -07:00
Kenneth Graunke	78164a3a6c	mesa: Fix ReadBuffers with pbuffers pbuffers are internally single-buffered. Marek fixed DrawBuffers to handle this case, but we need to fix ReadBuffers too. Otherwise, pretty much every conformance test fails because glReadPixels breaks. v2: Refactor the switch into a helper (suggested by Eric Anholt) Fixes: `35294f2eca` ("mesa: fix pbuffers because internally they are front buffers") Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-22 16:58:09 -07:00
Marek Olšák	c37df5feaa	mesa: fix assertion failure in TexImage Check the assertion after error checking. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111194 Fixes: `9dd1f7cec0` ("mesa: pass gl_texture_object as arg to not depend on state") Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-22 14:45:57 -07:00
Jason Ekstrand	5c5f11d1dd	nir: Remove a bunch of large stack arrays Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-22 16:17:18 -05:00
Jason Ekstrand	fa63fad333	intel/fs: Stop stack allocating large arrays Normally, we haven't worried too much about stack sizes as Linux tends to be fairly friendly towards large stacks. However, when running DXVK apps under wine, we're suddenly subject to Windows' more stringent stack limitations and can run out of space more easily. In particular, some of the shaders in Elite Dangerous: Horizons have quite a few registers and the arrays in split_virtual_grfs are large enough to blow a 1 MiB stack leading to crashes during shader compilation. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108662 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2019-07-22 16:16:39 -05:00
Nataraj Deshpande	0661c357c6	egl/android: Update color_buffers querying for buffer age color_buffers[] is currently hard coded to 3 for android which fails in droid_window_dequeue_buffer when ANativeWindow creates color_buffers >3 while querying buffer age during dEQP partial_update tests on chromeOS. The patch removes static color_buffers[], queries for MIN_UNDEQUEUED_BUFFERS, sets native window buffer count and allocates the correct number of color_buffers as per android. Fixes dEQP-EGL.functional.partial_update* tests on chromebooks with enabling EGL_KHR_partial_update. v2: update comment instead of removing (Eric Engestrom) v3: change static array to dynamic allocated color_buffers querying MIN_UNDEQUEUED_BUFFERS (Chia-I Wu olv@chromium.org) Fixes: `2acc69da8c` "EGL/Android: Add EGL_EXT_buffer_age extension" Signed-off-by: Nataraj Deshpande <nataraj.deshpande@intel.com> Acked-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-07-22 12:31:34 -07:00
Caio Marcelo de Oliveira Filho	0345aeeb40	intel/compiler: Use nir_opt_conditional_discard anv vkpipeline-db results for SKL: total instructions in shared programs: 3622461 -> 3611281 (-0.31%) instructions in affected programs: 396452 -> 385272 (-2.82%) helped: 2062 HURT: 1 total cycles in shared programs: 1458144669 -> 1458105320 (<.01%) cycles in affected programs: 4171830 -> 4132481 (-0.94%) helped: 1874 HURT: 180 total loops in shared programs: 2437 -> 2437 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 8745 -> 8748 (0.03%) spills in affected programs: 8 -> 11 (37.50%) helped: 1 HURT: 1 total fills in shared programs: 23392 -> 23395 (0.01%) fills in affected programs: 8 -> 11 (37.50%) helped: 1 HURT: 1 LOST: 0 GAINED: 1 No changes to shader-db on i965 or iris. The glsl compiler already does a similar optimization. Improvement suggested by Daniel Schürmann. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-22 09:33:48 -07:00
Alyssa Rosenzweig	d07c846546	pan/decode: Disable magic divisor debugging Memory corruption (for both legitimate and illegitimate reasons) causes this to hang pantrace. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:34:26 -07:00
Alyssa Rosenzweig	e8dca7e1e1	pan/midgard: Report spills:fills to shader-db Route this info through so we can track how we're doing on register spilling. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	055aa9b1f4	panfrost/midgard: Reenable pipeline register creation This was disabled to permit regression-free RA work. Now that the spill code is in place, we can reenable, with some caveats about efficacy. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	f0d0061b18	panfrost/midgard: Report tls_size Pipe through the number of bytes of spilled memory used from the compiler into the main driver, where it will be used to allocate the Thread Local Storage buffer. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	f1dcaa0df6	panfrost: Set `initialized` in more cases Indirect linear writes were not being marked as initialized, causing the back blit to be dropped, breaking the listed tests. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	9e3dc703ff	panfrost/ci: Update expectations We've fixed some shader tests. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	bc741599f2	panfrost/midgard: Promote to move, not rewrite for non-SSA Fixes promoted uniform loads to registers. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	40abf11708	panfrost/midgard: Dump MIR of RA failure Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	a08e9511e3	pan/midgard; Dump successor graph when printing MIR We just use the pointers of the midgard_block*, which is crude, but it gets the point across and will help debug successor related issues. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	1aa556de2e	pan/midgard: Remove debug statement Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	21510c253c	panfrost/midgard: Implement register spilling Now that we run RA in a loop, before each iteration after a failed allocation we choose a spill node and spill it to Thread Local Storage using st_int4/ld_int4 instructions (for spills and fills respectively). This allows us to compile complex shaders that normally would not fit within the 16 work register limits, although it comes at a fairly steep performance penalty. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	533d65786f	panfrost/midgard: Add mir_has_arg helper Helps scan the MIR for uses of an index. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	076838ef0c	panfrost/midgard: Check write-before-read in liveness analysis If we write to an index before reading it, the old copy we're checking liveness for isn't live in this block, even if it does get read later. Fixes abnormally high register pressure in shaders with loops. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	997f85c136	panfrost/midgard/disasm: Check for certain tag errors Midgard bundles contain a tag, as well as a copy of the tag of the next bundle to facilitate prefetch. Do some simple static analysis to detect certain tag errors (particularly on shaders without branching). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	d168b08d62	pan/midgard: Add OP_IS_CSEL helper Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	1f297471a0	pan/midgard: Add mir_rewrite_index_src_single helper Rather than rewriting an index away across the whole block, we expose finer (per-instruction) granularity for rewrites. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	16c8c354d0	pan/midgard: Ignore inline_constant in liveness It doesn't make any sense to look at it. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	d155168e6c	panfrost/midgard: Implement load/store scratch opcodes These are used to load/store from Thread Local Storage, which is memory allocated per-thread (corresponding to ctx->scratchpad in the command stream) and used for register spilling. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	3bb780ecb9	pan/midg/disasm: Check for int varying ops Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	7e052d9332	pan/midgard: Remove "aliasing" It was a crazy idea that didn't pan out. We're better served by a good copyprop pass. It's also unused now. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	3174bc9972	panfrost: Promote uniform registers late Rather than creating either a load or a uniform register read with a fixed beginning offset, we always create a load and then promote to a uniform register later. This will allow us to promote in a register pressure aware manner. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:34 -07:00
Alyssa Rosenzweig	aa03159120	pan/midgard: Call scheduler/RA in a loop This will allow us to insert instructions as a result of register allocation, permitting spilling to be implemented. As a side effect, with the assert commented out this would fix a bunch of glamor crashes (due to RA failures) so MATE becomes useable. Ideally we'll have scheduling or RA actually sorted out before the branch point but if not this gives us a one-line out to get X working... Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:33 -07:00
Alyssa Rosenzweig	1cabb8a706	pan/midgard: Remove custom register selection callback What we have is equivalent to the default callback; let's use that. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-22 08:20:33 -07:00
Samuel Pitoiset	b5116d3cb7	radv: fix crash in vkCmdClearAttachments with unused attachment depth_stencil_attachment and/or ds_resolve attachment can be NULL. This fixes crashes with dEQP-VK.renderpass.suballocation.unused_clear_attachments.* Cc: 19.1 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-22 14:25:54 +02:00
Sergii Romantsov	253be49402	i965: free object labels when deleting Some leaks detected with GL_KHR_debug on i965. CC: Timothy Arceri <t_arceri@yahoo.com.au> Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-22 12:39:32 +03:00
Samuel Pitoiset	915abbe932	radv/gfx10: update descriptors for inline uniform blocks Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-22 09:02:42 +02:00
Samuel Pitoiset	d76746c1ff	radv/gfx10: emit the GS NGG prologue before the nested barrier Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-22 09:02:39 +02:00
Samuel Pitoiset	8c97a07967	radv/gfx10: do not allocate space for the ZPASS_DONE bug GFX10 isn't affected. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-22 09:02:35 +02:00
Samuel Pitoiset	1fb7bd046b	radv/gfx10: do not set ELEMENT_SIZE for buffer descriptors This field doesn't exist. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-22 09:02:31 +02:00
Samuel Pitoiset	1878090b68	radv: clean up fill_geom_tess_rings() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-22 09:02:28 +02:00
Samuel Pitoiset	e7c356866e	radv: change a bunch of >= GFX9 to == GFX9 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-22 09:02:26 +02:00
Samuel Pitoiset	6049745b13	ac/nir: do not clamp shadow reference on GFX10 RadeonSI only uses Z32_FLOAT_CLAMP for upgraded depth textures on GFX10 and RADV doesn't promotes Z16 or Z24. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-22 09:02:22 +02:00
Daniel Schürmann	64b7386ee8	radv: move nir_opt_conditional_discard out of optimization loop This late optimization pass is only affected by nir_opt_if() and handles all cases in a single pass. It's enough to call it once after the optimization loop. No changes on vkpipeline-db. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-22 08:12:18 +02:00
Iago Toral Quiroga	dacaf7ec06	v3d: fill logicop_func in the fragment shader key when precompiling shaders Since logicop_func 0 is PIPE_LOGIOP_CLEAR, we were trigger lowerinng of logic ops on precompiled shaders, which we don't want to do. Also, this had the side effect of making shader-db crash, as during this lowering we would try to read the color format swizzle information from the fragment shader key that we don't populate in precompiled shaders because right now we only need it when logic operations are enabled. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-22 08:05:59 +02:00
Jose Maria Casanova Crespo	9bf0bdf776	v3d: Avoid scheduling an instruction that stalls waiting for SFU retval If we detect that a scheduling candidate will stall because having a register source that is the written by the SFU unit in the previous instruction we reduce its priority so any non stalling operation would be chosen. The latency of SFU operations is defined as 2. So they would be scheduled earlier if other candidates have the same priority. Finally we won't merge instructions that stall to a previously chosen one. As the result of the previous one would be waiting for an extra cycle. Although shader-db result show that instruction are hurt with an increase of 0.35% the sum of instructions + stalls is reduced a 0.52%. And the total of sfu-stalls is reduced a 63.51%. It implies also a small increase in the max-temps metric because of scheduling earlier SFU operations. total instructions in shared programs: 9102719 -> 9117851 (0.17%) instructions in affected programs: 4324628 -> 4339760 (0.35%) helped: 4162 HURT: 12128 helped stats (abs) min: 1 max: 10 x̄: 1.28 x̃: 1 helped stats (rel) min: 0.09% max: 4.76% x̄: 0.66% x̃: 0.51% HURT stats (abs) min: 1 max: 27 x̄: 1.69 x̃: 1 HURT stats (rel) min: 0.05% max: 7.69% x̄: 0.87% x̃: 0.68% 95% mean confidence interval for instructions value: 0.90 0.96 95% mean confidence interval for instructions %-change: 0.47% 0.50% Instructions are HURT. total max-temps in shared programs: 1327728 -> 1327812 (<.01%) max-temps in affected programs: 4730 -> 4814 (1.78%) helped: 61 HURT: 134 helped stats (abs) min: 1 max: 2 x̄: 1.08 x̃: 1 helped stats (rel) min: 2.70% max: 13.33% x̄: 4.89% x̃: 4.17% HURT stats (abs) min: 1 max: 3 x̄: 1.12 x̃: 1 HURT stats (rel) min: 1.54% max: 20.00% x̄: 6.10% x̃: 5.26% 95% mean confidence interval for max-temps value: 0.28 0.58 95% mean confidence interval for max-temps %-change: 1.80% 3.52% Max-temps are HURT. total sfu-stalls in shared programs: 99551 -> 36324 (-63.51%) sfu-stalls in affected programs: 95029 -> 31802 (-66.53%) helped: 25882 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 2.44 x̃: 2 helped stats (rel) min: 5.26% max: 100.00% x̄: 79.86% x̃: 100.00% 95% mean confidence interval for sfu-stalls value: -2.47 -2.42 95% mean confidence interval for sfu-stalls %-change: -80.18% -79.54% Sfu-stalls are helped. total inst-and-stalls in shared programs: 9202270 -> 9154175 (-0.52%) inst-and-stalls in affected programs: 5618516 -> 5570421 (-0.86%) helped: 22728 HURT: 855 helped stats (abs) min: 1 max: 31 x̄: 2.16 x̃: 1 helped stats (rel) min: 0.07% max: 16.67% x̄: 1.14% x̃: 0.92% HURT stats (abs) min: 1 max: 5 x̄: 1.25 x̃: 1 HURT stats (rel) min: 0.12% max: 5.26% x̄: 1.24% x̃: 0.86% 95% mean confidence interval for inst-and-stalls value: -2.07 -2.01 95% mean confidence interval for inst-and-stalls %-change: -1.07% -1.05% Inst-and-stalls are helped. v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-22 03:00:50 +02:00
Jose Maria Casanova Crespo	c341ab7ffb	v3d: add shader-db stat to count SFU stalls SFU operations have a latency of 2 cicles, so if their results are used in the following cycle to a SFU instruction, the GPU stalls for an extra cycle until the result is available. This adds the number of stalls to the shader-db debug mode and sum of instruction + stalls to evaluate optimizations to schedule instructions that avoid generating sfu-stalls. v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-22 03:00:50 +02:00
Eric Engestrom	f7224014df	radv: replace memset()+strcpy() with snprintf() Just like the next line :) Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-21 10:38:17 +01:00
Eric Engestrom	29e8f15bdc	radv: drop unnecessary memset() before snprintf() snprintf() always terminates the string. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-21 10:38:17 +01:00
Bas Nieuwenhuizen	451f030c06	radv: Fix uninitialized warning. For es_vgpr_comp_cnt. Fixes: `795adbbadd` "radv/gfx10: Add pipeline state support for tess." Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-21 01:39:08 +02:00
Chia-I Wu	d31d25f634	virgl: fix a sync issue in virgl_buffer_transfer_extend In virgl_buffer_transfer_extend, when no flush is needed, it tries to extend a previously queued transfer instead if it can find one. Comparing to virgl_resource_transfer_prepare, it fails to check if the resource is busy. The existence of a previously queued transfer normally implies that the resource is not busy, maybe except for when the transfer is PIPE_TRANSFER_UNSYNCHRONIZED. Rather than burdening us with a lengthy comment, and potential concerns over breaking it as the transfer code evolves, this commit makes the valid_buffer_range check the only condition to take the fast path. In real world, we hit the fast path almost only because of the valid_buffer_range check. In micro benchmarks, the condition should always be true, otherwise the benchmarks are not very representative of meaningful workloads. I think this fix is justified. The recent change to PIPE_TRANSFER_MAP_DIRECTLY usage disables the fast path. This commit re-enables it as well. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-07-19 18:04:42 -07:00
Chia-I Wu	324c20304e	virgl: rework virgl_transfer_queue_extend Do not take a transfer and do the memcpy. Add a _buffer suffix to the function name to make it clear that it is only for buffers. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-07-19 18:04:37 -07:00
Chia-I Wu	2b8ad88078	virgl: fix virgl_buffer_transfer_extend Without setting hw_res, virgl_transfer_queue_extend never finds a match and always returns NULL. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-07-19 18:04:34 -07:00
Marek Olšák	bcabf75ab7	radeonsi: initialize scissor registers etc. without clear state Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:56 -04:00
Marek Olšák	47f41af06c	radeonsi: return success from vi_dcc_clear_level to simplify callers Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:54 -04:00
Marek Olšák	7a764b963a	radeonsi: fix compute-based culling regression in `1ce52c1e37` Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:50 -04:00
Marek Olšák	c741bed6e8	radeonsi/gfx10: fix VGT_PRIMITIVE_TYPE programming Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	a0d330bedb	radeonsi/gfx10: enable Wave32 for vertex, geometry, and tessellation shaders Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	1d82240f55	radeonsi/gfx10: add debug options to enable/disable Wave32 Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	8f72f137ad	radeonsi/gfx10: add as_ngg variant for TES as ES to select Wave32/64 Legacy GS has to use Wave64, so TES before GS has to use Wave64 too. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	88efb63caf	radeonsi/gfx10: implement Wave32 Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	54e6900ede	radeonsi/gfx10: use 32-bit wavemasks for Wave32 Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	81091a5183	ac: create the LLVM builder in ac_llvm_context_init Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	eb54b8c222	ac: create the LLVM module for Wave32 or Wave64 in ac_llvm_context_init Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	921c1d24d5	ac/rtld: add support for Wave32 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	73aa04e40d	ac: add Wave32 LLVM target machine Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	9e467d111b	ac: initial Wave32 support in LLVM build helpers Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	c35e926a81	radeonsi: assume that selector != NULL for compute shaders Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:48 -04:00
Marek Olšák	bf0f0697a1	radeonsi: remove what appears to be legacy compute code Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:47 -04:00
Marek Olšák	be67a275b5	radeonsi: remove si_program::use_code_object_v2 Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:45 -04:00
Marek Olšák	fd92e65feb	radeonsi: add si_shader_selector into si_compute Now we can assume that shader->selector is always set. This will simplify some code. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:43 -04:00
Marek Olšák	e2c8ff009e	radeonsi: set threadgroup size to 0 for threadgroups with only 1 wave This has no effect on Wave64. Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:39 -04:00
Marek Olšák	a8a526c5cb	radeonsi/gfx10: set as_ngg for GS prolog as_ngg is required by Wave32. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	d3a80f2dda	radeonsi/gfx10: remove the disable_ngg option because legacy VS hangs. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	0f30223cf4	radeonsi/gfx10: combine hw edgeflags with user edgeflags for correct behavior Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	bfaca7259c	radeonsi/gfx10: deduplicate code for esvert_lds_size Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	a6722285c2	radeonsi/gfx10: simplify a streamout loop in gfx10_emit_ngg_epilogue Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	2683347ba0	radeonsi/gfx10: don't use MALLOC for outputs Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	1b4354dab9	radeonsi/gfx10: clean up ESGS ring size computation Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	37db9d2865	radeonsi/gfx10: fix unnecessary LDS overallocation for NGG GS Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	985a59e0d1	radeonsi/gfx10: don't compile the GS copy shader if it's 100% not needed Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	7f0ada3f3e	radeonsi/gfx10: set GE_CTNL.PACKET_TO_ONE_PA for NGG Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	e08463ac22	radeonsi/gfx10: update a tunable max_es_verts_base for NGG We have to fix the computation so as not to break quads. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	79d56e6a4a	radeonsi/gfx10: implement ARB_post_depth_coverage Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	a57f0f8a6b	radeonsi: fix leaked compute shader NIR Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:37 -04:00
Marek Olšák	98377d3450	radeonsi: save the enable_nir option in the shader cache correctly Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:35 -04:00
Marek Olšák	d227b91d2e	radeonsi/gfx10: enable SDMA no changes since gfx9 for buffers Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	47dee97329	ac: use llvm.amdgcn.writelane Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Marek Olšák	39d0c68321	ac: fix shader clock on LLVM 9 Probably relevant commit: commit dd32dc3f72ec99b1794d62c74d2beb3b60468d50 Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> Date: Tue Jul 9 03:10:18 2019 +0000 [AMDGPU] Always use s_memtime for readcyclecounter Differential Revision: https://reviews.llvm.org/D64369 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365431 91177308-0d34-0410-b5e6-96231b3b80d8 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Boyuan Zhang	26099bc35d	radeon/vcn: adding engine type for new fw interface Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:33 -04:00
Marek Olšák	936e9fa951	radeonsi: use the correct buffer size in si_vid_clear_buffer Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 20:16:19 -04:00
Pierre-Eric Pelloux-Prayer	b1efc9d05f	mesa: add EXT_dsa glEnabledIndexedEXT The implementation uses _mesa_ActiveTexture to change the active texture unit and then reset it. It causes an unnecessary _NEW_TEXTURE_STATE but: - adding an index argument to _mesa_set_enable causes a lot of changes (~140 callers) - enable_texture (called by _mesa_set_enable) might cause a _NEW_TEXTURE_STATE anyway. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-19 20:04:07 -04:00
Pierre-Eric Pelloux-Prayer	ff0cafc8f3	mesa: add EXT_dsa glGetTextureLevelParameter*vEXT functions Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-19 20:04:06 -04:00
Pierre-Eric Pelloux-Prayer	5fb9c9d628	mesa: add EXT_dsa gl(Copy)Texture(Sub)Image1D/2D/3DEXT functions Added functions: - glTextureImage1DEXT - glTextureImage2DEXT - glTextureImage3DEXT - glTextureSubImage1DEXT - glTextureSubImage3DEXT - glCopyTextureImage1DEXT - glCopyTextureImage2DEXT - glCopyTextureSubImage1DEXT - glCopyTextureSubImage2DEXT - glCopyTextureSubImage3DEXT - glGetTextureImageEXT All but the last one can be compiled in a display list. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-19 20:04:03 -04:00
Pierre-Eric Pelloux-Prayer	f8ad95c45f	mesa: move lookup_texture_ext_dsa up in teximage.c Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-19 20:04:01 -04:00
Pierre-Eric Pelloux-Prayer	9dd1f7cec0	mesa: pass gl_texture_object as arg to not depend on state This will allow to use the same functions for EXT_dsa implementation. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-19 20:03:57 -04:00
Pierre-Eric Pelloux-Prayer	0d8826f723	mesa: refactor get_texture_image to remove duplicate code Move shared code in a new function (_get_texture_image) and use it instead of duplicating the same lines. Will be also used by the EXT_dsa functions (GetTextureImageEXT and GetMultiTexImageEXT). Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-19 20:03:40 -04:00
Jeremy Newton	666ea30017	pipe-loader: use radeonsi for MM if amdgpu dri is used The amdgpu dri is used for the closed source AMD driver. Since this driver does not implement multimedia, we fall back to radeonsi in mesa to do multimedia. This corrects the dri driver name for when it is set to amdgpu. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1) Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-07-19 19:59:02 -04:00
Eric Engestrom	1a25980c46	egl: drop incorrect pkg-config file for glvnd With `b01524fff0` ("meson: don't build libGLES.so with GLVND") we dropped the incorrect pkg-config files for GLES. Since then, the glvnd issue of its missing files has become painfully apparent, since it break the build for everyone using glvnd. NVIDIA has had a fix for a few years now, but has yet to accept it: https://github.com/NVIDIA/libglvnd/pull/86 Since the breakage is already there, let's clean up everything on our side while we wait for NVIDIA to accept the fix. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-20 00:07:06 +01:00
Eric Engestrom	e8febd6cba	docs: simplify `Fixes:` git command Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-07-19 22:24:28 +00:00
Eric Engestrom	0e34e1a0ce	mesa/tests: add missing dep_thread Fixes: `f8c27c2775` ("state_tracker: Move the format test out to be an actual unit test.") Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Tested-by: Vinson Lee <vlee@freedesktop.org>	2019-07-19 23:03:42 +01:00
Eric Engestrom	6f8b5872ab	util: drop strncat(), strcmp(), strncmp(), snprintf() & vsnprintf() MSVC fallbacks It would seem MSVC>=2015 is now C99-compliant wrt these functions: strncat: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/strncat-strncat-l-wcsncat-wcsncat-l-mbsncat-mbsncat-l?view=vs-2017 strcmp: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/strcmp-wcscmp-mbscmp?view=vs-2017 strncmp: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/strncmp-wcsncmp-mbsncmp-mbsncmp-l?view=vs-2017 snprintf: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/snprintf-snprintf-snprintf-l-snwprintf-snwprintf-l?view=vs-2017 vsnprintf: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/vsnprintf-vsnprintf-vsnprintf-l-vsnwprintf-vsnwprintf-l?view=vs-2017 Suggested-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-19 22:39:38 +01:00
Eric Engestrom	085c3abf27	util: use standard name for vsnprintf() Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-19 22:39:38 +01:00
Eric Engestrom	dffeaa55dd	util: use standard name for snprintf() Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-19 22:39:38 +01:00
Eric Engestrom	00e23cd969	util: use standard name for vasprintf() Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-19 22:39:38 +01:00
Eric Engestrom	59c2dd1b8c	util: use standard name for sprintf() Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-19 22:39:38 +01:00
Eric Engestrom	321d971b08	util: use standard name for strcmp() Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-19 22:39:38 +01:00
Eric Engestrom	7abc739696	util: use standard name for strcasecmp() Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-19 22:39:38 +01:00
Eric Engestrom	88ddb2e186	util: use standard name for strncmp() Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-19 22:39:38 +01:00
Eric Engestrom	27b9eea557	util: use standard name for strncat() Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-19 22:39:38 +01:00
Eric Engestrom	3ba199abd1	util: use standard name for strdup() Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-19 22:39:38 +01:00
Eric Engestrom	09a8a39940	util: use standard name for strchrnul() Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-19 22:39:38 +01:00
Eric Engestrom	367bb55c17	util: drop unused vsprintf() wrapper Suggested-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-19 22:39:38 +01:00
Eric Engestrom	e7db1806af	util: drop unused strchr() wrapper Suggested-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-19 22:39:38 +01:00
Eric Engestrom	84e85035cf	util: drop unused strstr() wrapper Suggested-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-19 22:39:38 +01:00
Jason Ekstrand	6301f80b84	nir: Only rematerialize comparisons with all SSA sources Otherwise, you may end up moving a register read and that could result in an incorrect shader. This commit fixes a rendering issue in Elite: Dangerous. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111152 Fixes: `3ee2e84c60` "nir: Rematerialize compare instructions" Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-07-19 19:45:36 +00:00
Daniel Schürmann	e352b4d650	spirv: Fix order of barriers in SpvOpControlBarrier Semantically, the memory barrier has to come first to wait for the completion of pending memory requests. Afterwards, the workgroups can be synchronized. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-19 10:37:37 -07:00
Caio Marcelo de Oliveira Filho	4061a3f6c9	nir: use a switch when printing intrinsic indices Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-07-19 10:04:52 -07:00
Rhys Perry	e8644122ed	nir/algebraic: mark a few comparison simplifications as precise No vkpipeline-db changes found. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reveiewed-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-07-19 16:33:01 +00:00
Rhys Perry	79801b9d7d	nir/algebraic: optimize contradictory iand operands Some of these were found in a few GTAV, Rise of the Tomb Raider and Shadow of the Tomb Raider shaders. Results from vkpipeline-db run with ACO: Totals from affected shaders: SGPRS: 376 -> 376 (0.00 %) VGPRS: 220 -> 220 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 13492 -> 11560 (-14.32 %) bytes LDS: 6 -> 6 (0.00 %) blocks Max Waves: 69 -> 69 (0.00 %) Wait states: 0 -> 0 (0.00 %) v2: use False instead of 0 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reveiewed-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-07-19 16:33:01 +00:00
Erico Nunes	32ced14bad	lima/ppir: handle all node types in ppir_node_replace_child ppir_node_replace_child is used by the const lowering routine in ppir. All types need to be handled here, otherwise the src node is not updated properly when one of the lowered nodes is a const, which results in, for example, regalloc not assigning registers correctly. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-07-19 16:01:45 +00:00
Erico Nunes	2292f0c4b5	lima/ppir: branch regalloc fixes The branch instruction has sources which must be handled in src handling paths so that regalloc assigns registers to them properly. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-07-19 16:01:45 +00:00
Yevhenii Kolesnikov	32b72cbca5	main: Destroy static hash table format_array_format_table has a static lifetime - it will be destroyed by an atexit handler. Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-19 11:22:55 +03:00
Dave Airlie	248161123c	radv: reset the window scissor with no clear state. If we don't have clear state (which gfx10 doesn't currently) we will fix to reset the scissor. AMDVLK will leave it set to something else. Marek also has this fix for radeonsi pending. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 11:00:44 +10:00
Dave Airlie	2ac2b98780	radv: fix crash in shader tracing. Enabling tracing, and then having a vmfault, can leads to a segfault before we print out the traces, as if a meta shader is executing and we don't have the NIR for it. Just pass the stage and give back a default. Fixes: `9b9ccee4d6` ("radv: take LDS into account for compute shader occupancy stats") Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-19 11:00:25 +10:00
Timothy Arceri	80c2c17e1e	iris: change last_vue_stage() to look at uncompiled shaders This allows us to find the last vue stage before we have compiled the shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-19 09:25:47 +10:00
Timothy Arceri	30038dd5ec	nir/lower_clip: add support for geometry shaders This will be used to enabled compat profile support for geometry shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-19 09:25:47 +10:00
Timothy Arceri	4b08bb4770	nir/lower_clip: add lower_clip_outputs() helper This will be reused in the following patch to add support for clip vertex lowering in geometry shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-19 09:25:47 +10:00
Timothy Arceri	a59926b3ca	nir/lower_clip: add create_clipdist_vars() helper Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-19 09:25:47 +10:00
Timothy Arceri	e38b930876	nir/lower_clip: add a find_clipvertex_and_position_outputs() helper This will allow code sharing in a following patch that adds support for lowering in geometry shaders. It also allows us to exit early if there is no lowering to do which allows a small code tidy up. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-19 09:25:47 +10:00
Alyssa Rosenzweig	0395b58c92	panfrost: Set rt_count This doesn't quite work yet, but it illustrates how MRT is implemented in the MFBD: rt_count is set appropriately based on the number of render targets, while additional render target descriptors are appended on with an index variable in them (not quite decoded since there's some aspects we don't understand there, but conceptually this should be right). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig	871ad7789f	panfrost: Trace invisible BOs Helps make the decode a little more readable (names instead of addresses). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig	17752bae8e	panfrost/decode: Preserve empty tiler heap symmetry If tiler_heap_end == tiler_heap_start, ensure it's printed the same rather than one erroring out as hex. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig	e797caa0dd	panfrost: Zero polygon list body size for clears There's no polygons, so you can't have any size to the polygon list, although there is a minimal header. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig	f475b79980	panfrost/mfbd: Unify depth-only with masked FBO path Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig	629c7366a7	panfrost: Simplify set_framebuffer_state Most of the ad hoc logic is already in Gallium. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig	227c395c00	panfrost: Check for NULL surface in places Fixes a bunch of NULL dereferences, although it does cause GPU faults of course. This is caused by color buffers masked out in MRT, which we'll eventually have to solve the right way... one thing at a time. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig	79b13b4376	panfrost: Expose 4 render targets Hidden behind deqp flag as usual. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 15:25:40 -07:00
Alyssa Rosenzweig	d56f92502e	panfrost: Shrink tiler heap 128MB is excessive and 16MB is still plenty. Saves 112MB/context on kernels without growable/heap support. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 15:25:16 -07:00
Caio Marcelo de Oliveira Filho	b6d4753568	nir/large_constants: De-duplicate constants If a function has a constant and is called more than once, after inlining we may end up with different variables representing the same constant. This commit look into the data and de-duplicate them. The first pass now will collect the constant data in a per variable buffer, then de-duplication happens (by sorting then linear walk), and the second pass will use the data in var->data.location. One side-effect of the current implementation is that constants will be reordered. If this turns out to be a problem is something that can be fixed. An alternative strategy considered was to perform this in a per-function basis and then merge the results, the problem is that we would have to fix up the offsets during the merge. Given the data we have, the current patch is good enough. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-18 12:24:24 -07:00
Caio Marcelo de Oliveira Filho	d9b67ad079	nir/large_constants: Use ralloc for var_infos This will be used later on to allocate constant data for each variable (and then deduplicate). Also drop initializing found_read, as it is already implicitly false in the literal. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-18 12:24:24 -07:00
Eric Anholt	0d8a4c67cf	freedreno: Convert nir_lower_tg4_to_tex to the NIR lowering helper. Cuts a bunch of boilerplate. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-07-18 11:28:56 -07:00
Eric Anholt	56f4ede73d	freedreno: Convert load_barycentric_at_sample to the NIR lowering helper. Cuts out a ton of boilerplate. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-07-18 11:28:56 -07:00
Eric Anholt	61098baf42	freedreno: Convert load_barycentric_at_offset to the NIR lowering helper. Cuts out a ton of boilerplate. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-07-18 11:28:56 -07:00
Eric Anholt	cdc359c58e	v3d: Use nir_shader_lower_instructions() for txf_ms lowering. Cuts out a bunch of boilerplate. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-07-18 11:28:56 -07:00
Eric Anholt	251c64a53d	nir: Allow internal changes to the instr in nir_shader_lower_instructions(). v3d's NIR txf_ms lowering wants to swizzle around the input coordinates in NIR, but doesn't generate a new txf_ms instructions as replacement. It's pretty easy to allow that in nir_shader_lower_instructions, and it may be common in lowering passes. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-18 11:28:56 -07:00
Eric Anholt	c0640035fb	vc4: Convert vc4_nir_lower_txf_ms to nir_shader_lower_instructions(). Cuts out a bunch of boilerplate. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-07-18 11:28:56 -07:00
Eric Anholt	40e7609603	v3d: Fix assertion failures in debug builds. nir_lower_io leaves around deref_var instructions after lowering away deref intrinsics. This ends up breaking validation after v3d_nir_lower_io removes variables not actually being stored by the shader's store_output()s. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-07-18 11:28:56 -07:00
Alyssa Rosenzweig	1bced0fad2	panfrost: Handle Z24 textures Just use the Z32 code. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig	f29c084960	panfrost/ci: Update expectations We just fixed some stencil tests. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig	fad76470d5	panfrost: Make scissor test more robust See v3d implementation. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig	5c554e235d	panfrost: Use correct NO_DITHER field on MFBD Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig	676b9339dd	panfrost: Implement Z32F(_S8) support Z32F uses a dediacted float path. Z32F_S8 uses separate stencil planes in the hardware, lowered via u_transfer_helper. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig	479185a1cd	panfrost/decode: Don't disassemble NULL shaders It is legal to load a shader from a NULL address, particularly when the TILER job is used strictly for effects on the Z/S buffer with 0x0 color mask. Don't crash the decoder in this case. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 10:42:43 -07:00
Alyssa Rosenzweig	65d89097b8	panfrost: Copy stencil front to back if back disabled When backside stenciling is disabled, backfacing primitives just do the same thing as frontfacing primitives. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-18 10:42:43 -07:00
Jan Zielinski	6f7306c029	swr/rast: Refactor memory API between rasterizer core and swr This commit cleans up API between the core of the rasterizer and swr. Some formatting changes are also done. Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-07-18 16:17:00 +02:00
Andreas Baierl	4627a0c4eb	lima/ppir: Add gl_PointCoord handling Treat gl_PointCoord as a system value and add the necessary bits for correct codegen. Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 13:20:39 +00:00
Andreas Baierl	3523233027	gallium: Add PIPE_CAP_TGSI_FS_POINT_IS_SYSVAL This adds an option to treat gl_PointCoord as a system value. Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 13:20:39 +00:00
Andreas Baierl	3349a60f6f	nir/tgsi: Extend tgsi_to_nir.c to support gl_PointCoord as a system value. Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 13:20:39 +00:00
Andreas Baierl	f5804f1768	nir: Add gl_PointCoord system value gl_PointCoord handling needs some special bits set in lima/ppir code generation. Treating gl_PointCoord as a system value makes it easier to distinguish from a regular varying. Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 13:20:39 +00:00
Andreas Baierl	24af57407c	glsl: Optionally declare gl_PointCoord as a system value Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 13:20:39 +00:00
Connor Abbott	b178fdf486	lima/gp: Fix problem with complex moves When writing the scheduler, we forgot that you can't read the complex unit in certain sources because it gets overwritten to 0 or 1. Fixing this turned out to be possible without giving up and reducing GPIR_VALUE_REG_NUM to 10, although it was difficult in a way I didn't expect. There can be at most 4 next-max nodes that can't have moves scheduled in the complex slot, so it actually isn't a problem for getting the number of next-max nodes at 5 or lower. However, it is a problem for stores. If a given node is a next-max node whose move cannot go in the complex slot and is used by a store that we decide to schedule, we have to reserve one of the non-complex slots for a move instead of all the slots, or we can wind up in a situation where only the complex slot is free and we fail the move. This means that we have to add another term to the reservation logic, for stores whose children cannot be in the complex slot. Acked-by: Qiang Yu <yuq825@gmail.com>	2019-07-18 14:33:23 +02:00
Connor Abbott	54434fe670	lima/gpir: Rework the scheduler Now, we do scheduling at the same time as value register allocation. The ready list now acts similarly to the array of registers in value_regalloc, keeping us from running out of slots. Before this, the value register allocator wasn't aware of the scheduling constraints of the actual machine, which meant that it sometimes chose the wrong false dependencies to insert. Now, we assign value registers at the same time as we actually schedule instructions, making its choices reflect reality much better. It was also conservative in some cases where the new scheme doesn't have to be. For example, in something like: 1 = ld_att 2 = ld_uni 3 = add 1, 2 It's possible that one of 1 and 2 can't be scheduled in the same instruction as 3, meaning that a move needs to be inserted, so the value register allocator needs to assume that this sequence requires two registers. But when actually scheduling, we could discover that 1, 2, and 3 can all be scheduled together, so that they only require one register. The new scheduler speculatively inserts the instruction under consideration, as well as all of its child load instructions, and then counts the number of live value registers after all is said and done. This lets us be more aggressive with scheduling when we're close to the limit. With the new scheduler, the kmscube vertex shader is now scheduled in 40 instructions, versus 66 before. Acked-by: Qiang Yu <yuq825@gmail.com>	2019-07-18 14:33:23 +02:00
Connor Abbott	12645e8714	lima/gp: Mark more add-only nodes as maybe-two-slot Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-07-18 14:33:23 +02:00
Connor Abbott	16de3dd7a6	lima/gpir: Fix some bugs in instruction handling Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-07-18 14:33:23 +02:00
Connor Abbott	cc78a42577	lima: Reintroduce the standalone compiler I used this to test things without needing to have a device handy. Acked-by: Qiang Yu <yuq825@gmail.com>	2019-07-18 14:33:23 +02:00
Connor Abbott	4423552ff0	nir/lower_viewport: Check variable mode first The location is unused for shader_temp and function_temp variables, and due to the way we nir_lower_io_to_temproraries demotes shader_out variables to shader_temp variables, it happened to equal VARYING_SLOT_POS for the gl_Position temporary, which made this pass fail with the offline compiler due to this coming before vars_to_ssa. Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-07-18 14:21:41 +02:00
Samuel Pitoiset	6e5e4bf050	radv/gfx10: set BREAK_WAVE_AT_EOI if TES or GS enable the primitive ID Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-18 10:37:10 +02:00
Samuel Pitoiset	8c692ff512	radv/gfx10: move emitting VGT_PRIMITIVEID_EN into the NGG path And do not emit VGT_GS_MODE which is unnecessary on GFX10. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-18 10:36:38 +02:00
Samuel Pitoiset	8315dbe419	radv/gfx10: do not always execute a barrier before the second shader With NGG, empty waves may still be required to export data. This fixes dEQP-VK.ycbcr.format._unorm.geometry_. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-18 10:06:34 +02:00
Samuel Pitoiset	63d670e350	radv: fix VGT_GS_MODE if VS uses the primitive ID Found by inspection. Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-18 10:03:12 +02:00
Iago Toral Quiroga	c23fa1ca07	v3d: emit correct lowering for logic operations with MSAA render targets v2: - Drop the writemask from the per-sample color intrinsic (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 08:59:35 +02:00
Iago Toral Quiroga	93d05c1c1f	v3d: handle nir_intrinsic_store_tlb_sample_color_v3d v2: - Move handling of output intrinsics to ntq_emit_intrinsic() (Eric). Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 08:59:35 +02:00
Iago Toral Quiroga	50016d7718	nir: add a V3D-specific intrinsic for per-sample color writes For per-sample color writes we need the output intrinsic to pack the sample index, which is not provided with regular store_output intrinsics unless we figured out a way to encode it into the base or the offset. v2: - Drop the writemask (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 08:59:35 +02:00
Iago Toral Quiroga	ba520b00c4	v3d: implement per-sample tlb color writes Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 08:59:35 +02:00
Iago Toral Quiroga	b96c2219ca	v3d: refactor the tlb color write code We want to split the tlb specifier setup from the color writes, because when we implement per-sample color writes we want to do the latter for all the samples, but the former only once. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 08:59:35 +02:00
Iago Toral Quiroga	fd3ec6f55d	v3d: move tlb color write emission to a helper function We will soon be adding per-sample color writes which means additional complexity and more indentation (we will need another loop to emit the writes for each individual sample), so this will help keeping things simple and a bit more readable. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 08:59:35 +02:00
Iago Toral Quiroga	0c9919710e	v3d: implement per-sample tlb color reads Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-18 08:59:35 +02:00
Lionel Landwerlin	3adc32df92	anv: fix format mapping for depth/stencil formats anv_format is supposed to have a pointer back to the associated VkFormat, we were missed this for depth/stencil formats. This doesn't fix anything afaict, but will be needed for future changes. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `465de47bad` ("anv: associate vulkan formats with aspects") Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-18 09:40:01 +03:00
Dave Airlie	a68f593a0e	radv: put back VGT_FLUSH at ring init on gfx10 I can find no evidence that removing this is a good idea. Fixes: `9b116173b6` ("radv: do not emit VGT_FLUSH on GFX10") Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-18 16:24:44 +10:00
Gert Wollny	45951452aa	softpipe: Clamp border colors when needed unorm and snorm require that the border color values are clamped, so when picking the sampler view copy/clamp the border color from the sampler and use these adjusted values. Fixes: dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_compressed_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_snorm_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_srgb_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.linear_unorm_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_compressed_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_snorm_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_srgb_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_color dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_depth dEQP-GLES31.functional.texture.border_clamp.range_clamp.nearest_unorm_depth_uint_stencil_sample_depth Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-18 05:49:00 +02:00
Gert Wollny	230b99ce2f	softpipe: set a lower minimum clamp value for texture coordinate border clamp The value of -0.5f is not small enough to produce negative coordinates, so lower the minimum clamp value to -1.0f. This fixes a number of tests from dEQP-GLES31.functional.texture.border_clamp.* Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-18 05:47:23 +02:00
Gert Wollny	eae4c6df8d	softpipe: Correct repeat-mirror evaluation when mirroring the texture corrdinates the indices must be mirrored as well and the half pixel shift must be applied in reverse. Fixes a number of tests from: dEQP-GLES31.functional.texture.gather.offset.* dEQP-GLES31.functional.texture.gather.offsets.* Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-18 05:47:23 +02:00
Gert Wollny	fff624fca4	softpipe: Also mark textures as dirty when updating the framebuffer state At this point all the draw caches are flushed to the old attached textures, so the read caches of these textures will need to be updated too. Fixes: dEQP-GLES3.functional.fbo.color.repeated_clear.sample.tex2d.* Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-18 05:33:59 +02:00
Jonathan Marek	08514a9721	etnaviv: set DITHER_MODE This fixes a rendering glitch observed in SDL testscale test, where alpha blending samples with value (1.0, 1.0, 1.0, 0.0) whitens the target instead of having no effect. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-17 23:07:50 -04:00
Jonathan Marek	aaf0c47c76	etnaviv: update headers from rnndb Update to etna_viv commit a16a418. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-17 23:07:50 -04:00
Jonathan Marek	76adf041f2	etnaviv: fix blend color on newer GPUs Newer GPUs use the half float ALPHA_COLOR_EXT register. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-17 23:07:50 -04:00
Jonathan Marek	5f73726013	etnaviv: fix alpha blending cases We need to check rgb_func/alpha_func when determining if blend or separate alpha is required. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-17 23:07:35 -04:00
Jonathan Marek	6c3c05dc38	etnaviv: fix polygon offset Dividing the fui result by 65535 is obviously wrong, and from testing, on GC7000L at least there is no division by 65535. Fixes dEQP-GLES2.functional.polygon_offset.fixed16_displacement_with_units Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-17 23:07:07 -04:00
Timothy Arceri	a20a9d0c5e	radv: dont store disasm string unless keep_shader_info flag set This fixes the memory use regression from bug 111107. Fixes: `726a31df70` ("radv: Add the concept of radv shader binaries.") Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111107	2019-07-18 00:25:55 +00:00
Dave Airlie	82a2f10529	radv/gfx10: set the pgm rsrc3/4 regs using index sh reg set This is ported from AMDVLK, it's probably not requires unless we want to use "real time queues", but it might be nice to just have in place. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-18 10:24:26 +10:00
Dave Airlie	de524b2c37	radv: use correct register setter for ngg hw addr this shouldn't matter, but it's good to be correct. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-18 10:17:37 +10:00
Eric Anholt	9689407c54	freedreno/a6xx: Drop the WFI in the program update stateobj. Rob Clark thinks this was likely a workaround for our const buffer update bugs, and now that it's passing tests, we should be able to drop it. renderdoc-traces results: traces/android/clashofclans.rdc: +6.1% +/- 1.1% traces/android/candycrush.rdc: +5.2% +/- 1.6% Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-07-17 16:20:12 -07:00
Eric Anholt	2170822603	freedreno/a6xx: Drop the WFI in constant uploads. Now that the bin vs render constlen is fixed, we can skip these waits. Improves webgl aquarium performance at 10k fish from 27fps to 33. Some highlights from renderdoc-traces: traces/android/minecraft.rdc: +17.1% +/- 3.4% traces/glmark2/ideas-speed=duration.rdc: +11.6% +/- 2.4% traces/android/candycrush.rdc: +5.4% +/- 1.1% traces/android/clashofclans.rdc: +4.4% +/- 1.3% Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-07-17 16:20:12 -07:00
Eric Anholt	85bbdaff6c	freedreno: Assert that we don't exceed constlen. We actually could go up to vs->constlen in the binning shader on a6xx, but for sanity let's make sure that we're always under constlen. This would have caught the bug fixed in `572c76fd88` ("freedreno: Clamp UBO uploads to the constlen decided by the shader.") Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-07-17 16:20:12 -07:00
Eric Anholt	bc50ecfa7a	freedreno: Fix more constlen overflows. Fixes constlen overflow in dEQP-GLES31.functional.shaders.builtin_var.compute.num_work_groups and dEQP-GLES31.functional.image_load_store.buffer.image_size.readonly_32 and probably others. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-07-17 16:20:12 -07:00
Eric Anholt	b9f7f3e497	freedreno: Drop stale comment about skipping uploads. We already skip the upload if it's unused, due to the constlen > offset check. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-07-17 16:20:12 -07:00
Lepton Wu	6109df58e4	virgl: Set meta data for textures from handle. The set of meta data was removed by commit `8083464`. It broke lots of dEQP tests when running with pbuffer surface type. Fixes: `8083464013` ("virgl: remove dead code") Signed-off-by: Lepton Wu <lepton@chromium.org> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-07-17 16:17:48 -07:00
Bas Nieuwenhuizen	f1a8967344	radv: Only save the descriptor set if we have one. After reset, if valid does not contain the relevant bit the descriptor can be != NULL but still not be valid. CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-18 00:49:43 +02:00
Lionel Landwerlin	ce4c5474af	anv: report timestampComputeAndGraphics true Spec says : "timestampComputeAndGraphics specifies support for timestamps on all graphics and compute queues. If this limit is set to VK_TRUE, all queues that advertise the VK_QUEUE_GRAPHICS_BIT or VK_QUEUE_COMPUTE_BIT in the VkQueueFamilyProperties::queueFlags support VkQueueFamilyProperties::timestampValidBits of at least 36." On gen7+ this should be true (we only have 32bits of timestamp on gen6 and below). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `802f00219a` ("anv/device: Update features and limits") Reported-by: Timothy Strelchun <timothy.strelchun@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-17 22:46:58 +00:00
Rafael Antognolli	393f659ed8	iris: Enable fast clears on other miplevels and layers than 0. Until now we only supported fast clear colors on the first miplevel and layer. The main reason for it is that we can't have different fast clear values at different levels/layers, since the surface state only supports one clear value. We can, however, enable it if we make sure we only use the same value for all levels/layers, and if one of them changes, we resolve all the others. We already do that for depth fast clears so hopefully it will be fine for color fast clears too. v2: Add check for partial clear too (Ken). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-17 14:53:37 -07:00
Rafael Antognolli	8bbd4f32bf	iris: Allow resolving clear color of CCS_D surfaces. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-17 14:53:16 -07:00
Kenneth Graunke	df4c2ec5e1	iris: Make iris_has_color_unresolved non-static We want to use this in the transfer code and possibly for fast clears.	2019-07-17 13:43:04 -07:00
Andreas Bergmeier	f92290a8d9	broadcom: Move v3d_get_device_info to common In common we can use implementation for Vulkan.	2019-07-17 20:02:34 +00:00
Caio Marcelo de Oliveira Filho	891a232214	nir/large_constants: Use dominance information to find more constants Relax the restriction that all the writes need to be in the first block: now accept variables that have all the writes in the same block, and all the reads are dominated by that block. This let the pass identify large constants that are local to a helper function. The writes will be at the place that the function is inlined, possibly not in the first block (but still all in the same block). Results for vkpipeline-db in SKL: total instructions in shared programs: 3624891 -> 3623145 (-0.05%) instructions in affected programs: 79416 -> 77670 (-2.20%) helped: 16 HURT: 0 total cycles in shared programs: 1458149667 -> 1458147273 (<.01%) cycles in affected programs: 30154164 -> 30151770 (<.01%) helped: 14 HURT: 2 total loops in shared programs: 2437 -> 2437 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 8813 -> 8745 (-0.77%) spills in affected programs: 2894 -> 2826 (-2.35%) helped: 8 HURT: 0 total fills in shared programs: 23470 -> 23392 (-0.33%) fills in affected programs: 12248 -> 12170 (-0.64%) helped: 6 HURT: 2 LOST: 0 GAINED: 0 Results for shader-db in SKL with Iris: total instructions in shared programs: 15379442 -> 15379392 (<.01%) instructions in affected programs: 837 -> 787 (-5.97%) helped: 2 HURT: 2 helped stats (abs) min: 27 max: 27 x̄: 27.00 x̃: 27 helped stats (rel) min: 10.47% max: 10.67% x̄: 10.57% x̃: 10.57% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 1.23% max: 1.23% x̄: 1.23% x̃: 1.23% 95% mean confidence interval for instructions value: -39.14 14.14 95% mean confidence interval for instructions %-change: -15.51% 6.17% Inconclusive result (value mean confidence interval includes 0). total loops in shared programs: 4880 -> 4880 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 370677237 -> 370676567 (<.01%) cycles in affected programs: 17852 -> 17182 (-3.75%) helped: 2 HURT: 1 helped stats (abs) min: 338 max: 356 x̄: 347.00 x̃: 347 helped stats (rel) min: 13.98% max: 14.64% x̄: 14.31% x̃: 14.31% HURT stats (abs) min: 24 max: 24 x̄: 24.00 x̃: 24 HURT stats (rel) min: 0.18% max: 0.18% x̄: 0.18% x̃: 0.18% total spills in shared programs: 11772 -> 11772 (0.00%) spills in affected programs: 0 -> 0 helped: 0 HURT: 0 total fills in shared programs: 24948 -> 24948 (0.00%) fills in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 0 GAINED: 0 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-17 12:50:32 -07:00
Jason Ekstrand	7ceec21b76	intel/fs: Use a strided MOV instead of a conversion for load_* destinations In many cases, the compiler can just copy-prop the strided MOV whereas the conversion is a bit trickier. This cuts 5% of the instructions off of one particular Vulkan CTS test which does lots of load_ssbo. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-17 18:44:35 +00:00
Jason Ekstrand	812b341578	nir/algebraic: Optimize comparisons and up-casts These seem like obvious enough optimizations in the world of multiple integer bit sizes. The only known thing which hits these at the moment is some Vulkan CTS tests for 16-bit SSBO values which like to up-cast and check for equality. However, it's something that's bound to come up as we start seeing more integers in shaders. The optimizations of comparisons of casted values with constants are something which we would ideally do with range analysis. However, lacking that, we can do it in opt_algebraic as long as one side is a constant. In dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13, this commit, along with the previous commit, reduce the number of instructions emitted on Skylake from 55328 to 44546, a reduction of 20%. Acked-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-07-17 18:44:35 +00:00
Jason Ekstrand	e8505e982a	nir/algebraic: Optimize comparing unpacked values We could, in theory, add the same optimization for 64-bit unpack operations but that's likely to fight with 64-bit integer lowering on platforms which require it so it will require more infrastructure before that will be a good idea. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-17 18:44:35 +00:00
Jason Ekstrand	9fed031e4e	nir/algebraic: Print out the list of transforms in the C file This helps greatly when debugging algebraic transform generators because you can now actually see the output and verify that your transforms are getting generated. Acked-by: Matt Turner <mattst88@gmail.com>	2019-07-17 18:44:35 +00:00
Jason Ekstrand	68a4c796d5	intel/fs: Properly stride NULL replacement regs in DCE This fixes some validation errors generated by certain D->W conversions but is likely not a full solution. Calculating an actual register stride is a far more complex problem in general and should probably be handled by the brw_fs_generator. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-17 18:44:35 +00:00
Eric Anholt	28a808a11b	nir: Fix nir_lower_alu_to_scalar's instr filtering. It was checking if the dest or src[0] SSA values were vectors, rather than whether the ALU op was using the source as a vector resulting in a nir_fdot4 making it through to vc4 and v3d: vec1 32 ssa_6 = fdot4 ssa_4.xxxx, ssa_5 Fixes: `c1cffa4249` ("nir/alu_to_scalar: Use the new NIR lowering framework") v2: Use Jason's recommendation to look at input_sizes. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-17 10:30:43 -07:00
Alyssa Rosenzweig	a301250ece	panfrost: Merge varyings_mem into transient buffers Theoretically we would like these split since varyings can have specially optimized flags (no map, coherent local). For now, since neither of these flags is particularly meaningful right now, merge them together instead of special casing varyings_mem. Saves upwards of 64MB of RAM per context. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-17 09:16:37 -07:00
Lionel Landwerlin	6f880f128f	vulkan/wsi: update swapchain status on vkQueuePresent With the following chain of events : vkQueuePresent() <- Surface resize vkQueuePresent() We should be able to report SUBOPTIMAL or OUT_OF_DATE on the second vkQueuePresent() call. Currently we only look at X11 events in the vkAcquireNextImage() path so we're not able to report this. This change checks the queue of events and process any available ones to update the swapchain status. v2: Be consistent about reporting the current error state of the swapchain (Jason) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111097 Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-17 17:40:54 +03:00
Samuel Pitoiset	24b1b1f574	radv: add an option for disabling NGG on GFX10 Will be useful for testing the legacy path. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-17 15:43:36 +02:00
Erik Faye-Lund	d59c961af9	softpipe: pass stream-out targets to draw-module early This is essensially a port of `ed53e61bec` from LLVMpipe to softpipe, as it makes things a bit simpler and more performant. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-By: Gert Wollny <gert.wollny@collabora.com>	2019-07-17 10:43:06 +00:00
Alejandro Piñeiro	5a84960072	spirv_extensions: i965: initialize SPIR-V extensions v2: Rebase update after changes on previous patches. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-17 10:47:27 +02:00
Alejandro Piñeiro	6ed19dcf80	spirv_extensions: add spirv_supported_extensions on gl_constants We can use it to get real values for ARB_spirv_extensions methods. Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Signed-off-by: Arcady Goldmints-Orlov <agoldmints@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-17 10:45:58 +02:00
Alejandro Piñeiro	f6da2a5508	spirv_extensions: define spirv_extensions_supported Add a struct to maintain which SPIR-V extensions are supported, and an utility method to initialize it based on nir_spirv_supported_capabilities. v2: * Fixing code style (Ian Romanick) * Adding a prefix (spirv) to fill_supported_spirv_extensions (Ian Romanick) v3: rebase update (nir_spirv_supported_extensions renamed) v4: include AMD_gcn_shader support v5: move spirv_fill_supported_spirv_extensions to src/mesa/main/spirv_extensions.c Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Signed-off-by: Arcady Goldmints-Orlov <agoldmints@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-17 10:45:32 +02:00
Alejandro Piñeiro	06e5daf575	spirv_extensions: add list of extensions and to_string method Ideally this should be generated somehow. One option would be gather all the extension dependencies listed on the core grammar, but there would be the possibility of not including some of the extensions. Note that spirv-tools is doing it just slightly better, as it has a hardcoded list of extensions manually took from the registry, that they parse to get the enum and the to_string method (see generate_grammar_tables.py). v2: * Use a macro to improve readability. (Tapani Pälli) * Add unreachable on the switch, no default (Eric Engestrom) * No typedef enum (Ian Romanick) * Sort extensions names (Ian Romanick) * Don't add extensions unlikely to be supported by Mesa at any point (Ian Romanick) v3: rebase update v4: Include AMD_gcn_shader v5: move spirv_extensions_to_string to src/mesa/main/spirv_extensions.c Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Signed-off-by: Arcady Goldmints-Orlov <agoldmints@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-17 10:44:33 +02:00
Alejandro Piñeiro	a622aad869	spirv_extensions: add GL_ARB_spirv_extensions boilerplate v2: * Mention extension gap at gl_API.xml (Emil Velikov) * Bail with INVALID_ENUM if extension not available on getStringi (Emil Velikov) * Use EXTRA_EXT macro when defining the extension at get.c/get_hash_params.py (Emil Velikov) * Rename source files (spirvextensions.[ch] -> spirv_extensions.[ch]) (Ian) v3: * Fix GL_PROGRAM_BINARY_FORMATS glGet query, broken by error on a previous rebase v4: * Fix rebase conflicts on getstring.c after GL_SHADING_LANGUAGE_VERSION query was added v5: * Remove src/mapi/glapi/gen/Makefile.am as it no longer exists in master Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Signed-off-by: Arcady Goldmints-Orlov <agoldmints@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-17 10:41:44 +02:00
Samuel Pitoiset	07ff367442	radv/gfx10: implement VK_EXT_post_depth_coverage I did implement this extension a while ago but it didn't work on pre GFX10 for some reasons. Now all CTS pass. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-17 08:32:39 +02:00
Samuel Pitoiset	ed53d2c4be	radv/gfx10: disable the TC compat zrange workaround Unnecessary. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-17 08:32:36 +02:00
Samuel Pitoiset	edf1af696f	radv/gfx10: fallback to the legacy path if tess and extreme geometry This is unsupported and hangs. This fixes GPU hangs with dEQP-VK.tessellation.geometry_interaction.limits.output_required_*. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-17 08:32:33 +02:00
Samuel Pitoiset	ae4b1fc095	radv/gfx10: always build the GS copy shader but uses it on-demand It should be possible to build it on-demand too but it requires more work. On GFX10, the GS copy shader is required when tess is enabled with extreme geometry. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-17 08:32:30 +02:00
Gert Wollny	9c611fb381	softpipe: Remove unused static function Thanks to Eric Engestrom for pointing out that there was something wrong with that function. Fixes: `724a73509e` softpipe: Prepare handling explicit gradients Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-17 04:52:27 +00:00
Caio Marcelo de Oliveira Filho	e2939dc5a1	spirv: Bail when we see CounterBuffer decoration This decoration can be ignored, so we can just skip the next steps. Otherwise we'd have to also handle it in apply_var_decoration. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-16 20:31:12 -07:00
Kenneth Graunke	1d5ee31553	iris: Drop copy and pasted iris_timebase_scale Lionel moved brw_timebase_scale to gen_device_info_timebase_scale a few months ago, so we should just use that, and not our own copy in iris.	2019-07-16 17:22:48 -07:00
Jason Ekstrand	6fb685fe4b	nir/regs_to_ssa: Handle regs in phi sources properly Sources of phi instructions act as if they occur at the very end of the predecessor block not the block in which the phi lives. In order to handle them correctly, we have to skip phi sources on the normal instruction walk and handle them as a separate walk over the successor phis. While registers in phi instructions is a bit of an oddity it can happen when we temporarily go out-of-SSA for control-flow manipulations. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111075 Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-16 23:28:03 +00:00
Jason Ekstrand	6394680f6b	spirv: Add a warning for ArrayStride on arrays of blocks It's disallowed according to the SPIR-V spec or at least I think that's what the spec says. It's in a section explicitly about explicit layout of things in the StorageBuffer, Uniform, and PushConstant storage classes so it's not 100% clear that it applies with other storage classes. However, it seems like it should apply in general and violating it can trigger (fairly harmless) asserts in NIR. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-16 17:02:08 -05:00
Caio Marcelo de Oliveira Filho	f07f516c56	anv: Increase state allocation size limit to 2MB When running on ICL the dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 needs more than 1M for the shader, so bump it. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-16 14:17:52 -07:00
Yevhenii Kolesnikov	3853871ef8	meta: leaking of BO with DrawPixels ctx->Unpack.BufferObj wasn't unreferenced. Fixes: `d492e7b017` (meta: Fix invalid PBO access from DrawPixels when trying to just alloc.) CC: Eric Anholt <eric@anholt.net> Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-16 20:06:56 +00:00
Eric Anholt	e8360a64e4	swrast: Move _mesa_format_pack_colormask() to the only caller. This avoids needing format_pack to have access to the GLenum return functions for mesa_format. It seems like an odd function and unlikely to be reused. Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-07-16 12:51:13 -07:00
Eric Anholt	4d23157a8b	mesa: Give _mesa_format_get_color_encoding a clearer name. It only returned one of two values. Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-07-16 12:51:13 -07:00
Eric Anholt	35e2d31ba4	mesa: Drop redundant checks for sRGB before sRGB to linear conversion. _mesa_get_srgb_format_linear() just returns the original format if it wasn't sRGB. Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-07-16 12:51:13 -07:00
Eric Anholt	ece03848c2	mesa: Fold _mesa_unpack_depth_stencil_row() into its only caller. This was the last bit of gl.h usage in format packing. Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-07-16 12:51:13 -07:00
Eric Anholt	5956b46e16	mesa: Convert format_pack/unpack off of GL types. Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-07-16 12:51:13 -07:00
Eric Anholt	3e186af5e2	mesa: Port format_pack/unpack off of _mesa_problem(). unreachable() should be plenty of debug for these. Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-07-16 12:51:13 -07:00
Eric Anholt	c6a0a976a4	mesa: Mostly switch Mesa format info off of GL types other than GLenum. I'm considering moving most of this code to src/util/, and I want that code to not expose GL types in its interfaces. Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-07-16 12:51:13 -07:00
Eric Anholt	93a7651d8d	mesa: Rename gl_pack typedefs to mesa_pack. These are packing mesa formats, not a GL format/type. Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-07-16 12:51:13 -07:00
Eric Anholt	20ce56ad5b	mesa: Rename gl_format_info to mesa_format_info. It's about MESA_FORMATs, after all. Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-07-16 12:51:13 -07:00
Eric Anholt	f8c27c2775	state_tracker: Move the format test out to be an actual unit test. We want errors in the table to show up as unit test failures in MRs. Also keeps unit test code out of the built drivers. Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-07-16 12:51:13 -07:00
Eric Anholt	9eccae671e	u_format: Remove pointless comments. Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-07-16 12:51:13 -07:00
Eric Anholt	628f55717b	src/util: Switch _mesa_half_to_float() to u_half.h's version. The two implementations differ across the entire input range only in that u_half.h preserves mantissa bits for NaNs. The u_half.h version shaves 15% off of the text size of half_float.o. Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-07-16 12:51:13 -07:00
Eric Anholt	bb5801ad98	u_half_test: Turn it into an actual unit test. You could break the test and meson test wouldn't complain, since we returned success either way. Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-07-16 12:51:13 -07:00
Mauro Rossi	3630988b1d	android: radv/gfx10: generate gfx10_format_table.h This patch adds the missing building rules for Android, to avoid following building errors: In file included from external/mesa/src/amd/vulkan/radv_debug.c:35: In file included from external/mesa/src/amd/vulkan/radv_debug.h:27: external/mesa/src/amd/vulkan/radv_private.h:95:10: fatal error: 'gfx10_format_table.h' file not found ^~~~~~~~~~~~~~~~~~~~~~ 1 error generated. In file included from external/mesa/src/amd/vulkan/radv_android.c:31: external/mesa/src/amd/vulkan/radv_private.h:95:10: fatal error: 'gfx10_format_table.h' file not found ^~~~~~~~~~~~~~~~~~~~~~ 1 error generated. Fixes: `3dc5ec5d16` ("radv/gfx10: generate gfx10_format_table.h") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-16 21:31:24 +02:00
Rob Clark	856e84083e	mesa/st: add sampler uniforms Add sampler uniforms for the UV plane(s), so driver can count the uniforms and get the correct sampler count. Fixes lowered YUV on a6xx which actually wants to know # of samplers. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-16 18:14:44 +00:00
Rob Clark	a9f34b5631	egl/android: handle multi-fd native windows We can hit multi-fd EGL_NATIVE_BUFFER_ANDROID case when the native android buffer is YUV. So we need to handle that. Currently this went unnoticed because, even though we have two or three fd's for YUV native android buffers, they all reference the same backing buffer. But we really shouldn't rely on that. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-16 18:14:44 +00:00
Jason Ekstrand	110669c85c	st,i965: Stop looping on 64-bit lowering Now that the 64-bit lowering passes do a complete lowering in one go, we don't need to loop anymore. We do, however, have to ensure that int64 lowering happens after double lowering because double lowering can produce int64 ops. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-16 16:05:16 +00:00
Jason Ekstrand	548da20b22	nir/lower_doubles: Handle fdiv and fsub directly Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-16 16:05:16 +00:00
Jason Ekstrand	d7d35a9522	nir/lower_doubles: Use the new NIR lowering framework One advantage of this is that we no longer need to run in a loop because the new framework handles lowering instructions added by lowering. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-16 16:05:16 +00:00
Jason Ekstrand	197a08dc69	nir/lower_doubles: Use "alu" for the nir_alu_instr Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-16 16:05:16 +00:00
Jason Ekstrand	d65902c179	nir/lower_int64: Use the core NIR lowering framework One advantage of this is that we no longer need to run in a loop because the new framework handles lowering instructions added by lowering. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-16 16:05:16 +00:00
Jason Ekstrand	c1cffa4249	nir/alu_to_scalar: Use the new NIR lowering framework Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-16 16:05:16 +00:00
Jason Ekstrand	eb768b0a09	nir/alu_to_scalar: Use "alu" as the name for the nir_alu_instr Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-16 16:05:16 +00:00
Jason Ekstrand	998d84fca5	nir/lower_system_values: Support lowering more intrinsics Instead of only lowering system from variables, lower most to intrinsics and let the lowering framework immediately lower the intrinsic. This will result in a bit more instruction churn but it means that NIR code builders can just use intrinsics instead of everything having to go through variables. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-16 16:05:16 +00:00
Jason Ekstrand	ae8caaadee	nir/lower_system_values: Drop the context-aware builder functions Instead of having context-aware builder functions, just provide lowering for the system value intrinsics and let nir_shader_lower_instructions handle the recursion for us. This makes everything a bit simpler and means that the lowering can also be used if something comes in as a system value intrinsic rather than a load_deref. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-16 16:05:16 +00:00
Jason Ekstrand	58ffd7fbf6	nir/lower_system_values: Use the new generic NIR lowering helpers Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-16 16:05:16 +00:00
Jason Ekstrand	ce3af830cb	nir/lower_subgroups: Use the new generic NIR lowering helpers Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-16 16:05:16 +00:00
Jason Ekstrand	758fdce9fe	nir: Add some generic helpers for writing lowering passes Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-16 16:05:16 +00:00
Jason Ekstrand	c74b98486a	nir: Add a helper for fetching the SSA def from an instruction Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-16 16:05:16 +00:00
Tomeu Vizoso	75b53a159d	pandecode: Add more addresses to trace When debugging, we're given the fault_pointer unresolved, so it is helpful to have more context in the decode. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-07-16 08:40:59 -07:00
Tomeu Vizoso	5a7688fdec	panfrost: Use 64-bit descriptors globally Midgard supports two modes of operation, 32-bit mode and 64-bit mode. The GPU is natively 64-bit, but job descriptors can be submitted in 32-bit mode. Among other changes, 32-bit mode shortens pointer sizes to use 32-bit pointers rather than the full 64-bit range. The blob decides which mode to use based on the CPU bitness, so an armhf system uses 32-bit descriptors and an aarch64 system uses 64-bit descriptors. For a while, we mimicked this, bu inevitably this caused the 32-bit support to lag behind as our reference platform is 64-bit. To combat the code staleness, we traced an older GPU paired with a 64-bit CPU (the Midgard T720 on-board the sunxi H64). From there, we could tell which fields were really about hardware and which fields were simply reflections of the descriptor bitness. From there, we decided to remove support for 32-bit descriptors entirely, using 64-bit descriptors unconditionally. There is minimal performance penalty for this in practice, and it allows us to unify these disparate code paths. This fixes: - T860 + armhf - T820 + armhf - T760 + aarch64 And will help bringup of 1st/2nd generation Midgard regardless of CPU. [Work done by Tomeu. Commit message written by Alyssa.] v2: Add comments preserving information about the old behaviour for future reference. Fix a compiler warning. (Alyssa) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-16 08:40:59 -07:00
Jason Ekstrand	6a441151c2	anv: Account for dynamic stencil write disables in the PMA fix In `6ce8592836` we started looking at the dynamic stencil state and disabling stencil writes when the stencil mask is zero. Unfortunately, we never updated the PMA fix code accordingly so 3DSTATE_WM_DEPTH_STENCIL and the PMA fix were getting out-of-sync causing hangs. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109203 Fixes: `6ce8592836` "anv: Disable stencil writes when both write..." Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-16 15:12:45 +00:00
Alyssa Rosenzweig	5ad00fb3ed	panfrost: Implement opportunistic AFBC Rather than hardcoding a BO layout at creation-time, we implement the ability to hint layouts at various points in a BO's lifetime, potentially reallocating and switching layouts if it's heuristically deemed useful to do so. In this patch, we add a simple hinting implementation, opportunistically compressing FBOs. Support is hidden behind PAN_MESA_DEBUG=afbc as the implementation is incomplete (software access to AFBC is unimplemented at the moment) and therefore would regress significantly. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-16 07:21:08 -07:00
Alyssa Rosenzweig	d60994989e	panfrost/mfbd: Zero out framebuffer_stride We don't know what this is, so let's not pretend we do. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-16 07:19:29 -07:00
Alyssa Rosenzweig	e65e3cf596	panfrost: AFBC buffers must be cache-line aligned Fixes a DATA_INVALID_FAULT when AFBC is paried with mipmapping. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-16 07:19:28 -07:00
Alyssa Rosenzweig	f7621a8c5f	panfrost: Add Z/S and MRT BOs to the job Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-16 07:19:28 -07:00
Alyssa Rosenzweig	aaae6180bf	panfrost: Set usage2 during draw, not CSO It can change from a layout switch. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-16 07:19:28 -07:00
Sergii Romantsov	7417b43211	meta: memory leak of CopyPixels usage Meta of CopyPixel generates a buffer object but does not free it on cleanup. Fixes: `37d11b13ce` (meta: Don't pollute the buffer object namespace in _mesa_meta_setup_vertex_objects) Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-07-16 13:48:47 +03:00
Samuel Pitoiset	afa102d65b	radv: add radv_emit_streamout_{begin,end} helpers Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-16 11:17:00 +02:00
Samuel Pitoiset	17464d205c	radv: pass output values to radv_emit_stream_output() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-16 11:16:58 +02:00
Samuel Pitoiset	4dcdc4cdc5	radv: allow to select DST_SEL with RELEASE_MEM Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-16 11:16:57 +02:00
Samuel Pitoiset	3c6d6bd71f	radv: allow to emit PS_DONE/CS_DONE with RELEASE_MEM Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-16 11:16:55 +02:00
Samuel Pitoiset	219dc1b25c	radv: restore an assertion in handle_vs_outputs() The NGG GS epilogue no longers call that function so the assertion is just useless now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-16 11:16:53 +02:00
Samuel Pitoiset	68603b767f	radv/gfx10: emit ES outputs of TES when it's not NGG Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-16 11:16:51 +02:00
Samuel Pitoiset	b0f7a6e981	radv: update LATE_ALLOC_VS.LIMIT Mirror RadeonSI. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-16 10:10:22 +02:00
Samuel Pitoiset	27d91062a8	radv/gfx10: support pixel shaders without exports Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-16 10:10:21 +02:00
Samuel Pitoiset	1b2bfeaaaa	radv: fix gathering clip/cull distance masks for GS For NGG, the driver relies on the VS outinfo struct. This fixes dEQP-VK.clipping.user_defined.clip__vert_tess_geom_ Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-16 10:09:37 +02:00
Samuel Pitoiset	361d549f87	Revert "radv/gfx10: don't set array pitch field on images" It introduces too many regressions. This reverts commit `6d50dcd80f`.	2019-07-16 09:37:56 +02:00
Iago Toral Quiroga	556c299430	v3d: flag dirty state when binding new sampler states We emit code to saturate texture coordinates when using clamp wrapping mode so if we don't flag the dirty state here we don't get to recompile the shaders when the wrapping mode changes. v2: - Do the same when setting sampler views (Eric) - Use a switch statement instead of an if ladder. - Swap the shader stage assertion with an unreachable. Fixes: spec/!opengl 1.1/texwrap 1d bordercolor/gl_rgba8, border color only spec/!opengl 1.1/texwrap 1d proj bordercolor/gl_rgba8, projected, border color only spec/!opengl 1.1/texwrap 2d bordercolor/gl_rgba8, border color only spec/!opengl 1.1/texwrap 2d proj bordercolor/gl_rgba8, projected, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_alpha12, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_alpha16, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_alpha4, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_alpha8, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_intensity8, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_luminance4_alpha4, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_luminance6_alpha2, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_luminance8, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_luminance8_alpha8, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_r3_g3_b2, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgb10, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgb10_a2, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgb4, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgb5, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgb5_a1, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgb8, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgba4, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor-swizzled/gl_rgba8, swizzled, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_alpha12, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_alpha16, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_alpha4, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_alpha8, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_intensity8, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_luminance4_alpha4, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_luminance6_alpha2, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_luminance8, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_luminance8_alpha8, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_r3_g3_b2, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_rgb10, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_rgb10_a2, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_rgb4, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_rgb5, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_rgb5_a1, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_rgb8, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_rgba4, border color only spec/!opengl 1.1/texwrap formats bordercolor/gl_rgba8, border color only spec/!opengl 1.2/texwrap 3d bordercolor/gl_rgba8, border color only spec/!opengl 1.2/texwrap 3d proj bordercolor/gl_rgba8, projected, border color only spec/arb_es2_compatibility/texwrap formats bordercolor-swizzled/gl_rgb565, swizzled, border color only spec/arb_es2_compatibility/texwrap formats bordercolor/gl_rgb565, border color only spec/arb_texture_compression/texwrap formats bordercolor-swizzled/gl_compressed_alpha, swizzled, border color only spec/arb_texture_compression/texwrap formats bordercolor-swizzled/gl_compressed_luminance_alpha, swizzled, border color only spec/arb_texture_compression/texwrap formats bordercolor-swizzled/gl_compressed_rgb, swizzled, border color only spec/arb_texture_compression/texwrap formats bordercolor/gl_compressed_alpha, border color only spec/arb_texture_compression/texwrap formats bordercolor/gl_compressed_luminance_alpha, border color only spec/arb_texture_compression/texwrap formats bordercolor/gl_compressed_rgb, border color only spec/arb_texture_float/texwrap formats bordercolor-swizzled/gl_alpha16f_arb, swizzled, border color only spec/arb_texture_float/texwrap formats bordercolor-swizzled/gl_intensity16f_arb, swizzled, border color only spec/arb_texture_float/texwrap formats bordercolor-swizzled/gl_luminance16f_arb, swizzled, border color only spec/arb_texture_float/texwrap formats bordercolor-swizzled/gl_luminance_alpha16f_arb, swizzled, border color only spec/arb_texture_float/texwrap formats bordercolor-swizzled/gl_rgb16f, swizzled, border color only spec/arb_texture_float/texwrap formats bordercolor-swizzled/gl_rgba16f, swizzled, border color only spec/arb_texture_float/texwrap formats bordercolor/gl_alpha16f_arb, border color only spec/arb_texture_float/texwrap formats bordercolor/gl_intensity16f_arb, border color only spec/arb_texture_float/texwrap formats bordercolor/gl_luminance16f_arb, border color only spec/arb_texture_float/texwrap formats bordercolor/gl_luminance_alpha16f_arb, border color only spec/arb_texture_float/texwrap formats bordercolor/gl_rgb16f, border color only spec/arb_texture_float/texwrap formats bordercolor/gl_rgba16f, border color only spec/arb_texture_rectangle/texwrap rect bordercolor/gl_rgba8, border color only spec/arb_texture_rectangle/texwrap rect proj bordercolor/gl_rgba8, projected, border color only spec/arb_texture_rg/texwrap formats bordercolor-swizzled/gl_r8, swizzled, border color only spec/arb_texture_rg/texwrap formats bordercolor-swizzled/gl_rg8, swizzled, border color only spec/arb_texture_rg/texwrap formats bordercolor/gl_r8, border color only spec/arb_texture_rg/texwrap formats bordercolor/gl_rg8, border color only spec/arb_texture_rg/texwrap formats-float bordercolor-swizzled/gl_r16f, swizzled, border color only spec/arb_texture_rg/texwrap formats-float bordercolor-swizzled/gl_rg16f, swizzled, border color only spec/arb_texture_rg/texwrap formats-float bordercolor/gl_r16f, border color only spec/arb_texture_rg/texwrap formats-float bordercolor/gl_rg16f, border color only spec/ext_packed_float/texwrap formats bordercolor-swizzled/gl_r11f_g11f_b10f, swizzled, border color only spec/ext_packed_float/texwrap formats bordercolor/gl_r11f_g11f_b10f, border color only spec/ext_texture_shared_exponent/texwrap formats bordercolor-swizzled/gl_rgb9_e5, swizzled, border color only spec/ext_texture_shared_exponent/texwrap formats bordercolor/gl_rgb9_e5, border color only spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_alpha8_snorm, swizzled, border color only spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_intensity8_snorm, swizzled, border color only spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_luminance8_alpha8_snorm, swizzled, border color only spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_luminance8_snorm, swizzled, border color only spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_r8_snorm, swizzled, border color only spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_rg8_snorm, swizzled, border color only spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_rgb8_snorm, swizzled, border color only spec/ext_texture_snorm/texwrap formats bordercolor-swizzled/gl_rgba8_snorm, swizzled, border color only spec/ext_texture_snorm/texwrap formats bordercolor/gl_alpha8_snorm, border color only spec/ext_texture_snorm/texwrap formats bordercolor/gl_intensity8_snorm, border color only spec/ext_texture_snorm/texwrap formats bordercolor/gl_luminance8_alpha8_snorm, border color only spec/ext_texture_snorm/texwrap formats bordercolor/gl_luminance8_snorm, border color only spec/ext_texture_snorm/texwrap formats bordercolor/gl_r8_snorm, border color only spec/ext_texture_snorm/texwrap formats bordercolor/gl_rg8_snorm, border color only spec/ext_texture_snorm/texwrap formats bordercolor/gl_rgb8_snorm, border color only spec/ext_texture_snorm/texwrap formats bordercolor/gl_rgba8_snorm, border color only spec/ext_texture_srgb/texwrap formats bordercolor-swizzled/gl_sluminance8, swizzled, border color only spec/ext_texture_srgb/texwrap formats bordercolor-swizzled/gl_sluminance8_alpha8, swizzled, border color only spec/ext_texture_srgb/texwrap formats bordercolor-swizzled/gl_srgb8, swizzled, border color only spec/ext_texture_srgb/texwrap formats bordercolor-swizzled/gl_srgb8_alpha8, swizzled, border color only spec/ext_texture_srgb/texwrap formats bordercolor/gl_sluminance8, border color only spec/ext_texture_srgb/texwrap formats bordercolor/gl_sluminance8_alpha8, border color only spec/ext_texture_srgb/texwrap formats bordercolor/gl_srgb8, border color only spec/ext_texture_srgb/texwrap formats bordercolor/gl_srgb8_alpha8, border color only Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-16 08:13:28 +02:00
Samuel Pitoiset	994253b400	radv/gfx10: add missing conversions for 16-bit exports This fixes dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_* Found with RADV_DEBUG=checkir Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-16 08:12:34 +02:00
Samuel Pitoiset	d8844533af	radv: remove unused code in radv_export_param() It was hack for geometry shaders. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-16 08:12:20 +02:00
Dave Airlie	6d50dcd80f	radv/gfx10: don't set array pitch field on images Setting this seems to be broken, amdvlk only sets it for quilted textures which I'm not sure what those are. Fixes dEQP-VK.glsl.texture_functions.query.texturesize3d Fixes: `bf11f1c3a4` ("radv/gfx10: add gfx10_make_texture_descriptor") Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-16 10:41:27 +10:00
Vinson Lee	d1a55d9559	lima/ppir: Fix assert condition in ppir_codegen_encode_branch. Fixes: `af0de6b91c` ("lima/ppir: implement discard and discard_if") Reported-by: Coverity Scan Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-07-15 23:48:34 +00:00
Eric Anholt	82dc168f51	docs: Tell people how to easily generate the Fixes lines. v2: Include '-s' to suppress the diff. v3: use the git config command (Ken), use < (Eric) Reviewed-by: Matt Turner <mattst88@gmail.com> (v1) Acked-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-15 16:29:31 -07:00
Caio Marcelo de Oliveira Filho	1210e8caaf	spirv: Ignore ArrayStride for storage classes that should not use it The stride was already overriden when using lower_workgroup_access_to_offsets, so elaborate a bit the commentary there. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-15 16:18:57 -07:00
Caio Marcelo de Oliveira Filho	026cfa1099	spirv: Fix stride calculation when lowering Workgroup to offsets Use alignment to calculate the stride associated with the pointer types. That stride is used when the pointers are casted to arrays. Note that size alone is not sufficient, e.g. struct { vec2 a; vec1 b; } will have element an element size of 12 bytes, but the stride needs to be 16 bytes to respect the 8 byte alignment. Fixes: `050eb6389a` "spirv: Ignore ArrayStride in OpPtrAccessChain for Workgroup" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-15 16:18:46 -07:00
Alyssa Rosenzweig	329799257b	panfrost/ci: Blacklist flush finish tests We don't implement batch splitting quite yet which is necessary for the ludicrous number of draw calls these tests invoke. Blacklist them for now. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 16:16:19 -07:00
Alyssa Rosenzweig	c1125d0935	panfrost: Don't leak oversized transient allocations When we allocate them, we allocate with two references accidentally, causing them to leak uncontrollably. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 16:12:56 -07:00
Alyssa Rosenzweig	48f51e9dbb	panfrost: Implement panfrost_bo_cache_evict_all Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 16:12:56 -07:00
Alyssa Rosenzweig	f02278ae87	panfrost: Implement panfrost_bo_cache_get Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 16:12:56 -07:00
Alyssa Rosenzweig	525e5dc4ed	panfrost: Implement panfrost_bo_cache_put Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 16:12:56 -07:00
Alyssa Rosenzweig	9034b5586c	panfrost: Add pan_bucket helper Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 16:12:56 -07:00
Alyssa Rosenzweig	eb398683d7	panfrost: Implement pan_bucket_index helper We'll use this whenever we need to lookup a bucket. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 16:12:56 -07:00
Alyssa Rosenzweig	270733fe6a	panfrost: Add BO cache data structure Linked list of panfrost_bo* nested inside an array of buckets. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 16:12:56 -07:00
Alyssa Rosenzweig	f3464f7987	panfrost: Describe BO cache architecture Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 16:12:56 -07:00
Alyssa Rosenzweig	f3b7e1ddc7	panfrost: Stub out panfrost_bo_cache_evict This destructor will be used to legitimately free the BOs, now that a BO free with cacheable=0 is only a "fake" free. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 16:12:56 -07:00
Alyssa Rosenzweig	74ad5f89f8	panfrost: Stub out panfrost_bo_cache_put ..so we can intercept the BO free. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 16:12:56 -07:00
Alyssa Rosenzweig	b5a28f61ae	panfrost: Stub out panfrost_bo_cache_get We will use this function to fetch cached BOs instead of freshly allocating them. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 16:12:55 -07:00
Alyssa Rosenzweig	fea953e6c2	panfrost: Don't leak the blend CSO hash table Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 16:12:55 -07:00
Alyssa Rosenzweig	07a1f3d120	panfrost: Cleanup after scoreboarding Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 16:12:55 -07:00
Alyssa Rosenzweig	fae790ecfc	panfrost: Allocate UBOs on the stack, not the heap Saves a call to calloc (the maximum size is small and known at compile-time) and fixes a leak. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 16:12:55 -07:00
Jason Ekstrand	0ba508d7a3	nir,intel: Add support for lowering 64-bit nir_opt_extract_* We need this when doing full software 64-bit emulation. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110309 Fixes: `cbad201c2b` "nir/algebraic: Add missing 64-bit extract_[iu]8..." Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-07-15 16:08:37 -05:00
Jason Ekstrand	7a19e05e8c	nir/opt_if: Clean up single-src phis in opt_if_loop_terminator Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111071 Fixes: `2a74296f24` "nir: add opt_if_loop_terminator()" Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-15 19:58:51 +00:00
Pierre-Eric Pelloux-Prayer	ed98f8a63a	radeonsi: verify buffer_offset value before using it This buffer_ofset can come directly from the application (e.g: when using glVertexAttribPointer) and can contain an invalid value. st_atom_array already makes sure that if it's not negative so all that's left is to verify that it's smaller that the buffer size. Bugs related to this issue: Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105251#c52 Bugzilla: https://bugzilla.freedesktop.org/show_bug.cgi?id=109693 Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-07-15 15:22:28 -04:00
Pierre-Eric Pelloux-Prayer	a9655f36fe	st/mesa: verify that vertex buffer offset isn't negative For drivers supporting PIPE_CAP_SIGNED_VERTEX_BUFFER_OFFSET the buffer_offset value will be interpreted as an signed int. An example of application code causing a negative offset: float b[] = { ... }; // 3 float for pos, 3 for color glBufferData(GL_ARRAY_BUFFER, ..., b, ...); glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 6 * sizeof(float), 0); glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, 6 * sizeof(float), &b[3]); ^ should be 3 * sizeof(float) The offset is a ptr so when interpreted as a signed int it can be negative. This commit adds a verification that (int) buffer_offset is not negative - this would indicate an application bug. Since it's too late to emit a GL_INVALID_VALUE error, we replace the negative offset by 0 and emit a debug message. Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-07-15 15:22:25 -04:00
Marek Olšák	ce04fbf67c	st/mesa: don't invalidate a buffer range that is mapped This is needed to fix an issue with OpenGL when a buffer is mapped and BufferSubData is called. In this case, we can't invalidate the buffer range.	2019-07-15 14:58:23 -04:00
Marek Olšák	fc4302d1df	gallium: use MAP_DIRECTLY to mean supression of DISCARD in buffer_subdata This is needed to fix an issue with OpenGL when a buffer is mapped and BufferSubData is called. In this case, we can't invalidate the buffer range.	2019-07-15 14:58:23 -04:00
Kenneth Graunke	5e76c99923	iris: Better handle decoder base addresses It can be useful to call the decoder on a single batch. But, that batch may not contain STATE_BASE_ADDRESS, at which point the decoder will have no idea how to find any buffers. We can initialize the two static bases at the beginning of time, so it has them even if it never sees SBA. Surface base address changes dynamically, possibly in the middle of a batch. So we update it at the start of each batch, making it always start at the value we inherited from the previous one. SBA commands inside the batch can update it to a proper value. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-07-15 11:49:19 -07:00
Samuel Pitoiset	ed12be1b8f	radv/gfx10: enable OC_LDS_EN for NGG GS if the ES stage is TES Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-15 20:05:21 +02:00
Bas Nieuwenhuizen	d4f0f1a6e2	anv: Add android dependencies on android. Specifically needed for nativewindow for some VK_EXT_external_memory_android_hardware_buffers functions, where we call into some AHardwareBuffer functions. The legacy Android ext did not have us call into any Android function at all and hence it was not noticed. Fixes: `755c633b8d` "anv: Fix vulkan build in meson." Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2019-07-15 15:23:43 +00:00
Alyssa Rosenzweig	0b83005807	panfrost: Advertise more depth/stencil formats Fixes a regression in glmark's shadow/refract scenes. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 08:03:35 -07:00
Alyssa Rosenzweig	1aaf68d120	panfrost/mfbd: Add Z32 rendering support Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 08:03:35 -07:00
Alyssa Rosenzweig	f8e2219b08	panfrost: Fix blend_cso if nr_cbufs == 0 Fixes: `46396af1ec` ("panfrost: Refactor blend infrastructure") Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 08:03:35 -07:00
Alyssa Rosenzweig	318d641cd9	panfrost: Cleanup shader upload code The old algorithm is still used (and the same issue -- namely, leaking all shaders -- applies) but we're way more concise about it since we're only using the routine for shaders nowadays; everything else is a BO-proper or transient. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 08:03:34 -07:00
Alyssa Rosenzweig	1ffca961ab	panfrost: Remove all old allocators With the new refactor, this all becomes dead code. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 08:03:34 -07:00
Alyssa Rosenzweig	9981b6ef0f	panfrost: Use transient memory for occlusion queries These only last a frame anyway. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 08:03:34 -07:00
Alyssa Rosenzweig	594b47d917	panfrost: Remove bizarre hack I don't think this is still necessary, and if it is, we'll have to figure out how to fix it the right way. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 08:03:34 -07:00
Alyssa Rosenzweig	d375d127a9	panfrost: Upload vertex descriptors to transient memory It's not legal to reuse the vertex shader descriptor across frames now that we patch it at draw-time, so upload to transient memory. Ideally, we could be smarter about this such that subsequent draws with the same vertex shader and same patched state would reuse the descriptor, but for now, let's simply achieve correctness. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 08:03:34 -07:00
Alyssa Rosenzweig	c6b59db5b4	panfrost: Delay resource mmaps We use the new PAN_ALLOCATE_DELAY_MMAP flag to only map resources on-demand, which should avoid mapping FBOs. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 08:03:34 -07:00
Alyssa Rosenzweig	bd4986bafa	panfrost: Cleanup PAN_ALLOCATE_* While we're at it, prompted by a semantics issue around INVISIBLE, also add a separate DELAY_MMAP flag. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 08:03:34 -07:00
Alyssa Rosenzweig	2f783ede02	panfrost/drm: Don't mmap INVISIBLE buffers On the new kernel, mmaping doesn't hurt per se, but it's still wasteful for buffers explicitly marked as not needing an mmap. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-15 08:03:34 -07:00
Lionel Landwerlin	c9c8c2f7d7	anv: fix crash in vkCmdClearAttachments with unused attachment anv_render_pass_compile() turns an unused attachment into a NULL depth_stencil_attachment pointer so check that pointer before accessing it. Found with updates to existing CTS tests. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `208be8eafa` ("anv: Make subpass::depth_stencil_attachment a pointer") Reviewed-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>	2019-07-15 16:47:41 +03:00
Samuel Pitoiset	b650f3d197	radv/gfx10: export the PrimitiveID for ES stages (VS or TES) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-15 11:30:10 +02:00
Samuel Pitoiset	8175f6269b	radv/gfx10: declare an external symbol for the ESGS ring It will be used for stream output but for now only declares it if VS and if the PrimitiveID needs to be exported. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-15 11:30:08 +02:00
Samuel Pitoiset	f0a90eddb6	radv/gfx10: allocate ESGS ring space for exporting PrimitiveID Only VS needs that. We shouldn't hardcode these values but that's complicated to not do that for now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-15 11:30:05 +02:00
Samuel Pitoiset	4478f14327	radv/gfx10: fix crash when emitting NGG GS prologue ac_nir_context is initialized after the driver emits the NGG GS prologue so it's likely to crash. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-15 08:51:53 +02:00
Vasily Khoruzhick	eb862c2365	lima/ppir: Fix branch codegen "unknown_2" field is actually a size of instruction that branch points to. If it's set to a smaller size than actual instruction branch behavior is not defined (and it usually wedges the GPU). Fix it by setting this field correctly. Fixes: `af0de6b91c` ("lima/ppir: implement discard and discard_if") Reviewed-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-07-14 19:49:14 -07:00
Vasily Khoruzhick	8f0160ca24	lima/ppir: Fix assert condition in ppir_codegen_encode_discard Fixes: `af0de6b91c` ("lima/ppir: implement discard and discard_if") Reviewed-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-07-14 19:48:55 -07:00
Jonathan Marek	4e102a6de7	etnaviv: fix incorrect varying interpolation This corresponds to what the GC3000 blob does. The USED / UNUSED enums are wrong, at least for GC2000/GC3000. Without this the 3rd texture component is not interpolated correctly (flat?) in the following test (and others): dEQP-GLES2.functional.texture.mipmap.cube.generate.rgba8888_nicest Strangely, when the texture is sampled from OpenGL it works correctly, the problem only shows up for sampling by gallium/blitter. This fixes other cube map tests which use util_blitter_blit. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-14 10:34:17 -04:00
Jonathan Marek	a9e78a44d1	etnaviv: reduce rs alignment requirement for two pixel pipes GPU The rs alignment doesn't have to be multiplied by # of pixel pipes. This works on GC2000 which doesn't have the SINGLE_BUFFER feature. This fixes some cubemaps (NPOT / small mipmap levels) because aligning by 8 breaks the expected alignment of 4 for tiled format. We don't want to mess with the alignment of tiled formats. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-14 10:34:17 -04:00
Jonathan Marek	2c393053bf	etnaviv: fix nearest_linear / linear_nearest filtering on GC3000 The MIN filter is never used when not using mipmaps. This fixes that. Interestingly, only GC3000 needs this (GC2000 works without this fix). Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-14 10:34:17 -04:00
Jonathan Marek	63efb6ec6c	etnaviv: fix nearest filtering ROUND_UV rounding breaks nearest filtering. Enable it only when nearest filtering isn't used. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-14 10:34:17 -04:00
Bas Nieuwenhuizen	1f58b6ffef	radv/gfx10: Fix DCC clears. Looks like if the reg clear bit is set, the hwardware does not use the 0/1 clears for textures. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-14 07:30:04 +00:00
Vinson Lee	730ceeddb5	meson: Add dep_thread dependency. Fix this build error on Ubuntu 18.04. /usr/bin/ld: src/util/libmesa_util.a(u_cpu_detect.c.o): undefined reference to symbol 'pthread_once@@GLIBC_2.2.5' Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110663 Suggested-by: Eric Engestrom <eric@@engestrom.ch> Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Eric Engestrom <eric@engestrom.ch>	2019-07-13 11:39:26 -07:00
Eric Anholt	11aa32a447	gitlab-ci: Build i386 and ARM drivers in surfaceless mode. I don't particularly care about getting x86/ARM cross-build coverage of all the window systems, but we do want to be building src/mesa/ (for x86 asm) and gallium drivers (for vc4 NEON asm). I'm also hoping to use these build products for testing freedreno on actual HW (which we do using surfaceless). This increases the docker image from 1.4G to 1.5G. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Acked-by: Eric Engestrom <eric@engestrom.ch>	2019-07-13 13:46:24 +00:00
Andreas Baierl	ce81c9a2e1	lima: Fix compiler warnings for unused functions. Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-07-13 13:15:05 +00:00
Caio Marcelo de Oliveira Filho	09c4037dda	anv: Fix pool allocator when first alloc needs to grow When using softpin, the first allocation was not calculating the padding and offset correctly for the case the first allocation needed to grow. We were missing initialize the state.end right after expanding the pool for the first time. This is not a problem for non-softpin since there we don't use leftover padding so the ends would re-arrange incrementally. This fixes running dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 in SKL -- the test uses a shader larger than the initial size for the instruction pool. Fixes: `dfc9ab2ccd` "anv/allocator: Add padding information." Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-12 22:25:37 -07:00
Kenneth Graunke	aa13921079	mesa: Port errors.c to util/list.h instead of simple_list. There is widespread consensus that simple_list should go away. This patch converts one more use to the modern kernel-style list. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-07-12 21:58:40 -07:00
Jason Ekstrand	974fabe810	intel: Run the optimization loop before and after lowering int64 For bindless SSBO access, we have to do 64-bit address calculations. On ICL and above, we don't have 64-bit integer support so we have to lower the address calculations to 32-bit arithmetic. If we don't run the optimization loop before lowering, we won't fold any of the address chain calculations before lowering 64-bit arithmetic and they aren't really foldable afterwards. This cuts the size of the generated code in the compute shader in dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 by around 30%. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-13 02:59:28 +00:00
Alyssa Rosenzweig	7103baf01f	panfrost/decode: Drop _replay prefix We don't even support replay anymore; this is just wasting characters and adding clutter. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-12 16:23:53 -07:00
Alyssa Rosenzweig	0d5abfdec5	panfrost/decode: Drop _name suffixes Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-12 16:23:53 -07:00
Alyssa Rosenzweig	0c1874adad	panfrost/decode: Add MEMORY_PROP_DIR variant This allows dumping memory properties directly without dereferencing an address, allowing us to fix more -Waddress-of-packed-member warnings. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-12 16:23:52 -07:00
Alyssa Rosenzweig	9ffe061c5e	panfrost/decode: Copy embedded structs before using Fixes some, but not all, warnings from -Waddress-of-packed-member Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-12 16:23:52 -07:00
Alyssa Rosenzweig	23b230d72f	panfrost/decode: Remove pandecode_decode_fbd_type It is unused. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-12 16:23:52 -07:00
Alyssa Rosenzweig	9eea8423a0	panfrost/midgard: Use generic outmod type It could be midgard_outmod_float or midgard_outmod_int; don't assume it's one or the other. Fixes -Wenum-conversion warnings. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-12 16:23:52 -07:00
Alyssa Rosenzweig	e173d6b1b1	panfrost: Precompute scoreboard dependents Mali job dependency graphs, at least for GLES3.0, have the special property that a given node will only have at most a single dependent. This allows us to efficiently precompute the dependent array and replace an inner loop's O(N) search with an O(1) lookup, bringing the algorithmic complexity of scoreboarding from O(N^2) to O(N). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-12 16:22:15 -07:00
Alyssa Rosenzweig	b68778e6de	panfrost: Remove transient pool abstraction Now that it has been totally replaced by the borrow mechanism, it is now unused code. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-12 15:31:48 -07:00
Alyssa Rosenzweig	ee32700f37	panfrost: Subdivide fixed-size transient slabs The whole purpose of the transient memory model is to make subdivision stupidly easy, so let's handle that. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-12 15:31:48 -07:00
Alyssa Rosenzweig	37097b2f38	panfrost: Recycle fixed-size transient BOs The usual case. We use the bitset to mark freedom and seize it. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-12 15:31:48 -07:00
Alyssa Rosenzweig	0f5ad9efcc	panfrost: Bookkeep transient indices The batch now temporarily possesses the transient buffer, so it'll need to remember that to free it later. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-12 15:31:48 -07:00
Alyssa Rosenzweig	00c9a1cb75	panfrost: Rewrite allocate_transient with new abstraction We use a fixed size slab if we can, otherwise we create a dedicated ("oversized") BO and add that to the job. In the latter case we'll get reference counting for free so we can forget about this corner case for the rest of the series. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-12 15:31:48 -07:00
Alyssa Rosenzweig	ba02cf0e75	panfrost: Add pan_bo_for_screen helper Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-12 15:31:48 -07:00
Alyssa Rosenzweig	330cd057ad	panfrost: Add panfrost_transient_bo array We would like transient allocations to occur on the screen (borrowed by the batch) rather than on the context. Add fields to track this. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-12 15:31:47 -07:00
Alyssa Rosenzweig	718ebfa225	panfrost: Don't upload vertex/tiler twice The latter upload is correct, but the former upload is unassociated with any particular FBO and therefore becomes orphaned. We do have to upload at draw-time at the latest, if we haven't by then. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-12 15:31:47 -07:00
Alyssa Rosenzweig	085004cc2c	panfrost/drm: Check allocation size is positive Zero-sized allocations will fail with an unhelpful errno from the kernel; check size explicitly in userspace before it gets that far. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-12 15:31:47 -07:00
Neil Roberts	8419621176	mesa/glspirv: Validate that compute shaders are not linked with other stages The test is based on link_shaders(). For example, it allows the following test (when run on SPIR-V mode) to pass: spec/arb_compute_shader/linker/mix_compute_and_non_compute.shader_test Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-12 23:42:42 +02:00
Neil Roberts	022e9ddd1a	mesa/glspirv: Validate that there is a VS when there is a TCS, TES or GS The shader combination tests are copied from link_shaders(). For example, it allows the following tests (when run on SPIR-V mode) to pass: spec/arb_tessellation_shader/linker/no-vs spec/arb_tessellation_shader/linker/tcs-no-vs spec/arb_tessellation_shader/linker/tes-no-vs spec/glsl-1.50/linker/gs-without-vs Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-12 23:42:41 +02:00
Alejandro Piñeiro	e4210b93e4	i965: don't use disk cache with SPIR-V shaders Right now we don't support disk cache for SPIR-V shaders (from ARB_gl_spirv), so let's avoid writing the program data to or reading it from the disk if any in-use shaders use SPIR-V. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-12 23:42:41 +02:00
Alejandro Piñeiro	bb3bbdfbbd	glsl/shader_cache: handle SPIR-V shaders Right now we don't have cache support for SPIR-V shaders (from ARB_gl_spirv). Right now they are properly skipped because they fall on the ff shader code path (no key, no name), but it would be better to update current comments, and add some guards. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-12 23:42:41 +02:00
Arcady Goldmints-Orlov	637b168470	nir/linker: Initialize UniformDataDefaults when using SPIR-V Allocate UniformDataDefaults and fill in the data defaults when linking a SPIR-V program. Among other things, this allows program serialization to work. It allows the following piglit test (when run on SPIR-V mode) to pass: spec/arb_get_program_binary/execution/uniform-after-restore.shader_test v2: use memcpy to initialize UniformDataDefaults Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-12 23:42:41 +02:00
Arcady Goldmints-Orlov	761b0fe95f	glsl/serialize: Update write_program_resource_data() to handle NULL input and output variable names Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-12 23:42:41 +02:00
Arcady Goldmints-Orlov	c3122d2431	glsl/serialize: Handle NULL uniform name in write_uniforms() Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-12 23:42:41 +02:00
Antia Puentes	0baa553fab	mesa/main: Fix UBO/SSBO ACTIVE_VARIABLES query (ARB_gl_spirv) When querying MAX_NUM_ACTIVE_VARIABLES, NUM_ACTIVE_VARIABLES and ACTIVE_VARIABLES over SSBO and UBO interfaces, we filter the variables which are active using the variable's name and looking for it in the program resource list. If it is in the program resource list, the variable will be considered active. However due to ARB_gl_spirv where name reflection information is not mandatory, we can use the UBO/SSBO binding and variable offset to filter which variables which are active. v2: use RESOURCE_UBO/UNI macros instead of direct castings, update comment (Alejandro) v3: Change signature of _mesa_program_resource_find_active_variable to simplify calling it. Also, squash the fix for find_binding_offset for arrays of blocks (Arcady) Signed-off-by: Antia Puentes <apuentes@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Signed-off-by: Arcady Goldmints-Orlov <agoldmints@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-12 23:42:41 +02:00
Antia Puentes	161de77e0f	mesa/shader_query: Fix LOCATION_INDEX query (ARB_gl_spirv) When querying GL_LOCATION_INDEX using glGetProgramResourceiv we already know the index of the resource, we do not need to find it using the name, which is convenient for shaders coming from SPIR-V binaries where names are optional. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-12 23:42:41 +02:00
Antia Puentes	8818553f18	mesa/shaderapi: Fix TRANSFORM_FEEDBACK_VARYING program query Fixes the program queries API (glGetProgramiv): TRANSFORM_FEEDBACK_VARYINGS and TRANSFORM_FEEDBACK_VARYING_MAX_LENGTH in two cases: 1. ARB_enhaced_layouts: The queries were not working for GLSL shaders which specify the varyings using enhanced layouts. We were returning the info as if the varyings could only be specified using the API. 2. ARB_gl_spirv: TRANSFORM_FEEDBACK_VARYING_MAX_LENGTH should return 1 if there is no name reflection information available. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-12 23:42:41 +02:00
Antia Puentes	8792abff9d	mesa/uniforms: Fix GetUniformLocation (ARB_gl_spirv) From the ARB_gl_spirv specification, glGetUniformLocation should return -1 when no name reflection is available. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-12 23:42:41 +02:00
Antia Puentes	96d6156678	mesa/shader_query: Fix NAME_LENGTH queries (ARB_gl_spirv) For shaders constructed from SPIR-V binaries, it is possible that no name reflection information is available. In that case, - glGetProgramInterfaceiv(.., pname=MAX_NAME_LENGTH, ..) - gletProgramResourceiv(.., props=NAME_LENGTH, ..) should return 1. Signed-off-by: Antia Puentes <apuentes@igalia.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-12 23:42:41 +02:00
Alejandro Piñeiro	3ebd60b491	mesa: Fix ACTIVE_*_MAX_LENGTH program queries (ARB_gl_spirv) Since ARB_gl_spirv it is possible to miss a lot of name reflection information, so it is needed to add NULL name checks for several queries, and return a specific value on those cases. This commit add them for ACTIVE_UNIFORM_BLOCK_MAX_NAME_LENGTH, ACTIVE_ATTRIBUTE_MAX_LENGTH and ACTIVE_UNIFORM_MAX_LENGTH. From ARB_gl_spirv spec: "If pname is ACTIVE_UNIFORM_BLOCK_MAX_NAME_LENGTH, the length of the longest active uniform block name, including the null terminator, is returned. If no active uniform blocks exist, zero is returned. If no name reflection information is available, one is returned. If pname is ACTIVE_ATTRIBUTE_MAX_LENGTH, the length of the longest active attribute name, including a null terminator, is returned. If no active attributes exist, zero is returned. If no name reflection information is available, one is returned. If pname is ACTIVE_UNIFORM_MAX_LENGTH, the length of the longest active uniform name, including a null terminator, is returned. If no active uniforms exist, zero is returned. If no name reflection information is available, one is returned." Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-12 23:42:41 +02:00
Antia Puentes	cafc1a40d4	nir/types: Add glsl_type_is_unsized_array helper Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-12 23:42:41 +02:00
Antia Puentes	bfc5e46746	nir/linker: Fill TOP_LEVEL_ARRAY_SIZE and STRIDE From the ARB_program_interface_query specification: "For the property TOP_LEVEL_ARRAY_SIZE, a single integer identifying the number of active array elements of the top-level shader storage block member containing to the active variable is written to <params>. If the top-level block member is not declared as an array, the value one is written to <params>. If the top-level block member is an array with no declared size, the value zero is written to <params>." "For the property TOP_LEVEL_ARRAY_STRIDE, a single integer identifying the stride between array elements of the top-level shader storage block member containing the active variable is written to <params>. For top-level block members declared as arrays, the value written is the difference, in basic machine units, between the offsets of the active variable for consecutive elements in the top-level array. For top-level block members not declared as an array, zero is written to <params>." v2: move top_level_array_size and stride into nir_link_uniforms_state Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-12 23:42:41 +02:00
Antia Puentes	ae2ea5ec1f	nir/linker: Compute the offset for non-trivial uniform types. ARB_gl_spirv points that the offset must be explicit, however this is true for 'root' types. For complex types, like struct members or arrays of arraya, it needs to be computed. We are not using the offset stored in the gl_buffer_variables during the uniform blocks linking because currently we do not have a way to relate a gl_buffer_variable with its corresponding gl_uniform_storage. The GLSL path uses the name for that, but we can not rely on that because names are optional in SPIR-V. Notice that uniforms non-backed by a buffer object will have an offset equal to -1, like in the GLSL path. v2: add offset and var_is_in_block as per-variable state in nir_link_uniforms_state (Arcady) Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-12 23:42:41 +02:00
Antia Puentes	e15c663d8e	nir/linker: Add atomic counters to the program resource list Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-12 23:42:41 +02:00
Antia Puentes	e1464a1cf8	nir/linker: Add XFB resources to the program resource list Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-12 23:42:41 +02:00
Antia Puentes	53087a89ac	nir/linker: Add BUFFER_VARIABLEs to the prog resource list v2: use link_util_should_add_buffer_variable() (Arcady) Signed-off-by: Arcady Goldmints-Orlov <agoldmints@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-12 23:42:41 +02:00
Antia Puentes	ffdb44d3a0	nir/linker: Add inputs/outputs to the program resource list v2: added TODO comment hinting possible future refactoring of nir_build_program_resource_list and build_program_resource_list, to avoid code duplication (Alejandro, to explicitly reflect a valid concern from Timothy during the review). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-12 23:42:41 +02:00
Alejandro Piñeiro	691cee751a	nir/linker: add ubo/ssbo to the program resource list v2: "nir/linker: Use the stageref when adding UBO/SSBO resources" squashed on this one (Timothy) Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-12 23:42:41 +02:00
Antia Puentes	a638971929	nir/linker: Fill the uniform's BLOCK_INDEX Binding comparison is used to determine the block the uniform is part of. Note that to do the binding comparison we need the information in UniformBlocks[] and ShaderStorageBlocks[] to be available, so we have to call gl_nir_link_uniform_blocks() before linking the uniforms. v2: add missing break (Timothy) Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-12 23:42:41 +02:00
Samuel Pitoiset	f239e22813	radv/gfx10: enable 1D textures Mirror RadeonSI. This also fixes crashes in addrlib. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 18:25:45 +02:00
Andres Gomez	f4d2be03b1	intel/compiler: remove abandoned comments `c8665005`: ("intel/compiler: Don't always require precise lowering of flrp") forgot to remove some comments that didn't apply any more after the change. Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrnd.net>	2019-07-12 16:15:20 +00:00
Andres Gomez	9aadd5d688	nir/compiler: keep same bit size when lowering with flrp This was probably not caught before because no supported test was exercising the flrp lowering with other bit size different than 32. With the arrival of VK_KHR_shader_float_controls we will have some of those and, unless we keep the bit size, we will end with something like: ../src/compiler/nir/nir_builder.h:420: nir_builder_alu_instr_finish_and_insert: Assertion `src_bit_size == bit_size' failed. Fixes: `158370ed2a` ("nir/flrp: Add new lowering pass for flrp instructions") Fixes: `ae02622d8f` ("nir/flrp: Lower flrp(a, b, c) differently if another flrp(_, b, c) exists") Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrnd.net>	2019-07-12 16:15:20 +00:00
Jason Ekstrand	16842b2391	anv: Properly compute image usage in CreateImageView With separate stencil usage, we can't just grab the usage from the image directly and have to consider the per-aspect usage instead. Fixes: `1be38f9178` "anv:Use VK_EXT_separate_stencil_usage to avoid..." Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-12 16:13:48 +00:00
Samuel Pitoiset	b393b2ce95	radv/gfx10: emit DISABLE_CONSERVATIVE_ZPASS_COUNTS Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 17:47:12 +02:00
Samuel Pitoiset	8cc4e4a81e	radv/gfx10: init more registers in the graphics preamble Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 17:47:12 +02:00
Samuel Pitoiset	e68b55f5e3	radv/gfx10: set HS/GS/CS.WGP_MODE Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 17:47:12 +02:00
Samuel Pitoiset	5d5e26230a	radv/gfx10: emit GE_PC_ALLOC Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 17:47:11 +02:00
Samuel Pitoiset	df062afa03	radv/gfx10: enable vertex shaders without export parameters GFX10 allows this. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 17:47:11 +02:00
Samuel Pitoiset	3f76c0f47c	radv/gfx10: launch 2 compute waves per CU before going onto the next CU Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 17:47:11 +02:00
Samuel Pitoiset	e631d65fc6	radv: use ac_get_compute_resource_limits() No behaviour change. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 17:47:11 +02:00
Samuel Pitoiset	e510c5ee3b	ac: import ac_get_compute_resource_limits() from RadeonSI Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 17:47:11 +02:00
Alyssa Rosenzweig	5f4f8aec74	panfrost: Initialize shift/extra_flags Don't rely on them being preinitialized to zero; this can cause junk to appear on the wire. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-12 07:38:37 -07:00
Alyssa Rosenzweig	6d8490f900	panfrost: Fix build warnings A bunch of these are from asserts not being compiled in 32-bit mode (once Erik's ASSERTABLE stuff is merged, we'll want to switch). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-12 07:38:37 -07:00
Samuel Pitoiset	37aefb2be1	radv/gfx10: invalidate everything in L2 when shaders read data This includes metadata as well. On GFX10, we have to invalidate the L2 metadata cache when shaders read DCC. Note that we still have to implement GFX10 coherency by introducing INV_L2_METATADA but for now just flush L2. This fixes a corruption with DCC and Talos. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 14:08:12 +02:00
Samuel Pitoiset	4e38322dd8	radv/gfx10: fix wrong emission of GE_CNTL Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 12:15:08 +02:00
Samuel Pitoiset	219d6939df	radv: add more assertions to make sure packets are correctly emitted Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 12:15:06 +02:00
Alejandro Piñeiro	85b78f96a6	v3d: use inc/dec tmu operation with image atomic sub/add of 1 This allows to remove a mov of 1/-1, as it is implicit with the operation. As with atomic inc/dec/add, usual shader-db set doesn't include any GLES shader using it. So using as workaround vk-gl-cts shaders, we get this: total instructions in shared programs: 1217013 -> 1217006 (<.01%) instructions in affected programs: 53 -> 46 (-13.21%) helped: 2 HURT: 0 One of the helped shader went from 40 to 34 instructions. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 11:51:22 +02:00
Alejandro Piñeiro	2e22879115	v3d: refactor some code from v3d40_vir_emit_image_load_store And moved to new auxiliar method v3d40_image_load_store_tmu_op, equivalent to the nir_to_nir v3d_general_tmu_op, to clean-up a little. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 11:49:29 +02:00
Alejandro Piñeiro	934ce48db8	v3d: use inc/dec tmu operation with atomic sub/add of 1 Among other things, this avoid the need of loading 1/-1 constants (so one less operation). The removed comment suggest the option of adding support on NIR for inc/dec. Intel just uses an auxiliar method to get which hw operation is needed, so no lowering is needed. And at the same time, being so small, seems unreasonable to try to add a general one on NIR itself. It is more easy to just adapt the method here (that is what the patch does right now). It is worth to note that we are not getting any change on shader-db stats because all those methods are used on the usual shader-db set with shaders needing GLSL > 4.2. In general there aren't too many GLSL ES 3.1 tests. As an alternative, we captured the GLES3/GLSL31/GLS32 used on vk-gl-cts, even if that is not a real life usage of shaders. With those we get the following: total instructions in shared programs: 1217022 -> 1217013 (<.01%) instructions in affected programs: 117 -> 108 (-7.69%) helped: 6 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.50 x̃: 1 helped stats (rel) min: 3.57% max: 10.00% x̄: 8.09% x̃: 9.09% 95% mean confidence interval for instructions value: -2.07 -0.93 95% mean confidence interval for instructions %-change: -10.54% -5.64% Instructions are helped. Note that the shaders helped are really low because most of the vk-gl-cts tests using AtomicInc/Dec/Add are mostly used on compute shaders. Although right now there is a branch around with CS support, the usual is doing the stats against master. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 11:48:40 +02:00
Alejandro Piñeiro	3912a32a79	v3d: remove redefinition of tmu operations on nir_to_vir They are already defined, although is a slightly different format on the generated packet headers, so it was needed to change how it is used on nir_to_vir. In addition to allow to remove some duplicated headers, it will allow to define just one get_op_for_atomic_add aux method later to support using inc/dec instead of add of 1/-1. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 11:48:17 +02:00
Alejandro Piñeiro	c2ff38d2df	v3d: tweak initial comment on pack generator script As the files it mentions to use as reference has slightly different names. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 11:48:09 +02:00
Yevhenii Kolesnikov	8c5692b696	glsl/link_varyings: Fix hash table leak Hash tables were not destroyed at return. v2: Use ralloc_context (Eric Anholt) Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-12 11:07:08 +03:00
Kenneth Graunke	712ac83033	iris: Simplify devinfo access in calculate_result_on_gpu() We have devinfo, no need for screen->devinfo.	2019-07-12 00:33:19 -07:00
Iago Toral Quiroga	10d50f2904	v3d: remove unused definitions Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	8e50a9f6cf	v3d: move implementation of some intrinsics to separate helpers Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	d69184204e	v3d: emit correct lowering for logic ops with RGB10A2 render targets Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	7bf3676845	v3d: emit correct lowering for logic ops with integer render targets Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	e540775f0c	v3d: add lowering for OpenGL logic operations This implements support for OpenGL logic operations by emitting code to read from the TLB if needed and blending the fragment output accordingly. It is similar to VC4's blend lowering pass, but exclusive to logic operations, since blending is otherwise supported in hardware. The pass doesn't handle MSAA targets yet. Fixes the following piglit tests: spec/!opengl 1.0/gl-1.0-logicop/* spec/!opengl 1.1/gl-1.1-xor spec/!opengl 1.1/gl-1.1-xor-copypixels It also fixes text cursor rendering in Libreoffice with the GTK+2 theme, which is rendered via glamor using the XOR logic operation. v2: fix checks for allowed variable location and maximum render target (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	7c1d708911	v3d: acquire scoreboard lock before first tlb read Until now we have always been emitting our scoreboard locks on the last thread switch to improve parallelism. We did this by emitting our last thread switch right before our tlb writes at the very end of the program, where we know that we are outside control flow. Unfortunately, this strategy is not valid when we have tlb color reads too, as these will happen before this point in the program and can happen inside control flow. To fix this we always emit a thread switch before the first tlb load and if we see additional thread switches after that point, we change the strategy to lock on the first thread switch. v2: change the solution so it is expected to work in more scenarios (Eric). Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	47d7c80dc7	v3d: implement tile buffer color read intrinsic We will be emitting this intrinsic to signal TLB color loads when we implement OpenGL logic operations, where we need to blend the fragment shader color output with the existing color in the render target. Per-sample TLB reads are not supported yet. v2: fix the offset into the color_reads array (Eric). Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	b0eec9e27d	nir: add a new v3d-specific intrinsic for tile buffer color reads This is intended to be used, for example, with OpenGL logic operations. It takes a render target as source and a sample index in the base index for MSAA color reads. v2: drop the CAN_ELIMINATE and CAN_REORDER flags (Eric). Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	6af1bdefa9	v3d: fix size of color_reads and sample_colors arrays We need to scale the size of these arrays to consider up to V3D_MAX_DRAW_BUFFERS render targets and 4 components per color. v2: we want to store each color component separately, so scale by 4 too. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	0279ac6e51	v3d: add color formats and swizzles to the fragment shader key We are going to need these very soon to emit correct reads from the tlb to implement logic operations. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	d26b35ba44	v3d: add helpers to emit ldtlb and ldtlbu signals The ldtlbu version will read an implicit uniform with the TLB read specifier and should be used for the first read in a sequence of TLB reads (unless the default configuration is valid, in which case we can use ldtlb). The ldtlb version is used for any subsequent TLB read in the sequence. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	aff8885cf9	v3d: handle tlb read dependency tracking as if they were writes Tile buffer reads are emitted as ordered sequences and cannot be reordered. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	4793e2c888	v3d: instructions with the ldtlb and ldtlbu signals are tlb instructions Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	83a66e10de	v3d: tlb loads cannot be removed Loads from the tile buffer are emitted in ordered sequences so we cannot eliminate or reorder any of them. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	08f4dc3adc	v3d: the ldtlbu signal reads an implicit uniform Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	271bc8acfb	v3d: handle ldtlb and ldtlbu signals during disassembly We already have code to print these signals but the early return in the code that checks if any signals are present present was missing the checks for them, so it would skip printing them unless they were paired with other signals. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Samuel Pitoiset	958ee4c21a	radv: report shader stage name when dumping LLVM IR For debugging purposes. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 08:19:53 +02:00
Samuel Pitoiset	2b6a089813	radv: tidy up radv_get_shader_name() and add NGG stages Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 08:19:53 +02:00
Samuel Pitoiset	ffd6a979bf	radv/gfx10: update OVERWRITE_COMBINER_{MRT_SHARING,WATERMARK} DCC related, mirror RadeonSI. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com	2019-07-12 08:19:53 +02:00
Samuel Pitoiset	c6fa4de15d	radv/gfx10: do not set alignment on the ngg_emit pointer This is invalid and this fixes a crash in LLVM. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 08:19:53 +02:00
Samuel Pitoiset	df0a23ad1e	radv/gfx10: fix exporting clip/cull distances for GS This fixes dEQP-VK.clipping.user_defined.clip_distance.geom. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 08:19:53 +02:00
Samuel Pitoiset	edcd2bc833	radv/gfx10: fix exporting the subpass view index for GS This fixes dEQP-VK.multiview.geometry. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-12 08:19:20 +02:00
Timothy Arceri	3043908ccb	mesa: save/restore SSO flag when using ARB_get_program_binary Without this the restored program will fail the pipeline validation checks when we attempt to use an SSO program. Fixes: `c20fd744fe` ("mesa: Add Mesa ARB_get_program_binary helper functions") Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111010	2019-07-12 09:26:53 +10:00
Alyssa Rosenzweig	fe783c5b0c	pan/midgard: Correct component count clamping PSIZ Kind of a funky corner case that does not (as far as I know) apply to organic shaders from GLES but does pop up in generated shaders from the fixed-function desktop pipeline. Fixes: `bb483a9166` ("panfrost: Clamp point size") Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-11 13:30:55 -07:00
Alyssa Rosenzweig	c4e6d759dd	panfrost: Remove unused display target field Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-11 12:48:25 -07:00
Alyssa Rosenzweig	6b9edd2451	panfrost/ci: Update expectations Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-11 12:48:25 -07:00
Samuel Pitoiset	a7b7e94085	radv: only enable the GS copy shader stage if GS is enabled Ooops. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-11 21:44:44 +02:00
Eric Anholt	e1fe98cc7d	freedreno: Add dependency on the xml build to the winsys. The screen header includes the common xml, and otherwise we might race to build before it's done. Fixes: `e03259974e` ("freedreno: Generate headers from xml files") Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-07-11 12:01:01 -07:00
Kenneth Graunke	5445c176e2	iris: Disable SIMD32 when using a 16x MSAA framebuffer. We weren't doing this documented workaround because it's sorta painful.	2019-07-11 11:34:21 -07:00
Ian Romanick	ef7b4fdf3f	nir/algebraic: Recognize open-coded flrp(a, b, a) No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms lack a LRP instruction. v2: Remove flrp@64 cases. Since Gen11 removes flrp@32, it seems unlikely that we'll ever have a flrp@64. Should that occur, the cases can be added back. All Gen6-Gen9 platforms had similar results. (Skylake shown) total instructions in shared programs: 15041996 -> 15041184 (<.01%) instructions in affected programs: 71776 -> 70964 (-1.13%) helped: 312 HURT: 0 helped stats (abs) min: 2 max: 3 x̄: 2.60 x̃: 3 helped stats (rel) min: 0.36% max: 4.55% x̄: 1.75% x̃: 1.28% 95% mean confidence interval for instructions value: -2.66 -2.55 95% mean confidence interval for instructions %-change: -1.89% -1.61% Instructions are helped. total cycles in shared programs: 354303333 -> 354301807 (<.01%) cycles in affected programs: 433742 -> 432216 (-0.35%) helped: 206 HURT: 78 helped stats (abs) min: 2 max: 244 x̄: 21.02 x̃: 8 helped stats (rel) min: 0.06% max: 19.59% x̄: 1.72% x̃: 0.82% HURT stats (abs) min: 1 max: 220 x̄: 35.95 x̃: 10 HURT stats (rel) min: 0.07% max: 30.48% x̄: 2.53% x̃: 0.56% 95% mean confidence interval for cycles value: -10.68 -0.06 95% mean confidence interval for cycles %-change: -0.99% -0.12% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Ian Romanick	0c2b3a7fc0	nir/algebraic: Rearrange 1-((1-a) * (1-b)) into flrp-friendly form No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms lack a LRP instruction. v2: Convert the pattern directly to flrp. There were negligible improvements on Gen4 and Gen5, and Gen11 was actually hurt. I believe the problem is this optimization conflicts with the (1-x)*y => ffma(-x, y, y) optimization on Gen11. Skylake total instructions in shared programs: 15046487 -> 15041996 (-0.03%) instructions in affected programs: 194681 -> 190190 (-2.31%) helped: 880 HURT: 20 helped stats (abs) min: 1 max: 19 x̄: 5.13 x̃: 4 helped stats (rel) min: 0.19% max: 36.36% x̄: 4.85% x̃: 3.33% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.11% max: 1.06% x̄: 0.28% x̃: 0.17% 95% mean confidence interval for instructions value: -5.25 -4.73 95% mean confidence interval for instructions %-change: -5.11% -4.36% Instructions are helped. total cycles in shared programs: 354340839 -> 354303333 (-0.01%) cycles in affected programs: 1753622 -> 1716116 (-2.14%) helped: 786 HURT: 182 helped stats (abs) min: 1 max: 1842 x̄: 56.52 x̃: 22 helped stats (rel) min: 0.03% max: 43.17% x̄: 3.90% x̃: 2.84% HURT stats (abs) min: 1 max: 440 x̄: 37.99 x̃: 9 HURT stats (rel) min: 0.03% max: 29.37% x̄: 1.96% x̃: 0.32% 95% mean confidence interval for cycles value: -45.90 -31.59 95% mean confidence interval for cycles %-change: -3.09% -2.50% Cycles are helped. All Gen6-Gen8 platforms had similar results. (Broadwell shown) total instructions in shared programs: 15055907 -> 15051466 (-0.03%) instructions in affected programs: 196370 -> 191929 (-2.26%) helped: 871 HURT: 26 helped stats (abs) min: 1 max: 19 x̄: 5.13 x̃: 4 helped stats (rel) min: 0.19% max: 36.36% x̄: 4.76% x̃: 3.27% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.11% max: 1.06% x̄: 0.24% x̃: 0.12% 95% mean confidence interval for instructions value: -5.21 -4.69 95% mean confidence interval for instructions %-change: -4.99% -4.24% Instructions are helped. total cycles in shared programs: 387729170 -> 387699745 (<.01%) cycles in affected programs: 1816409 -> 1786984 (-1.62%) helped: 788 HURT: 172 helped stats (abs) min: 1 max: 662 x̄: 47.29 x̃: 22 helped stats (rel) min: 0.03% max: 31.26% x̄: 3.55% x̃: 2.76% HURT stats (abs) min: 1 max: 404 x̄: 45.59 x̃: 14 HURT stats (rel) min: 0.03% max: 22.92% x̄: 1.53% x̃: 0.43% 95% mean confidence interval for cycles value: -35.69 -25.61 95% mean confidence interval for cycles %-change: -2.88% -2.40% Cycles are helped. total fills in shared programs: 34712 -> 34710 (<.01%) fills in affected programs: 7 -> 5 (-28.57%) helped: 1 HURT: 0 LOST: 0 GAINED: 2 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Ian Romanick	09705747d7	nir/algebraic: Reassociate fadd into fmul in DPH-like pattern Moving the add to the other end of the sequence allows it to be fused into an FMA. Ice Lake total instructions in shared programs: 17173074 -> 16933147 (-1.40%) instructions in affected programs: 7938745 -> 7698818 (-3.02%) helped: 35583 HURT: 90 helped stats (abs) min: 1 max: 716 x̄: 6.75 x̃: 6 helped stats (rel) min: 0.10% max: 53.04% x̄: 5.29% x̃: 3.45% HURT stats (abs) min: 1 max: 41 x̄: 2.46 x̃: 1 HURT stats (rel) min: 0.32% max: 8.33% x̄: 1.41% x̃: 0.77% 95% mean confidence interval for instructions value: -6.80 -6.65 95% mean confidence interval for instructions %-change: -5.32% -5.22% Instructions are helped. total cycles in shared programs: 360881386 -> 359533568 (-0.37%) cycles in affected programs: 189489144 -> 188141326 (-0.71%) helped: 27250 HURT: 6707 helped stats (abs) min: 1 max: 21997 x̄: 62.15 x̃: 16 helped stats (rel) min: <.01% max: 70.69% x̄: 4.04% x̃: 2.35% HURT stats (abs) min: 1 max: 3507 x̄: 51.56 x̃: 14 HURT stats (rel) min: <.01% max: 77.26% x̄: 2.72% x̃: 1.27% 95% mean confidence interval for cycles value: -44.70 -34.68 95% mean confidence interval for cycles %-change: -2.75% -2.65% Cycles are helped. total spills in shared programs: 8943 -> 8829 (-1.27%) spills in affected programs: 625 -> 511 (-18.24%) helped: 6 HURT: 3 total fills in shared programs: 21815 -> 21719 (-0.44%) fills in affected programs: 1653 -> 1557 (-5.81%) helped: 7 HURT: 10 LOST: 11 GAINED: 3 Skylake and Broadwell had similar results. (Skylake shown) total instructions in shared programs: 15271996 -> 15040882 (-1.51%) instructions in affected programs: 7193699 -> 6962585 (-3.21%) helped: 33985 HURT: 30 helped stats (abs) min: 1 max: 260 x̄: 6.80 x̃: 6 helped stats (rel) min: 0.10% max: 30.00% x̄: 5.54% x̃: 3.85% HURT stats (abs) min: 1 max: 41 x̄: 4.00 x̃: 3 HURT stats (rel) min: 0.20% max: 2.16% x̄: 1.46% x̃: 1.72% 95% mean confidence interval for instructions value: -6.87 -6.72 95% mean confidence interval for instructions %-change: -5.59% -5.48% Instructions are helped. total cycles in shared programs: 355520785 -> 354253799 (-0.36%) cycles in affected programs: 185869148 -> 184602162 (-0.68%) helped: 25824 HURT: 6287 helped stats (abs) min: 1 max: 21997 x̄: 61.66 x̃: 16 helped stats (rel) min: <.01% max: 42.05% x̄: 4.18% x̃: 2.41% HURT stats (abs) min: 1 max: 3327 x̄: 51.76 x̃: 14 HURT stats (rel) min: <.01% max: 101.62% x̄: 2.80% x̃: 1.28% 95% mean confidence interval for cycles value: -44.70 -34.21 95% mean confidence interval for cycles %-change: -2.87% -2.76% Cycles are helped. total spills in shared programs: 8835 -> 8818 (-0.19%) spills in affected programs: 613 -> 596 (-2.77%) helped: 5 HURT: 2 total fills in shared programs: 21738 -> 21744 (0.03%) fills in affected programs: 1348 -> 1354 (0.45%) helped: 5 HURT: 11 LOST: 0 GAINED: 12 Haswell total instructions in shared programs: 13447102 -> 13381508 (-0.49%) instructions in affected programs: 3770735 -> 3705141 (-1.74%) helped: 11999 HURT: 29 helped stats (abs) min: 1 max: 409 x̄: 5.60 x̃: 3 helped stats (rel) min: 0.10% max: 20.00% x̄: 2.38% x̃: 1.87% HURT stats (abs) min: 3 max: 750 x̄: 54.90 x̃: 3 HURT stats (rel) min: 0.12% max: 125.30% x̄: 9.96% x̃: 1.82% 95% mean confidence interval for instructions value: -5.71 -5.19 95% mean confidence interval for instructions %-change: -2.39% -2.30% Instructions are helped. total cycles in shared programs: 376342236 -> 375690458 (-0.17%) cycles in affected programs: 155699021 -> 155047243 (-0.42%) helped: 8397 HURT: 2876 helped stats (abs) min: 1 max: 20248 x̄: 109.87 x̃: 18 helped stats (rel) min: <.01% max: 40.71% x̄: 2.23% x̃: 1.49% HURT stats (abs) min: 1 max: 15414 x̄: 94.15 x̃: 22 HURT stats (rel) min: <.01% max: 432.49% x̄: 3.15% x̃: 1.41% 95% mean confidence interval for cycles value: -67.64 -48.00 95% mean confidence interval for cycles %-change: -0.99% -0.74% Cycles are helped. total spills in shared programs: 23134 -> 23184 (0.22%) spills in affected programs: 1675 -> 1725 (2.99%) helped: 13 HURT: 11 total fills in shared programs: 34550 -> 34686 (0.39%) fills in affected programs: 1421 -> 1557 (9.57%) helped: 13 HURT: 11 LOST: 0 GAINED: 11 Ivy Bridge total instructions in shared programs: 12019642 -> 11987285 (-0.27%) instructions in affected programs: 1532236 -> 1499879 (-2.11%) helped: 5522 HURT: 110 helped stats (abs) min: 1 max: 312 x̄: 6.22 x̃: 3 helped stats (rel) min: 0.16% max: 20.00% x̄: 2.46% x̃: 1.88% HURT stats (abs) min: 1 max: 750 x̄: 18.07 x̃: 3 HURT stats (rel) min: 0.09% max: 125.30% x̄: 3.42% x̃: 1.15% 95% mean confidence interval for instructions value: -6.25 -5.24 95% mean confidence interval for instructions %-change: -2.43% -2.26% Instructions are helped. total cycles in shared programs: 180214667 -> 179761900 (-0.25%) cycles in affected programs: 31448723 -> 30995956 (-1.44%) helped: 7191 HURT: 2838 helped stats (abs) min: 1 max: 17680 x̄: 88.47 x̃: 17 helped stats (rel) min: <.01% max: 50.45% x̄: 2.16% x̃: 1.40% HURT stats (abs) min: 1 max: 15540 x̄: 64.63 x̃: 24 HURT stats (rel) min: 0.02% max: 435.17% x̄: 3.10% x̃: 1.51% 95% mean confidence interval for cycles value: -53.34 -36.95 95% mean confidence interval for cycles %-change: -0.81% -0.53% Cycles are helped. total spills in shared programs: 3599 -> 3642 (1.19%) spills in affected programs: 1180 -> 1223 (3.64%) helped: 12 HURT: 2 total fills in shared programs: 4031 -> 4162 (3.25%) fills in affected programs: 876 -> 1007 (14.95%) helped: 12 HURT: 2 LOST: 6 GAINED: 5 Sandy Bridge total instructions in shared programs: 10850686 -> 10822890 (-0.26%) instructions in affected programs: 1247986 -> `1220190` (-2.23%) helped: 4699 HURT: 102 helped stats (abs) min: 1 max: 104 x̄: 6.02 x̃: 3 helped stats (rel) min: 0.15% max: 17.65% x̄: 2.44% x̃: 1.88% HURT stats (abs) min: 1 max: 16 x̄: 4.70 x̃: 3 HURT stats (rel) min: 0.09% max: 3.85% x̄: 1.11% x̃: 1.10% 95% mean confidence interval for instructions value: -6.10 -5.47 95% mean confidence interval for instructions %-change: -2.42% -2.30% Instructions are helped. total cycles in shared programs: 154044149 -> 153920095 (-0.08%) cycles in affected programs: 26037392 -> 25913338 (-0.48%) helped: 5974 HURT: 2521 helped stats (abs) min: 1 max: 1802 x̄: 35.42 x̃: 16 helped stats (rel) min: <.01% max: 35.80% x̄: 1.43% x̃: 0.84% HURT stats (abs) min: 1 max: 862 x̄: 34.73 x̃: 20 HURT stats (rel) min: 0.01% max: 36.33% x̄: 1.67% x̃: 0.85% 95% mean confidence interval for cycles value: -16.31 -12.90 95% mean confidence interval for cycles %-change: -0.56% -0.45% Cycles are helped. total spills in shared programs: 2876 -> 2957 (2.82%) spills in affected programs: 592 -> 673 (13.68%) helped: 6 HURT: 35 total fills in shared programs: 3157 -> 3134 (-0.73%) fills in affected programs: 402 -> 379 (-5.72%) helped: 6 HURT: 0 LOST: 5 GAINED: 11 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Ian Romanick	ff9f526de3	nir/algebraic: Recognize open-coded flrp(-1, 1, a) and flrp(1, -1, a) v2: Remove flrp@64 cases. Since Gen11 removes flrp@32, it seems unlikely that we'll ever have a flrp@64. Should that occur, the cases can be added back. v3: Add a couple more patterns that just move the negation around. No shader-db changes Ice Lake, Iron Lake, or GM45 as these platforms lack a LRP instruction. Skylake total instructions in shared programs: 15279687 -> 15256058 (-0.15%) instructions in affected programs: 4344440 -> 4320811 (-0.54%) helped: 23455 HURT: 18 helped stats (abs) min: 1 max: 21 x̄: 1.01 x̃: 1 helped stats (rel) min: 0.02% max: 13.33% x̄: 0.86% x̃: 0.65% HURT stats (abs) min: 1 max: 2 x̄: 1.06 x̃: 1 HURT stats (rel) min: 0.13% max: 1.16% x̄: 0.43% x̃: 0.34% 95% mean confidence interval for instructions value: -1.01 -1.00 95% mean confidence interval for instructions %-change: -0.87% -0.85% Instructions are helped. total cycles in shared programs: 355593755 -> 355339981 (-0.07%) cycles in affected programs: 162089552 -> 161835778 (-0.16%) helped: 20467 HURT: 7158 helped stats (abs) min: 1 max: 2074 x̄: 29.00 x̃: 6 helped stats (rel) min: <.01% max: 35.71% x̄: 1.71% x̃: 0.58% HURT stats (abs) min: 1 max: 4814 x̄: 47.46 x̃: 11 HURT stats (rel) min: <.01% max: 125.43% x̄: 2.88% x̃: 0.98% 95% mean confidence interval for cycles value: -10.39 -7.98 95% mean confidence interval for cycles %-change: -0.57% -0.47% Cycles are helped. total spills in shared programs: 8843 -> 8835 (-0.09%) spills in affected programs: 190 -> 182 (-4.21%) helped: 2 HURT: 0 total fills in shared programs: 21738 -> 21738 (0.00%) fills in affected programs: 372 -> 372 (0.00%) helped: 1 HURT: 1 LOST: 12 GAINED: 22 Broadwell total instructions in shared programs: 15290523 -> 15266818 (-0.16%) instructions in affected programs: 4314738 -> 4291033 (-0.55%) helped: 23391 HURT: 11 helped stats (abs) min: 1 max: 119 x̄: 1.02 x̃: 1 helped stats (rel) min: 0.02% max: 13.33% x̄: 0.86% x̃: 0.65% HURT stats (abs) min: 1 max: 189 x̄: 18.09 x̃: 1 HURT stats (rel) min: 0.11% max: 5.39% x̄: 0.98% x̃: 0.50% 95% mean confidence interval for instructions value: -1.04 -0.99 95% mean confidence interval for instructions %-change: -0.87% -0.85% Instructions are helped. total cycles in shared programs: 388911660 -> 388830827 (-0.02%) cycles in affected programs: 172903324 -> 172822491 (-0.05%) helped: 15601 HURT: 13269 helped stats (abs) min: 1 max: 1986 x̄: 29.18 x̃: 6 helped stats (rel) min: <.01% max: 36.60% x̄: 1.74% x̃: 0.55% HURT stats (abs) min: 1 max: 14904 x̄: 28.21 x̃: 6 HURT stats (rel) min: <.01% max: 102.58% x̄: 1.77% x̃: 0.60% 95% mean confidence interval for cycles value: -4.20 -1.40 95% mean confidence interval for cycles %-change: -0.17% -0.08% Cycles are helped. total spills in shared programs: 23110 -> 23069 (-0.18%) spills in affected programs: 656 -> 615 (-6.25%) helped: 3 HURT: 1 total fills in shared programs: 34399 -> 34398 (<.01%) fills in affected programs: 905 -> 904 (-0.11%) helped: 3 HURT: 1 LOST: 6 GAINED: 23 Haswell total instructions in shared programs: 13465303 -> 13441142 (-0.18%) instructions in affected programs: 3726999 -> 3702838 (-0.65%) helped: 22139 HURT: 347 helped stats (abs) min: 1 max: 43 x̄: 1.11 x̃: 1 helped stats (rel) min: 0.03% max: 10.00% x̄: 1.01% x̃: 0.75% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.35% max: 11.11% x̄: 1.48% x̃: 1.12% 95% mean confidence interval for instructions value: -1.08 -1.07 95% mean confidence interval for instructions %-change: -0.99% -0.96% Instructions are helped. total cycles in shared programs: 376271308 -> 376273090 (<.01%) cycles in affected programs: 167496811 -> 167498593 (<.01%) helped: 13206 HURT: 13281 helped stats (abs) min: 1 max: 3864 x̄: 35.39 x̃: 8 helped stats (rel) min: <.01% max: 53.10% x̄: 2.31% x̃: 0.80% HURT stats (abs) min: 1 max: 3828 x̄: 35.32 x̃: 8 HURT stats (rel) min: <.01% max: 117.85% x̄: 2.88% x̃: 0.61% 95% mean confidence interval for cycles value: -1.33 1.47 95% mean confidence interval for cycles %-change: 0.22% 0.36% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 23158 -> 23134 (-0.10%) spills in affected programs: 24 -> 0 helped: 3 HURT: 0 total fills in shared programs: 34580 -> 34550 (-0.09%) fills in affected programs: 30 -> 0 helped: 3 HURT: 0 LOST: 23 GAINED: 13 Ivy Bridge total instructions in shared programs: 12034154 -> 12014301 (-0.16%) instructions in affected programs: 3636209 -> 3616356 (-0.55%) helped: 18771 HURT: 459 helped stats (abs) min: 1 max: 43 x̄: 1.08 x̃: 1 helped stats (rel) min: 0.03% max: 10.00% x̄: 0.91% x̃: 0.68% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.34% max: 8.33% x̄: 1.43% x̃: 1.11% 95% mean confidence interval for instructions value: -1.04 -1.02 95% mean confidence interval for instructions %-change: -0.86% -0.84% Instructions are helped. total cycles in shared programs: 180186960 -> 180175147 (<.01%) cycles in affected programs: 44652745 -> 44640932 (-0.03%) helped: 12979 HURT: 11033 helped stats (abs) min: 1 max: 5836 x̄: 32.88 x̃: 6 helped stats (rel) min: <.01% max: 53.10% x̄: 2.19% x̃: 0.74% HURT stats (abs) min: 1 max: 4811 x̄: 37.61 x̃: 9 HURT stats (rel) min: <.01% max: 115.18% x̄: 2.99% x̃: 0.69% 95% mean confidence interval for cycles value: -2.29 1.31 95% mean confidence interval for cycles %-change: 0.11% 0.26% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 3623 -> 3599 (-0.66%) spills in affected programs: 24 -> 0 helped: 3 HURT: 0 total fills in shared programs: 4061 -> 4031 (-0.74%) fills in affected programs: 30 -> 0 helped: 3 HURT: 0 LOST: 17 GAINED: 18 Sandy Bridge total instructions in shared programs: 10853968 -> 10834932 (-0.18%) instructions in affected programs: 3769957 -> 3750921 (-0.50%) helped: 17944 HURT: 204 helped stats (abs) min: 1 max: 3 x̄: 1.07 x̃: 1 helped stats (rel) min: 0.02% max: 10.00% x̄: 0.83% x̃: 0.60% HURT stats (abs) min: 1 max: 2 x̄: 1.01 x̃: 1 HURT stats (rel) min: 0.31% max: 9.09% x̄: 1.83% x̃: 0.93% 95% mean confidence interval for instructions value: -1.05 -1.04 95% mean confidence interval for instructions %-change: -0.81% -0.78% Instructions are helped. total cycles in shared programs: 153894864 -> 153885988 (<.01%) cycles in affected programs: 50643925 -> 50635049 (-0.02%) helped: 9361 HURT: 10534 helped stats (abs) min: 1 max: 1966 x̄: 19.42 x̃: 4 helped stats (rel) min: <.01% max: 34.97% x̄: 0.90% x̃: 0.22% HURT stats (abs) min: 1 max: 1371 x̄: 16.42 x̃: 5 HURT stats (rel) min: <.01% max: 55.10% x̄: 0.81% x̃: 0.27% 95% mean confidence interval for cycles value: -1.27 0.38 95% mean confidence interval for cycles %-change: -0.03% 0.04% Inconclusive result (value mean confidence interval includes 0). LOST: 6 GAINED: 24 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Ian Romanick	1259f6d802	nir: intel/vec4: Add flag to disable some algebraic optimizations A couple patches later in this series use the flag to avoid a few thousand shader-db regresions on all vec4 platforms. I'm not particularly enamored with the name of this flag. However, I suspect the Intel vec4 backend is the only backend that will benefit from it. Specifically, the cases where this helps are all cases where we want to prevent nir_opt_algebraic from rearranging instructions to create 3-source instructions, such as ffma and flrp, with additional immediate value or uniform sources. The earlier commit "intel/vec4: Try to emit a single load for multiple 3-src instruction operands" solves most of the problems caused by additional immediate values, but the restrictions on register strides that cause problems for uniforms and shader inputs persist. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Ian Romanick	3a1fdca5ad	intel/vec4: Try to emit immediate sources for MOV Per the comment in vec4_visitor::nir_emit_load_const, further improvement is possible in this area. That case would be more complicated as I think we'd want to check that all users of the nir_load_const_instr result intended to use the value as float. No shader-db changes on any Gen8+ platform as these platforms do not use the vec4 backend. v2: Massive rebase on `eeebeb211f` ("intel/vec4: Try emitting non-scalar immediates"). This commit is about twice as helpful since `b04beaf41d` ("intel/vec4: Try both sources as candidates for being immediates"). Haswell and Ivy Bridge had similar results. (Haswell shown) total instructions in shared programs: 13478598 -> 13474068 (-0.03%) instructions in affected programs: 589452 -> 584922 (-0.77%) helped: 2773 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 1.63 x̃: 1 helped stats (rel) min: 0.16% max: 5.66% x̄: 0.96% x̃: 0.83% 95% mean confidence interval for instructions value: -1.67 -1.60 95% mean confidence interval for instructions %-change: -0.98% -0.94% Instructions are helped. total cycles in shared programs: 376386916 -> 376369392 (<.01%) cycles in affected programs: 16871628 -> 16854104 (-0.10%) helped: 2293 HURT: 523 helped stats (abs) min: 2 max: 812 x̄: 13.80 x̃: 2 helped stats (rel) min: <.01% max: 10.18% x̄: 1.02% x̃: 0.36% HURT stats (abs) min: 2 max: 316 x̄: 26.99 x̃: 14 HURT stats (rel) min: <.01% max: 19.34% x̄: 2.15% x̃: 1.43% 95% mean confidence interval for cycles value: -7.87 -4.58 95% mean confidence interval for cycles %-change: -0.52% -0.34% Cycles are helped. Sandy Bridge total instructions in shared programs: 10860328 -> 10857675 (-0.02%) instructions in affected programs: 335907 -> 333254 (-0.79%) helped: 1639 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.62 x̃: 1 helped stats (rel) min: 0.10% max: 5.26% x̄: 0.86% x̃: 0.70% 95% mean confidence interval for instructions value: -1.67 -1.57 95% mean confidence interval for instructions %-change: -0.89% -0.84% Instructions are helped. total cycles in shared programs: 153942720 -> 153934120 (<.01%) cycles in affected programs: 5604818 -> 5596218 (-0.15%) helped: 1494 HURT: 97 helped stats (abs) min: 2 max: 256 x̄: 7.84 x̃: 2 helped stats (rel) min: 0.01% max: 6.62% x̄: 0.35% x̃: 0.18% HURT stats (abs) min: 2 max: 160 x̄: 32.02 x̃: 20 HURT stats (rel) min: 0.02% max: 3.37% x̄: 0.88% x̃: 0.56% 95% mean confidence interval for cycles value: -6.45 -4.36 95% mean confidence interval for cycles %-change: -0.32% -0.23% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8139378 -> 8137267 (-0.03%) instructions in affected programs: 265616 -> 263505 (-0.79%) helped: 1148 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.84 x̃: 1 helped stats (rel) min: 0.22% max: 4.76% x̄: 0.87% x̃: 0.62% 95% mean confidence interval for instructions value: -1.90 -1.78 95% mean confidence interval for instructions %-change: -0.90% -0.83% Instructions are helped. total cycles in shared programs: 188541756 -> 188537540 (<.01%) cycles in affected programs: 9807004 -> 9802788 (-0.04%) helped: 1143 HURT: 4 helped stats (abs) min: 2 max: 10 x̄: 3.70 x̃: 2 helped stats (rel) min: <.01% max: 3.01% x̄: 0.13% x̃: 0.06% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.18% max: 0.18% x̄: 0.18% x̃: 0.18% 95% mean confidence interval for cycles value: -3.80 -3.55 95% mean confidence interval for cycles %-change: -0.14% -0.12% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Ian Romanick	acd7796a07	intel/vec4: Try to emit a VF source in try_immediate_source This commit is also a pre-requisite for the next commit. No shader-db changes on any Gen8+ platform as these platforms do not use the vec4 backend. v2: Massive rebase on `eeebeb211f` ("intel/vec4: Try emitting non-scalar immediates"). This change is a lot less helpful since that commit landed (previously helped 1934 shaders on HSW) because, apparently, a lot of the cases helped by that commit were things like vector loads of { 1.0, 1.0, 1.0 } that were also helped by this commit. Haswell total instructions in shared programs: 13480095 -> 13478598 (-0.01%) instructions in affected programs: 229534 -> 228037 (-0.65%) helped: 1006 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 1.49 x̃: 1 helped stats (rel) min: 0.04% max: 3.45% x̄: 1.11% x̃: 1.09% 95% mean confidence interval for instructions value: -1.54 -1.43 95% mean confidence interval for instructions %-change: -1.15% -1.07% Instructions are helped. total cycles in shared programs: 376385734 -> 376386916 (<.01%) cycles in affected programs: 14101380 -> 14102562 (<.01%) helped: 941 HURT: 56 helped stats (abs) min: 2 max: 322 x̄: 5.62 x̃: 2 helped stats (rel) min: <.01% max: 7.74% x̄: 0.51% x̃: 0.42% HURT stats (abs) min: 2 max: 618 x̄: 115.50 x̃: 32 HURT stats (rel) min: 0.03% max: 4.62% x̄: 0.83% x̃: 0.44% 95% mean confidence interval for cycles value: -2.06 4.43 95% mean confidence interval for cycles %-change: -0.47% -0.39% Inconclusive result (value mean confidence interval includes 0). Ivy Bridge total instructions in shared programs: 12048004 -> 12046589 (-0.01%) instructions in affected programs: 217072 -> 215657 (-0.65%) helped: 934 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 1.51 x̃: 1 helped stats (rel) min: 0.04% max: 3.45% x̄: 1.14% x̃: 1.11% 95% mean confidence interval for instructions value: -1.57 -1.46 95% mean confidence interval for instructions %-change: -1.18% -1.10% Instructions are helped. total cycles in shared programs: 180285854 -> 180287608 (<.01%) cycles in affected programs: 14103824 -> 14105578 (0.01%) helped: 871 HURT: 53 helped stats (abs) min: 2 max: 322 x̄: 5.51 x̃: 2 helped stats (rel) min: <.01% max: 7.67% x̄: 0.50% x̃: 0.42% HURT stats (abs) min: 2 max: 618 x̄: 123.66 x̃: 32 HURT stats (rel) min: 0.03% max: 4.47% x̄: 0.92% x̃: 0.46% 95% mean confidence interval for cycles value: -1.60 5.39 95% mean confidence interval for cycles %-change: -0.46% -0.37% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10861227 -> 10860328 (<.01%) instructions in affected programs: 92969 -> 92070 (-0.97%) helped: 624 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 1.44 x̃: 1 helped stats (rel) min: 0.11% max: 3.45% x̄: 1.05% x̃: 0.95% 95% mean confidence interval for instructions value: -1.52 -1.36 95% mean confidence interval for instructions %-change: -1.09% -1.01% Instructions are helped. total cycles in shared programs: 153944316 -> 153942720 (<.01%) cycles in affected programs: 1640956 -> 1639360 (-0.10%) helped: 601 HURT: 15 helped stats (abs) min: 2 max: 120 x̄: 3.56 x̃: 2 helped stats (rel) min: 0.02% max: 6.33% x̄: 0.18% x̃: 0.08% HURT stats (abs) min: 2 max: 72 x̄: 36.13 x̃: 36 HURT stats (rel) min: 0.05% max: 3.84% x̄: 1.95% x̃: 2.00% 95% mean confidence interval for cycles value: -3.44 -1.74 95% mean confidence interval for cycles %-change: -0.18% -0.09% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8139924 -> 8139378 (<.01%) instructions in affected programs: 69776 -> 69230 (-0.78%) helped: 322 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 1.70 x̃: 1 helped stats (rel) min: 0.27% max: 3.23% x̄: 0.79% x̃: 0.54% 95% mean confidence interval for instructions value: -1.88 -1.51 95% mean confidence interval for instructions %-change: -0.85% -0.72% Instructions are helped. total cycles in shared programs: 188542864 -> 188541756 (<.01%) cycles in affected programs: 3031532 -> 3030424 (-0.04%) helped: 320 HURT: 0 helped stats (abs) min: 2 max: 20 x̄: 3.46 x̃: 2 helped stats (rel) min: <.01% max: 0.69% x̄: 0.06% x̃: 0.06% 95% mean confidence interval for cycles value: -3.85 -3.07 95% mean confidence interval for cycles %-change: -0.06% -0.05% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Ian Romanick	365b45d571	intel/vec4: Try to emit a single load for multiple 3-src instruction operands If a 3-source instruction uses immediate values 1.0 and -1.0, just load 1.0 into a register. Use the negation source modifier to get -1.0. This has trivial impact now, but it prevents a few thousand regressions on vec4 platforms with "nir/algebraic: Recognize open-coded flrp(-1, 1, a) and flrp(1, -1, a)" All Gen6 and Gen7 platforms had similar results. (Haswell shown) total instructions in shared programs: 13487412 -> 13487406 (<.01%) instructions in affected programs: 541 -> 535 (-1.11%) helped: 6 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.36% max: 2.08% x̄: 1.65% x̃: 1.80% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -2.33% -0.97% Instructions are helped. total cycles in shared programs: 376402564 -> 376402500 (<.01%) cycles in affected programs: 10348 -> 10284 (-0.62%) helped: 10 HURT: 1 helped stats (abs) min: 2 max: 26 x̄: 7.00 x̃: 2 helped stats (rel) min: 0.13% max: 2.05% x̄: 0.89% x̃: 0.79% HURT stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6 HURT stats (rel) min: 0.29% max: 0.29% x̄: 0.29% x̃: 0.29% 95% mean confidence interval for cycles value: -11.72 0.08 95% mean confidence interval for cycles %-change: -1.20% -0.36% Inconclusive result (value mean confidence interval includes 0). No shader-db changes on any other Intel platform. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Ian Romanick	6f6bc842f6	intel/vec4: Refactor operand fixing for ffma and flrp Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-11 10:20:03 -07:00
Alyssa Rosenzweig	8305766e0e	panfrost: Wire up GLES2-class polygon offset Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-11 09:40:47 -07:00
Alyssa Rosenzweig	7a36c72f5d	pan/decode: Depth units/factor are identical to GL I'm not sure why I thoughtt here was an off-by-one, other than maybe bad data collection. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-11 09:40:47 -07:00
Christian Gmeiner	a7153ebcd3	etnaviv: remove dead translate_ts_sampler_format(..) declaration Fixes: `66411521ea` ("etnaviv: combine translate_ts_sampler_format/translate_msaa_format") Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jonathan Marek <jonathan@marek.ca>	2019-07-11 17:51:15 +02:00
Caio Marcelo de Oliveira Filho	b390ff3517	intel/fs: Add support for SLM fence in Gen11 Gen11 SLM is not on L3 anymore, so now the hardware has two separate fences. Add a way to control which fence types to use. At this time, we don't have enough information in NIR to control the visibility of the memory being fenced, so for now be conservative and assume that fences will need a stall. With more information later we'll be able to reduce those. Fixes Vulkan CTS tests in ICL: dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_nonlocal.workgroup.guard_local.buffer.comp dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.buffer.guard_nonlocal.workgroup.comp dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.image.guard_nonlocal.workgroup.comp dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.buffer.guard_nonlocal.workgroup.comp dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.image.guard_nonlocal.workgroup.comp The whole set of supported tests in dEQP-VK.memory_model.* group should be passing in ICL now. v2: Pass BTI around instead of having an enum. (Jason) Emit two SHADER_OPCODE_MEMORY_FENCE instead of one that gets transformed into two. (Jason) List tests fixed. (Lionel) v3: For clarity, split the decision of which fences to emit from the emission code. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-11 08:29:32 -07:00
Tomeu Vizoso	838374b6dd	Revert "panfrost/midgard: Use _safe iterator" This reverts commit `812ce2ce9e`. We massively regress with the reverted patch. So in the meantime, take it out. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-07-11 16:53:42 +02:00
Alyssa Rosenzweig	507e297431	panfrost: Don't lie about Z/S formats Only Z24S8 is properly supported right now, so let's be careful. Fixes a number of issues relating to improper Z/S handling. The most obvious is depth buffers with incorrect strides, which manifests in truly bizarre ways and can happen commonly with FBOs. Fixes WebGL (Aquarium runs, etc). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-11 14:27:25 +00:00
Samuel Pitoiset	cd403a931f	radv/gfx10: enable geometry shaders Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-11 15:46:02 +02:00
Bas Nieuwenhuizen	0a8ef756d3	radv/gfx10: Fix NGG GS output mask handlings for LDS indexing. In emit_vertex we optimize storage if the output mask does not have all bits set. Do the same in the epilogue so the indices actually match up. Fixes dEQP-VK.geometry.input.basic_primitive.points because it outputs PSIZE with an output mask of 1, which cause the generic attribute for the color to be loaded from the wrong indices. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-11 15:45:59 +02:00
Bas Nieuwenhuizen	f5982917ff	radv/gfx10: Simplify output mask handling for NGG GS. We only ever get in this function for a NGG GS proper. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-11 15:45:58 +02:00
Bas Nieuwenhuizen	7515f41c78	radv/gfx10: Do GS prologue outside of gs_threads if. Mirror radeonsi. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-11 15:45:56 +02:00
Samuel Pitoiset	5bbcb3f5bc	radv/gfx10: implement support for GS as NGG Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-11 15:45:53 +02:00
Bas Nieuwenhuizen	7286865f6d	radv/gfx10: Use correct ES shader for es_vgpr_comp_cnt for GS. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-11 15:45:51 +02:00
Bas Nieuwenhuizen	45b73b3aa9	radv/gfx10: Do not allocate a gs_copy_shader on gfx10. Will use ngg for any gs anyway. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-11 15:45:47 +02:00
Samuel Pitoiset	ef5efb40f4	radv/gfx10: fix VGT_SHADER_STAGES_EN for GS as NGG The driver shouldn't set the copy shader bit. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-11 15:45:43 +02:00
Samuel Pitoiset	8bc3ab6f0c	radv/gfx10: fix number of GS invocations for NGG Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-11 15:45:40 +02:00
Tomeu Vizoso	812ce2ce9e	panfrost/midgard: Use _safe iterator Fixes this assertion: ../mesa/src/panfrost/midgard/midgard_schedule.c:507:schedule_block: Assertion `ins == __next && "use _safe iterator"' failed. Trace/breakpoint trap Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-11 15:06:51 +02:00
Tomeu Vizoso	82ee48e5ef	panfrost: Place the height value in the height field In the mali_single_framebuffer descriptor. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> v2: Remove unwanted chunks	2019-07-11 15:06:47 +02:00
Samuel Pitoiset	022b1f4190	radv/gfx10: fix maximum number of mip levels for 3D images The dimensions also have to be adjusted if the number of supported mip levels is changed. This fixes dEQP-VK.api.info.image_format_properties.3d.*. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-11 14:44:47 +02:00
Samuel Pitoiset	f3dfdd4091	radv/gfx10: disable TC-compat HTILE for multisampled D32_SFLOAT format For some reasons D32_SFLOAT is also affected on GFX10, it works fine with previous generations. This fixes some dEQP-VK.renderpass2.depth_stencil_resolve.*. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-11 13:43:21 +02:00
Kenneth Graunke	a01770b9c8	iris: Fix key->input_vertices for 8_PATCH TCS mode. We were failing to flag the program dirty when it changed. Also, we were unnecessarily setting key->input_vertices for SINGLE_PATCH mode, which would reduce program cache hits. Only set it if needed.	2019-07-11 01:18:24 -07:00
Kenneth Graunke	c58f52f0ef	iris: Only set key->flat_shade if COL0/COL1 are written. This was just laziness on my part, we already added similar checks in the VS key handling. Just need to do it here too. Should improve cache hits.	2019-07-11 00:12:50 -07:00
Kenneth Graunke	cb82d534a0	iris: Drop comment about var->data.binding not being set. I refactored the sampler lowering passes a long time ago to ensure that gl_nir_lower_samplers_as_deref is run and var->data.binding is set.	2019-07-11 00:12:00 -07:00
Kenneth Graunke	38f9954208	iris: Drop comments about missing NOS These stages don't need NOS. If they do, we can add it - the infrastructure is there if we need it someday.	2019-07-11 00:12:00 -07:00
Kenneth Graunke	2bd1234a77	iris: Drop a TODO comment This is literally implemented two lines above.	2019-07-11 00:12:00 -07:00
Neil Roberts	eae06b34ea	glsl/builtin types: Set the precision on the depth range params The members of gl_DepthRangeParameters are declared to be highp in GLSL ES specs. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-11 08:04:54 +02:00
Neil Roberts	74d71dac20	glsl: Add a constructor for glsl_struct_field to specify the precision Adds a third constructor to glsl_struct_field which has an extra parameter to specify the precision. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-11 08:04:54 +02:00
Neil Roberts	014be60398	glsl: Add a macro for the default values for glsl_struct_field There are two constructors for glsl_struct_field with different parameters. Instead of repeating them for both constructors, this patch adds a convenience macro. This will make it easier to add a third constructor in a later patch. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-11 08:04:54 +02:00
Neil Roberts	ca6ee488e9	glsl/builtin_variables: Add a precision to the builtins All of the builtin variables mentioned in the GLSL ES spec and the extensions include a precision declaration which is different depending on what the variable is used for. This patch makes it set the corresponding precision when creating the variable. This will make a difference once we start using the precision information for optimisation. Previously all of the builtin variables ended up with a precision of NONE. v2: Made gl_PointSize and gl_FragCoord highp since GLSL ES 3.00. Fixed gl_MaxViewPorts to always be highp. (Eric Anholt) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-11 08:04:54 +02:00
Kenneth Graunke	ce93bf1876	compiler: Save a single copy of the softfp64 shader in the context. We were recompiling the softfp64 library of functions from GLSL to NIR every time we compiled a shader that used fp64. Worse, we were ralloc stealing it to the GL context. This meant that we'd accumulate lots of copies for the lifetime of the context, which was a big space leak. Instead, we can simply stash a single copy in the GL context, and use it for subsequent compiles. Having a single copy should be fine from a memory context point of view: nir_inline_function_impl already clones the necessary nir_function_impl's as it inlines. KHR-GL45.enhanced_layouts.ssb_member_align_non_power_of_2 was previously OOM'ing a system with 16GB of RAM when using softfp64. Now it finishes much more quickly and uses only ~200MB of RAM. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-07-10 22:14:36 -07:00
Timothy Arceri	ae4ccb67be	radv: fix memory leak when restoring from cache Fixes: `726a31df70` ("radv: Add the concept of radv shader binaries.") Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-11 10:44:29 +10:00
Kristian H. Kristensen	e03259974e	freedreno: Generate headers from xml files Reviewed-by: Eric Engestrom <eric@engestrom.ch> Acked-by: Rob Clark <robdclark@gmail.com>	2019-07-10 22:05:02 +00:00
Samuel Pitoiset	51e2124a4b	radv: switch to the new VS exports path It will help for GS as NGG on GFX10. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-10 23:37:02 +02:00
Samuel Pitoiset	f616d80a7a	radv: set the slot_index correctly for VARYING_SLOT_CLIP_DIST1 For selecting a different SQ_EXP_POS target. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-10 23:36:59 +02:00
Samuel Pitoiset	c4ab33378a	radv: add a new function for exporting VS outputs Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-10 23:36:57 +02:00
Samuel Pitoiset	ac0edc369c	radv: implement new path for exporting generic varyings Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-10 23:36:55 +02:00
Samuel Pitoiset	0b368fc8c3	radv: use the generic export path for clip/cull distances When they are exported to the next stage. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-10 23:36:52 +02:00
Samuel Pitoiset	f653e5c1d6	radv: remove an extra memcpy when exporting clip/cull distances Cleanup. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-10 23:36:50 +02:00
Jason Ekstrand	14781e2122	intel/compiler: Add a "base class" for program keys Right now, all keys have two things in common: a program string ID and a sampler_prog_key_data. I'd like to add another thing or two and need a place to put it. This commit adds a new brw_base_prog_key struct which contains those two common bits. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-10 19:35:55 +00:00
Jason Ekstrand	3a4667e502	i965/program_cache: Cast the key to char * before adding key_size We're about to change the type of key to be brw_base_prog_key and that will mean blindly adding the key size without a cast will lead to the wrong calculation. It's safer to cast to char * first anyway. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-10 19:35:55 +00:00
Jason Ekstrand	bb14abed18	anv: Make the workaround BO a whole page I'm not 100% sure how this ever worked because gem_create usually shoots you if the BO size isn't page-aligned. Reviewed-by: Chad Versace <chadversary@chromium.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-10 19:35:23 +00:00
Jason Ekstrand	6a2ff217b8	anv: Set Stateless Data Port Access MOCS This is the MOCS setting used for the A64 stateless messages which we sometimes use for SSBO operations. Fixes: `48ed2a7bb0` "anv: Implement VK_EXT_buffer_device_address" Fixes: `79fb0d27f3` "anv: Implement SSBOs bindings with GPU addr..." Reviewed-by: Chad Versace <chadversary@chromium.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-10 19:35:23 +00:00
Alyssa Rosenzweig	bb483a9166	panfrost: Clamp point size It's not clear the hardware really has a maximum which confuses dEQP; clamp to whatever we report as our maximum. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 11:30:00 -07:00
Alyssa Rosenzweig	7318b525a2	pan/decode: Auto style $ astyle .c .h --style=linux -s8 Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 10:43:23 -07:00
Alyssa Rosenzweig	ec2a59cd7a	panfrost: Move non-Gallium files outside of Gallium In preparation for a Panfrost-based non-Gallium driver (maybe Vulkan...?), hoist everything except for the Gallium driver into a shared src/panfrost. Practically, that means the compilers, the headers, and pandecode. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 10:43:23 -07:00
Alyssa Rosenzweig	a2d0ea92ba	panfrost: Style main Gallium driver $ astyle .c .h --style=linux -s8 Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 10:43:23 -07:00
Alyssa Rosenzweig	e4bd6fbe51	panfrost/midgard: Apply code styling $ astyle .c .h --style=linux -s8 Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 10:43:23 -07:00
Alyssa Rosenzweig	b4733b2b61	panfrost/nir: Apply NIR style $ astyle .c .h --style=linux -s3 Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 10:43:23 -07:00
Alyssa Rosenzweig	c2c8983cf4	panfrost: Move midgard/nir* to nir folder The reason for doing this is two-fold: 1. These passes are likely to be shared with the Bifrost compiler Therefore, we don't want to restrict them to Midgard 2. The coding style is different (NIR-style vs Panfrost-style) The NIR passes are candidates for moving upstream into compiler/nir, so don't block that off for stylistic reasons Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 10:43:23 -07:00
Alyssa Rosenzweig	ef2d577769	panfrost: Typofix Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 09:45:16 -07:00
Alyssa Rosenzweig	31fc52a4e7	panfrost: Identify shared tiler structure This is identical across SFBD/MFBD so pull it out to allow for better code sharing. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 09:45:16 -07:00
Alyssa Rosenzweig	6eb99c78e2	panfrost/midgard: Drop unnecessary assert Just use the #define instead. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Suggested-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-07-10 09:37:08 -07:00
Alyssa Rosenzweig	c1b109caec	panfrost: Don't expose OES_standard_derivatives This has not been implemented quite yet. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 09:36:03 -07:00
Erik Faye-Lund	39e7fbf24a	gallium: get rid of PIPE_CAP_SM3 PIPE_CAP_SM3 has always been an odd one out of all our caps. While most other caps are fine-grained and single-purpose, this cap encode several features in one. And since OpenGL cares more about single features, it'd be nice to get rid of this one. As it turns, this is now relatively simple. We only really care about three features using this cap, and those already got their own caps. So we can remove it, and make sure all current drivers just give the same response to all of them. The only place we really care about SM3 is in nine, and there we can instead just re-construct the information based on the finer-grained caps. This avoids DX9 semantics from needlessly leaking into all of the drivers, most of who doesn't care a whole lot about DX9 specifically. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 15:50:51 +02:00
Erik Faye-Lund	21de1bf24b	gallium: give vertex-shader saturate its own cap Shader Model 3.0 is a big promise to make to the state-tracker, and for instance mobile hardware might support vertex-shader saturate but not some of the other features of SM3. So let's give this its own cap for simplicity. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-10 15:49:57 +02:00
Erik Faye-Lund	681fa03e8d	gallium: give fragment-shader derivatives its own cap Shader Model 3.0 is a big promise to make to the state-tracker, and for instance mobile hardware might support fragment-shader derivatives but not some of the other features of SM3. So let's give this its own cap for simplicity. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-10 15:49:57 +02:00
Erik Faye-Lund	66ee6661e9	gallium: give fragment-shader texture-lod its own cap Shader Model 3.0 is a big promise to make to the state-tracker, and for instance mobile hardware might support texture lod but not some of the other features of SM3. So let's give this its own cap for simplicity. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-10 15:49:57 +02:00
Erik Faye-Lund	ffbd004686	mesa/st: drop needless has_shader_model3 boolean This boolean is only consulted once during init, so there's nothing much saved by storing this in the context. So let's just check directly when we need it instead. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-10 15:49:57 +02:00
Alyssa Rosenzweig	af2949e928	panfrost: Fix copyright identifier in a few places Oops. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-07-10 06:47:15 -07:00
Alyssa Rosenzweig	629c516b76	panfrost: Bikeshed pan_screen.c comment The asterisks were inherited from... softpipe, maybe? Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-07-10 06:47:13 -07:00
Alyssa Rosenzweig	2f7145a6de	panfrost: Check GPU version before loading Panfrost is known to only work on a select few CPU/GPU combinations at the moment (tested system-on-chips: RK3288, RK3399, and S912). Whitelist the combinations known to work and refuse to load on others where nothing works yet to avoid user confusion. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-07-10 06:47:11 -07:00
Alyssa Rosenzweig	b5de423ac1	panfrost: Be more honest about PIPE_CAPs A lot of the pan_screen.c code was cargoculted from other drivers. The upshot is that we return true for a lot of PIPE_CAPs that we don't actually support, resulting in us exposing way too many extensions that we don't actually support. Be more careful. Some CAPs we do need to fake to access higher dEQP versions (i.e. in order to debug the features we're hiding behind the CAP). For these, we hide the CAP behind a special PAN_MESA_DEBUG=deqp option to avoid apps randomly using these in-development features. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-07-10 06:47:01 -07:00
Alyssa Rosenzweig	b69d5d6e19	panfrost/midgard: Hit missed scheduling opportunity Don't try to schedule to vmul when that can't possible work (forcing a bundle break). glmark: total bundles in shared programs: 2700 -> 2683 (-0.63%) bundles in affected programs: 695 -> 678 (-2.45%) helped: 14 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 1.21 x̃: 1 helped stats (rel) min: 1.27% max: 7.69% x̄: 4.30% x̃: 4.77% 95% mean confidence interval for bundles value: -1.68 -0.75 95% mean confidence interval for bundles %-change: -5.63% -2.97% Bundles are helped. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:45:20 -07:00
Alyssa Rosenzweig	2d739f6b59	panfrost/midgard: Include shader size for shader-db It's easy to forget about, but shader size does matter for things like i-cache, so let's include it in the analysis. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:45:20 -07:00
Alyssa Rosenzweig	7ad6516f3b	panfrost/midgard: Include loop count for shader-db We have to emit it anyway for the report to be happy (with respect to unrolling), so return an actual count rather than dummy numbers. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:45:20 -07:00
Alyssa Rosenzweig	138e40d471	panfrost/midgard: Dump shader-db stats All the kool kids are doing it. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:45:20 -07:00
Alyssa Rosenzweig	a2f1a06a5e	panfrost/midgard: Flush undefineds to zero Fixes a buggy dEQP test. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:45:20 -07:00
Alyssa Rosenzweig	318e9933b1	panfrost/midgard: Specify channel count for broadcasting ops bany/ball type ops read from all 4 channels even though they only write to 1; specify this in the opcode table like we do for dot products. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:45:20 -07:00
Alyssa Rosenzweig	a1a4dfa74b	panfrost/midgard: Don't try to "alias" texture registers It won't work. Just, stop it. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:45:20 -07:00
Samuel Pitoiset	4cadf4309c	radv: compute correct number of input vertices for NGG Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-10 15:17:08 +02:00
Samuel Pitoiset	3303bc8b74	radv: remove extra code for exporting LayerID to the next stage Now that the output usage mask is set to 0x1 the LayerID is correctly exported in the loop above. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-10 15:17:08 +02:00
Samuel Pitoiset	bd86ded027	radv: set the LayerId output usage mask if FS needs it When the stage preceding FS doesn't export it the fragment shader might read it, even if it's 0. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-10 15:17:08 +02:00
Alyssa Rosenzweig	53d64753e1	panfrost: Update supported formats Much of the format selection code was inherited from softpipe (!) of all places, and a lot of it is accordingly cruft. Later if-elses were added in random places to workaround missing formats at various points in history. Clean up some of this. Theoretically, any format we can texture from we can also render to. In practice, there are a few corner cases that we need to disable explicitly. For one, we do have to restrict SCANOUT formats to workaround buggy apps (in particular, dEQP which with --deqp-surface-type=window under Weston will end up with RGB10_A2 and complain about low alpha precision). Just be clearer about how/why. Also, RGB5_A1 support is still broken; let's not worry about that quite yet. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:09 -07:00
Alyssa Rosenzweig	ced132d203	panfrost/mfbd: Cleanup format code selection Rather than have random variables flying around and a long if-else chain, use a switch. They're literally designed for this. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:09 -07:00
Alyssa Rosenzweig	da5382c0d8	panfrost/midgard: Cleanup blend switch Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:08 -07:00
Alyssa Rosenzweig	c0c709a13a	panfrost/mfbd: Handle PIPE_FORMAT_B10G10R10A2_UNORM Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:08 -07:00
Alyssa Rosenzweig	c58c5268da	panfrost/midgard: Handle PIPE_FORMAT_B10G10R10A2_UNORM Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:08 -07:00
Alyssa Rosenzweig	c2ee937cf2	panfrost: Implement ES3-format writeout We add support for writing out (via a blend shader): - RGBA4 - RGB10_A2_UNORM - RGB10_A2_UINT - RGB5_A1_UNORM - R11G11B10_FLOAT Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:08 -07:00
Alyssa Rosenzweig	46396af1ec	panfrost: Refactor blend infrastructure We would like to permit keying blend shaders against the framebuffer format, which requires some new blending abstractions. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:08 -07:00
Alyssa Rosenzweig	c9af7701d1	panfrost/midgard: Use unsigned blend patch offset We would like the offset field to be unsigned, letting 0 represent "no offset" and positive represent an offset. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:08 -07:00
Alyssa Rosenzweig	6def428f10	panfrost/midgard: Handle pure int formats I'm not sure I'm totally comfortable with this, but conceptually neither float nor pure-int formats require any format conversion, except size conversion. Going from a shaderable format (fp32 or i16, for instance) into a blendable format (fp16) is a separate question, one we can defer momentarily while we're not interested in actually blending. As an aside, I'd be fascinated by an integer-based blending implementation. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:08 -07:00
Alyssa Rosenzweig	280c777fd7	panfrost/mfbd: Handle pure int formats We see that the render target itself turns out to be typeless (surprise!) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:08 -07:00
Alyssa Rosenzweig	7647e56c1f	panfrost: Set rt_count_2 for bpp>4 formats Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:07 -07:00
Alyssa Rosenzweig	0c619210b2	panfrost/midgard: Implement preliminary float converters We'll need some careful handling, but for now, get some baseline code out for handling float formats in a blend shader. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:06 -07:00
Alyssa Rosenzweig	5849c85008	panfrost/midgard: Skip blend for REPLACE (shader) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:06 -07:00
Alyssa Rosenzweig	5e825f5cad	panfrost: Handle "blend disabled" blend shaders Normally, disabled blend can definitely be fixed-function'd away, but if a blend shader is used merely for format conversion rather than blending, this code path can be nevertheless hit. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:05 -07:00
Alyssa Rosenzweig	27e0c8c15d	panfrost: Route format through fixed-function blending Not all framebuffer formats are supported by the fixed-function blender. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:05 -07:00
Alyssa Rosenzweig	e7551c1bff	panfrost: Pipe framebuffer format around Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:05 -07:00
Alyssa Rosenzweig	74fd914a89	panfrost/midgard: Use Gallium framebuffer formats Ideally, we would keep Galliumisms far away from the compiler; unfortunately, Mesa hasn't standardized on system of format codes to be shared across APIs and across drivers, so using Gallium formats is our best bet in the short run. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:05 -07:00
Alyssa Rosenzweig	2157fe967a	panfrost/midgard: Use fp16 exclusively while blending We now have some preliminary fp16 support available. We're not able to expose this for GLSL quite yet, but for internal blend shaders, we're able to do control bitness ourselves just fine. So let's fp16 that stuff! Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:05 -07:00
Alyssa Rosenzweig	0cfa54801e	panfrost/midgard: Remove opt_copy_prop_tex Eventually this should be replaced by proper tex RA / not emitting so many silly moves to begin with / better general copy prop. For now remove it since it breaks things. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:05 -07:00
Alyssa Rosenzweig	b113be7683	panfrost/midgard: Fix scalarification Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:05 -07:00
Alyssa Rosenzweig	e92caad744	panfrost/midgard: Handle fp16 in embedded_to_inline_constants Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:05 -07:00
Alyssa Rosenzweig	3dbedb26f5	panfrost/midgard: Eliminate redundant type convert Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:05 -07:00
Alyssa Rosenzweig	64df54d894	panfrost/midgard: Fix fp16 embedded constants Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:05 -07:00
Alyssa Rosenzweig	f8b18a4277	panfrost/midgard: Hoist mask field Share a single mask field in midgard_instruction with a unified format, rather than using separate masks for each instruction tag with hardware-specific formats. Eliminates quite a bit of duplicated code and will enable vec8/vec16 masks as well (which don't map as cleanly to the hardware as we might like). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:04 -07:00
Alyssa Rosenzweig	e69cf1fed9	panfrost/midgard: Allow fp16 in scalar ALU The packing is a little different, so implement that. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:04 -07:00
Alyssa Rosenzweig	d8c084d2ca	panfrost/midgard: Implement f2u16 and friends Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:04 -07:00
Alyssa Rosenzweig	954c6afa3e	panfrost/midgard: Implement f2f16/f2f32 These conversions handle half-floats within the shader. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:04 -07:00
Alyssa Rosenzweig	0ed8cca008	panfrost/midgard: Verify src_bitsize == dst_bitsize We can handle differing, but we'd prefer not to because there are restrictions on sizing which aren't accounted for yet. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:04 -07:00
Alyssa Rosenzweig	1686ef8655	panfrost/midgard: Simplify blend read It's not clear where the extra indirection was from (older hardware or just older blobs?) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:04 -07:00
Alyssa Rosenzweig	952993d3bb	panfrost/midgard: NIRify blend load scale/convert The scale and type-convert can now be expressed in NIR, rather than MIR, which is significantly more maintainable and demonstrates correctness of the type conversion patches. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:04 -07:00
Alyssa Rosenzweig	ae42991b83	panfrost/midgard: Fix blend constant scheduling bug Blend constant conflicts run in two directions. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:04 -07:00
Alyssa Rosenzweig	7f807ef1fa	panfrost/midgard: Implement upscaling type converts Rather than using a dest_override, we upscale integers by using a half field with a sign-extend bit. A variant of this trick should also work for floats, but one step at a time! Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:04 -07:00
Alyssa Rosenzweig	541b329bd1	panfrost/midgard: Move blend load/store into NIR We have dedicated intrinsics to access the raw contents of the tile buffer so we can use a dedicated NIR pass to lower appropriately for blend shaders, rather than introducing a bizarre hardcoded blend epilogue that only works for RGBA8_UNORM. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:04 -07:00
Alyssa Rosenzweig	f42e5be910	panfrost/midgard: Use nir_dest_num_components Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:03 -07:00
Alyssa Rosenzweig	4df80cab40	panfrost/midgard: Implement integer downsize ops Oh, dear. No turning back now. We begin implementing non-32-bit types, using downsizing integer type conversions as the initial instructions. We implement them naively as type-converting moves; substantially more efficient operation is possible by copypropping the type conversion modifier, but this optimization is not implemented here. Size converting modifiers on Midgard allow an instruction to write to a destination 1/2 the size, or to read from a source 1/2 the size. If we need an extreme conversion (32-bit to 8-bit, for instance), multiple type converting ops are chained together, which here is handled via an algebraic pass. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:03 -07:00
Alyssa Rosenzweig	dc69d3bf8f	panfrost/midgard: Move scale from MIR to NIR This begins the process of removing blend shader specific MIR into a more general NIR lowering pass for formats. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:03 -07:00
Alyssa Rosenzweig	d151319a3d	panfrost/midgard: Passthrough nir_lower_framebuffer Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:03 -07:00
Alyssa Rosenzweig	8e4e46794e	panfrost: Extend clear colour packing Eventually, this will allow packing clear colours for all formats, including floating-point framebuffers, pure integer buffers, and special formats. Currently, a few of these formats are supported, and many more are handled through a generic Gallium colour packing path (which is not a perfect fit for the hardware, but works for many formats and is a sane default for the moment.) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:03 -07:00
Alyssa Rosenzweig	21c863a695	panfrost/mfbd: Include codes for float framebuffers We see the hardware doesn't actually support float framebuffers in the native sense -- rather, it just allows higher bpp framebuffers and lets a blend shader / additional clear_color fields sort out the formats. This will be.. interesting. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:03 -07:00
Alyssa Rosenzweig	36b3e7ea90	panfrost: Prepare some code for MRT Full MRT support is a while away, but in the mean time, we can remove code that explicitly assumes nr_cbufs <= 0, to minimize the obstacles we'll face later when we add the whole thing. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:03 -07:00
Alyssa Rosenzweig	7c82dfba8f	panfrost: Use standard ALIGN_POT/INFINITY macros We had vendored duplicates from pre-Mesa days; clean that up. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-10 06:12:03 -07:00
Eric Engestrom	c78d2d9840	egl: add glvnd symbols check According to the spec [1], `__egl_Main` is the only symbol that needs to be exported. We don't want applications directly linking against libEGL_mesa.so (apps should always go through libEGL.so, regardless of who is providing it), so we shouldn't export any other symbols either. [1] https://github.com/NVIDIA/libglvnd/blob/master/include/glvnd/libeglabi.h (this header is the closest there is to a spec) Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-07-10 11:27:51 +00:00
Eric Engestrom	ba18b968e8	egl: rewrite entrypoints check Part of the effort to replace shell scripts with portable python scripts. I could've used a trivial `assert lines == sorted(lines)`, but this way the caller is shown which entrypoint is out of order. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-07-10 11:27:51 +00:00
Eric Engestrom	b619f89e23	mapi: add shared glapi symbols check Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-07-10 11:27:51 +00:00
Eric Engestrom	1abae9e54a	tu: add exported symbols check Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-07-10 11:27:51 +00:00
Eric Engestrom	0fd30c1011	vulkan: add symbols file According to the Vulkan ICD spec [1], these two symbols must be exposed: - vk_icdGetInstanceProcAddr - vk_icdNegotiateLoaderICDInterfaceVersion and this one is optional: - vk_icdGetPhysicalDeviceProcAddr [1] https://github.com/KhronosGroup/Vulkan-Loader/blob/master/loader/LoaderAndLayerInterface.md Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-07-10 11:27:51 +00:00
Eric Engestrom	915eab5e87	meson: remove unused env_test No longer used as of last commit :) Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-07-10 11:27:51 +00:00
Eric Engestrom	6f305d0c61	gles: use new symbols check script Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-07-10 11:27:51 +00:00
Eric Engestrom	111c34d2ae	gbm: sort symbols Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-07-10 11:27:51 +00:00
Eric Engestrom	aa6973e611	gbm: use new symbols check script Note: the list in gbm-symbols.txt is the same as the one that was in gbm-symbols-check, I just took the opportunity to sort it. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-07-10 11:27:51 +00:00
Eric Engestrom	1172263c87	egl: use new symbols check script Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-07-10 11:27:51 +00:00
Eric Engestrom	176f350fcf	symbols-check: introduce new python script I've re-written this in bash a couple times over the years, and then I realised python is much more portable and already required by Mesa, so we might as well make use of it. I decided to still use the build system's NM instead of re-implementing symbols extraction, to offload the complexity of keeping it compatible with many systems (Linux, Unix, BSD, MacOS, etc.), especially when cross-building. This new script checks not only that nothing is exported when it shouldn't be, but also that everything that should be exported is. Sometimes, some symbols _can_ be exported but don't have to be, in which case they can be prefixed with `(optional)`. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-07-10 11:27:51 +00:00
Karol Herbst	62362a4abb	nv50/ir/nir: implement load/store_global required by OpenCL v2: fix setting globalAccess Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>	2019-07-10 13:23:00 +02:00
Karol Herbst	33a9b9fce5	nv50/ir/nir: handle kernel inputs required by OpenCL Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>	2019-07-10 13:22:40 +02:00
Karol Herbst	2617c78fe2	nv50/ir/nir: don't assert on !main required for OpenCL Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>	2019-07-10 13:22:21 +02:00
Karol Herbst	fa6bd3c639	nv50/ir/nir: parse system values first and stop for compute shaders required by OpenCL Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>	2019-07-10 13:20:13 +02:00
Connor Abbott	133273aa22	nir/lower_io: Don't use variable to get deref mode Drivers only use lower_io for modes where pointers don't have a meaningful value, and dereferences can always be traced back to a variable. But there can be other modes, like global mode with VK_EXT_buffer_device_address, where pointers cannot be traced back to a variable, and lower_io would segfault on loads/stores of these since nir_deref_instr_get_variable() would return NULL. Just use the mode on the deref itself to filter out these modes before we try to get the variable. Fixes: `118a66df99` ("radv: Use NIR barycentric coordinates") Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-10 12:31:41 +02:00
Connor Abbott	f18b8a1174	radv: Don't optimize after lowering FS inputs Currently this is done rather late in radv, after lowering booleans, so it isn't safe to run additional optimizations that may add e.g. 1-bit booleans. We could move the lowering parts earlier, but since right now we only lower FS inputs and by this point all indirects have been lowered away, there's no reason we should need to optimize anything. One shader from Devil May Cry 5 was getting optimized, but only because the optimization loop was working on 32-bit booleans which revealed an opportunity that was hidden with 1-bit booleans, and we generated a 1-bit boolean which is invalid. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111092 Fixes: `118a66df99` Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-10 10:10:20 +02:00
Mauro Rossi	fe3898547a	android: amd/addrlib: add gfx10 support Fix the following building error: external/mesa/src/amd/addrlib/src/gfx10/gfx10addrlib.cpp:35:10: fatal error: 'gfx10_gb_reg.h' file not found ^~~~~~~~~~~~~~~~ 1 error generated. Fixes: `78cdf9a` ("amd/addrlib: add gfx10 support") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Acked-by: Marek Olšák <marek.olsak@amd.com>	2019-07-10 09:03:55 +02:00
Mauro Rossi	b3d46cb539	android: amd/common/gfx10: add register JSON The necessary Android makefile building rules are added and the generation rules are simplified for readability Fixes the following building errors: external/mesa/src/amd/common/ac_llvm_build.c:1496:45: error: use of undeclared identifier 'V_008F0C_IMG_FORMAT_8_UINT' case V_008F0C_BUF_DATA_FORMAT_8: format = V_008F0C_IMG_FORMAT_8_UINT; break; ^ Fixes: `74a26af` ("amd/common/gfx10: add register JSON") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Acked-by: Marek Olšák <marek.olsak@amd.com>	2019-07-10 09:03:51 +02:00
Mauro Rossi	2434fb3e8e	android: radeonsi/gfx10: generate gfx10_format_table.h (v2) Fix Android building rules for gfx10_format_table.h generated header (v2) Add LOCAL_C_INCLUDES += $(intermediates)/radeonsi to fix error: external/mesa/src/gallium/drivers/radeonsi/si_state.c:46:10: fatal error: 'gfx10_format_table.h' file not found ^~~~~~~~~~~~~~~~~~~~~~ 1 error generated. Fixes: `0ffa229` ("radeonsi/gfx10: generate gfx10_format_table.h") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Acked-by: Marek Olšák <marek.olsak@amd.com>	2019-07-10 09:03:46 +02:00
Chih-Wei Huang	0d394f1734	android: virgl: remove unnecessary LOCAL_C_INCLUDES The path could be imported automatically. Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw> Reviewed-by: Mauro Rossi <issor.oruam@gmail.com>	2019-07-10 08:56:47 +02:00
Chih-Wei Huang	4dc129e4f4	android: vulkan/util: fix generating vk_enum_to_str.* The gen_enum_to_str.py generates vk_enum_to_str.c and its header at once. However, the makefiles incorrectly list both files parallel with the same recipes. That means both two files may be generated simultaneously by two processes. The generating files may be truncated by another process, as shown below: $ cd $OUT/obj/STATIC_LIBRARIES/libmesa_vulkan_util_intermediates/util $ ls -l -rw-rw-r-- 1 lh lh 193713 Jul 5 13:31 vk_enum_to_str.c -rw-rw-r-- 1 lh lh 4609 Jul 5 13:31 vk_enum_to_str.d -rw-rw-r-- 1 lh lh 0 Jul 5 16:21 vk_enum_to_str.h Let one file depends on the other with empty recipe to avoid the issue. Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-10 08:56:37 +02:00
Chih-Wei Huang	a74285def2	android: radv: import include paths from used libraries It's unnecessary to manually add these include paths since they could be imported automatically. Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw> Acked-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2019-07-10 08:56:28 +02:00
Chih-Wei Huang	f982c6789c	android: anv: import include path of libmesa_nir Add libmesa_nir to a common LOCAL_STATIC_LIBRARIES defined by ANV_STATIC_LIBRARIES so that its include path can be imported automatically. Then ANV_INCLUDES is unnecessary and could be eliminated. Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw> Acked-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2019-07-10 08:56:23 +02:00
Chih-Wei Huang	5cb61f27d0	android: anv: eliminate libmesa_anv_entrypoints The dummy library libmesa_anv_entrypoints is totally unnecessary. The four VULKAN_GENERATED_FILES could be generated and built in libmesa_vulkan_common directly. The libraries using the generated headers should get it via the exported include path. Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw> Acked-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2019-07-10 08:56:16 +02:00
Chih-Wei Huang	4338e08bd6	android: vulkan/util: fix export path Export the correct include path so that the libraries use it can get it automatically. Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw> Acked-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2019-07-10 08:56:10 +02:00
Chih-Wei Huang	e2ef281da1	android: radv: fix improper use of LOCAL_WHOLE_STATIC_LIBRARIES The libmesa_git_sha1 is a dummy library. There is no reason to put it into LOCAL_WHOLE_STATIC_LIBRARIES. Move libmesa_vulkan_util to the vulkan.radv which really needs it. Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw> Acked-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2019-07-10 08:56:04 +02:00
Chih-Wei Huang	8ff01f0342	android: anv: fix improper use of LOCAL_WHOLE_STATIC_LIBRARIES The libmesa_anv_entrypoints and libmesa_genxml are dummy libraries. There is no reason to put them into LOCAL_WHOLE_STATIC_LIBRARIES. Move libmesa_vulkan_util to the vulkan HAL which really needs it. Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw> Acked-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2019-07-10 08:55:59 +02:00
Chih-Wei Huang	352d91ce5b	android: radv: remove unused LOCAL_EXPORT_C_INCLUDE_DIRS The vulkan module is the final HAL. No need to export its headers since none will import it. Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw> Acked-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2019-07-10 08:55:50 +02:00
Chih-Wei Huang	4fb11c01c5	android: anv: remove unused LOCAL_EXPORT_C_INCLUDE_DIRS The vulkan module is the final HAL. No need to export its headers since none will import it. Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw> Acked-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2019-07-10 08:55:42 +02:00
Jason Ekstrand	7e0fcea727	nir/loop_analyze: Pass nir_const_values directly to helpers Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	ff972c7a3a	nir/loop_analyze: Properly handle swizzles in loop conditions This commit re-plumbs all of nir_loop_analyze to use nir_ssa_scalar for all intermediate values so that we can properly handle swizzles. Even though if conditions are required to be scalars, they may still consume swizzles so you could have ((a.yzw < b.zzx).xz && c.xx).y == 0 as your loop termination condition. The old code would just bail the moment it saw its first non-zero swizzle but we can now properly chase the scalar from the if condition to all the way to a, b, and c. Shader-db results on Kaby Lake: total loops in shared programs: 4388 -> 4364 (-0.55%) loops in affected programs: 29 -> 5 (-82.76%) helped: 29 HURT: 5 Shader-db results on Haswell: total loops in shared programs: 4370 -> 4373 (0.07%) loops in affected programs: 2 -> 5 (150.00%) helped: 2 HURT: 5 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	0333649e63	nir/loop_analyze: Refactor detection of limit vars This commit reworks both get_induction_and_limit_vars() and try_find_trip_count_vars_in_iand to return true on success and not modify their output parameters on failure. This makes their callers significantly simpler. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	8f7405ed9d	nir: Add some helpers for chasing SSA values properly There are various cases in which we want to chase SSA values through ALU ops ranging from hand-written optimizations to back-end translation code. In all these cases, it can be very tricky to do properly because of swizzles. This set of helpers lets you easily work with a single component of an SSA def and chase through ALU ops safely. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	9a3cb6f5fe	nir/loop_analyze: Bail if we encounter swizzles None of the current code knows what to do with swizzles. Take the safe option for now and bail if we see one. This does have a small shader-db impact but it is at least safe. Shader-db results on Kaby Lake: total loops in shared programs: 4364 -> 4388 (0.55%) loops in affected programs: 5 -> 29 (480.00%) helped: 5 HURT: 29 Shader-db results on Haswell: total loops in shared programs: 4373 -> 4370 (-0.07%) loops in affected programs: 5 -> 2 (-60.00%) helped: 5 HURT: 2 Fixes: `6772a17acc` "nir: Add a loop analysis pass" Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	6455fa9710	nir/loop_analyze: Use new eval_const_* helpers in test_iterations Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	268ad47c11	nir/loop_analyze: Handle bit sizes correctly in calculate_iterations The current code assumes everything is 32-bit which is very likely true but not guaranteed by any means. Instead, use nir_eval_const_opcode to do the calculations in a bit-size-agnostic way. We also use the new constant constructors to build the correct size constants. Fixes: `6772a17acc` "nir: Add a loop analysis pass" Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	9f7ffe41dd	nir/loop_analyze: Fix phi-of-identical-alu detection One issue was that the original version didn't check that swizzles matched when comparing ALU instructions so it could end up matching very different instructions. Using the nir_instrs_equal function from nir_instr_set.c which we use for CSE should be much more reliable. Another was that the loop assumes it will only run two iterations which may not be true. If there's something which guarantees that this case only happens for phis after ifs, it wasn't documented. Fixes: `9e6b39e1d5` "nir: detect more induction variables" Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	6e984bcb92	nir/instr_set: Expose nir_instrs_equal() Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	64328f947e	nir/builder: Use nir_const_value_for_* for constructing immediates Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	3acddc733f	nir: Refactor nir_src_as_* constant functions Now that we have the nir_const_value_as_* helpers, every one of these functions is effectively the same except for the suffix they use so we can easily define them with a repeated macro. This also means that they're inline and the fact that the nir_src is being passed by-value should no longer really hurt anything. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Jason Ekstrand	ce5581e23e	nir: Add more helpers for working with const values Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-10 00:20:59 +00:00
Chia-I Wu	b44bb8bded	virgl: remove virgl_transfer_queue_lists COMPLETED_LIST is always empty. We only need one list. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-07-09 14:26:55 -07:00
Chia-I Wu	48aefcbd6b	virgl: simplify virgl_transfer_queue_extend We can reuse virgl_transfer_queue_find_pending. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-07-09 14:26:55 -07:00
Chia-I Wu	eae4527551	virgl: remove transfer after transfer_write Now that virgl_transfer_queue_is_queued does not search COMPLETED_LIST, we don't need to move transfers to that list. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-07-09 14:26:55 -07:00
Chia-I Wu	bec2a85c48	virgl: improve virgl_transfer_queue_is_queued Search only the pending list and return immediately on the first hit. When the transfer queue was introduced, the function was used to deal with write transfer -> draw -> write transfer sequence. It was used to tell if the second transfer intersects with the first transfer. If yes, the transfer queue avoided reordering the second transfer to before the draw (by flushing) in case the draw uses the transferred data. With the recent changes to the transfer code, the function is used to deal with write transfer -> readback transfer We want to avoid reordering the readback transfer to before the first transfer (also by flushing). In the old code, we needed to track the compeleted transfers as well to avoid reordering. But in the new code, a readback transfer is guaranteed to see the data from the completed transfers (in other words, it cannot be reoderered to before the already completed transfers). We don't need to search the COMPLETED_LIST. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-07-09 14:26:55 -07:00
Chia-I Wu	5f6aab2ee2	virgl: fix transfers_intersect for mipmaps We never use transfers_intersect with textures, but fix it anyway to avoid confusion. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-07-09 14:26:55 -07:00
Chia-I Wu	6ca1bbabbe	virgl: fix some false positives in transfers_overlap Rewrite the function and check z/depth more carefully. We intentionally avoid u_box_test_intersection_2d because it returns true when two boxes touch but do not intersect and can be confusing. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-07-09 14:26:55 -07:00
Marek Olšák	2b2093961e	radeonsi/gfx10: enable primitive binning by default Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	9f68367d19	radeonsi/gfx10: implement primitive binning Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	4e56a2aaa8	radeonsi: simplify primitive binning enablement Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	3521297251	radeonsi: set primitive binning tunables for dGPUs Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	d7e80ba1e7	radeonsi: set FLUSH_ON_BINNING_TRANSITION when needed Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	9dbe63ceea	radeonsi/gfx10: use the new scan converter when binning is disabled Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	80b3f4b4bd	radeonsi/gfx9: fix an oversight in primitive binning code Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	1f53a3e766	radeonsi: use BREAK_BATCH instead of FLUSH_DFSM when CB_TARGET_MASK changes Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	605900d7dd	radeonsi/gfx10: don't expose unimplemented PIPE_CAP_QUERY_SO_OVERFLOW Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	270a8ab648	radeonsi/gfx10: launch 2 compute waves per CU before going onto the next CU Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	ab1f36a1d3	radeonsi/gfx10: set more registers and fields Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	9b65f6618c	radeonsi/gfx10: enable LATE_ALLOC_GS Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	4985c3ee22	radeonsi/gfx10: set HS/GS/CS.WGP_MODE Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	329406ec9c	radeonsi/gfx10: set GE_PC_ALLOC Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	9d1483de3b	radeonsi/gfx10: enable 1D textures Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	1d3bffaf9c	radeonsi/gfx10: enable image stores with DCC Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	5b50fb9b7f	radeonsi/gfx10: no need to invalidate L2 for framebuffer -> texture coherency Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	fbf781e401	radeonsi/gfx10: support pixel shaders without exports It only works if there are not color and no Z exports. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	2adc8e2736	radeonsi/gfx10: enable vertex shaders without param space allocation Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	07fe51156d	radeonsi: update DCC settings from PAL Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	4002913f8d	radeonsi: reorder shader IO indices for better IO space usage for tess and GS The highest used index determines the stride for shader outputs in shaders that use LDS or memory for outputs. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	1c99a13f89	radeonsi: decrease maximum supported GENERIC varying index from 42 to 31 This can decrease LDS and/or memory usage for shader outputs when geometry shaders or tessellation is used. Only PS inputs support higher indices and those aren't eliminated by kill_outputs. Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	6335cc6a58	radeonsi: cosmetic cleanup in si_shader_io_get_unique_index Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	3be4ed2fe1	radeonsi: fix and clean up shader_type passing - don't pass it via a parameter if it can be derived from other parameters - set shader_type for ac_rtld_open - use enum pipe_shader_type instead of unsigned Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie <airlied@redhat.com>	2019-07-09 17:24:16 -04:00
Marek Olšák	37b26671a7	radeonsi: enable RB+ for pixel shaders with no/non-contiguous color outputs Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie airlied@redhat.com	2019-07-09 17:24:16 -04:00
Marek Olšák	5058d62b05	radeonsi: don't set READ_ONLY for const_uploader to fix bindless texture hangs Bindless textures can update descriptors with WRITE_DATA. Cc: 19.1 <mesa-stable@lists.freedesktop.org> Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Acked-by: Dave Airlie airlied@redhat.com	2019-07-09 17:24:16 -04:00
Alyssa Rosenzweig	6074eae753	gallium: Add util_format_is_unorm8 check Useful for formats that would work with the same driver code path as RGBA8 UNORM but that don't meet the util_format_is_rgba8_variant criteria due to a smaller channel count. v2: Use simpler logic (suggested by Iago). v3: Fix spelling erorr. boolean->bool (thank you airlied). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 21:17:47 +00:00
Alyssa Rosenzweig	15000c79da	nir: Add Panfrost-specific blending intrinsic This gives more flexibility than the normal store_deref/store_output versions (particularly, it allows us to abuse the type system in awful ways, which is necessary for efficient format conversion in blend shaders.) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by: Karol Herbst <kherbst@redhat.com>	2019-07-09 14:07:23 -07:00
Pratik Vishwakarma	177a3df7b0	radeonsi: Expose support for 10-bit VP9 decode Fix si_vid_is_format_supported to expose support for 10-bit VP9 decode using P016 format. Without this change, 10-bit decode will be exposed only for HEVC even though newer hardware support 10-bit decode for VP9. Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2019-07-09 15:26:54 -04:00
Alyssa Rosenzweig	4a4b48fb05	nir: Add nir_imm_vec4_16 We already have nir_imm_float16 and nir_imm_vec4; let's add the ability to easily make immediate fp16 vectors as well, now that fp16 support is maturing in NIR/GLSL. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-09 18:43:07 +00:00
Karol Herbst	a110a8090d	nvc0: remove nvc0_program.tp.input_patch_size right now that's dead code Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-07-09 12:41:54 +02:00
Bas Nieuwenhuizen	14291342ec	radv: Add a common member in the union to make things more clear. This clarifies that the struct can be used when the shader can be one of VS/TES. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-09 09:59:07 +00:00
Bas Nieuwenhuizen	f9070743a9	Revert "radv: keep track of whether NGG is used for GS on GFX10" This reverts commit `63e0675d98`. The GS is merged with the preceding shader and since the preceding shader will have as_ngg set the final binary will have is_ngg set. So we do not need the gs key here. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-09 09:59:07 +00:00
Juan A. Suarez Romero	d33e93d332	docs: update calendar, add news item and link release notes for 19.1.2 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2019-07-09 11:22:13 +02:00
Juan A. Suarez Romero	3c90baf047	docs: add sha256 checksums for 19.1.2 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `e42399f4de`)	2019-07-09 09:19:25 +00:00
Juan A. Suarez Romero	0f51d69087	docs: add release notes for 19.1.2 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `fe1f7b538b`)	2019-07-09 09:19:24 +00:00
Connor Abbott	86968327df	nir/lower_io_to_temporaries: Fix hash table leak Fixes: `c45f5db527` ("nir/lower_io_to_temporaries: Handle interpolation intrinsics") Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-07-09 10:39:37 +02:00
Bas Nieuwenhuizen	64cd972ffb	radv/gfx10: Use correct gs_out for tess point_mode. Fixes: `204e4da9b4` "radv: Use correct gs_out with tessellation." Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-09 09:52:50 +02:00
Samuel Pitoiset	3f50007ad8	radv: set correct number of VGPRs for GS on GFX10 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 09:54:27 +02:00
Samuel Pitoiset	611ddf794e	radv: fix VGT_ESGS_RING_ITEMSIZE for GS as NGG on GFX10 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 09:54:24 +02:00
Samuel Pitoiset	eca8a478a5	radv: emit VGT_GS_MAX_VERT_OUT for legacy and NGG paths for GS Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 09:54:22 +02:00
Samuel Pitoiset	f240147cf7	radv: emit the geometry shader as NGG if enabled on GFX10 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 09:54:21 +02:00
Samuel Pitoiset	63e0675d98	radv: keep track of whether NGG is used for GS on GFX10 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 09:54:19 +02:00
Samuel Pitoiset	c81b719812	radv: add radv_pipeline_generate_hw_gs() helper For legacy GS path. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 09:54:17 +02:00
Samuel Pitoiset	54e2470047	radv: fix setting VGT_REUSE_OFF for TES on GFX10 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 09:54:16 +02:00
Samuel Pitoiset	d2a8b63a2c	radv: fix computing the number of ES VGPRS for TES on GFX10 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 09:54:14 +02:00
Samuel Pitoiset	2974df819e	radv: set max workgroup size to 128 for TES as NGG on GFX10 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 09:54:12 +02:00
Samuel Pitoiset	53c75f17ec	radv: fix allocating USER SGPRs on GFX10 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 09:54:11 +02:00
Alejandro Piñeiro	71446bf8e3	v3d: Early return with handle 0 when getting a bo on the simulator Until now we were just asking entries on the bo hash table, and don't worry if the handle was NULL, as we were just expecting to get a NULL in return. It seems that now the hash table assert with some reserverd pointers, included NULL. This commit just early returns with handle 0. This change fixes several crashes on vk-gl-cts GLES tests when using the v3d simulator, like: KHR-GLES3.core.internalformat.copy_tex_image.* Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-09 08:40:35 +02:00
Lionel Landwerlin	b031dd9010	vulkan/overlay: use a single macro to lookup objects Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-07-09 09:13:21 +03:00
Lionel Landwerlin	b3a96e69ac	vulkan/overlay: add queue present timing measurement Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-07-09 09:13:19 +03:00
Bas Nieuwenhuizen	f7f08b2d81	radv/gfx10: Enable tess. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 12:04:29 +10:00
Bas Nieuwenhuizen	795adbbadd	radv/gfx10: Add pipeline state support for tess. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 12:04:26 +10:00
Bas Nieuwenhuizen	23c6698ea2	radv/gfx10: Only set HW edge flags with gs & tess disabled. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 12:04:23 +10:00
Bas Nieuwenhuizen	9a8e4a07ad	radv/gfx10: Add tess eval ngg shader support. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 12:04:20 +10:00
Bas Nieuwenhuizen	204e4da9b4	radv: Use correct gs_out with tessellation. We should use the primitives output by the TES in that case. There is always a separate TES if there is no GS. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 12:04:16 +10:00
Bas Nieuwenhuizen	343a435c46	radv/gfx10: Use correct count of max_offchip_buffers. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 12:04:12 +10:00
Bas Nieuwenhuizen	5d0dbc2564	radv/gfx10: Load global pointers in correct userdata registers for hs/gs. Fixes: `cfaad5e3ca` "radv/gfx10: implement radv_emit_global_shader_pointers()" Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-09 12:03:51 +10:00
Timothy Arceri	6b60cfd079	radeonsi: update function name in comment This was missed in `2361558eb7` Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-09 10:00:23 +10:00
Timothy Arceri	7c612c49b4	r600: remove query/apply_opaque_metadata callbacks Theses seem to have been radeonsi specific callbacks that are no longer needed now that these drivers no longer share this code path. These callbacks were removed from radeonsi in `c0d44fe0e9`. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-09 10:00:23 +10:00
Lionel Landwerlin	a72351cc76	vulkan/overlay: fix crash on freeing NULL command buffer It is legal to call vkFreeCommandBuffers() on NULL command buffers. This fix requires `eb41ce1b01` ("util/hash_table: Properly handle the NULL key in hash_table_u64"). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `4438188f49` ("vulkan/overlay: record stats in command buffers and accumulate on exec/submit") Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 21:49:26 +00:00
Lionel Landwerlin	6271d16320	vulkan: bump headers & registry to 1.1.114 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-07-09 00:09:36 +03:00
Dave Airlie	6422fa75b4	radv: only use specialised 3D meta paths on GFX9. GFX10 appears to act like GFX8 here. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-09 06:32:28 +10:00
Ian Romanick	0349bc3ce2	mesa: Set minimum possible GLSL version Set the absolute minimum possible GLSL version. API_OPENGL_CORE can mean an OpenGL 3.0 forward-compatible context, so that implies a minimum possible version of 1.30. Otherwise, the minimum possible version 1.20. Since Mesa unconditionally advertises GL_ARB_shading_language_100 and GL_ARB_shader_objects, every driver has GLSL 1.20... even if they don't advertise any extensions to enable any shader stages (e.g., GL_ARB_vertex_shader). Converts about 2,500 piglit tests from crash to skip on NV18. Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109524 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110955 Cc: mesa-stable@lists.freedesktop.org	2019-07-08 12:34:09 -07:00
Caio Marcelo de Oliveira Filho	d577db293d	anv: Set maxComputeSharedMemorySize to 64k This value is supported since gen7. See also `8514c75a26` "i965: Set compute shader shared memory max to 64k". Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-08 11:35:42 -07:00
Ian Romanick	dd2dc7e707	intel/vec4: Delete vec4_visitor::emit_lrp Effectivley unused since `dd7135d55d` ("intel/compiler: Use the flrp lowering pass for all stages on Gen4 and Gen5"). I had intended to remove this code as part of that series, but I forgot. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-08 11:30:11 -07:00
Ian Romanick	5450fd7a36	nir: Allow nir_ssa_alu_instr_src_components to operate on non-SSA destinations Existing users only operate on instructions with SSA destinations. Some later patches add new direct calls and indirect calls (via existing NIR functions) on instructions after going out of SSA. At the very least, these calls are added by: intel/vec4: Try to emit a VF source in try_immediate_source intel/vec4: Try to emit a single load for multiple 3-src instruction operands The first commit adds direct calls, and the second adds calls via nir_alu_srcs_equal and nir_alu_srcs_negative_equal. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-08 11:30:11 -07:00
Ian Romanick	12217de08c	nir: Handle swizzle in nir_alu_srcs_negative_equal When I added this function, I was not sure if swizzles of immediate values were a thing that occurred in NIR. The only existing user of these functions is the partial redundancy elimination for compares. Since comparison instructions are inherently scalar, this does not occur. However, a couple later patches, "nir/algebraic: Recognize open-coded flrp(-1, 1, a) and flrp(1, -1, a)" combined with "intel/vec4: Try to emit a single load for multiple 3-src instruction operands", collaborate to create a few thousand instances. No shader-db changes on any Intel platform. v2: Handle the swizzle in nir_alu_srcs_negative_equal and leave nir_const_value_negative_equal unchanged. Suggested by Jason. v3: Correctly handle write masks. Add note (and assertion) that the caller is responsible for various compatibility checks. The single existing caller only calls this for combinations of scalar fadd and float comparison instructions, so all of the requirements are met. A later patch (intel/vec4: Try to emit a single load for multiple 3-src instruction operands) will call this for sources of the same instruction, so all of the requirements are met. v4: Add unit test for nir_opt_comparison_pre that is fixed by this commit. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-08 11:30:11 -07:00
Ian Romanick	ad50e812a3	nir: nir_const_value_negative_equal compares one value at a time Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-08 11:30:10 -07:00
Ian Romanick	bcd22b740c	nir: Port some const_value_negative_equal tests to alu_src_negative_equal The next commit will make the existing tests irrelevant. Reviewed-by: Matt Turner <mattst88@gmail.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-08 11:30:10 -07:00
Ian Romanick	ec96c289ea	nir: Pass fully qualified type to nir_const_value_negative_equal Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-08 11:30:10 -07:00
Ian Romanick	0ac5ff9ecb	nir: Use nir_src_bit_size instead of alu1->dest.dest.ssa.bit_size This is important because, for example nir_op_fne has dest.dest.ssa.bit_size == 1, but the source operands can be 16-, 32-, or 64-bits. Fixing this helps partial redundancy elimination for compares in a few more shaders. v2: Add unit tests for nir_opt_comparison_pre that are fixed by this commit. All Intel platforms had similar results. total instructions in shared programs: 17179408 -> 17179081 (<.01%) instructions in affected programs: 43958 -> 43631 (-0.74%) helped: 118 HURT: 2 helped stats (abs) min: 1 max: 5 x̄: 2.87 x̃: 2 helped stats (rel) min: 0.06% max: 4.12% x̄: 1.19% x̃: 0.81% HURT stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6 HURT stats (rel) min: 5.83% max: 6.06% x̄: 5.94% x̃: 5.94% 95% mean confidence interval for instructions value: -3.08 -2.37 95% mean confidence interval for instructions %-change: -1.30% -0.85% Instructions are helped. total cycles in shared programs: 360959066 -> 360942386 (<.01%) cycles in affected programs: 774274 -> 757594 (-2.15%) helped: 111 HURT: 4 helped stats (abs) min: 1 max: 1591 x̄: 169.49 x̃: 36 helped stats (rel) min: <.01% max: 24.43% x̄: 8.86% x̃: 2.24% HURT stats (abs) min: 1 max: 2068 x̄: 533.25 x̃: 32 HURT stats (rel) min: 0.02% max: 5.10% x̄: 3.06% x̃: 3.56% 95% mean confidence interval for cycles value: -200.61 -89.47 95% mean confidence interval for cycles %-change: -10.32% -6.58% Cycles are helped. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v1] Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Fixes: `be1cc3552b` ("nir: Add nir_const_value_negative_equal")	2019-07-08 11:30:10 -07:00
Ian Romanick	47c2aa5b48	intel/vec4: Reswizzle VF immediates too Previously, an instruction like mul(8) vgrf29.xy:F, vgrf25.yxxx:F, [-1F, 1F, 0F, 0F] would get rewritten as mul(8) vgrf0.yz:F, vgrf25.yyxx:F, [-1F, 1F, 0F, 0F] The latter does not produce the correct result. The VF immediate in the second should be either [-1F, -1F, 1F, 1F] or [0F, -1F, 1F, 0F]. This commit produces the former. Fixes: `1ee1d8ab46` ("i965/vec4: Reswizzle sources when necessary.") Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-08 11:30:10 -07:00
Ian Romanick	b08d704051	nir: Add unit tests for nir_opt_comparison_pre Each tests has a comment with the expected before and after NIR. The tests don't actually check this. The tests only check whether or not the optimization pass reported progress. I couldn't think of a robust, future-proof way to check the before and after code. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-08 11:30:10 -07:00
Dongwon Kim	f734e2a042	anv: disable repacking for compression for applicable gen set bit15 (Disable Repacking for Compression) of CACHE_MODE_0 register if the gen attribute, 'disable_ccs_repack' is set. Signed-off-by: Dongwon Kim <dongwon.kim@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2019-07-08 10:54:38 -07:00
Dongwon Kim	6866765cb3	iris: disable repacking for compression for applicable gen set bit15 (Disable Repacking for Compression) of CACHE_MODE_0 register if the gen attribute, 'disable_ccs_repack' is set. Signed-off-by: Dongwon Kim <dongwon.kim@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2019-07-08 10:54:38 -07:00
Dongwon Kim	00df55cdc9	i965: disable repacking for compression for applicable gen set bit15 (Disable Repacking for Compression) of CACHE_MODE_0 register if the gen attribute, 'disable_ccs_repack' is set. Signed-off-by: Dongwon Kim <dongwon.kim@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2019-07-08 10:54:38 -07:00
Dongwon Kim	eb6d067e68	intel: add disable_ccs_repack to gen_device_info add a new attribute, 'disable_ccs_repack' to gen_device info, which indicates whether repacking of components in certain pixel formats before compression needs to be disabled to keep the compatibility with decompression capability of display controller (gen11+) Signed-off-by: Dongwon Kim <dongwon.kim@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2019-07-08 10:54:38 -07:00
Dongwon Kim	e6ac6d3224	intel/genxml: correct bit fields in CACHE_MODE_0 reg for gen11 correct bit fields information of CACHE_MODE_0 reg in current gen11.xml Signed-off-by: Dongwon Kim <dongwon.kim@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2019-07-08 10:54:37 -07:00
Caio Marcelo de Oliveira Filho	2614319259	nir: print ptr_stride for deref_casts Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-08 10:05:56 -07:00
Caio Marcelo de Oliveira Filho	9c7adaeb5f	anv: Advertise VK_EXT_shader_demote_to_helper_invocation Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-08 08:57:25 -07:00
Caio Marcelo de Oliveira Filho	1a83c9a619	spirv: Implement SPV_EXT_demote_to_helper_invocation Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-08 08:57:25 -07:00
Caio Marcelo de Oliveira Filho	5a7c69399d	spirv: Update the headers from latest Khronos master This corresponds to 29c11140baaf9f7fdaa39a583672c556bf1795a1 in https://github.com/KhronosGroup/SPIRV-Headers. Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-08 08:57:25 -07:00
Caio Marcelo de Oliveira Filho	45f5db5a84	intel/fs: Implement "demote to helper invocation" The "demote" intrinsic works like "discard" but don't change the control flow, allowing derivative operations to work. This is the semantics of D3D discard. The "is_helper_invocation" intrinsic will return true for helper invocations -- both the ones that started as helpers and the ones that where demoted. This is needed to avoid changing the behavior of gl_HelperInvocation which is an input (so not expected to change during shader execution). v2: Emit the discard jump and comment why it is safe. (Jason) Rework the is_helper_invocation() that was stomping f0.1. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-08 08:57:25 -07:00
Caio Marcelo de Oliveira Filho	a42e8f0ed1	nir: Add demote and is_helper_invocation intrinsics From SPV_EXT_demote_to_helper_invocation. Demote will be implemented as a variant of discard, so mark uses_discard if it is used. v2: Add CAN_ELIMINATE flag to the new intrinsic. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-08 08:57:25 -07:00
Samuel Pitoiset	9b116173b6	radv: do not emit VGT_FLUSH on GFX10 We don't need it. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:45:23 +02:00
Connor Abbott	0c114ae3be	ac/nir: Remove now-unused interp_deref handling Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-08 14:18:52 +02:00
Connor Abbott	b3a226691d	radeonsi/nir: Use NIR barycentric intrinsics This is simpler than radv, since the driver_location is already assigned for us. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-08 14:18:46 +02:00
Connor Abbott	d1c65939e2	radeonsi/nir: Delete unreachable code We always get gl_FragCoord as a system value, not a varying, so this is never hit. We already set PIXEL_CENTER_INTEGER elsewhere. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-08 14:18:41 +02:00
Connor Abbott	e5536aa584	compiler: Add color system value This is nice to have with radeonsi, where color varyings are handled specially to avoid recompiles. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-08 14:18:34 +02:00
Connor Abbott	118a66df99	radv: Use NIR barycentric intrinsics We have to add a few lowering to deal with things that used to be dealt with inline when creating inputs. We also move the code that fills out the radv_shader_variant_info struct for linking purposes to radv_shader.c, as it's no longer tied to the NIR->LLVM lowering. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:18:25 +02:00
Connor Abbott	0cad0424e9	ac/nir: Implement barycentric intrinsics Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-08 14:18:25 +02:00
Connor Abbott	6b28808b22	intel/nir: Extract add_const_offset_to_base Pretty much every driver using nir_lower_io_to_temporaries followed by nir_lower_io is going to want this. In particular, radv and radeonsi in the next commits. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:14:53 +02:00
Connor Abbott	c45f5db527	nir/lower_io_to_temporaries: Handle interpolation intrinsics These weren't properly supported. This does pretty much the same thing that the radv code did. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:14:53 +02:00
Connor Abbott	3a2ea2af9d	nir: Avoid coalescing vars created by lower_io_to_temporaries Right now nir_copy_prop_vars is effectively undoing nir_lower_io_to_temporaries for inputs by propagating the original variable through the copy created in lower_io_to_temporaries. A theoretical variable coalescing pass would have the same issue with output variables, although that doesn't exist yet. To fix this, add a new bit to nir_variable, and disable copy propagation when it's set. This doesn't seem to affect any drivers now, probably since since no one uses lower_io_to_temporaries for inputs as well as copy_prop_vars, but it will fix radv once we flip on lower_io_to_temporaries for fs inputs. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:14:53 +02:00
Connor Abbott	f3e2c65041	nir: Return correct size in nir_assign_io_var_locations() It was double-counting cases where multiple variables were assigned to the same slot, and not handling the case where the last variable is a compact variable. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:14:53 +02:00
Connor Abbott	dd81d8808d	nir: Handle compact variables when assigning i/o locations These are used in Vulkan for clip/cull distances, instead of the GLSL lowering when the clip/cull arrays are shared. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:14:53 +02:00
Connor Abbott	fd5ed6b9d6	nir: Move st_nir_assign_var_locations() to common code It isn't really doing anything Gallium-specific, and it's needed for handling component packing, overlapping, etc. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-08 14:15:06 +02:00
Connor Abbott	27f0c3c15e	radv: Make FragCoord a sysval load_fragcoord is already handled in common code for radeonsi, so we don't need to do anything to handle it. However, there were some passes creating NIR with the varying, so we switch them over to the sysval. In the case of nir_lower_input_attachments which is used by both radv and anv, we add handling for both until intel switches to using a sysval. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:14:53 +02:00
Connor Abbott	64f3fc5ea6	spirv: Add an option for making FragCoord a sysval On AMD, FragCoord should be a sysval because it is handled separately from all the other inputs. We were already doing this in radeonsi, but we weren't doing it with radv. It'll be much more annoying to handle VARYING_SLOT_POS in fragment shaders when we let NIR lower FS inputs for us, so here we add an option so that radv can get it as a system value. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:14:53 +02:00
Daniel Schürmann	e41e932e57	radv: Lower input attachments in NIR. v2 (Connor) - Fix warning in release mode using MAYBE_UNUSED Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:14:53 +02:00
Daniel Schürmann	c65e880a65	radv: Implement nir_intrinsic_load_layer_id(). Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-08 14:14:53 +02:00
Daniel Schürmann	c31f470066	anv,nir: Move lower_input_attachments pass from ANV to NIR. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 14:02:50 +02:00
Dave Airlie	1d327689f9	radv/gfx10: don't emit PFP packets on ME. This was done for all previous GPUs. This fixes Talos Principle launch hangs. Fixes: `7e43022e8c` (radv/gfx10: add gfx10_cs_emit_cache_flush) Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 17:19:42 +10:00
Samuel Pitoiset	49e5136887	ac: select the GFX ring when halting waves with UMR on GFX10 GFX10 has two rings, so UMR want to know which one to halt. Select the first one by default. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-08 09:10:57 +02:00
Bas Nieuwenhuizen	4d118ad44a	radv/gfx10: Move NGG output handling outside of giant if-statement. In merged shaders we put a big if around each shader, so both stages can have a different number of threads. However, the NGG output code still needs to run if the first shader is not executed. This can happen when there are more gs threads than vs/es threads, or when there are 0 es/vs threads (why? no clue). Fixes: `ee21bd7440` "radv/gfx10: implement NGG support (VS only)" Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-08 01:49:54 +02:00
Bas Nieuwenhuizen	703efab7e4	radv: Actually use VK formats for the format table. No ETC2 or ASTC on navi so nothing to add. Fixes: `3dc5ec5d16` "radv/gfx10: generate gfx10_format_table.h" Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-07 23:10:32 +02:00
Chia-I Wu	5824130389	anv: fix VkExternalBufferProperties for host allocation It was reported as unsupported previously. It should be importable and is compatible with itself. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Fixes: `69cc6272fb` ("anv: Implement VK_EXT_external_memory_host") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-07 13:31:58 -07:00
Chia-I Wu	f3c7a02a62	anv: fix VkExternalBufferProperties for unsupported handles compatibleHandleTypes must include the queried handle type. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-07 13:31:58 -07:00
Bas Nieuwenhuizen	e46b41b3ae	radv: Handle cmask being disallowed by addrlib. alignment=0 does weird things with align64. CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-07-07 21:29:52 +02:00
Samuel Pitoiset	5eaed7ecfc	radv/gfx10: enable support for NAVI10, NAVI12 and NAVI14 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Bas Nieuwenhuizen	817bd0cc2e	radv/gfx10: Use GS rectlist when needed. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	ee21bd7440	radv/gfx10: implement NGG support (VS only) This needs to be cleaned up a bit, and it probably contains missing stuff and/or bugs. This doesn't fix the "half of the triangles" issue. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Bas Nieuwenhuizen	9e37609d0b	radv: Combine vs and tes output keys parts. That way the same deref is valid for both shader stages. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-07 17:51:32 +02:00
Bas Nieuwenhuizen	d0978427cb	radv/gfx10: Use new uconfig reg index packet for GFX10+. Otherwise the hardware/firmware seems to not set the registers. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-07 17:51:32 +02:00
Bas Nieuwenhuizen	aeb5b1a998	radv/gfx10: Set MEM_ORDERED flags on shaders. Scattered because depending on stage they are at offset 24/25/27/30. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	67b6888d8b	radv/gfx10: emit GE_CNTL instead of IA_MULTI_VGT_PARAM for legacy mode Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	74d69299d1	radv/gfx10: double the number of tessellation offchip buffers per SE Each gfx10 shader engine corresponds to two gfx9 shader engines, so scale the number of offchip buffers accordingly. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	bf1e1a29c3	radv/gfx10: require LLVM 9+ Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	0f769ed398	radv/gfx10: disable geometry and tessellation shaders Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	fe4419d3c7	radv/gfx10: disable binning Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	faf27ee9b3	radv/gfx10: disable CLEAR_STATE Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	698f9e6fd3	radv/gfx10: disable VK_EXT_transform_feedback It requires a bunch of work, so disable for now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	2141e6fc73	radv/gfx10: set user data base registers Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	7e43022e8c	radv/gfx10: add gfx10_cs_emit_cache_flush The cache flush logic on GFX10 is quite different and it's implemented with a new function. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	b0b6e27bca	radv/gfx10: set the DCC constant encoding flag Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	ce3b5d4c17	radv/gfx10: do not declare streamout SGPRS Streamout is completely different on GFX10. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	352365c5e2	radv/gfx10: do not set stream output shader config Transform feedback is really different on GFX10. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	3f68329806	radv/gfx10: emit VGT_VERTEX_REUSE_BLOCK_CNTL during gfx initialization The value doesn't need to be updated for tess. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	2a83154b4a	radv/gfx10: update shader-related fields in si_emit_graphics() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	5556f16609	radv/gfx10: implement si_emit_compute() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	c90f46700d	radv/gfx10: mask DCC tile swizzle by alignment DCC alignment can be less than the alignment of the main surface. In that case, the DCC tile swizzle needs to be masked accordingly. Should have no impact on pre-gfx10. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	b1b60a92b1	radv/gfx10: initialize GE_{MAX,MIN}_VTX_INDX/INDX_OFFSET Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	12a42c2d9f	radv/gfx10: implement radv_flush_vertex_descriptors() change Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:32 +02:00
Samuel Pitoiset	0ca09a7fe3	radv/gfx10: implement fill_geom_tess_rings() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:31 +02:00
Samuel Pitoiset	ebeb319f0e	radv/gfx10: implement radv_CmdBindDescriptorSets() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:31 +02:00
Samuel Pitoiset	97891a0d10	radv/gfx10: implement write_buffer_descriptor() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:31 +02:00
Samuel Pitoiset	bdd8acde02	radv/gfx10: use the correct register for image descriptor dumping Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:31 +02:00
Samuel Pitoiset	e5a8f21b0e	radv/gfx10: implement radv_pipeline_generate_hw_hs() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:51:31 +02:00
Samuel Pitoiset	4c82094b7b	radv/gfx10: implement radv_fill_shader_variant() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:39 +02:00
Samuel Pitoiset	b144a70ca8	radv/gfx10: implement radv_pipeline_generate_geometry_shader() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:39 +02:00
Samuel Pitoiset	5551d6d6ea	radv/gfx10: implement radv_init_sampler() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:39 +02:00
Samuel Pitoiset	4c31f3dcc0	radv/gfx10: fix PS exports for SPI_SHADER_32_AR Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:39 +02:00
Samuel Pitoiset	8574a84291	radv/gfx10: implement radv_get_device_name() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	863727c4a3	radv/gfx10: set RADV_FORCE_FAMILY Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	34b185cc43	radv/gfx10: fix a possible hang with exp pos0 with done=0 and exec=0 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	b3a53de5fa	radv/gfx10: set PA_SC_TILE_STEERING_OVERRIDE Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	96cd24588b	radv/gfx10: set cache control registers Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	9a01eded0c	radv/gfx10: set llvm_has_working_vgpr_indexing Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	6b9dbb28ef	radv/gfx10: update DB_DFSM_CONTROL register Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	2435b571de	radv/gfx10: update DB_Z_INFO register GFX10 uses the same register as GFX8. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	cfaad5e3ca	radv/gfx10: implement radv_emit_global_shader_pointers() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	3f5ca22e9c	radv/gfx10: implement radv_emit_tess_factor_ring() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	17048c1765	radv/gfx10: implement radv_emit_fb_ds_state() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	2481ac81d3	radv/gfx10: implement radv_initialise_ds_surface() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	c2a5d98148	radv/gfx10: implement radv_emit_fb_color_state() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	e80f189de0	radv/gfx10: implement radv_initialise_color_surface() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	ee8d6a2a6c	radv/gfx10: implement radv_init_dcc_control_reg() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	ccce8f5915	radv/gfx10: implement radv_make_buffer_descriptor() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	549d0aeee4	radv/gfx10: implement si_set_mutable_tex_desc_fields() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	bf11f1c3a4	radv/gfx10: add gfx10_make_texture_descriptor Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	3dc5ec5d16	radv/gfx10: generate gfx10_format_table.h Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	9c1266048f	radv/gfx10: increase maximum number of layers to 8192 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	0213fe09b8	radv/gfx10: increase maximum number of levels to 14 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	1f82007a9e	radv/gfx10: set MAX_ALLOC_COUNT Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	c3459968cd	ac/nir: unpacked GS invocation ID on GFX10+ Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Samuel Pitoiset	4d7c420a94	ac: add missing formats to ac_get_tbuffer_format() for GFX10 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 17:03:38 +02:00
Lionel Landwerlin	8f0f727fe4	vulkan/overlay: fix command buffer stats Begin/Reset of command buffer both reset the content of the command buffer. Don't forget to wipe them on Begin. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `4438188f49` ("vulkan/overlay: record stats in command buffers and accumulate on exec/submit") Acked-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-07 15:47:54 +03:00
Lionel Landwerlin	5493ec3c19	anv: manually add KHR_display to the list of platforms Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `38305e6c94` ("anv: replace hard-coded platform list with vk.xml parse") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111078 Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-07-07 15:34:09 +03:00
Dave Airlie	002c8cae44	docs/features: add shader buffer and atomic support for llvmpipe	2019-07-07 16:24:21 +10:00
Dave Airlie	2f8cbdfc88	llvmpipe: enable ARB_shader_storage_buffer_object Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-07 16:24:17 +10:00
Dave Airlie	df46b3d196	llvmpipe: add support for shader buffer binding. This add support for setting shader buffers and passing them to draw or binding them to the fragment shader jit. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-07 16:24:12 +10:00
Dave Airlie	d8fb66a3e1	draw: add shader buffer interfaces. This adds the interface to add mapped shader buffers, and sets up the jit linkage for them. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-07 16:24:09 +10:00
Dave Airlie	b5ac381d8f	gallivm: add buffer operations to the tgsi->llvm conversion. This adds load, store and atomic operations. These operations have to respect the exec_mask, and can't operate in lanes where the execute is off. This is needed to avoid side effects seen outside the shaders. There is also bounds checking on the ssbo accesses vs the size ptr. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-07 16:24:05 +10:00
Dave Airlie	a845baff16	gallivm: move mask_vec function up higher so it can be reused. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-07 16:24:01 +10:00
Dave Airlie	ab807859ea	tgsi: denote which load/store/atomic channels are unsigned llvmpipe will need this info. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-07 16:23:54 +10:00
Dave Airlie	e21007f426	llvmpipe: add support for ssbo to the fragment shader jit. This just adds the ssbo ptrs to the jit fragment shader api. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-07 16:23:51 +10:00
Dave Airlie	69ff738eb0	draw: add support for ssbo ptrs to jit tables. This adds ssbo/num_ssbo ptrs to the vs/gs jit tables. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-07 16:23:46 +10:00
Dave Airlie	e84570ba70	gallivm: add some basic SSBO limits. (v2) v2: update ssbo size Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-07 16:23:44 +10:00
Dave Airlie	7c3807c1b3	util: add util_copy_shader_buffer. This just adds an inline to copy a pipe_shader_buffer. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-07 16:23:40 +10:00
Dave Airlie	5ff697aa65	gallivm: add ssbo pointers to the soa build api. Need to pass ssbo + ssbo size pointers just like constants. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-07 16:23:36 +10:00
Dave Airlie	2a55acbc1d	gallivm: add compare exchange wrapper This just pulls the wrapper from LLVM for older versions Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-07 16:23:32 +10:00
Dave Airlie	4f709c86a9	vertex shader: add exec masking (v2) As suggested by Roland this is just a compare of fetch_max vs the counter, much simpler than my original spaghetti code. We require the vertex shader to have an exec mask to get proper ssbo/image load/atore/atomics semantics Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-07-07 16:23:27 +10:00
Alexandros Frantzis	4271430dd7	virgl: Hide internal virgl_resource functions Since the transition to virgl_resource_transfer_map(), several previously public virgl_resource functions are not required to be public anymore. We also move the functions earlier in the file so they can be used without functions declarations. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-07-06 19:30:38 -07:00
Alexandros Frantzis	e5b54d0018	virgl: Use virgl_resource_transfer_map for textures Replace custom texture map code (for maps which don't require resolve) with virgl_resource_transfer_map. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-07-06 19:30:37 -07:00
Alexandros Frantzis	f8975f8f2f	virgl: Use virgl_resource_transfer_map for buffers Replace custom buffer map code with virgl_resource_transfer_map. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-07-06 19:30:34 -07:00
Alexandros Frantzis	bb0a38d819	virgl: Introduce virgl_resource_transfer_map Normal mapping of buffers and textures uses almost identical logic. This commit extracts the this logic in the form of the virgl_resource_transfer_map() helper function. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-07-06 19:30:22 -07:00
Jason Ekstrand	4633298fd6	iris: Use a uint16_t for key sizes sizeof(struct brw_vs_prog_key) == 324. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-04 19:52:34 -05:00
Marek Olšák	aa5dab27f9	ac: destroy passes in ac_destroy_llvm_compiler Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-04 15:39:04 -04:00
Marek Olšák	ea64d66fde	ac: use an LLVM fence instead of s.waitcnt when possible Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-04 15:39:03 -04:00
Marek Olšák	14450c8c41	ac: remove unused AC_WAIT_EXP Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-04 15:39:01 -04:00
Marek Olšák	fe5dbe75b2	ac: only set ac_dlc in ac_llvm_build.c Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-04 15:39:00 -04:00
Marek Olšák	8a71f60194	ac: replace glc,slc with cache_policy for loads cosmetic change Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-04 15:38:56 -04:00
Marek Olšák	a29e781961	ac: replace glc,slc with cache_policy for stores cosmetic change Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-04 15:38:54 -04:00
Jonathan Marek	5feb8adb0f	etnaviv: implement buffer compression Vivante GPUs have lossless buffer compression using the tile-status bits, which can reduce memory access and thus improve performance. This patch only enables compression for "V4" compression GPUs, but the implementation is tested on GC2000(V1) and GC3000(V2). V1/V2 compresssion looks absolutely useless, so it is not enabled. I couldn't test if this patch breaks MSAA, because it looks like MSAA is already broken. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-04 14:05:18 -04:00
Jonathan Marek	f6a0d17abe	etnaviv: detect v4 compression Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-04 14:05:18 -04:00
Jonathan Marek	e910acb3f2	etnaviv: rs: don't use etna_compatible_rs_format when possible This mirrors the change in blt. RS cares about this for msaa/compression. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-04 14:05:18 -04:00
Jonathan Marek	66411521ea	etnaviv: combine translate_ts_sampler_format/translate_msaa_format Both translate the same thing, so just add the missing cases into one. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-04 14:05:18 -04:00
Jonathan Marek	84c87f40fb	etnaviv: fix compression format not set correctly in TS_MEM_CONFIG VIVS_TS_MEM_CONFIG_COLOR_COMPRESSION_FORMAT() needs to be used. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-04 14:05:18 -04:00
Jonathan Marek	53475c85fd	etnaviv: set correct ts_clear_value for BLT engine BLT engine uses all ones to clear TS, set ts_clear_value to match that. Note: ts_clear_value is never used with BLT engine. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-04 14:05:18 -04:00
Jonathan Marek	7c7eaaed4a	etnaviv: remove initial CPU ts clear Since we have "ts_valid" to avoid using uncleared ts, this memset serves no purpose. Also it is broken because it doesn't use cpu_prep/cpu_fini. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-04 14:05:18 -04:00
Jonathan Marek	95d937852e	etnaviv: implement TS_MODE for GC7000L GC7000L has a TS mode with larger tiles, which improves performance. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-04 14:05:18 -04:00
Jonathan Marek	bc5ae6a330	etnaviv: fix ts size calculation The size of the TS is screen->specs.bits_per_tile bits per tile, with each tile being 64 bytes of the resource. This gives the same result for 32bpp formats, but reduces the size of TS for 16bpp formats by 2. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-04 14:05:09 -04:00
Jonathan Marek	2f540745ad	etnaviv: update headers from rnndb Update to etna_viv commit 8a8b13a and use new names in the code. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-07-04 14:04:47 -04:00
Eric Engestrom	c314ba2c26	scons: s/HAVE_NO_AUTOCONF/HAVE_SCONS/ Back when autotools and scons were the two build systems, it kinda made sense to call scons "not autoconf", but autoconf's been gone for a while now and other build systems have been added (android.mk and meson), so the name really doesn't make any sense anymore. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-04 16:41:23 +01:00
Bas Nieuwenhuizen	bbbcb49f9b	radeonsi: Fix some warnings. ../mesa/src/gallium/drivers/radeonsi/si_compute_blit.c: In function ‘si_clear_buffer’: ../mesa/src/gallium/drivers/radeonsi/si_compute_blit.c:195:11: warning: unused variable ‘clear_alignment’ [-Wunused-variable] unsigned clear_alignment = MIN2(clear_value_size, 4); ^~~~~~~~~~~~~~~ [23/60] Compiling C object 'src/gallium/drivers/radeonsi/3cdc30e@@radeonsi@sta/si_compute_prim_discard.c.o'. ../mesa/src/gallium/drivers/radeonsi/si_compute_prim_discard.c: In function ‘si_prepare_prim_discard_or_split_draw’: ../mesa/src/gallium/drivers/radeonsi/si_compute_prim_discard.c:1106:7: warning: unused variable ‘compute_has_space’ [-Wunused-variable] bool compute_has_space = sctx->ws->cs_check_space(cs, need_compute_dw, false); Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-04 11:12:27 +00:00
Nicolai Hähnle	cb07f91489	amd/common: move ac_shader_{binary,reloc} into r600 and rename They are no longer used by radeonsi or radv. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-04 10:52:26 +00:00
Nicolai Hähnle	510e74ff48	amd/common: removed unused ac_shader_binary functions Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-04 10:52:26 +00:00
Nicolai Hähnle	b398230e6d	amd/common: remove unused ac_compile_module_to_binary Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen	6a220e67ce	radv: Switch to using rtld. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen	5ff651c0a7	radv: Move more stuff to variant create time. Due to them depending on the linker result. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen	726a31df70	radv: Add the concept of radv shader binaries. This simplifies a bunch of stuff by (1) Keeping all the things in a single allocation, making things easier for the cache. (2) creating a shader_variant creation helper. This is immediately put to use by creating rtld shader binaries. This is the main reason for the binaries, as we need to do the linking at upload time, i.e. post caching. We do not enable rtld yet. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen	43f2f01cc8	radv: Add export_prim_id to the shader variant info. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen	15046ef7c8	radv: use last nir shader to determine stage in postprocessing Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-04 10:52:26 +00:00
Bas Nieuwenhuizen	7469516244	radv: Merge rsrc1/rsrc2 fields with the config fields. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-04 10:52:26 +00:00
Andres Gomez	4000428ada	vulkan: Update headers to 1.1.113 Some headers were not dragged in the last update(s). Fixes: `465ec0b145` ("vulkan: Update the XML and headers to 1.1.113") Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-07-04 10:37:52 +00:00
Samuel Pitoiset	cce2645810	radv: do not crash when generating binning state for unknown chips These values are only useful if binning is disabled. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-04 12:22:46 +02:00
Samuel Pitoiset	8a425e057d	radv: fix potential crash in the compute resolve path If the destination attachment is UNUSED. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-04 12:22:43 +02:00
Tomeu Vizoso	0cc02c9ea6	panfrost: Take into account off-screen FBOs In that case, ctx->pipe_framebuffer.cbufs[0] can be NULL. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Fixes: `5375d009be` ("panfrost: Pass referenced BOs to the SUBMIT ioctls")	2019-07-04 10:48:09 +02:00
Christian Gmeiner	f39a7fd627	util/macros: rework DIV_ROUND_UP macro Simplify used math. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-07-04 10:21:32 +02:00
Christian Gmeiner	e519d3c239	gitlab-ci: bump required libdrm version Fixes following build problem: Message: libdrm 2.4.99 needed because amdgpu has the highest requirement Dependency libdrm_intel found: NO found '2.4.97' but need: '>=2.4.99' Dependency libdrm_intel found: NO meson.build:1178:4: ERROR: Invalid version of dependency, need 'libdrm_intel' ['>=2.4.99'] found '2.4.97'. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-07-04 09:55:10 +02:00
Kenneth Graunke	9ea67f0a79	iris: Fix MOCS for grid surface Hardcoding 4 is bad; we have a function for this now.	2019-07-03 22:24:50 -07:00
Kenneth Graunke	10560f8506	iris: Minor tidying	2019-07-03 22:24:44 -07:00
Marek Olšák	6ab23805c3	Revert "mesa/st: Passthrough scissor when clearing by quad" This reverts commit `0a88aa3025`. It breaks a lot of piglit tests.	2019-07-04 01:08:02 -04:00
Marek Olšák	8dfdf5aae4	gallium/u_blitter: add return to fix the build	2019-07-03 23:44:14 -04:00
Alyssa Rosenzweig	0a88aa3025	mesa/st: Passthrough scissor when clearing by quad The scissor state -is- setup, but the scissor test is not enabled. This can prevent certain optimizations from occurring on tilers where unaffected tiles are thrown out entirely. v2: Only enable scissor test if the scissor test is actually set by the app, to avoid regressing quad-based clears used for other reasons (like a color mask). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-07-03 14:33:46 -07:00
Nicolai Hähnle	8845a23698	amd: add NAVI10 PCI IDs Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	92e34568b7	radeonsi/gfx10: fix legacy GS LLVM doesn't insert s_waitcnt_vscnt before GS_DONE. There was also the crash in legacy GS copy shader. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Nicolai Hähnle	dfa8e758c2	radeonsi/gfx10: disable clear state Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Nicolai Hähnle	0dd57f0fc0	radeonsi/gfx10: disable DPBB Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Nicolai Hähnle	815fd77a47	radeonsi/gfx10: disable SDMA Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	f66ee5af2f	radeonsi: determine the rasterization primitive type accurately (v2) v2: reworked version to fix bugs and make it more efficient Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	a4b3eea325	radeonsi/gfx10: consolidate & improve input_prim determination for NGG Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	969e5176c2	ac: rework ac_build_waitcnt for gfx10 Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	214ddfb688	radeonsi/gfx10: implement si_shader_vs Only used with tessellation + GS instancing. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	6cf2fb1fc4	radeonsi/gfx10: unpack GS invocation ID Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	32694456f7	radeonsi/gfx10: jump over the shader query atomic if the queries are disabled Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	244a8e6798	radeonsi/gfx10: cosmetic changes Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	09a905d930	radeonsi/gfx10: set cache control registers Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	b680f723f8	radeonsi/gfx10: export correct PrimitiveID from NGG vertex shaders Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	3203a74dcb	radeonsi/gfx10: set PA_SC_TILE_STEERING_OVERRIDE Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	07aacdbfd5	radeonsi/gfx10: add a workaround for stencil HTILE with mipmapping Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	51db950419	radeonsi/gfx10: disable DCC with MSAA It was only enabled for 2x MSAA anyway. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	6920f09f4b	radeonsi/gfx10: fix GL_LINE polygon mode for decomposed primitives We need to tell PA to accept edge flags generated by the input assembler, because decomposed primitives shouldn't draw inner edges. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	e39d4594da	radeonsi/gfx10: fix NGG GS color clamping Just need to pass the input from ES to GS. Everything else is done. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	40e7c65590	radeonsi/gfx10: fix vertex color clamping for TES Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	cc7875150a	radeonsi/gfx10: unbind NGG shaders when destroyed This fixes glsl-max-varyings, which creates shaders, draws, and then destroys them. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	b90ddff477	radeonsi/gfx10: don't use the GS workaround for triangle strips w/ adjancency Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	c3ac22a620	radeonsi/gfx10: don't do the query buffer atomic for blit shaders Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	adbec817d3	radeonsi/gfx10: update spi_map if API VS (as NGG) changes and PS doesn't Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	1e39c21c23	radeonsi/gfx10: fix a possible hang with exp pos0 with done=0 and exec=0 Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	683cf11b81	radeonsi/gfx10: prefetch HW GS when NGG is used Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Nicolai Hähnle	76898a8062	amd/common/gfx10: set DLC for llvm.amdgcn.s.buffer.load Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	7f71579064	radeonsi/gfx10: fix PS exports for SPI_SHADER_32_AR Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	4bdf44724f	radeonsi/gfx10: set DLC for loads when GLC is set This fixes L1 shader array cache coherency. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	f81aa6b0c8	radeonsi/gfx10: fix shader images Don't promote 2D image instructions to 3D, and don't set z=BASE_ARRAY. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	7c805a7c67	radeonsi/gfx10: set the DCC constant encoding flag Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	6eb219e963	radeonsi/gfx10: fix intensity formats move the ALPHA_IS_ON_MSB fixup into vi_alpha_is_on_msb Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	6944f99176	radeonsi/gfx10: allocate GDS BOs for streamout Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Marek Olšák	395185912d	radeonsi/gfx10: make sure GDS is idle between IBs Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Nicolai Hähnle	5ff3aff0d6	radeonsi/gfx10: implement streamout Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Nicolai Hähnle	792a638b03	radeonsi/gfx10: implement streamout-related queries The NGG hardware pipeline doesn't track these statistics automatically, and in fact cannot track them automatically when API geometry shaders are involved, so we accumulate statistics in the shader using atomic adds. This implementation accumulates statistics via the memory system and the RW buffer descriptor setup. We could use GDS, but since these atomics aren't latency-sensitive, that basically just trades off L2$ bandwidth vs. export bus bandwidth. One single memory transaction per shader workgroup doesn't seem too bad. The result ring buffer in memory is needed either way to avoid pipeline stalls. The shader code contains the atomic unconditionally, though the GFX10_GS_QUERY_BUF is a null buffer when no queries are active. The atomic is simply discarded by the shader hardware in that case. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Nicolai Hähnle	bcd2d2e194	radeonsi/gfx10: enable the workaround for unaligned vertex fetch Yes, really. Note that non-format buffer loads are unaffected and work just fine with unaligned pointers (as long as SH_MEM_CONFIG is setup correctly, which amdgpu ensures). Fixes e.g. KHR-GL45.vertex_attrib_64bit.vao Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Nicolai Hähnle	22b85bfc02	radeonsi/gfx10: re-order the initialization order in si_compile_tgsi_main It's useful to be able to access gs_ngg_scratch before creating the main wrapping branch. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Nicolai Hähnle	3aa622aab1	radeonsi/gfx10: apply DCC MSAA blend workaround Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Nicolai Hähnle	bc25ccfe22	radeonsi/gfx10: implement si_emit_global_shader_pointers Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Nicolai Hähnle	6bcc273de8	radeonsi/gfx10: implement si_init_tess_factor_ring Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Nicolai Hähnle	2492cfde66	radeonsi/gfx10: initialize EXEC for TES-as-NGG (without geometry shader) Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Nicolai Hähnle	591537c7fa	radeonsi/gfx10: use correct VGPR for instance ID in LS shader Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Nicolai Hähnle	f3b9a37278	radeonsi/gfx10: implement si_shader_hs Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Nicolai Hähnle	e4d6b4daae	radeonsi/gfx10: implement si_create_sampler_state Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Nicolai Hähnle	0bf3e6fae7	radeonsi/gfx10: double the number of tessellation offchip buffers per SE Each gfx10 shader engine corresponds to two gfx9 shader engines, so scale the number of offchip buffers accordingly. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Nicolai Hähnle	2afd3c421d	radeonsi/gfx10: implement get_tess_ring_descriptor Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Nicolai Hähnle	d028440f57	radeonsi/gfx10: mask DCC tile swizzle by alignment DCC alignment can be less than the alignment of the main surface. In that case, the DCC tile swizzle needs to be masked accordingly. Should have no impact on pre-gfx10. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Nicolai Hähnle	1666ee183e	radeonsi/gfx10: implement hardware MSAA resolve MSAA is only supported for 64KB_{R,Z}_X modes, so the micro tile optimization that we use on gfx9 and earlier does not work. Be very explicit about how the swizzle mode of the temporary surface is selected. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:13 -04:00
Nicolai Hähnle	69c41fb8ff	radeonsi/gfx10: fix binding on si_update_scratch_relocs Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	fd8758366b	radeonsi/gfx10: set llvm_has_working_vgpr_indexing Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	48810ad02d	radeonsi/gfx10: implement load_const_buffer_desc_fast_path Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	1b11fb148c	radeonsi/gfx10: take PRIMID from the correct output when exported by GS Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	8060339278	radeonsi/gfx10: change location of instance ID shader input Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	ccdf792910	radeonsi/gfx10: set USER_DATA_ADDR offset for geometry shaders Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	00707922d4	radeonsi/gfx10: implement si_emit_derived_tess_state Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	e0c2a4d58c	radeonsi/gfx10: implement si_shader_gs This is only used in the legacy, non-NGG path. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	2864d53deb	radeonsi/gfx10: implement preload_ring_buffers Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	56cab3e996	radeonsi/gfx10: implement si_set_ring_buffer Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	3c1aeb834f	radeonsi/gfx10: allow rectangle outputs from NGG primitive shader Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	77e715541c	radeonsi/gfx10: emit VGT_GS_OUT_PRIM_TYPE from draw and add it to VS_STATE With NGG, the VGT_GS_OUT_PRIM_TYPE can change without a shader change. The VS_STATE is required for both streamout and culling from a vertex shader without pre-compiling outprim-specific variants. We could consider compiling specialized variants in the future. We could also consider compiling the NGG logic as an epilog. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	4ecc39e1aa	radeonsi/gfx10: NGG geometry shader PM4 and upload Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	a04aa4be2b	radeonsi/gfx10: generate geometry shaders for NGG Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	efe1cd4859	radeonsi/gfx10: use the correct register for image descriptor dumping Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	1ce52c1e37	radeonsi/gfx10: emit GE_CNTL instead of IA_MULTI_VGT_PARAM for legacy mode Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	77c0f9e7ba	radeonsi/gfx10: initialize GE_{MAX,MIN}_VTX_INDX/INDX_OFFSET Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	47c9505a92	radeonsi/gfx10: setup registers for OpenGL compute Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	b8d3fd46d6	radeonsi/gfx10: set user data base registers Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	016a465d7d	radeonsi/gfx10: implement gfx10_shader_ngg For pipelines without API GS. We will later expand this to cover NGG geometry shaders as well. Note that the vtx offset passed into the GS part is just the vertex index multiplied by VGT_ESGS_RING_ITEMSIZE. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	d0c204a1e0	radeonsi/gfx10: add NGG registers to si_init_config Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	ae00cae0b7	radeonsi/gfx10: update shader-related fields in si_init_config Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	1dee01ee13	radeonsi/gfx10: implement si_shader_ps Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	612489bd5d	radeonsi/gfx10: generate VS and TES as NGG merged ESGS shaders This does not support geometry shading yet. Also missing are streamout and NGG-specific optimizations. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	e86256c512	radeonsi/gfx10: distinguish between merged shaders and multi-part shaders Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	4063ea95e9	radeonsi/gfx10: update si_get_shader_name Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	8ec60d3031	radeonsi/gfx10: add as_ngg shader key bit Also add the shader main part NGG variant, so that in principle we can switch between legacy in NGG modes. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	40b12c0f5a	radeonsi/gfx10: implement si_update_shaders Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	5726ec0d24	radeonsi/gfx10: implement si_build_vgt_shader_config Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	b45c3debe8	radeonsi/gfx10: keep track of whether NGG is used We always use NGG by default, except when tessellation is enabled with extreme geometry shader amplification. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	226f650d92	radeonsi/gfx10: document NGG shader stages Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	7bb9bb0540	radeonsi/gfx10: implement gfx10_emit_cache_flush Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	0c6c6810bd	radeonsi/gfx10: add si_context::emit_cache_flush The introduction of GCR_CNTL makes cache flush handling on gfx10 sufficiently different that it makes sense to just use a separate function. Since emit_cache_flush is called quite early during context init, we initialize the pointer explicitly in si_create_context. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	08e2a62b07	radeonsi/gfx10: implement DB registers Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	372652bccc	radeonsi/gfx10: set CB registers Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	44adae42ae	radeonsi/gfx10: always set up sample locations Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	79b1eaf2fd	radeonsi/gfx10: use Z32_FLOAT_CLAMP for upgraded depth textures Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	c049a6f895	radeonsi/gfx10: implement vertex format changes Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	62f73d8214	radeonsi/gfx10: implement si_set_{constant,shader}_buffer Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	21ac1da0d1	radeonsi/gfx10: implement si_make_buffer_descriptor Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	7bc818aef1	radeonsi/gfx10: implement si_set_mutable_tex_desc_fields Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	8598a999ea	radeonsi/gfx10: gfx10 can render up to 8192 layers Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	3f2b2b52d0	radeonsi/gfx10: add gfx10_make_texture_descriptor Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	595a7f7c47	radeonsi/gfx10: add pipe_screen::make_texture_descriptor Texture descriptors in gfx10 are very different. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	4afce5efdd	radeonsi/gfx10: determine view->is_integer based on the pipe_format It was convenient, but NUM_FORMAT no longer exists in gfx10. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	3163db3ba4	radeonsi/gfx10: implement si_is_format_supported Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	0ffa2292b3	radeonsi/gfx10: generate gfx10_format_table.h Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	af29ad7cc6	radeonsi/gfx10: set MAX_ALLOC_COUNT The number for Vega was copied from PAL and has no effect because of MIN2. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	594010e366	radeonsi/gfx10: require LLVM 9 Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Boyuan Zhang	de99e0a563	radeon/vcn: update for new vcn enc interface Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Boyuan Zhang	9ab1e427bb	radeonsi: enable jpeg decode for navi10 Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Boyuan Zhang	6480c7b577	radeon/vcn: implement vcn 2.0 jpeg decode Use direct register to implement vcn 2.0 jpeg deocde Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Boyuan Zhang	0cd7953ece	radeon/vcn: add direct register bool VCN 2.0 uses direct register space where VCN 1.0 uses some indirect registers Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Boyuan Zhang	7a5c22d32a	radeon/vcn: add defines for vcn 2.0 jpeg Add neccesary register defines for vcn 2.0 jpeg deocde Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Boyuan Zhang	0c27971157	radeon/vcn: use variable to assign ib cmd Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Boyuan Zhang	587b9c5dae	radeon/vcn: implement vcn 2.0 encode Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Boyuan Zhang	40e1bed389	radeon/vcn: add vcn2.0 encode skeleton Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> (v2: build fix -- Nicolai) Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Boyuan Zhang	8f6272d494	radeon/vcn: move vcn1.0 specific defines to c Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Boyuan Zhang	b5287a9fa6	radeon/vcn: assign function pointer with ib functions Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Boyuan Zhang	9940a6e066	radeon/vcn: add function pointer for ib functions Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Boyuan Zhang	c6b5188505	radeon/vcn: move header related algorithm to vcn_enc Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Boyuan Zhang	dd46740bc2	radeon/vcn: move add buf func to common file Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Boyuan Zhang	e6ca4d1bd8	radeon/vcn: move cs defines to enc header file Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Leo Liu	874881b26b	radeon/vcn: add VP9 support for Navi10 It requires bigger DPB and context buffers Signed-off-by: Leo Liu <leo.liu@amd.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Leo Liu	9bbb546c4f	radeonsi: enable encode support for newer HW Previously it was Raven only allowed to do so Signed-off-by: Leo Liu <leo.liu@amd.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Leo Liu	d6acd29c9a	radeon/vcn: add VCN2 set of internal registers for IB From VCN2.0, the RBC have different views on the registers Signed-off-by: Leo Liu <leo.liu@amd.com> (v2: rebase -- Nicolai) Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Leo Liu	a38268ea5b	radeonsi/uvd: allow newer HW to create HW decoder Previously it was Raven only allowed to do so Signed-off-by: Leo Liu <leo.liu@amd.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	84e7ee421f	ac/surface/gfx10: allow "rotated" micro mode Standard mode does not support DCC. The R is retconned to "render target" on gfx10. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	a66be784c3	ac/surface/gfx10: DCC is only supported with SW_64KB_{Z,R}_X modes Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	97ddcfff7c	amd/addrlib/gfx10: forbid DCC for swizzle modes which the hardware does not support Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	9eb4a79345	amd/addrlib/gfx10: fix assertion in Addr2IsValidDisplaySwizzleMode Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	6d416ac7e1	amd/common/gfx10: print gfx10 registers in debug dumps Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	70fd27d1e3	amd/common/gfx10: CMASK is only used for FMASK All regular color compression is done via DCC. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	b52bf8f12a	amd/common/gfx10: support new tbuffer encoding Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	c067aaa580	amd/common/gfx10: pad shader buffers for instruction prefetch Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	227c29a80d	amd/common/gfx10: implement scan & reduce operations Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	7ba80c1d19	amd/common/gfx10: add GS_ALLOC_REQ message define Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	4c364c89e2	amd/common/gfx10: print out GCR_CNTL as part of {ACQUIRE,RELEASE}_MEM Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	74a26af913	amd/common/gfx10: add register JSON A small number of fields now need new disambiguation. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	536782b0b7	amd/common: add GFX10 chips Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Marek Olšák	677bb80c98	meson: require libdrm_amdgpu 2.4.99 for Navi Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	db7e7a6cb5	radv: gfx10 is not supported Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Marek Olšák	78cdf9a99f	amd/addrlib: add gfx10 support Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	112bf7f900	radeonsi: make emit_streamout_output externally accessible Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	e241b405ca	radeonsi: pass the context to query destroy functions We'll need this in the future. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	064f195ef0	radeonsi: make si_restore_qbo_state externally available Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	04e27ec136	radeonsi: make get_primitive_id externally visible Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	5059a4df8a	radeonsi: make si_llvm_export_vs externally available Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Nicolai Hähnle	4a774ba893	radeonsi: various si_translate_*format functions only apply to pre-gfx10 Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 15:51:12 -04:00
Marek Olšák	c53e6ea05d	radeonsi: use a fragment shader blit instead of DB->CB copy for ZS CPU mappings This mainly removes and simplifies code that is no longer needed. There were some issues with the DB->CB stencil copy on gfx10, so let's just use a fragment shader blit for all ZS mappings. It's more reliable. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2019-07-03 15:51:12 -04:00
Marek Olšák	6686d8a130	gallium/u_blitter: implement copying from ZS to color and vice versa This is for drivers that can't map depth and stencil and need to blit them to a color texture for CPU access. This also useful for drivers using separate depth and stencil. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2019-07-03 15:51:12 -04:00
Marek Olšák	13a5e9d685	gallium/util: rewrite depth-stencil blit shaders - merge all 3 functions (Z, S, ZS) - don't write the color output - read the value from texel.x, then write it to position.z or stencil.y (don't use the value from texel.y or texel.z) Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2019-07-03 15:51:12 -04:00
Marek Olšák	131d40cfc9	st/mesa: accelerate glCopyPixels(STENCIL) Tested-by: Dieter Nützel	2019-07-03 15:50:04 -04:00
Yevhenii Kolesnikov	65dc4db08e	glsl/standalone: meson test for --dump-builder Added meson test for standalone compiler with --dump-builder option on builtin texture* functions. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107767 Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-07-03 12:13:37 -07:00
Sergii Romantsov	9f85b4940c	glsl/standalone: exit on unsupported texture functions glsl/standalone with --dump-builder will exit when unsupported texture functions are encountered. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107767 Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com> Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-07-03 12:13:37 -07:00
Pierre-Eric Pelloux-Prayer	ea5b7de138	radeonsi: make gl_SampleMaskIn = 0x1 when MSAA is disabled gl_SampleMaskIn is 1 when R_028BE0_PA_SC_AA_CONFIG is 0, so this commit rework the conditions controlling this register. Before it was set if the sctx->framebuffer had a sample count > 1. Now we still require this condition, but we also need either: - GL_MULTISAMPLE to be enabled - to be executing an operation that doesn't depends on GL state using u_blitter. This fixes the arb_sample_shading/sample_mask piglit tests on radeonsi. Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-07-03 14:59:21 -04:00
Brian Paul	7bb3d6acec	gallium/u_blitter: enable MSAA when blitting to MSAA surfaces If we're doing a Z -> Z MSAA blit (for example) we need to enable msaa rasterization when drawing the quads so that we can properly write the per-sample values. This fixes a number of Piglit ext_framebuffer_multisample blit tests such as ext_framebuffer_multisample/no-color 2 depth combined with the VMware driver. Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-07-03 14:59:15 -04:00
Alexandros Frantzis	e5be4351c2	virgl: Clear the valid buffer range when possible If we are discarding the whole resource, we don't care about previous contents, and the resource storage is now unused, either because we have created new resource storage, or because we have waited for the existing resource storage to become unused, or because the transfer is unsynchronized. In the last two cases this commit marks the storage as uninitialized, but only if the resource is not host writable (in which case we can't clear the valid range, since that would result in missed readbacks in future transfers). In the first case, when the whole resource discard involves a reallocation, the reallocation and subsequent rebinding already update the valid buffer range appropriately. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-07-03 09:59:55 -07:00
Jan Zielinski	243db4980c	swr/swr: Enable ARB_viewport_array The rasterizer core supported ARB_viewport_array, but the swr layer connecting core to Gallium state tracker only allowed one viewport. We add support for multiple viewports to swr layer. Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-07-03 14:43:28 +02:00
Bas Nieuwenhuizen	c6cb9b197d	radv: Support VK_EXT_queue_family_foreign. Basically same as external for now. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Only case we might need to handle differently in the near future is Raven's case of displayable DCC which is not renderable. But we don't support that yet.	2019-07-03 10:56:21 +00:00
Bas Nieuwenhuizen	8a053254b8	radv: Fix interactions between variable descriptor count and inline uniform blocks. Fixes: `d7e6541cc7` "radv: Only allocate supplied number of descriptors when variable." Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-03 10:43:35 +00:00
Michel Dänzer	11a3679e3a	winsys/amdgpu: Make KMS handles valid for original DRM file descriptor Getting a DMA-buf fd and converting that to a handle using our duplicate of that file descriptor (getting at which requires passing a radeon_winsys pointer to the buffer_get_handle hook) makes sure of this, since duplicated file descriptors reference the same file description and therefore the same GEM handle namespace. This is necessary because libdrm_amdgpu may use a different DRM file descriptor with a separate handle namespace internally, e.g. because it always reuses any existing amdgpu_device_handle for the same device. amdgpu_bo_export returns a handle which is valid for that internal file descriptor. Bugzilla: https://bugs.freedesktop.org/110903 Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-03 09:19:07 +00:00
Michel Dänzer	cb446dc0fa	winsys/amdgpu: Add amdgpu_screen_winsys It extends pipe_screen / radeon_winsys and references amdgpu_winsys. Multiple amdgpu_screen_winsys instances may reference the same amdgpu_winsys instance, which corresponds to an amdgpu_device_handle. The purpose of amdgpu_screen_winsys is to keep a duplicate of the DRM file descriptor passed to amdgpu_winsys_create, which will be needed in the next change. v2: * Add comment in amdgpu_winsys_unref explaining why it always returns true (Marek Olšák) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-03 09:19:07 +00:00
Michel Dänzer	6fce296400	winsys/amdgpu: Use amdgpu_winsys helper instead of open-coded casts Cleanup to prevent breakage with the next change, no functional change intended in this one. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-07-03 09:19:07 +00:00
Juan A. Suarez Romero	e06bc0b166	intel: fix wrong format usage Do not use the view format when filling the surface state. Fixes dEQP-VK.image.texel_view_compatible.compute.extended.texture.* Fixes: `fb1350c76f` ("intel: Add and use helpers for level0 extent") Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-03 10:14:54 +02:00
Samuel Pitoiset	a7b6a869a7	radv: only allocate a 32-bit value for the TC-compat range metadata Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 08:52:01 +02:00
Samuel Pitoiset	6baa453dd5	radv: remove unused code in radv_update_tc_compat_zrange_metadata() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 08:51:58 +02:00
Samuel Pitoiset	a21f23c811	radv: add radv_get_depth_pipeline() helper Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-03 08:51:42 +02:00
Mike Blumenkrantz	e005470466	iris: assert isl_surf_init success in resource_from_handle this can fail unexpectedly due to bugs, so it's good to provide feedback when this occurs Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-07-02 15:39:44 -07:00
Jason Ekstrand	e708261cb7	anv: Advertise a more accurate minTexelBufferOffsetAlignment Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-02 22:28:44 +00:00
Jason Ekstrand	0bc657f2db	anv: Implement VK_EXT_texel_buffer_alignment Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-02 22:28:44 +00:00
Jason Ekstrand	465ec0b145	vulkan: Update the XML and headers to 1.1.113 Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-07-02 22:28:44 +00:00
Caio Marcelo de Oliveira Filho	050eb6389a	spirv: Ignore ArrayStride in OpPtrAccessChain for Workgroup From OpPtrAccessChain description in the SPIR-V spec (1.4 rev 1): For objects in the Uniform, StorageBuffer, or PushConstant storage classes, the element’s address or location is calculated using a stride, which will be the Base-type’s Array Stride when the Base type is decorated with ArrayStride. For all other objects, the implementation will calculate the element’s address or location. For non-CL shaders the driver should layout the Workgroup storage class, so override any explicitly set ArrayStride in the shader. This currently fixes only the lower_workgroup_access_to_offsets case, which is used by anv. Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>	2019-07-02 12:15:01 -07:00
Karol Herbst	95a7fd0f10	nouveau: handle new CAPS Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-07-02 20:09:44 +02:00
Jason Ekstrand	fa869f45c8	intel/fs: Use nir_lower_interpolation on gen11+ On gen11, the removed the PLN instruction so we have to emit a pile of MAD to emulate it. We may as well do that in NIR so we can optimize and later schedule it. Shader-db results on Ice Lake: total instructions in shared programs: 17145644 -> 16556440 (-3.44%) instructions in affected programs: 11507454 -> 10918250 (-5.12%) helped: 35763 HURT: 42085 helped stats (abs) min: 1 max: 140 x̄: 19.09 x̃: 18 helped stats (rel) min: 0.04% max: 37.93% x̄: 15.40% x̃: 14.49% HURT stats (abs) min: 1 max: 248 x̄: 2.22 x̃: 2 HURT stats (rel) min: 0.05% max: 50.00% x̄: 5.00% x̃: 2.47% 95% mean confidence interval for instructions value: -7.67 -7.47 95% mean confidence interval for instructions %-change: -4.46% -4.29% Instructions are helped. total loops in shared programs: 4370 -> 4370 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 360624645 -> 368220857 (2.11%) cycles in affected programs: 269631244 -> 277227456 (2.82%) helped: 15583 HURT: 65874 helped stats (abs) min: 1 max: 28561 x̄: 78.45 x̃: 32 helped stats (rel) min: <.01% max: 67.81% x̄: 5.38% x̃: 2.44% HURT stats (abs) min: 1 max: 238638 x̄: 133.87 x̃: 20 HURT stats (rel) min: <.01% max: 306.25% x̄: 5.81% x̃: 3.97% 95% mean confidence interval for cycles value: 67.42 119.09 95% mean confidence interval for cycles %-change: 3.61% 3.73% Cycles are HURT. total spills in shared programs: 8943 -> 8981 (0.42%) spills in affected programs: 1925 -> 1963 (1.97%) helped: 44 HURT: 14 total fills in shared programs: 21815 -> 21925 (0.50%) fills in affected programs: 3511 -> 3621 (3.13%) helped: 41 HURT: 18 LOST: 70 GAINED: 14 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-02 16:15:25 +00:00
Jason Ekstrand	2b79a9e5a5	intel/fs: Implement nir_intrinsic_load_fs_input_interp_deltas Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-02 16:15:25 +00:00
Jason Ekstrand	8e7d066682	intel/fs: Actually implement the load_barycentric intrinsics If they never get used, dead code should clean them up. Also, we rework the at_offset and at_sample intrinsics so they return a proper vec2 instead of returning things in PLN layout. Fortunately, copy-prop is pretty good at cleaning this up and it doesn't result in any actual extra MOVs. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-02 16:15:25 +00:00
Rob Clark	5787a2dfe3	nir: add pass to lower load_interpolated_input Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-02 16:15:25 +00:00
Boris Brezillon	5375d009be	panfrost: Pass referenced BOs to the SUBMIT ioctls Instead of manually adding the BOs from the various SLAB pools plus the one backing the color FB, we insert them in the BO set attached to the job and let panfrost_drm_submit_job() pass all BOs from this set to the SUBMIT ioctl. This means we are now passing all referenced BOs and let the scheduler wait on referenced BO fences if needed. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-07-02 15:00:21 +02:00
Boris Brezillon	3557746e0d	panfrost: Make SLAB pool creation rely on BO helpers There's no point duplicating the code, and it will help us simplify the bo_handles[] filling logic in panfrost_drm_submit_job(). Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-07-02 14:59:28 +02:00
Boris Brezillon	c684a79669	panfrost: Add the panfrost_drm_{create,release}_bo() helpers To avoid the panfrost_memory <-> panfrost_bo dance done in panfrost_resource_create_bo() and panfrost_bo_unreference(). Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-07-02 14:58:51 +02:00
Boris Brezillon	948fddfc42	panfrost: Move the mmap BO logic out of panfrost_drm_import_bo() So we can re-use it for the panfrost_drm_create_bo() function we are about to introduce. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-07-02 14:58:51 +02:00
Boris Brezillon	8d4afcdacc	panfrost: Avoid passing winsys handles to import/export BO funcs Let's keep a clear split between ioctl wrappers and the rest of the driver. All the import BO function need is a dmabuf FD and the screen object, and the export one should only take care of generating a dmabuf FD out of a BO object. Winsys handle manipulation should stay in the resource.c file. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-07-02 14:58:51 +02:00
Boris Brezillon	aa5bc35f31	panfrost: Move BO meta-data out of panfrost_bo That's what most (all?) implementation seem to do, and my understanding is that a BO is just a bunch of memory that can be used for anything GPU related, not only texture/FB resources. Let's move those meta data in panfrost_resource so we can use panfrost_bo for all kind of memory allocation and make BO allocation more consistent. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-07-02 14:58:51 +02:00
Boris Brezillon	c4f4193ad4	panfrost: Stop exposing internal panfrost_drm_*() functions panfrost_drm_submit_job() and panfrost_fence_create() are not used outside of pan_drm.c. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-07-02 14:58:51 +02:00
Boris Brezillon	6ba61324f0	panfrost: Get rid of the "free imported BO" logic bo->imported was never set to true which means this path was never taken. Moreover, panfrost_drm_free_imported_bo() is doing missing the munmap() call which seems wrong because the import BO function calls mmap(). Let's just kill this function along with the ->imported field. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-07-02 14:57:35 +02:00
Boris Brezillon	079aaa9c6d	panfrost: Get rid of the panfrost_driver abstraction leftovers Commit `5f81669d88` ("panfrost: Remove the panfrost_driver abstraction") left a few things behind, remove them now. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-07-02 14:57:35 +02:00
Boris Brezillon	6608642d21	panfrost: Move scanout res creation out of panfrost_resource_create() Which improves readability and help us avoid a memory leak. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-07-02 14:57:35 +02:00
Boris Brezillon	873b7b93e8	panfrost: Add the sampled texture BO to the job Otherwise we get random use-after-{free,unmap} errors. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> --- Changes in v2: - Move the panfrost_job_add_bo() call out of the loop	2019-07-02 14:57:35 +02:00
Samuel Pitoiset	6cc213b3c1	radv: enable DCC for layers on GFX8 It's currently only enabled if dcc_slice_size is equal to dcc_slice_fast_clear_size because the driver assumes that portions of multiple layers are contiguous but it's not always true. Still not supported on GFX9. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-02 09:38:02 +02:00
Samuel Pitoiset	233224c7f7	radv: do not enable DCC for mipmapped arrays because performance is worse Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-02 09:38:00 +02:00
Samuel Pitoiset	e41e575e24	radv: implement clearing DCC layers on GFX8 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-02 09:37:56 +02:00
Samuel Pitoiset	e47c68b7b0	radv: merge radv_dcc_clear_level() into radv_clear_dcc() This will help for clearing DCC arrays because we need to know the subresource range. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-02 09:37:51 +02:00
Samuel Pitoiset	f772fe6a11	radv: add support for decompressing DCC layers with compute Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-02 09:37:49 +02:00
Samuel Pitoiset	83297baf2d	ac: compute the DCC fast clear size per slice on GFX8 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-02 09:37:44 +02:00
Samuel Pitoiset	6517d226ac	ac: compute the size of one DCC slice on GFX8 Addrlib doesn't provide this info. Because DCC is linear, at least on GFX8, it's easy to compute the size of one slice. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-02 09:37:41 +02:00
Kenneth Graunke	457a55716e	iris: Defer closing and freeing VMA until buffers are idle. There will unfortunately be circumstances where we cannot re-use a virtual memory address until it's no longer active on the GPU. To facilitate this, we instead move BOs to a "dead" list, and defer closing them and returning their VMA until they are idle. We periodically sweep these away in cleanup_bo_cache, which triggers every time a new object's refcount hits zero. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Tested-by: Jordan Justen <jordan.l.justen@intel.com>	2019-07-02 07:23:55 +00:00
Kenneth Graunke	07f3455664	iris: Add an explicit alignment parameter to iris_bo_alloc_tiled(). In the future, some images will need to be aligned to a larger value than 4096. Most buffers, however, don't have any such requirement, so for now we only add the parameter to iris_bo_alloc_tiled() and leave the others with the simpler interface. v2: Fix missing alignment in vma_alloc, caught by Caio! Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Tested-by: Jordan Justen <jordan.l.justen@intel.com>	2019-07-02 07:23:55 +00:00
Iago Toral Quiroga	042aeffd5b	v3d: do not flush jobs that are synced with 'Wait for transform feedback' Generally, we achieve this by skipping the flush on calls to v3d_flush_jobs_writing_resource() when we detect that the resource is written in the current job from a transform feedback write. The exception to this is the case where the caller is about to map the resource, in which case we need to flush immediately since we can only emit 'Wait for transform feedback' commands on rendering jobs. We add a parameter to the function so the caller can identify that scenario. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-02 08:57:20 +02:00
Iago Toral Quiroga	88cbc4f7f6	v3d: emit 'Wait for transform feedback' commands when needed The hardware can flush transform feedback writes before reads in the same job by inserting this command. This patch detects when the rendering state for the current draw call reads resources that had been previously written by transform feedback in the same job and inserts the 'Wait for transform feedback' command before emitting the new draw. v2 (Eric): - this was intended to look at job->tf_write_prscs for TF jobs. - clear job->tf_write_prscs after we emit the TF flush. - can skip flushes for fragment shader reads from TF. v3 (Eric): - all resources in job->tf_write_prscs are resources written by TF so we don't need to check if they are bound to PIPE_BIND_STREAM_OUTPUT. - documented optimization opportunity for geometry stages. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-02 08:57:20 +02:00
Iago Toral Quiroga	c7dff0e614	v3d: keep track of resources written by transform feedback The hardware provides a feature to sync reads from previous transform feedback writes in the same job so if we use this mechanism we no longer have to flush the job. In order to identify this scenario we need a mechanism to identify resources that are written by transform feedback. v2: use _mesa_pointer_set_create (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-02 08:57:20 +02:00
Mike Blumenkrantz	c8dcc308cc	st/dri: fix typo in format table for GR1616 format the dri image format here should match the fourcc format Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-01 15:17:10 -07:00
Mike Blumenkrantz	08fc14a979	st/dri: pass dri2_format_mapping directly to dri2_create_image_from_winsys this makes the entire struct available for use here Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-01 15:16:56 -07:00
Mike Blumenkrantz	2cc85670a7	mesa/st: simplify format usage in st_bind_egl_image the formats handled in the switch statement will always return an unknown mesa format, so process them directly and leave the default case for other/unknown formats no functional changes Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-07-01 15:16:43 -07:00
Kenneth Graunke	9b1b971491	iris: Use MI_COPY_MEM_MEM for tiny resource_copy_region calls. If our resource_copy_region size is a small number of DWords, then instead of firing up BLORP, we can simply use MI_COPY_MEM_MEM (after a CS stall). We also try and select the optimal batch. Improves performance in Shadow of Mordor on Low settings at 1920x1080 on Skylake GT4e by 0.689096% +/- 0.473968% (n=4). It tries to copy 4 bytes of data to a buffer which was most recently used as a writable compute shader SSBO. Previously we were switching from compute to the render pipeline, then firing up all of blorp_buffer_copy...for 4 bytes. I arbitrarily decided to support 4/8/12/16 bytes. Jason thinks this is about the right threshold where it's cheaper to use MI_COPY_MEM_MEM.	2019-07-01 13:59:49 -07:00
Bas Nieuwenhuizen	d7e6541cc7	radv: Only allocate supplied number of descriptors when variable. Fixes: `b5e04e9217` "radv: Support allocating variable size descriptor sets." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111019 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-07-01 20:53:33 +02:00
Eric Engestrom	177c35bf13	egl: simplify loop Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Sagar Ghuge<sagar.ghuge@intel.com>	2019-07-01 19:35:22 +01:00
Eric Anholt	67ffb853f0	sparc: Reuse m_vector_asm.h. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-07-01 11:14:29 -07:00
Eric Anholt	20294dceeb	mesa: Enable asm unconditionally, now that gen_matypes is gone. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-01 11:14:10 -07:00
Eric Anholt	52a39a332f	mesa: Replace gen_matypes with a simple header for V4F/mat layout. We can greatly simplify our builds by just hardcoding GLvector4f and GLmatrix's layouts. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-07-01 11:12:15 -07:00
Eric Anholt	1738b38ce8	matypes: Drop some unused defines. Most of these haven't been used since the conversion from checked-in matypes to generation. By cutting down the generated contents, this should clarify why the file is generated: we need architecture-specific offsets to the V4F fields in the asm that uses it. v2: Keep matrix offsets to prevent x86 build breakage.. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-07-01 11:09:26 -07:00
Eric Engestrom	1835f30097	meson: drop duplicate source & inc_dir These two are already pulled from `idep_vulkan_util_headers`. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-07-01 18:53:57 +01:00
Eric Engestrom	04e0ac59b1	swrast: simplify function pointer calls Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>	2019-07-01 18:51:49 +01:00
Eric Engestrom	fbf7c38da3	egl/wayland: use bitset.h for `formats` bit set Currently only 7 formats are supported, but we don't want the 16 limit (it's an `unsigned`) to hit us by surprise :] Let's use bitset.h's BITSET magic to allow us to have any number of formats, with a static assert to make sure we don't forget to update it. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-07-01 18:35:54 +01:00
Sagar Ghuge	d5f63990b4	intel/tools: Add assembler unit tests for ROL/ROR instructions Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-01 10:14:22 -07:00
Sagar Ghuge	e9c35dd7cc	intel/tools: Add ROL/ROR support in assembler Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-01 10:14:22 -07:00
Sagar Ghuge	456557a837	nir: Add lower_rotate flag and set to true in all drivers Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Suggested-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-01 10:14:22 -07:00
Sagar Ghuge	1e92e83856	intel/compiler: Emit ROR and ROL instruction v2: Reorder patch (Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-01 10:14:22 -07:00
Sagar Ghuge	80117117bd	nir: Add optimization to use ROR/ROL instructions v2: 1) Add more optimization rules for ROL/ROR (Matt Turner) 2) Add lowering rules for ROL/ROR (Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-01 10:14:22 -07:00
Sagar Ghuge	81d342e2a1	nir: Add urol and uror opcodes Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-01 10:14:22 -07:00
Sagar Ghuge	83fdec0f0d	intel/compiler: Enable the emission of ROR/ROL instructions v2: 1) Drop changes for vec4 backend as on Gen11+ we don't support align16 mode (Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-07-01 10:14:22 -07:00
Alyssa Rosenzweig	8d74749f81	panfrost: Implement instanced rendering We implement GLES3.0 instanced rendering with full support for instanced arrays (via instance divisors). To do so, we use the new invocation helpers to invoke a triplet of (1, vertex_count, instance_count), rather than simply (1, vertex_count, 1). We rewrite the attribute handling code into a new pan_instancing.c file which handles both the simple LINEAR case for non-instanced as well as each of the new instancing cases: MODULO (for per-vertex attributes), POT and NPOT divisors. As a side effect, we rework how vertex buffers are handled, duplicating them to be 1:1 with vertex descriptors to simplify instancing code paths dramatically. This might be a performance regression, but this remains to be seen; if so, we can always deduplicate later with some added logic in pan_instancing.c Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-01 07:50:57 -07:00
Alyssa Rosenzweig	e9e22546ff	panfrost/decode: Compute padded_num_vertices for MODULO Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-01 07:49:18 -07:00
Alyssa Rosenzweig	9b97ed1250	panfrost/midgard: Emit type appropriate ld_vary Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-01 07:42:56 -07:00
Alyssa Rosenzweig	aa333ac6ad	panfrost/midgard: Add unsigned ld/st ops Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-01 07:42:55 -07:00
Alyssa Rosenzweig	bbc050b82e	panfrost/midgard: Use the appropriate ld_attr type Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-01 07:42:55 -07:00
Alyssa Rosenzweig	c9b164f9b5	panfrost: Implement dispatch helpers Rather than open-coding workgroups_shift_* type fields, we include a general routine for packing the vertex/tiler/compute descriptor based on the provided dispatch parameters. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-01 07:42:55 -07:00
Alyssa Rosenzweig	8fd748de3d	panfrost: Remove ancient comment Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-01 07:42:55 -07:00
Alyssa Rosenzweig	9fe4fd8a9c	panfrost: Extend software tiling to larger bpp Should not affect lima. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-07-01 07:40:19 -07:00
Alyssa Rosenzweig	f2801f7775	panfrost: Rewrite u-interleaving code Rather than using a magic lookup table with no explanations, let's add liberal comments to the code to explain what this tiling scheme is and how to encode/decode it efficiently. It's not so mysterious after all -- just reordering bits with some XORs thrown in. v2: Correct copyright identifier. Fix spelling error. Switch space_4 to a LUT. Fix comment typo. Use LUT instead of space_x tricks. Fallback on generic rather than split up unaligned writes. v3: Correct stride order (fixes crash loading). Correct coordinate system mishap. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>	2019-07-01 07:39:51 -07:00
Rob Clark	02893fe73a	freedreno: update generated registers Corrects the a3xx texconst state for TILE_MODE. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-07-01 06:15:52 -07:00
Samuel Pitoiset	d8b079e4c7	radv: rework how the number of VGPRs is computed Just a cleanup, it shouldn't change anything. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-01 14:59:27 +02:00
Samuel Pitoiset	e3baa54195	radv: gather if a vertex shaders needs the instance ID Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-01 14:59:24 +02:00
Samuel Pitoiset	17cb7ea6fc	radv: fix decompressing DCC levels with compute Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-01 14:59:22 +02:00
Samuel Pitoiset	f4d2c47cf6	radv: the number of VGPR_COMP_CNT for GS is expected to be 0 on GFX8 Just move around the switch case. GFX9+ is handled below. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-01 14:59:19 +02:00
Samuel Pitoiset	b4477fa4d4	radv: reduce number of VGPRs for TESS_EVAL if primitive ID is not used We only need to 2. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-01 14:59:17 +02:00
Samuel Pitoiset	cc50c85e13	radv: make sure to mark the image as compressed when clearing DCC levels Found while working on DCC for arrays. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-07-01 14:58:56 +02:00
Michel Dänzer	3fd21a6b77	targets/opencl: Add clangASTMatchers library as dependency Fixes link failure since clang r364424 "[clang/DIVar] Emit the flag for params that have unmodified value", clangCodeGen depends on clangASTMatchers now. Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-07-01 12:54:40 +02:00
Caio Marcelo de Oliveira Filho	5ad283550b	glsl/nir: Lower buffers using Binding instead of Names When using ARB_gl_spirv, the block names are optional and the uniform blocks are referred using Bindings instead. Teach gl_nir_lower_buffers to handle those. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-30 16:58:27 -05:00
Alejandro Piñeiro	2af2235a32	glspirv: Enable the new deref-base UBO/SSBO path on gl_spirv Among other things, it supports arrays of arrays of UBO/SSBO (default codepath doesn't). Acked-by: Timothy Arceri <tarceri@itsqueeze.com> v2: nir_address_format_vk_index_offset got renamed to nir_address_format_32bit_index_offset (after rebase against master) v3: the ptr_type fields in spirv_to_nir_options got changed to be of type nir_address_format. v4: remove phys_ssbo_addr_format and push_const_addr_format as they are not used by glspirv Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-06-30 16:58:27 -05:00
Alejandro Piñeiro	cae501b394	i965: call to gl_nir_link_uniform_blocks When using a SPIR-V shader. Note that needs to be done before linking uniforms, so when creating the uniform storage entries, block_index could be filled properly (among other things). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-30 16:58:27 -05:00
Alejandro Piñeiro	678140e195	i965: use GLboolean for all brw_link_shader returns The function had a mix of true/GL_TRUE and false/GL_FALSE returns. Using GL_TRUE/GL_FALSE as the function returns a GLboolean. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-30 16:58:27 -05:00
Alejandro Piñeiro	a69a48d65a	nir/linker: update already processed uniforms search for UBOs/SSBOs Until now, we were using the uniform explicit location to check if the current nir variable was already processed while adding entries on the uniform storage. But for UBOs/SSBOs, entries are added too but we lack a explicit location. For those we need to rely on the UBO/SSBO binding and the unifor storage block_index. In that case several uniforms would need to be updated at once. v2: (from Timothy review) * Improve wording and fix typos of some long comments. * Rename update_uniform_storage for mark_stage_as_active v3: (from cmarcelo review) * Fixed some comment typos Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-30 16:58:27 -05:00
Alejandro Piñeiro	de05a6ccf5	nir/linker: fill up uniform_storage with explicit data Specifically, offset, stride (coming from arrays or matrices) and row_major. On GLSL, most of that info is computed using the layout qualifier, but on ARB_gl_spirv they are explicit, and for Mesa, included on the glsl_type. From ARB_gl_spirv spec: "Mapping of layouts std140/std430 -> explicit Offset, ArrayStride, and MatrixStride Decoration on struct members"" "7.6.2.spv SPIR-V Uniform Offsets and Strides The SPIR-V decorations GLSLShared or GLSLPacked must not be used. A variable in the Uniform Storage Class decorated as a Block must be explicitly laid out using the Offset, ArrayStride, and MatrixStride decorations" For offset we needed to include the parent and index_in_parent while processing the type, as the offset is maintained on glsl_struct_field of the parent type, not on the type itself. v2: Fix the default values for MATRIX_STRIDE, ARRAY_STRIDE and ROW_MAJOR when the variable is not backed by a buffer object (Antia Puentes). v3: Update after Jason series "SPIR-V: Use NIR deref instructions for UBO/SSBO access" that included just one explicit stride, instead of a previous patch we wrote that had matrix_stride and array_stride (Alejandro) Signed-off-by: Antia Puentes <apuentes@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-30 16:58:27 -05:00
Alejandro Piñeiro	eb50d1d2a6	nir/linker: use only the array element type for array of ssbo/ubo For this interfaces, the inner members are added only once as uniforms or resources, in opposite to other cases, like a uniform array of structs. For those guessing why a issue (16) from ARB_program_interface_query was used, instead of a quote of the core spec: The core spec is not really clear about how members of arrays of blocks should be enumerated. On GLSL this was also problematic, specially when we were trying to pass the 4.5 CTS tests. See commit "glsl: Fix program interface queries relating to interface blocks" (`4c4d9e4f03`), as a reference. That one also needed to rely on issue (16) to justify the change, pointing that the core spec needs to be clarified. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-30 16:58:26 -05:00
Alejandro Piñeiro	eec1d5f801	nir/linker: fill is_shader_storage for uniforms Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-30 16:58:26 -05:00
Alejandro Piñeiro	5723919282	nir/linker: add gl_nir_link_uniform_blocks.c Adding the ability to link uniform blocks and shader storage blocks using NIR, intended for ARB_gl_spirv support. Among other things, this linking needs to take into account that everything should work without names, as they could be not present, while the GLSL IR uniform block linking was wrote with the names on its core. The other major difference compared with the GLSL IR linker is that we don't deal with layouts. There are no references to std140, std430, etc. Layouts are expressed through explicit offset, array stride and matrix stride. That simplifies how the buffer size are computed. But also means that we couldn't use the existing methods at glsl_types, so we needed to implement new methods. It is worth to note that this linking do a iteration over the glsl_types, similarly to what the linking uniforms do. A possible future improvement would be refactor both cases to try to share more code that it sharing right now. On GLSL IR there are a class visitor, specialized on each case, for that sharing. As adding a class visitor on C would more complicated, for now we are just iterating on both. Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Signed-off-by: Neil Roberts <nroberts@igalia.com> Signed-off-by: Antia Puentes <apuentes@igalia.com> v2: (from Timothy review) * Fix variable name convention * Stop to use _function_name convention * Don't use // for comments * "nir/linker: Keep track of the stages referencing an UBO/SSBO" squashed with this patch v3: (from Caio review) * Don't delete the linked shader on failure * Use rzalloc_array to avoid some explicit initializations Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-06-30 16:58:26 -05:00
Alejandro Piñeiro	39f4ef57d6	nir_types: add glsl_type_is_leaf helper Helper used to know when a glsl_type is a leaf when iteraring through a complex type. Note that GLSL IR linking also uses the concept of leaf while doing the same iteration, although in that case it uses a visitor. See link_uniform_blocks, process_array_leaf and others as reference. Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Signed-off-by: Antia Puentes <apuentes@igalia.com> v2: * Moved from gl_nir_linker to nir_types, so it could be used on nir xfb gathering (Timothy Arceri) * Minor update after Timothy's series about record to struct renaming landed master. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-30 16:58:26 -05:00
Alejandro Piñeiro	0019d61527	glsl/nir: add glsl_types::explicit_size plus nir C wrapper While using SPIR-V shaders (ARB_gl_spirv), layout data is not implicit to a specific value (std140, std430, etc) but explicitly included on the type (explicit values for offset, stride and row_major). So this method is equivalent to the existing std140_size and std430_size, but using such explicit values. Note that the value returned by this method is only valid if such data is set, so when dealing with SPIR-V shaders. v2: (all changes suggested by Jason Ekstrand) * Iterate through all struct members, instead of assume that fields are ordered by offset * Use else if * Take into account the case that explicit_stride > elem_size, to fine graine the final size on arrays and matrices * Handle different bit-sizes in general, not just 32 and 64. v3: (change suggested by Caio Marcelo de Oliveira Filho) * fix up explicit_size() to consider interface types Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Signed-off-by: Antia Puentes <apuentes@igalia.com> Signed-off-by: Neil Roberts <nroberts@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-06-30 16:58:26 -05:00
Alejandro Piñeiro	c23522add2	glsl_types: add type::bit_size and glsl_base_type_bit_size helpers Note that the nir_types glsl_get_bit_size is not a wrapper of this one, because for bools at the nir level, we want to return size 1, but at the glsl_types we want to return 32. v2: reuse the new method in order to simplify is_16bit and is_32bit helpers (Timothy) v3: add a comment clarifying the difference between glsl_base_type_bit_size and glsl_get_bit_size. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-30 16:58:26 -05:00
Alejandro Piñeiro	12355c7e91	nir: add is_in_ubo/ssbo/block helpers Equivalent to the already existing ir_variable is_in_buffer_block and is_in_shader_storage_block, adding the uniform buffer object one. I'm using the short forms (ssbo, ubo) to avoid having method names too long. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-30 16:58:26 -05:00
Alejandro Piñeiro	15f134412f	spirv/nir: fill up nir variable info for ubos and ssbo The data for some nir variables is only filled up for some specific modes. We need now too for UBO/SSBO, as such info would be used when linking for OpenGL (ARB_gl_spirv). There is an existing comment just before that code (starts with XXX) that points that binding still needs to be filled up for uniform variables at that point, and that should be fixed, although it doesn't specify why that's a problem or what would be the alternative. For now doing the same for UBO/SSBO, and will hope that the future fixing is done for all of them. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-30 16:58:26 -05:00
Alejandro Piñeiro	7d7ab34d5f	spirv/nir: create nir variable for UBO/SSBO Providing nir variables for UBO/SSBO it is not required for Vulkan, but it is needed for OpenGL (ARB_gl_spirv), like for example, to gather info from the UBO/SSBO while linking. In opposite with most cases where the nir variables is created, here the type assigned is the full type (not just the bare type). This is needed because while linking using the nir shader we need the explicit layout info (explicit stride, explicit offset, row_major, etc). Also, we need to assign an interface type, used also on the OpenGL linker if it is a UBO/SSBO. See ir_variable::is_in_buffer_block as example. v2: assign interface_type to be the variable type, not need to be arrayness (Timothy) Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-30 16:58:26 -05:00
Gert Wollny	75d8b4e795	vl: Use CS composite shader only if TEX_LZ and DIV are supported Enable the compute shader copositer only when TEX_LZ is supported by the driver. v2: Also check whether DIV is supported. https://bugs.freedesktop.org/show_bug.cgi?id=110783 Fixes: `9364d66cb7` gallium/auxiliary/vl: Add video compositor compute shader render Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-30 18:41:38 +02:00
Gert Wollny	843723e2f7	gallium: Add CAP for opcode DIV Not all drivers support TGSI_OPCODE_DIV, so we should have a cap to be able to check this. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-30 18:41:35 +02:00
Gert Wollny	187c308b96	vl: replace DIV-ADD with MAD using inverse size Optimize the shader a bit by emitting MAD with the inverse size values instead of DIV+ADD. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-30 18:41:26 +02:00
Jonathan Marek	89381191a9	etnaviv: blt: blit with the original format when possible This fixes BGR565 blit: currently BGRA444 is used for the blit, but with swizzles from the original BGR565 format, so the 4 alpha bits are set to 1. We can't just use the swizzle from the 'compatible' format, since there are cases where BGR<->RGB swap needs to happen. We can avoid all this trouble by using the original formats and only falling back to the 'compatible' format when we need to. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-29 21:49:50 -04:00
Jonathan Marek	a99a265b14	etnaviv: clear all bits for 24bpp depth without stencil For fast clear to happen, all bits must be cleared. This allows using fast clear for 24bpp depth without stencil. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-29 21:49:50 -04:00
Eric Engestrom	74f064ae90	mesa: use binary search for MESA_EXTENSION_OVERRIDE Not a hot path obviously, but the table still has 425 extensions, which you can go through in just 9 steps with a binary search. The table is already sorted, as required by other parts of the code and enforced by mesa's `main-test`. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-06-30 01:45:36 +01:00
Eric Engestrom	b738d4494c	gitlab-ci: test meson installation Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-29 21:46:37 +00:00
Eric Engestrom	5f9764bc0b	anv: fix indentation Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-29 22:41:06 +01:00
Eric Engestrom	42eb85a9d8	anv: fix typo Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-29 22:41:06 +01:00
Eric Engestrom	38305e6c94	anv: replace hard-coded platform list with vk.xml parse Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-29 22:38:54 +01:00
Chih-Wei Huang	bb75c73e96	android: fix typo LOCAL_EXPORT_C_INCLUDES Should be LOCAL_EXPORT_C_INCLUDE_DIRS. Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw> Tested-by: Mauro Rossi <issor.oruam@gmail.com>	2019-06-29 17:17:49 +02:00
Mauro Rossi	c237654dca	android: virgl: fix generated virgl_driinfo.h building rules Changelog in Android makefile: - Add LOCAL_MODULE_CLASS, intermediates and LOCAL_GENERATED_SOURCES - Use LOCAL_EXPORT_C_INCLUDE_DIRS to export $(intermediates) path - Move generated header rules before 'include $(BUILD_STATIC_LIBRARY)' Fixes the following building error: In file included from external/mesa/src/gallium/targets/dri/target.c:1: external/mesa/src/gallium/auxiliary/target-helpers/drm_helper.h:257:16: fatal error: 'virgl/virgl_driinfo.h' file not found #include "virgl/virgl_driinfo.h" ^~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. Fixes: `cf800998a` ("virgl: Add driinfo file and tie it into the build") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Review-by: Chih-Wei Huang <cwhuang@linux.org.tw>	2019-06-29 16:25:01 +02:00
Lionel Landwerlin	5847de6e9a	intel/compiler: don't use byte operands for src1 on ICL The simulator complains about using byte operands, we also have documentation telling us. Note that add operations on bytes seems to work fine on HW (like ADD). Using dwords operands with CMP & SEL fixes the following tests : dEQP-VK.spirv_assembly.type.vec.i8. v2: Drop the GLK changes (Matt) Add validator tests (Matt) v3: Drop GLK ref (Matt) Don't mix float/integer in MAD (Matt) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> (v1) Reviewed-by: Matt Turner <mattst88@gmail.com> BSpec: 3017 Cc: <mesa-stable@lists.freedesktop.org>	2019-06-29 12:56:09 +00:00
renchenglei	500b45a98a	egl: Enable eglGetPlatformDisplay on Android Platform This helps to add eglGetPlatformDisplay support on Android Platform. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-29 12:20:17 +01:00
Ian Romanick	02c6cd8481	nir/serach: Increase maximum commutative expressions from 4 to 8 No shader-db change on any Intel platform. No shader-db run-time difference on a certain 36-core / 72-thread system at 95% confidence (n=20). Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-28 18:56:19 -07:00
Ian Romanick	1a43cf9a40	nir/algebraic: Don't mark expression with duplicate sources as commutative There is no reason to mark the fmul in the expression ('fmul', ('fadd', a, b), ('fadd', a, b)) as commutative. If a source of an instruction doesn't match one of the ('fadd', a, b) patterns, it won't match the other either. This change is enough to make this pattern work: ('~fadd@32', ('fmul', ('fadd', 1.0, ('fneg', a)), ('fadd', 1.0, ('fneg', a))), ('fmul', ('flrp', a, 1.0, a), b)) This pattern has 5 commutative expressions (versus a limit of 4), but the first fmul does not need to be commutative. No shader-db change on any Intel platform. No shader-db run-time difference on a certain 36-core / 72-thread system at 95% confidence (n=20). There are more subpatterns that could be marked as non-commutative, but detecting these is more challenging. For example, this fadd: ('fadd', ('fmul', a, b), ('fmul', a, c)) The first fadd: ('fmul', ('fadd', a, b), ('fadd', a, b)) And this fadd: ('flt', ('fadd', a, b), 0.0) This last case may be easier to detect. If all sources are variables and they are the only instances of those variables, then the pattern can be marked as non-commutative. It's probably not worth the effort now, but if we end up with some patterns that bump up on the limit again, it may be worth revisiting. v2: Update the comment about the explicit "len(self.sources)" check to be more clear about why it is necessary. Requested by Connor. Many Python fixes style / idom fixes suggested by Dylan. Add missing (!!!) opcode check in Expression::__eq__ method. This bug is the reason the expected number of commutative expressions in the bitfield_reverse pattern changed from 61 to 45 in the first version of this patch. v3: Use all() in Expression::__eq__ method. Suggested by Connor. Revert away from using __eq__ overloads. The "equality" implementation of Constant and Variable needed for commutativity pruning is weaker than the one needed for propagating and validating bit sizes. Using actual equality caused the pruning to fail for my ('fmul', ('fadd', 1, a), ('fadd', 1, a)) case. I changed the name to "equivalent" rather than the previous "same_as" to further differentiate it from __eq__. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-06-28 18:56:19 -07:00
Ian Romanick	cae1af4339	nir/search: Log Boolean constants instead of asserting Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-28 18:56:19 -07:00
Ian Romanick	8d6b35fffd	nir/algebraic: Fail build when too many commutative expressions are used Search patterns that are expected to have too many (e.g., the giant bitfield_reverse pattern) can be added to a white list. This would have saved me a few hours debugging. :( v2: Implement the expected-failure annotation as a property of the search-replace pattern instead of as a property of the whole list of patterns. Suggested by Connor. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-06-28 18:56:19 -07:00
Ian Romanick	57704b8d22	nir/algebraic: Fix whitespace error Trivial Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-06-28 18:56:19 -07:00
Alyssa Rosenzweig	f8fca4fe61	panfrost: Allow R11G11B10 rendering Doesn't fully work yet, but better than crashing. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-28 18:48:13 -07:00
Alyssa Rosenzweig	7692ad19fb	panfrost: Default to util_pack_color for clears This might help as we bringup more render-target formats. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-28 18:48:13 -07:00
Ian Romanick	b04beaf41d	intel/vec4: Try both sources as candidates for being immediates For some reason, when I first wrote try_immediate_source, I thought the sources had already been ordered so that the immediate value was the second source. That's rubbish. The generator assumes neither source is immediate, and it relies on later copy/constant propagation passes to do the reordering. For this reason, the changes to try_immediate_source have to go to some efforts to reorder the operands and tell the caller when it reordered them. The generator for comparison instructions uses this to determine when the comparison needs to change (e.g., from GT to LT). No changes on any Gen8 or later platform because those platforms do not use the vec4 backend. Haswell total instructions in shared programs: 13484431 -> 13480500 (-0.03%) instructions in affected programs: 441138 -> 437207 (-0.89%) helped: 1883 HURT: 0 helped stats (abs) min: 1 max: 49 x̄: 2.09 x̃: 1 helped stats (rel) min: 0.07% max: 8.91% x̄: 1.10% x̃: 0.90% 95% mean confidence interval for instructions value: -2.19 -1.98 95% mean confidence interval for instructions %-change: -1.14% -1.06% Instructions are helped. total cycles in shared programs: 376420286 -> 376406400 (<.01%) cycles in affected programs: 15995668 -> 15981782 (-0.09%) helped: 1692 HURT: 219 helped stats (abs) min: 2 max: 764 x̄: 13.78 x̃: 4 helped stats (rel) min: <.01% max: 9.69% x̄: 0.69% x̃: 0.35% HURT stats (abs) min: 2 max: 516 x̄: 43.09 x̃: 22 HURT stats (rel) min: 0.02% max: 12.09% x̄: 2.30% x̃: 1.13% 95% mean confidence interval for cycles value: -9.70 -4.83 95% mean confidence interval for cycles %-change: -0.42% -0.28% Cycles are helped. total spills in shared programs: 23166 -> 23158 (-0.03%) spills in affected programs: 66 -> 58 (-12.12%) helped: 2 HURT: 0 total fills in shared programs: 34592 -> 34580 (-0.03%) fills in affected programs: 75 -> 63 (-16.00%) helped: 2 HURT: 0 Ivy Bridge total instructions in shared programs: 12051590 -> 12048513 (-0.03%) instructions in affected programs: 355911 -> 352834 (-0.86%) helped: 1481 HURT: 0 helped stats (abs) min: 1 max: 12 x̄: 2.08 x̃: 1 helped stats (rel) min: 0.07% max: 4.92% x̄: 1.08% x̃: 0.90% 95% mean confidence interval for instructions value: -2.17 -1.98 95% mean confidence interval for instructions %-change: -1.12% -1.04% Instructions are helped. total cycles in shared programs: 180319624 -> 180307642 (<.01%) cycles in affected programs: 15591028 -> 15579046 (-0.08%) helped: 1340 HURT: 174 helped stats (abs) min: 2 max: 764 x̄: 14.19 x̃: 2 helped stats (rel) min: <.01% max: 8.68% x̄: 0.64% x̃: 0.32% HURT stats (abs) min: 2 max: 518 x̄: 40.41 x̃: 14 HURT stats (rel) min: 0.02% max: 8.37% x̄: 1.59% x̃: 0.67% 95% mean confidence interval for cycles value: -10.85 -4.97 95% mean confidence interval for cycles %-change: -0.45% -0.31% Cycles are helped. All Gen6 and earlier platforms had simlar results. (Sandy Bridge shown) total instructions in shared programs: 10863159 -> 10861462 (-0.02%) instructions in affected programs: 157839 -> 156142 (-1.08%) helped: 715 HURT: 0 helped stats (abs) min: 1 max: 12 x̄: 2.37 x̃: 2 helped stats (rel) min: 0.23% max: 4.33% x̄: 1.07% x̃: 0.85% 95% mean confidence interval for instructions value: -2.53 -2.21 95% mean confidence interval for instructions %-change: -1.13% -1.02% Instructions are helped. total cycles in shared programs: 153957782 -> 153948778 (<.01%) cycles in affected programs: 3171648 -> 3162644 (-0.28%) helped: 696 HURT: 62 helped stats (abs) min: 2 max: 390 x̄: 15.72 x̃: 4 helped stats (rel) min: 0.02% max: 10.57% x̄: 0.57% x̃: 0.12% HURT stats (abs) min: 2 max: 300 x̄: 31.29 x̃: 2 HURT stats (rel) min: 0.11% max: 7.23% x̄: 0.83% x̃: 0.34% 95% mean confidence interval for cycles value: -15.65 -8.11 95% mean confidence interval for cycles %-change: -0.56% -0.36% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-28 18:13:18 -07:00
Ian Romanick	379cf3bb87	intel/vec4: Try immediate sources for dot products too No changes on any Gen8 or later platform because those platforms do not use the vec4 backend. All Haswell and earlier platforms has similar results. (Haswell shown) total instructions in shared programs: 13484467 -> 13484431 (<.01%) instructions in affected programs: 8540 -> 8504 (-0.42%) helped: 33 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.09 x̃: 1 helped stats (rel) min: 0.31% max: 1.53% x̄: 0.49% x̃: 0.35% 95% mean confidence interval for instructions value: -1.19 -0.99 95% mean confidence interval for instructions %-change: -0.60% -0.38% Instructions are helped. total cycles in shared programs: 376420572 -> 376420286 (<.01%) cycles in affected programs: 56260 -> 55974 (-0.51%) helped: 26 HURT: 5 helped stats (abs) min: 2 max: 204 x̄: 11.85 x̃: 2 helped stats (rel) min: 0.11% max: 3.08% x̄: 0.39% x̃: 0.13% HURT stats (abs) min: 2 max: 6 x̄: 4.40 x̃: 6 HURT stats (rel) min: 0.03% max: 0.35% x̄: 0.24% x̃: 0.35% 95% mean confidence interval for cycles value: -22.91 4.45 95% mean confidence interval for cycles %-change: -0.56% -0.02% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-28 17:16:16 -07:00
Ian Romanick	eeebeb211f	intel/vec4: Try emitting non-scalar immediates Sometimes an instruction has a vector as a source, but all of the components have the same value. For example, vec3 32 ssa_16 = load_const (1.0, 1.0, 1.0) ... vec3 32 ssa_82 = fadd ssa_16, -ssa_81.xyz No changes on any Gen8 or later platform because those platforms do not use the vec4 backend. Haswell total instructions in shared programs: 13487811 -> 13484467 (-0.02%) instructions in affected programs: 421981 -> 418637 (-0.79%) helped: 1859 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 1.80 x̃: 1 helped stats (rel) min: 0.04% max: 9.80% x̄: 1.04% x̃: 0.84% 95% mean confidence interval for instructions value: -1.85 -1.74 95% mean confidence interval for instructions %-change: -1.07% -1.00% Instructions are helped. total cycles in shared programs: 376423252 -> 376420572 (<.01%) cycles in affected programs: 14800970 -> 14798290 (-0.02%) helped: 1519 HURT: 329 helped stats (abs) min: 2 max: 462 x̄: 10.59 x̃: 4 helped stats (rel) min: 0.03% max: 16.73% x̄: 0.79% x̃: 0.36% HURT stats (abs) min: 2 max: 598 x̄: 40.74 x̃: 16 HURT stats (rel) min: <.01% max: 10.32% x̄: 2.56% x̃: 0.98% 95% mean confidence interval for cycles value: -3.53 0.63 95% mean confidence interval for cycles %-change: -0.30% -0.09% Inconclusive result (value mean confidence interval includes 0). total fills in shared programs: 34601 -> 34592 (-0.03%) fills in affected programs: 91 -> 82 (-9.89%) helped: 9 HURT: 0 Ivy Bridge total instructions in shared programs: 12053565 -> 12051626 (-0.02%) instructions in affected programs: 298103 -> 296164 (-0.65%) helped: 1228 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 1.58 x̃: 1 helped stats (rel) min: 0.04% max: 3.57% x̄: 0.91% x̃: 0.81% 95% mean confidence interval for instructions value: -1.63 -1.53 95% mean confidence interval for instructions %-change: -0.95% -0.88% Instructions are helped. total cycles in shared programs: 180322270 -> 180319922 (<.01%) cycles in affected programs: 14123840 -> 14121492 (-0.02%) helped: 1036 HURT: 195 helped stats (abs) min: 2 max: 462 x̄: 11.93 x̃: 2 helped stats (rel) min: 0.03% max: 14.05% x̄: 0.82% x̃: 0.35% HURT stats (abs) min: 2 max: 598 x̄: 51.33 x̃: 16 HURT stats (rel) min: <.01% max: 9.68% x̄: 3.02% x̃: 0.72% 95% mean confidence interval for cycles value: -4.92 1.10 95% mean confidence interval for cycles %-change: -0.35% -0.07% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10864286 -> 10863189 (-0.01%) instructions in affected programs: 159722 -> 158625 (-0.69%) helped: 724 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 1.52 x̃: 1 helped stats (rel) min: 0.10% max: 2.91% x̄: 0.79% x̃: 0.62% 95% mean confidence interval for instructions value: -1.58 -1.46 95% mean confidence interval for instructions %-change: -0.82% -0.75% Instructions are helped. total cycles in shared programs: 153967938 -> 153957926 (<.01%) cycles in affected programs: 1923186 -> 1913174 (-0.52%) helped: 654 HURT: 56 helped stats (abs) min: 2 max: 170 x̄: 20.00 x̃: 4 helped stats (rel) min: 0.03% max: 11.82% x̄: 0.89% x̃: 0.18% HURT stats (abs) min: 2 max: 390 x̄: 54.75 x̃: 32 HURT stats (rel) min: 0.05% max: 6.92% x̄: 3.09% x̃: 2.92% 95% mean confidence interval for cycles value: -17.42 -10.78 95% mean confidence interval for cycles %-change: -0.76% -0.40% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8142677 -> 8141721 (-0.01%) instructions in affected programs: 139511 -> 138555 (-0.69%) helped: 588 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 1.63 x̃: 1 helped stats (rel) min: 0.21% max: 4.39% x̄: 0.84% x̃: 0.46% 95% mean confidence interval for instructions value: -1.70 -1.55 95% mean confidence interval for instructions %-change: -0.89% -0.78% Instructions are helped. total cycles in shared programs: 188549394 -> 188547676 (<.01%) cycles in affected programs: 3171960 -> 3170242 (-0.05%) helped: 527 HURT: 0 helped stats (abs) min: 2 max: 18 x̄: 3.26 x̃: 2 helped stats (rel) min: <.01% max: 0.80% x̄: 0.08% x̃: 0.06% 95% mean confidence interval for cycles value: -3.49 -3.03 95% mean confidence interval for cycles %-change: -0.09% -0.07% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-28 17:16:06 -07:00
Eric Anholt	8fd8964302	nir: Fix lowering of bitfield_insert to shifts. The bfi/bfm behavior change replaced the bfi/bfm usage in lower_bitfield_insert_to_shifts with actual shifts like the name says, but it failed to handle the offset=0, bits==32 case in the new lowering. v2: Use 31 < bits instead of bits == 32, to get the 31 < (iand bits, 31) -> false optimization. Fixes regressions in dEQP-GLES31.bitfield_insert on freedreno. Fixes: `165b7f3a44` ("nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec.") Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>	2019-06-28 16:38:23 -07:00
Dylan Baker	97c2c4546c	Revert "meson: Add support for using cmake for finding LLVM" This reverts commit `5157a42765`. There is a meson bug that causes llvm to always be statically linked, which is obviously not what we want. I haven't had time to look into it yet, but for now let's just revert it.	2019-06-28 16:36:38 -07:00
Dylan Baker	69f9fbab8a	Revert "meson: try to use cmake as a finder for clang" This reverts commit `0ba0c0c15c`.	2019-06-28 16:36:27 -07:00
Eric Engestrom	78aa4a3c0a	mesa: stop trying new filenames if the filename existing is not the issue Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-28 23:37:49 +01:00
Eric Engestrom	d02d2b626b	mesa: use os_file_create_unique() Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-28 23:37:49 +01:00
Eric Engestrom	1b259f1ae7	util: add os_file_create_unique() Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-28 23:37:49 +01:00
Alyssa Rosenzweig	9de4325b27	panfrost: Disable DXT-style texture compression Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-28 15:10:05 -07:00
Alyssa Rosenzweig	e8ae998c1b	panfrost: Dump unknown formats before aborting Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-28 15:10:05 -07:00
Alyssa Rosenzweig	68a5b58fb9	panfrost/midgard: Fix 3D texture regression Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-28 15:10:05 -07:00
Alyssa Rosenzweig	601d4d3157	panfrost: Add some special formats Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-28 15:10:05 -07:00
Alyssa Rosenzweig	e32af4b5c3	panfrost/midgard: Implement integer sampler Turns out one of the magic bits in the texture instruction meant 'float'. Different magic bits mean int and uint then :) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-28 15:09:07 -07:00
Alyssa Rosenzweig	7d30000628	panfrost: Remove dubious assert We already can support texture formats with bpp > 4, so.. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-28 15:09:07 -07:00
Alyssa Rosenzweig	7f5481258c	panfrost: Implement primitive restart For GLES3, just pass the flag through. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-28 15:09:07 -07:00
Anuj Phogat	804d1bd111	i965/icl: Apply WA_1606682166 to compute workloads We missed the workaround for compute workloads in earlier patches. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-28 14:02:13 -07:00
Anuj Phogat	d96cba7754	Revert "iris/icl: Add WA_2204188704 to disable pixel shader panic dispatch" SLICE_COMMON_CHICKEN3 is a privileged register not accesible from userspace. This patch silences a simulator warning about it. We don't need to add this workaround in linux kernel as the WA description says it's fixed on latest stepping. This reverts commit `9c421d6b47`. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-28 14:02:13 -07:00
Anuj Phogat	387e43b52f	Revert "anv/icl: Add WA_2204188704 to disable pixel shader panic dispatch" SLICE_COMMON_CHICKEN3 is a privileged register not accesible from userspace. This patch silences a simulator warning about it. We don't need to add this workaround in linux kernel as the WA description says it's fixed on latest stepping. This reverts commit `2be60e0c73`. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-28 14:02:13 -07:00
Anuj Phogat	7746d4edef	Revert "i965/icl: Add WA_2204188704 to disable pixel shader panic dispatch" SLICE_COMMON_CHICKEN3 is a privileged register not accesible from userspace. This patch silences a simulator warning about it. We don't need to add this workaround in linux kernel as the WA description says it's fixed on latest stepping. This reverts commit `85ecd14ef6`. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-28 14:02:13 -07:00
Anuj Phogat	db093d028c	i965/icl: Fix WA_1606682166 An earlier change was setting the SamplerCount = 0 for Gen 11 under #if GEN_GEN < 7. This commit fixes the problem. This WA has also been added to the linux kernel. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-28 14:02:13 -07:00
Rob Clark	9753d7381c	freedreno/ir3: small cleanup `target` cannot be NULL here. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-06-28 13:02:59 -07:00
Rob Clark	016a9ab2f9	freedreno/ir3: fix missing (ss) in dummy bary.f case In case we need to insert a dummy bary.f for the (ei) flag, it also needs (ss) so we don't release varying storage to the next VS wave before the ldlv completed. Fixes random failures in: dEQP-GLES3.functional.transform_feedback.random.interleaved.lines.* Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-28 13:02:59 -07:00
Rob Clark	21beddd3bc	freedreno/a6xx: wire up dither state Fixes: dEQP-GLES2.functional.fbo.render.recreate_colorbuffer.rebind_rbo_rgba4 dEQP-GLES2.functional.fbo.render.recreate_colorbuffer.no_rebind_rbo_rgba4 dEQP-GLES2.functional.fbo.render.recreate_colorbuffer.no_rebind_rbo_rgba4_stencil_index8 dEQP-GLES2.functional.fbo.render.recreate_depthbuffer.rebind_rbo_rgba4_depth_component16 dEQP-GLES2.functional.fbo.render.recreate_depthbuffer.no_rebind_rbo_rgba4_depth_component16 dEQP-GLES2.functional.fbo.render.recreate_stencilbuffer.rebind_rbo_rgba4_stencil_index8 dEQP-GLES2.functional.fbo.render.recreate_stencilbuffer.no_rebind_rbo_rgba4_stencil_index8 Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-28 13:02:59 -07:00
Arfrever Frehtes Taifersar Arahesis	b120a02b21	meson: Improve detection of Python when using Meson >=0.50. Previously, on systems where multiple versions of Python 3 (e.g. 3.6 and 3.7) are installed, wrong version of Python 3 could have been used. The proper fix requires availability of path() method in Meson's python module, which has been added in Meson 0.50: https://github.com/mesonbuild/meson/pull/4616 Distro Bug: https://bugs.gentoo.org/671308 Signed-off-by: Arfrever Frehtes Taifersar Arahesis <Arfrever@Apache.Org> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> v2: - Add missing `endif` keyword (Dylan)	2019-06-28 12:51:21 -07:00
Pierre-Eric Pelloux-Prayer	c81c784a4a	radeon/uvd: fix calc_ctx_size_h265_main10 Left shift was applied twice. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110702 Reviewed-by: Leo Liu <leo.liu@amd.com> Tested-by: <irherder@gmail.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com> Cc: <mesa-stable@lists.freedesktop.org>	2019-06-28 15:44:48 -04:00
Pierre-Eric Pelloux-Prayer	1f7d8f9786	mesa: add display list support for gl(Compressed)TextureSubImage2DEXT Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-28 15:41:35 -04:00
Pierre-Eric Pelloux-Prayer	360ef82765	mesa: add glTextureParameteri/iv/f/fvEXT Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-28 15:41:34 -04:00
Pierre-Eric Pelloux-Prayer	29194648a6	mesa: extend _mesa_lookup_or_create_texture to support EXT_dsa Adds a boolean to implement EXT_dsa specifics. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-28 15:41:32 -04:00
Pierre-Eric Pelloux-Prayer	274104ec38	mesa: refactor bind_texture Splits texture lookup and binding actions. The new _mesa_lookup_or_create_texture will be useful to implement the EXT_direct_state_access extension. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-28 15:41:30 -04:00
Pierre-Eric Pelloux-Prayer	6535964fdf	mesa: extract helper function for glTexParameter* Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-28 15:41:29 -04:00
Pierre-Eric Pelloux-Prayer	32eefb7451	mesa: add buffer != 0 checks to glNamedBufferEXT functions The EXT_direct_state_access spec says: INVALID_OPERATION is generated by GetNamedBufferParameterivEXT, GetNamedBufferPointervEXT, GetNamedBufferSubDataEXT, MapNamedBufferEXT, NamedBufferDataEXT, NamedBufferSubDataEXT, and UnmapNamedBufferEXT if the buffer parameter is zero. This commits adds buffer != 0 validation to the implemented functions. glNamedBufferStorageEXT isn't included in this list and the EXT_buffer_storage doesn't says that buffer = 0 is an error either so I didn't add the same validation for this function. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-28 15:41:26 -04:00
Marek Olšák	0de2754aa7	mesa: fix a typo in map_named_buffer_range Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-28 15:41:25 -04:00
Timothy Arceri	9c53a2ecb7	mesa: add support for glMapNamedBufferEXT() Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-28 15:41:24 -04:00
Timothy Arceri	76e25edf6a	mesa: add support for glUnmapNamedBufferEXT() Since the ARB DSA function glUnmapNamedBuffer() is only exposed for 3.1 or above we make glUnmapNamedBuffer() an alias of glUnmapNamedBufferEXT() rather than the other way around. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-28 15:41:21 -04:00
Timothy Arceri	b5f930ea05	mesa: add support for glCompressedTextureSubImage2DEXT() Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-28 15:41:20 -04:00
Timothy Arceri	b82b3d28d3	mesa: add support for glTextureSubImage2DEXT() Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-28 15:41:19 -04:00
Timothy Arceri	cb0f25a926	mesa: add support for glMapNamedBufferRangeEXT() Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-28 15:41:16 -04:00
Timothy Arceri	eec5c01b5e	mesa: add support for glNamedBufferStorageEXT This is available in ARB_buffer_storage when EXT_direct_state_access is present. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-28 15:41:14 -04:00
Timothy Arceri	83ed9485b7	mesa: add support for glNamedBuffer*DataEXT() Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-28 15:41:12 -04:00
Timothy Arceri	0972b0b059	mesa: add support for glBindMultiTextureEXT Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-28 15:40:54 -04:00
Pierre-Eric Pelloux-Prayer	c37f03d464	mesa: delete framebuffer texture attachment sampler views When a context is destroyed the destroy_tex_sampler_cb makes sure that all the sampler views created by that context are destroyed. This is done by walking the ctx->Shared->TexObjects hash table. In a multiple context environment the texture can be deleted by a different context, so it will be removed from the TexObjects table and will prevent the above mechanism to work. This can result in an assertion in st_save_zombie_sampler_view because the sampler_view owns a reference to a destroyed context. This issue occurs in blender 2.80. This commit fixes this by explicitly releasing sampler_view created by the destroyed context for all texture attachments. Fixes: `593e36f956` (st/mesa: implement "zombie" sampler views (v2)) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110944 Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-28 15:29:08 -04:00
James Clarke	7389bf9761	meson: GNU/kFreeBSD has DRM/KMS and requires -D_GNU_SOURCE This is a regression from the old autotools build system. Acked-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Dylan Baker <dylan@pnwbakers.com>	2019-06-28 19:06:46 +00:00
Kenneth Graunke	65e0c4b64f	gallium/u_transfer_helper: Don't leak a reference to the resource. We pipe_resource_reference when handling transfers in map, we need to do a corresponding unreference in unmap. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2019-06-28 11:25:56 -07:00
Eric Engestrom	6227e6faee	meson: only add empty lines betwen active summary sections Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-06-28 19:15:18 +01:00
Eric Engestrom	5819bc0e5c	meson: bump required libdrm version to 2.4.81 `dbb4457d98` started using drmDevicesEqual(), which was introduced in libdrm 2.4.81 We could either copy the function locally, or bump the required version. Since the function is non-trivial and 2.4.81 is old enough already, I suggesting the latter. Fixes: `dbb4457d98` ("egl: add EGL_EXT_device_drm support") Cc: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-06-28 19:03:04 +01:00
Emil Velikov	4ec32413f3	ac: change ac_query_gpu_info() signature Currently libdrm_amdgpu provides a typedef of the various handles. While the goal was to make those opaque, it effectively became part of the API To the best of my knowledge there are two ways to have opaque handles: - "typedef void foo;" - rather messy IMHO - "stuct foo;" and use "struct foo " through the API In our case amdgpu_device_handle is used only internally, plus respective code is not used or applicable for r300 and r600. Hence we copied the typedef. Seemingly this will be a problem since libdrm_amdgpu wants to change the API, while not updating the code(?). Either way, we can safely s/amdgpU_device_handle/void */ and carry on. Cc: Michel Dänzer <michel@daenzer.net> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Marek Olšák <marek.olsak at amd.com>	2019-06-28 17:49:32 +01:00
Tomeu Vizoso	7c745f6148	panfrost: Only tag AFBC addresses when sampling Rendering to AFBC was broken, as the HW will complaint loudly if we pass a tagged pointer in bifrost_render_target. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Fixes: `3609b50a64` ("panfrost: Merge AFBC slab with BO backing") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-28 15:40:52 +02:00
Jose Fonseca	3573412981	gallivm: Improve lp_build_rcp_refine. Use the alternative more accurate expression from https://en.wikipedia.org/wiki/Division_algorithm#Newton%E2%80%93Raphson_division v2: Use lp_build_fmuladd as suggested by Roland Tested by enabling this code path, and running lp_test_arit. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-06-28 11:48:12 +01:00
Tomeu Vizoso	0ec8a292fb	panfrost/ci: Don't error out on RK3288 At the moment we don't have enough people to ensure that RK3288 is regression-free, so don't fail the CI in that case. For now we'll focus on not regressing on RK3399 and we can expand to other SoCs as more people join the effort. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Suggested-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-28 11:13:04 +02:00
Tomeu Vizoso	6a26d6f4d9	panfrost/ci: Don't print every kernel file As there's lots of them and Gitlab struggles rendering logs with so many lines. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-28 11:13:04 +02:00
Tomeu Vizoso	61b793dde4	panfrost/ci: Fix the image name These changes will make sure we get the right image from the container registry. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-28 11:13:04 +02:00
Tomeu Vizoso	0315350d9e	panfrost/ci: Remove batching Panfrost has grown and doesn't leak as much as before. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-28 11:13:04 +02:00
Kenneth Graunke	847ef8ee4f	iris: Don't leak resources in iris_create_surface for incomplete FBOs We were failing to pipe_resource_unreference on the failure path due to a non-renderable format. Instead of fixing this, just move the checks earlier, before we even bother with refcounting or calloc.	2019-06-28 01:13:11 -07:00
Samuel Pitoiset	ef1787dbc9	radv: only enable VK_AMD_gpu_shader_{half_float,int16} on GFX9+ These two extensions are supported on GFX8 but the throughput of 16-bit floats/integers is same as 32-bit. Also, shaderInt16 is only enabled on GFX9+ for the same reason, be more consistent. This fixes a crash with Wolfenstein II because it expects shaderInt16 to be enabled when VK_AMD_gpu_shader_half_float is exposed. Note that AMDVLK only enables these extensions on GFX9+. Cc: 19.1 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-28 08:40:44 +02:00
Samuel Pitoiset	5d6d29ed5d	radv: add si_emit_ia_multi_vgt_param() helper Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-28 08:40:42 +02:00
Alexandros Frantzis	7da90a7cc9	virgl: Don't allow creating staging pipe_resources Staging buffers are now created directly by the virgl_staging_mgr. We don't need to support creating staging pipe_resources. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-06-28 04:30:02 +00:00
Alexandros Frantzis	5388be039b	virgl: Use virgl_staging_mgr Use an instance of virgl_staging_mgr instead of u_upload_mgr to handle the staging buffer. This removes the need to track the availability of the staging manager, since virgl_staging_mgr can handle concurrent active allocations. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-06-28 04:30:02 +00:00
Alexandros Frantzis	790d1a0b17	virgl: Add tests for virgl_staging_mgr Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-06-28 04:30:02 +00:00
Alexandros Frantzis	55a58dfcfb	virgl: Introduce virgl_staging_mgr Add a manager for the staging buffer used in virgl. The staging manager is heavily inspired by u_upload_mgr, but is simpler and is a better fit for virgl's purposes. In particular, the staging manager: * Allows concurrent staging allocations. * Calls the virgl winsys directly to create and map resources, avoiding unnecessarily going through gallium resources and transfers. olv: make virgl_staging_alloc_buffer return a bool Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-06-28 04:30:02 +00:00
Alexandros Frantzis	6a03f25522	virgl: Store the virgl_hw_res for copy transfers Store the virgl_hw_res instead of the pipe_resource for copy transfer sources. This prepares the codebase for a change to provide only the virgl_hw_res for the staging buffers in upcoming commits. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-06-28 04:30:02 +00:00
Kenneth Graunke	bed305fb7a	iris: Fix major resource leak in iris_set_shader_images We were failing to unreference the old image resource. Instead of open coding this and doing it badly, just use the copier function which does the right thing.	2019-06-27 19:08:46 -07:00
Kenneth Graunke	255c71ec07	gallium: Make util_copy_image_view handle shader_access A while back, we added a new field, but failed to update the copier. I believe iris is the only current user of the new field, and it hasn't used the copier, so noone noticed. Fixes: `8b626a22b2` st/mesa: Record shader access qualifiers for images Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-27 19:06:19 -07:00
Kenneth Graunke	0d6fc6f07e	gallium: Teach GALLIUM_REFCNT_LOG about array textures Otherwise they are classified as pipe_martian_resource, and don't contain any helpful information about the texture. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-27 16:56:15 -07:00
Nanley Chery	02f6995d76	isl: Don't align phys_level0_sa by block dimension Aligning phys_level0_sa by the compression block dimension prior to mipmap layout causes the layout of compressed surfaces to differ from the sampler's expectations in certain cases. The hardware docs agree: From the BDW PRM, Vol. 5, Compressed Mipmap Layout, The compressed mipmaps are stored in a similar fashion to uncompressed mipmaps [...] The following exceptions apply to the layout of compressed (vs. uncompressed) mipmaps: * [...] * The dimensions of the mip maps are first determined by applying the sizing algorithm presented in Non-Power-of-Two Mipmaps above. Then, if necessary, they are padded out to compression block boundaries. The last bullet indicates that alignment should not be done for calculating a miplevel's dimensions, but rather for determining miplevel placement/padding. Comply with this text by removing the extra alignment. Fixes some fbo-generatemipmap-formats piglit failures on all tested platforms (SNB-KBL). v2: - Note fixed platforms. - Update some consumers via a helper function. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-27 23:38:38 +00:00
Nanley Chery	fb1350c76f	intel: Add and use helpers for level0 extent Prepare for a bug fix by adding and using helpers which convert isl_surf::logical_level0_px and isl_surf::phys_level0_sa to units of surface elements. v2: - Update iris (Ken). - Update anv. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-27 23:38:37 +00:00
Dylan Baker	0ba0c0c15c	meson: try to use cmake as a finder for clang Clang (like LLVM), very annoyingly refuses to provide pkg-config, and only provides cmake (unlike LLVM which at least provides llvm-config, even if llvm-config is terrible). Meson has gained the ability to use cmake to find dependencies, and can successfully find Clang. This change attempts to use cmake to find clang instead of a bunch of library searches, when paired with -Dcmake_prefix_path we can much more reliably use cmake to control which clang we're getting. This is only enabled for meson >= 0.51, which adds the required options. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-27 22:12:02 +00:00
Dylan Baker	5157a42765	meson: Add support for using cmake for finding LLVM Meson has support for using cmake as a finder for some dependencies, including LLVM. Using cmake has a lot of advantages: it needs less meson maintenance to keep working (even for llvm updates); it works more sanely for cross compiles (as llvm-config is a compiled binary not a shell script). Meson 0.51.0 also has a new generic variable getter that can be used to get information from either cmake, pkg-config, or config-tools dependencies, which is needed for cmake. We continue to support using llvm-config if you don't have cmake installed, or if cmake cannot find a suitable version. Fixes: `0d59459432` ("meson: Force the use of config-tool for llvm") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-27 22:12:02 +00:00
Kenneth Graunke	3d3685d354	iris: Fix memory leak of SO targets We need to pitch these on context destroy.	2019-06-27 14:59:39 -07:00
Kenneth Graunke	d65819f054	iris: Fix memory leak for draw parameter resources Need to pitch these on context destroy.	2019-06-27 14:59:39 -07:00
Kenneth Graunke	50eb1c1396	iris: Drop u_upload_unmap We use persistent maps so this does nothing.	2019-06-27 14:59:39 -07:00
Lionel Landwerlin	836225840c	intel/compiler: fix derivative on y axis implementation This rewrites the ddy in EXECUTE_4 mode with a loop to make it more obvious what is going on and also sets the group each of the 4 threads in the groups are supposed to execute. Fixes the following CTS tests : dEQP-VK.glsl.derivate.dfdyfine.dynamic_* Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Co-Authored-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Fixes: `2134ea3800` ("intel/compiler/fs: Implement ddy without using align16 for Gen11+")	2019-06-27 18:14:58 +00:00
Eric Engestrom	53f17c4efd	meson: set up a proper internal dependency for xmlconfig Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-06-27 17:42:25 +00:00
Eric Engestrom	ad0ee5bfa5	xmlconfig: add missing #include Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-06-27 17:42:25 +00:00
Eric Engestrom	069e6d587e	xmlpool: fix typo in comment s/otions/options/, and while here let's give the full path to xmlpool.h since `../` won't be true in the generated file. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-06-27 17:42:25 +00:00
Kenneth Graunke	d6683e118f	iris: Also properly restore INTERFACE_DESCRIPTOR_DATA buffer object We were at least cleaning up this reference, but we were failing to pin it in iris_restore_compute_saved_bos.	2019-06-27 08:12:22 -07:00
Kenneth Graunke	340df53d6a	iris: Fix resource tracking for CS thread ID buffer Today, we stream the compute shader thread IDs simply because they're (annoyingly) relative to dynamic state base address. We could upload them once at compile time, but we'd need a separate non-streaming uploader for IRIS_MEMZONE_DYNAMIC, and I'm not sure it's worth it. stream_state pins the buffer for use in the current batch, but also returns a reference to the pipe_resource. We dropped this reference on the floor, leaking a reference basically every time we dispatched a compute shader after switching to a new one. The reason it returns a reference is so that we can hold on to it and re-pin it in iris_restore_compute_saved_bos, which we were also failing to do. So if we actually filled up a batch with repeated dispatches to the same compute shader, and flushed, then continued dispatching, we would fail to pin it and likely GPU hang.	2019-06-27 08:12:22 -07:00
Kenneth Graunke	16d334951e	iris: Only bother with thread ID upload if doing MEDIA_CURBE_LOAD We were unconditionally uploading the new data, but then conditionally using it with MEDIA_CURBE_LOAD. If we're not going to emit the command, there's no point in uploading the data.	2019-06-27 08:12:22 -07:00
Kenneth Graunke	8f51f1ba6e	iris: Do MEDIA_CURBE_LOAD when IRIS_DIRTY_CS is set, not constants We only use push the compute shader thread IDs, not any actual constant buffer data. So we should track the compute shader variant changing, not constbuf changes.	2019-06-27 08:12:22 -07:00
Kenneth Graunke	85c72da1b1	iris: Drop UBO range stuff from iris_restore_compute_saved_bos Compute doesn't use UBO ranges (annoyingly), so this is dead code.	2019-06-27 08:12:22 -07:00
Kenneth Graunke	f94ebf0c9d	iris: Properly align interface descriptor data addresses MEDIA_INTERFACE_DESCRIPTOR's Interface Descriptor Data Start Address field's docs say: "This bit specifies the 64-byte aligned address..." And we were doing 32. Superfluous thread ID uploading was apparently saving us from GPU hangs in most cases.	2019-06-27 08:12:22 -07:00
Andrii Simiklit	62c6059584	mesa: use a correct function return type v2: standard 'bool' can be used ( Eric Engestrom <eric.engestrom@intel.com> ) Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>	2019-06-27 07:53:41 +00:00
Tomeu Vizoso	9bef1f1ff1	panfrost/decode: Mention the address of a few descriptors When the fault_pointer field in the header is set, we can get some idea of which descriptor the HW isn't happy with if we know their addresses. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-27 09:13:48 +02:00
Tomeu Vizoso	de02fb19ed	panfrost/decode: Wait for a job to finish before dumping Then we can get some information back about any exception that might have happened. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-27 09:13:42 +02:00
Tomeu Vizoso	fa36c194fd	panfrost/decode: Decode exception status Arm's kernel driver mentions how to decode this field, which makes a bit clearer what had happened. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-27 09:13:35 +02:00
Tomeu Vizoso	b26c2b4840	panfrost/decode: Print AFBC struct when appropriate Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-27 09:12:56 +02:00
Samuel Pitoiset	d5004f60be	radv: only export clip/cull distances if PS reads them The only exception is the GS copy shader which emits them unconditionally. Totals from affected shaders: SGPRS: 71320 -> 71008 (-0.44 %) VGPRS: 54372 -> 54240 (-0.24 %) Code Size: 2952628 -> 2941368 (-0.38 %) bytes Max Waves: 9689 -> 9723 (0.35 %) This helps Dota2, Doom, GTAV and Hitman 2. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-27 08:56:37 +02:00
Samuel Pitoiset	1e9ccc5429	radv: fix FMASK expand if layerCount is VK_REMAINING_ARRAY_LAYERS This doesn't fix anything known, but it's likely going to break if layerCount is ~0U. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-27 08:56:34 +02:00
Kenneth Graunke	8551dc17a7	iris: Disable loop unrolling in GLSL IR. Leave it to NIR instead, like i965 does. Thanks to Tim Arceri for noticing that I'd left this enabled by accident. shader-db results on Skylake: total instructions in shared programs: 15522628 -> 15521642 (<.01%) instructions in affected programs: 94008 -> 93022 (-1.05%) helped: 34 HURT: 33 helped stats (abs) min: 12 max: 48 x̄: 33.82 x̃: 42 helped stats (rel) min: 0.06% max: 22.14% x̄: 9.86% x̃: 10.89% HURT stats (abs) min: 1 max: 16 x̄: 4.97 x̃: 3t HURT stats (rel) min: 0.82% max: 3.77% x̄: 1.73% x̃: 1.53% 95% mean confidence interval for instructions value: -20.08 -9.35 95% mean confidence interval for instructions %-change: -5.95% -2.36% Instructions are helped. total cycles in shared programs: 367105221 -> 367074230 (<.01%) cycles in affected programs: 10017660 -> 9986669 (-0.31%) helped: 266 HURT: 184 helped stats (abs) min: 1 max: 9556 x̄: 151.35 x̃: 12 helped stats (rel) min: 0.08% max: 59.91% x̄: 4.66% x̃: 1.67% HURT stats (abs) min: 1 max: 1716 x̄: 50.37 x̃: 6 HURT stats (rel) min: <.01% max: 24.40% x̄: 2.42% x̃: 0.85% 95% mean confidence interval for cycles value: -133.90 -3.84 95% mean confidence interval for cycles %-change: -2.44% -1.10% Cycles are helped. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-26 22:55:03 -07:00
Kenneth Graunke	acadeaff6a	st/mesa: Set EmitNoIndirectSampler if GLSLVersion < 400. This patch changes the code which sets EmitNoIndirectSampler to check the core profile GLSL version, rather than the ARB_gpu_shader5 extension enable. st/mesa exposes ARB_gpu_shader5 if GLSLVersion (in core profiles) or GLSLVersionCompat (in compat profiles) >= 400. The Intel drivers do not currently expose ARB_gpu_shader5 in compat profiles. But the backend can absolutely handle indirect samplers. Looking at the core profile version number should be a good indication of what the driver supports. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-26 22:54:52 -07:00
Kenneth Graunke	116144d65e	iris: Delete dead ice->state.streamout_strides field. Nothing uses this, it must be a remnant from an earlier approach.	2019-06-26 20:17:22 -07:00
Caio Marcelo de Oliveira Filho	085c0f1f13	nir/algebraic: Add helpers and a rule involving wrapping The helpers are needed so we can use the syntax `instr(cond)` in the algebraic rules. Add simple rule for dropping a pair of mul-div of the same value when wrapping is guaranteed to not happen. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-26 14:13:02 -07:00
Caio Marcelo de Oliveira Filho	5a143965b8	spirv: Implement NoSignedWrap and NoUnsignedWrap decorations When handling the specified ALU operations, check for the decorations and set nir_alu_instr no_signed_wrap and no_unsigned_wrap flags accordingly. v2: Add a glsl_base_type_is_unsigned_integer() helper. (Karol) v3: Rename helper to glsl_base_type_is_uint(). v4: Use two flags, so we don't need the helper anymore. (Connor) v5: Pass alu directly to handle function. (Jason) Reviewed-by: Karol Herbst <kherbst@redhat.com> [v3] Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-26 14:13:02 -07:00
Caio Marcelo de Oliveira Filho	ae37237713	nir: Add a no wrapping bits to nir_alu_instr They indicate the operation does not cause overflow or underflow. This is motivated by SPIR-V decorations NoSignedWrap and NoUnsignedWrap. Change the storage of `exact` to be a single bit, so they pack together. v2: Handle no_wrap in nir_instr_set. (Karol) v3: Use two separate flags, since the NIR SSA values and certain instructions are typeless, so just no_wrap would be insufficient to know which one was referred to. (Connor) v4: Don't use nir_instr_set to propagate the flags, unlike `exact`, consider the instructions different if the flags have different values. Fix hashing/comparing. (Jason) Reviewed-by: Karol Herbst <kherbst@redhat.com> [v1] Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-26 14:13:02 -07:00
Dylan Baker	f97dcb7a55	docs: add news item and link release notes for 19.0.8 This is an emergency release due to a critical bug.	2019-06-26 13:48:06 -07:00
Dylan Baker	290495a431	docs: Add mesa 19.0.8 sha256 sums	2019-06-26 13:46:30 -07:00
Dylan Baker	10a24925a0	docs: Add docs for 19.0.8	2019-06-26 13:46:29 -07:00
Jonathan Marek	a70ff70158	nir: remove fnot/fxor/fand/for opcodes There doesn't seem to be any reason to keep these opcodes around: * fnot/fxor are not used at all. * fand/for are only used in lower_alu_to_scalar, but easily replaced Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-06-26 15:26:10 -04:00
Jonathan Marek	0b5a483baa	nir: opt_vectorize: combine different constant sources We can vectorize instructions with different constant sources by creating a new load_const and using that. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-26 14:56:28 -04:00
Alyssa Rosenzweig	10688257bd	panfrost/midgard: Merge embedded constants In Midgard, a bundle consists of a few ALU instructions. Within the bundle, there is room for an optional 128-bit constant; this constant is shared across all instructions in the bundle. Unfortunately, many instructions want a 128-bit constant all to themselves (how selfish!). If we run out of space for constants in a bundle, the bundle has to be broken up, incurring a performance and space penalty. As an optimization, the scheduler now analyzes the constants coming in per-instruction and attempts to merge shared components, adjusting the swizzle accessing the bundle's constants appropriately. Concretely, given the GLSL: (a * vec4(1.5, 0.5, 0.5, 1.0)) + vec4(1.0, 2.3, 2.3, 0.5) instead of compiling to the naive two bundles: vmul.fmul [temp], [a], r26 fconstants 1.5, 0.5, 0.5, 1.0 vadd.fadd [out], [temp], r26 fconstants 1.0, 2.3, 2.3, 0.5 The scheduler can now fuse into a single (pipelined!) bundle: vmul.fmul [temp], [a], r26.xyyz vadd.fadd [out], [temp], r26.zwwy fconstants 1.5, 0.5, 1.0, 2.3 Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-26 10:01:36 -07:00
Alyssa Rosenzweig	a0a34946d8	panfrost/midgard: Share swizzle compose Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-26 10:01:36 -07:00
Alyssa Rosenzweig	f6fde45d5c	panfrost/midgard: Share swizzle/mask code Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-26 10:01:36 -07:00
Alyssa Rosenzweig	0979ea9de8	panfrost: Fix checksumming typo Fixes: `3e6c6bb0` ("panfrost: Merge checksum buffer with main BO") Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-26 09:58:30 -07:00
Kenneth Graunke	ab009b7d6e	iris: Fix overzealous query object batch flushing. In the past, each query object had their own BO. Checking if the batch referenced that BO was an easy way to check if commands were still queued to compute the query value. If so, we needed to flush. More recently (`c24a574e6c`), we started using an u_upload_mgr for query objects, placing multiple queries in the same BO. One side-effect is that iris_batch_references is a no longer a reasonable way to check if commands are still queued for our query. Ours might be done, but a later query that happens to be in the same BO might be queued. We don't want to flush in that case. Instead, check if the current batch's signalling syncpt is the one we referenced when ending the query. We know the syncpt can't have been reused because our query is holding a reference, so a simple pointer comparison should suffice. Removes all batch flushing caused by query objects in Shadow of Mordor. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2019-06-26 09:49:01 -07:00
Kenneth Graunke	db878a728c	iris: Make an iris_batch_get_signal_syncpt() helper. This returns a pointer to the signalling syncpt, without incrementing the reference count. This can be useful for comparisons. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2019-06-26 09:49:01 -07:00
Boris Brezillon	443e530194	panfrost: Remove unneeded check in panfrost_scissor_culls_everything() The ss local var is guaranteed to be != NULL. Get rid of this useless check. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-26 09:35:25 -07:00
Alyssa Rosenzweig	d4575c3071	panfrost: Update copyright identifiers "Collabora, Ltd." should be listed in lieu of simply "Collabora" Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Suggested-by: Daniel Stone <daniels@collabora.com>	2019-06-26 09:10:51 -07:00
Alyssa Rosenzweig	b0e8941df1	panfrost/midgard: Reorder to permit constant bias Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-26 09:08:37 -07:00
Alyssa Rosenzweig	213b62810d	panfrost/midgard: Add helper to encode constant bias Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-26 09:08:37 -07:00
Alyssa Rosenzweig	b51727ea28	panfrost/midgard: Handle negative immediate bias Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-26 09:08:37 -07:00
Rob Clark	1833827eac	freedreno: correct batch_depends_on() logic Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-06-26 08:43:02 -07:00
Rob Clark	2b10bb6e5e	freedreno: drop unused arg from fd_batch_flush() The `force` arg has been unused for a while.. but apparently I forgot to garbage collect it. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-06-26 08:43:02 -07:00
Timothy Arceri	5f809e2707	st/glsl: fix silly regression finding gs/tes variants Fixes: `d19fe5e67a` ("st/glsl: support clamping color outputs in compat for gs/tes") Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-06-26 23:13:02 +10:00
Timothy Arceri	d19fe5e67a	st/glsl: support clamping color outputs in compat for gs/tes This support requires the driver to be a NIR driver as we use the NIR lowering pass to do the clamping. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-26 00:36:48 +00:00
Timothy Arceri	f5f31612d3	nir: add tess support to nir_lower_clamp_color_outputs() This will be used to add compat profile support for higher GL versions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-26 00:36:48 +00:00
Sagar Ghuge	06807e1948	glsl: Fix round64 conversion function Fix round64 function to handle round to nearest even cases specially with positive and negative numbers with fraction part 0.5. v2: 1) Simplify unused bits (Elie Tournier) Fixes: KHR-GL45.gpu_shader_fp64.builtin.round_dvec2 KHR-GL45.gpu_shader_fp64.builtin.round_dvec3 KHR-GL45.gpu_shader_fp64.builtin.round_dvec4 KHR-GL45.gpu_shader_fp64.builtin.roundeven_double KHR-GL45.gpu_shader_fp64.builtin.roundeven_dvec2 KHR-GL45.gpu_shader_fp64.builtin.roundeven_dvec3 KHR-GL45.gpu_shader_fp64.builtin.roundeven_dvec4 Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Elie Tournier <elie.tournier@collabora.com> Acked-by: Anuj Phogat <anuj.phogat@gmail.com>	2019-06-25 15:19:10 -07:00
Alyssa Rosenzweig	e8f4c9f56c	panfrost/ci: Add RK3288 flipflops I don't want to deal with right now Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:42:58 -07:00
Alyssa Rosenzweig	70a87a915d	panfrost/ci: Update failures list A ton of tests were fixed by this series. A few were incorrectly passing before (QualityError, for instance) and now are explicitly failing. A few legitimate regressions but overwhelmingly positive. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:18 -07:00
Alyssa Rosenzweig	ddf5f04edf	panfrost/ci: Set MESA_GLES_VERSION_OVERRIDE=3.0 Fixes cube map tests due to disagreements between Mesa, dEQP, and the spec... Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-06-25 13:39:18 -07:00
Alyssa Rosenzweig	33f3cac1c2	panfrost/ci: Run full set of mipmap tests Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:18 -07:00
Alyssa Rosenzweig	f34635c699	panfrost: Advertise support for other 8-bit UNORM formats Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:18 -07:00
Alyssa Rosenzweig	310ca6ba40	panfrost: Use pipe_surface->format directly in blitter Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:18 -07:00
Alyssa Rosenzweig	5cfb4248c6	panfrost: Invert swizzle for rendering Fixes rendering to e.g. alpha textures. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:18 -07:00
Alyssa Rosenzweig	b96f119d85	panfrost: Honour first_layer...last_layer when sampling Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:18 -07:00
Alyssa Rosenzweig	0ad17f56ae	panfrost: Use the sampler_view target (not the textures) u_blitter gets "special treatment" and uses this mechanism to cast cube maps to 2D textures in order to texelFetch them. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:18 -07:00
Alyssa Rosenzweig	faf8ad4875	panfrost/midgard: Assert guard texelFetch against cubemaps Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:18 -07:00
Alyssa Rosenzweig	124f6b541b	panfrost: Zero pixels in any axis is zero pixels total Multiplication, not addition, so switch the logic operator. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig	06211f45a7	panfrost: Respect mip level when wallpapering Fixes DATA_INVALID_FAULT raised when wallpapering while rendering to a mipmap. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig	6729912a4b	panfrost/midgard: Fixup NIR texture op In a vertex shader, a tex op should map to txl, as there must be a LOD given to the hardware (implicitly or explicitly). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig	17adcfc008	panfrost: Support (non-)seamless cube maps Identify the seamless cubemap bit and passthrough the Gallium state rather than setting unconditionally. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig	3e6c6bb0af	panfrost: Merge checksum buffer with main BO This is similar to the AFBC merge; now all (non-imported) buffers use a common backing buffer. Reenables checksumming, eliminating a performance regression. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig	a9fc1c8399	panfrost/decode: Limit MRT blend count I thought I already fixed this. Maybe that was a dream...? Then again, I might be dreaming now. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig	65e9d9b625	panfrost: Clamp tile coordinates before job submission Fixes TILE_RANGE_FAULT raised on some tests in dEQP-GLES3.functional.fbo.blit.* Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig	7005c0d83b	panfrost: Use dedicated u_blitter context for wallpapers The main ctx->blitter instance should be reserved for blits originated from Gallium (like mipmap generation). Since wallpapering is conceptually different -- wallpaper blits can be triggered by Gallium blits -- the blitter pipes must be separate to avoid potential u_blitter recursion. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig	64b7bd3f90	panfrost: Sanity check layer It doesn't make sense to try to render to multiple array elements at once. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig	eb3c09716b	panfrost: Divide array_size by 6 for cubemaps Addresses the disparity between Mali and Gallium definitions of array_size. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig	65bc56b568	panfrost: Use get_texture_address for framebuffer computations Allows for sharing some code as well as theoretically allowing cubemap rendering. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig	3609b50a64	panfrost: Merge AFBC slab with BO backing Rather than tracking AFBC memory "specially", just use the same codepath as linear and tiled. Less things to mess up, I figure. This allows us to use the standard setup_slices() call with AFBC resources, allowing mipmapped AFBC resources. Unfortunately, we do have to disable AFBC (and checksumming) in the meantime to avoid functional regressions, as we don't know _a priori_ if we'll need to access a resource from software (which is not yet hooked up with AFBC) and we don't yet have routines to switch the layout of a BO at runtime. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig	aea3f0ac1d	panfrost: Z/S can't be tiled As far as we know, Utgard-style tiling only works for color render targets, not depth/stencil, so ensure we don't try to tile it (rather than compress or plain old linear) and drive ourselves into a corner. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig	ad56dd4e97	panfrost: Enable mipmapping Now the autogeneration of mipmaps is working (via u_blitter), we can finally enable mipmaps! Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig	5aeffa9517	panfrost: Enable blitting Now that all the prerequisites breaking u_blitter are fixed, we can finally hook up panfrost_blit. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig	06d192c742	panfrost: Allow texelFetch for wallpaper blits We just implemented the routine; we may as well use it. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig	f4bb7f096c	panfrost/midgard: Implement texelFetch (2D only) txf instructions can result from blits, so handle them rather than crash. Only works for 2D textures (not even 2D array texture) due to a register allocation constraint that may not be sorted for a while. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig	4ac42f2b38	panfrost: Skip flushes only for wallpapers, not any blit We need the flush from u_blitter for a normal blit (e.g. for mipmaps); it's only wallpaper-related blits that are special-cased. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig	ffcc4d1c4e	panfrost: Handle generate_mipmap ourselves To avoid interference with the wallpaper code, we need to do some state tracking when generating mipmaps. In particular, we need to mark the generated layers as invalid before generating the mipmap, so we don't try to backblit them if they already had content. Likewise, we need to flush both before and after generating a mipmap since our usual set_framebuffer_state flushing isn't quite there yet. Ideally better optimizations would save the flush but I digress. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:17 -07:00
Alyssa Rosenzweig	f57dfe4cdd	panfrost: Disable mipmapping if necessary If a mipfilter is not set, it's legal to have an incomplete mipmap; we should handle this accordingly. An "easy way out" is to rig the LOD clamps. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-25 13:39:17 -07:00
Kenneth Graunke	748e5dac72	intel/blorp: Disable sampler state prefetching on Gen11 Sampler state prefetching is broken on Gen11, and WA_160668216 says to disable it. Apparently sampler state prefetching also has basically zero impact on performance, so we don't need to worry there. i965, anv, and iris already handle this correctly, but we missed BLORP. Ideally the kernel should globally disable this by writing SARCHKMD, at which point we wouldn't have to worry about it. But let's be defensive and handle it ourselves too. v2: separate out from BTP workaround in case we change that eventually Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> [v1]	2019-06-25 13:29:31 -07:00
Jason Ekstrand	0a364a4a74	anv/descriptor_set: Only write texture swizzles if we have an image view When immutable samplers are set we call write_image_view with a NULL image view. This causes issues on IVB where we have to fake texture swizzling. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110999 Fixes: `d2aa65eb18` "anv: Emulate texture swizzle in the shader when..."	2019-06-25 19:43:25 +00:00
Chia-I Wu	74786b3aa3	virgl: add VIRGL_DEBUG_XFER When set, do as requested and skip any transfer optimization. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-By: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-06-25 12:01:45 -07:00
Chia-I Wu	e93d918b65	virgl: add VIRGL_DEBUG_SYNC When set, wait after every each flush. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-By: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-06-25 12:01:43 -07:00
Chia-I Wu	119b5701e1	virgl: fix the value of VIRGL_DEBUG_BGRA_DEST_SWIZZLE VIRGL_DEBUG_BGRA_DEST_SWIZZLE should use bit 3. Make some cosmetic changes as well. Fixes: `a478e56fbd` virgl: Add debug flag to bypass driconf to enable the BGRA tweaks Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-By: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-06-25 12:01:14 -07:00
Samuel Pitoiset	8ea7ee1536	radv: rename and re-document cache flush flags SMEM and VMEM caches are L0 on gfx10. Ported from RadeonSI. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-25 18:38:37 +02:00
Samuel Pitoiset	5411f47056	radv: set DISABLE_CONSTANT_ENCODE_REG to 1 for Raven2 Ported from RadeonSI, will be emitted for GFX10 too. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-25 16:45:15 +02:00
Samuel Pitoiset	34bef8a0d7	radv: clear CMASK layers instead of the whole buffer on GFX8 This reduces the size of fill operations needed to clear CMASK for layered color textures. GFX9 unsupported for now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-25 16:36:28 +02:00
Samuel Pitoiset	476b907a3b	radv: clear FMASK layers instead of the whole buffer on GFX8 This reduces the size of fill operations needed to clear FMASK for layered color textures. GFX9 unsupported for now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-25 16:36:25 +02:00
Samuel Pitoiset	a5ba386b3f	radv: always initialize levels without DCC as fully expanded This fixes a rendering issue with RoTR/DXVK. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-25 16:36:23 +02:00
Sergii Romantsov	1931c97a1d	i965: leaking of upload-BO with push constants In case of any enabled VS members from: uses_firstvertex, uses_baseinstance, uses_drawid, uses_is_indexed_draw leaks may happens. Call gen6_upload_push_constants allocates stage_stat->push_const_bo. It than takes pointer from push_const_bo to draw_params_bo (in the call brw_prepare_shader_draw_parameters by brw_upload_data) and do reference which finally haven't got unreferenced. Fixes leak: 136 bytes in 1 blocks are definitely lost in loss record 6 of 13 at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) by 0xC2B64B7: bo_alloc_internal (brw_bufmgr.c:596) by 0xC2B6748: brw_bo_alloc (brw_bufmgr.c:672) by 0xC314BB3: brw_upload_space (intel_upload.c:88) by 0xC2EBBC5: gen6_upload_push_constants (gen6_constant_state.c:155) by 0xC9E4FA6: gen9_upload_vs_push_constants (genX_state_upload.c:3300) by 0xC2E0EDA: check_and_emit_atom (brw_state_upload.c:540) by 0xC2E0EDA: brw_upload_pipeline_state (brw_state_upload.c:659) by 0xC2E0FF1: brw_upload_render_state (brw_state_upload.c:681) by 0xC2C5D2D: brw_draw_single_prim (brw_draw.c:1052) by 0xC2C62CB: brw_draw_prims (brw_draw.c:1175) by 0xC488AD1: vbo_exec_vtx_flush (vbo_exec_draw.c:386) by 0xC485270: vbo_exec_FlushVertices_internal (vbo_exec_api.c:652) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reported-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com> Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>	2019-06-25 12:26:25 +00:00
Juan A. Suarez Romero	81d28c69ea	docs: update calendar, add news item and link release notes for X.Y.Z Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2019-06-25 13:02:37 +02:00
Juan A. Suarez Romero	2c06071521	docs: fix some typos in 19.0.7 release notes Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2019-06-25 13:01:56 +02:00
Juan A. Suarez Romero	4a2b502a6b	docs: add sha256 checksums for 19.1.1 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `d54dc24d6d`)	2019-06-25 12:56:49 +02:00
Juan A. Suarez Romero	5f7c66676f	docs: add release notes for 19.1.1 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `22eddd8b9d`)	2019-06-25 12:56:46 +02:00
Tapani Pälli	7a6e5a4bc3	intel/compiler: silence a warning of using different enum type Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-25 10:09:22 +03:00
Eric Engestrom	e9286eb60b	egl: replace dead vfunc with an error st/egl used to support eglCreatePbufferFromClientBuffer, but now that it's gone, any call to it would segfault. Let's return a nice error instead. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-06-25 07:47:19 +01:00
Eric Engestrom	eeacd66324	egl: delete unused vfuncs Nobody ever uses these, so let's just hard code them instead. If an EGL driver ever comes around that needs them they're trivial to re-add. Suggested-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-06-25 07:47:19 +01:00
Eric Engestrom	83f01f5261	egl: drop empty eglfallbacks.c Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-06-25 06:36:54 +00:00
Eric Engestrom	757d2fb48d	egl: move eglGetSyncAttrib() fallback to eglapi.c Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-06-25 06:36:54 +00:00
Eric Engestrom	26d5ca44ba	egl: move eglSwapInterval() fallback to eglapi.c Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-06-25 06:36:54 +00:00
Eric Engestrom	9dc00c8433	egl: move eglSurfaceAttrib() fallback to eglapi.c Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-06-25 06:36:54 +00:00
Eric Engestrom	58be9d50a7	egl: move eglQuerySurface() fallback to eglapi.c Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-06-25 06:36:54 +00:00
Eric Engestrom	b792b3ebd7	egl: move eglQueryContext() fallback to eglapi.c Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-06-25 06:36:54 +00:00
Eric Engestrom	7f848f9713	egl: move eglGetConfigAttrib() fallback to eglapi.c Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-06-25 06:36:54 +00:00
Eric Engestrom	1b76cca40f	egl: move eglChooseConfig() fallback to eglapi.c Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-06-25 06:36:53 +00:00
Eric Engestrom	b883d7f567	egl: move eglGetConfigs() fallback to eglapi.c Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-06-25 06:36:53 +00:00
Rob Clark	927fb50727	freedreno/a5xx: fix batch leak in fd5 blitter path Fixes: `3d198926a4` freedreno: use fd_bc_alloc_batch instead of fd_batch_create. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-06-24 18:43:20 -07:00
Marek Olšák	4a1421aa26	radeonsi: don't set spi_ps_input_* for monolithic shaders The driver doesn't use these values and ac_rtld has assertions expecting the value of 0. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-24 21:04:10 -04:00
Marek Olšák	1d6e358c36	radeonsi: rename and re-document cache flush flags SMEM and VMEM caches are L0 on gfx10. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-24 21:04:10 -04:00
Marek Olšák	aa8d6e0507	radeonsi: fix AMD_DEBUG=nofmask Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-24 21:04:10 -04:00
Marek Olšák	f46efacd01	radeonsi: flatten the switch for DPBB tunables Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2019-06-24 21:04:10 -04:00
Marek Olšák	ac4b1e2f0a	radeonsi: set the calling convention for inlined function calls otherwise the behavior is undefined Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2019-06-24 21:04:10 -04:00
Nicolai Hähnle	610e1a81f7	radeonsi: refactor si_update_vgt_shader_config We'll have to extend this at some point, and using a bitfield union in this way makes it easier to get the right index without excessive branching. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-24 21:04:10 -04:00
Nicolai Hähnle	bd3a3fd25a	amd/rtld: update the ELF representation of LDS symbols The initial prototype used a processor-specific symbol type, but feedback suggests that an approach using processor-specific section name that encodes the alignment analogous to SHN_COMMON symbols is preferred. This patch keeps both variants around for now to reduce problems with LLVM compatibility as we switch branches around. This also cleans up the error reporting in this function. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-24 21:04:10 -04:00
Marek Olšák	0032f6b8a0	ac/surface: remove addrlib_family_rev_id Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-24 21:04:10 -04:00
Dylan Baker	032fe7d602	docs: update calendar, add news item and link release notes for 19.0.7	2019-06-24 16:24:05 -07:00
Dylan Baker	7badae431a	docs: Add SHA256 sums for 19.0.7	2019-06-24 16:22:21 -07:00
Dylan Baker	8c0e5c4cfc	Docs add 19.0.7 release notes	2019-06-24 16:22:20 -07:00
Ian Romanick	ee1c69fadd	glsl: Don't increase the iteration count when there are no terminators Incrementing the iteration count was intended to fix an off-by-one error when the first terminator was superseded by a later terminator. If there is no first terminator or later terminator, there is no off-by-one error. Incrementing the loop count creates one. This can be seen in loops like: do { if (something) { // No breaks or continues here. } } while (false); Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Tested-by: Abel Briggs <abelbriggs1@hotmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110953 Fixes: `646621c66d` ("glsl: make loop unrolling more like the nir unrolling path")	2019-06-24 14:32:33 -07:00
Eric Anholt	5c4289dd4b	freedreno: Only upload the used part of UBO0 to the constant buffer. We were pessimistically uploading all of it in case of indirection, but we can just bump that when we encounter indirection. total constlen in shared programs: 2529623 -> 2485933 (-1.73%) Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-06-24 14:23:07 -07:00
Eric Anholt	852704976a	freedreno: Stop treating UBO 0 specially in UBO uploading. ir3_nir_analyze_ubo_ranges() has already told us how much of cb0 we need to upload (all of it, since it will lower indirect UBO 0 accesses from load_ubo back to indirection on the constant buffer). Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-06-24 14:23:07 -07:00
Rob Clark	572c76fd88	freedreno: Clamp UBO uploads to the constlen decided by the shader. If the NIR-level analysis decided to move UBO loads to the constant file, but the backend decided not to load those constants, we could upload past the end of constlen. This is particularly relevant for pre-a6xx, where we emit a different constlen between bin and render variants. (Fix by Rob, commit message by anholt) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-24 14:23:07 -07:00
Alyssa Rosenzweig	c1ca138475	panfrost: Allow up to 16 UBOs This is the hardware max, as far as I can tell. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-24 12:57:40 -07:00
Alyssa Rosenzweig	b670becb1e	panfrost: DRY between shader stage setup Just a little spring cleanup, extending UBOs to vertex shaders in the process. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-24 12:57:40 -07:00
Alyssa Rosenzweig	5e2c3d40bd	panfrost/midgard: Implement UBO reads UBOs and uniforms now use a common code path with an explicit `index` argument passed, enabling UBO reads. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-24 12:57:40 -07:00
Alyssa Rosenzweig	f28e9e868b	panfrost: Handle disabled/empty UBOs Prevents an assert(0) later in this (not so edge) case. We still have to have a dummy there. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-24 12:57:40 -07:00
Alyssa Rosenzweig	bd2fc60a8a	panfrost: Identify "uniform buffer count" bits We've known about this for a while, but it was never formally in the machine header files / decoder, so let's add them in. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-24 12:57:40 -07:00
Alyssa Rosenzweig	856e03902b	panfrost: Upload UBOs Now that all the counting is sorted, it's a matter of passing along a GPU address and going. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-24 12:57:40 -07:00
Alyssa Rosenzweig	4c6d751274	panfrost: Allow for dynamic UBO count We already uploaded UBOs, but only a fixed number (1) for uniforms; let's upload as many as we compute we need. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-24 12:57:40 -07:00
Alyssa Rosenzweig	5d60be4e24	panfrost: Report UBO count We look at the highest set bit in the UBO enable mask to work out the maximum indexable UBO, i.e. the UBO count as we need to report to the hardware. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-24 12:57:40 -07:00
Alyssa Rosenzweig	ca2caf01df	panfrost: Constant buffer refactor We refactor panfrost_constant_buffer to mirror v3d's constant buffer handling, to enable UBOs as well as a single set of uniforms. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-24 12:57:40 -07:00
Alyssa Rosenzweig	f35f373850	panfrost: Replace varyings for point sprites This doesn't handle Y-flipping, but it's good enough to render the stars in Neverball. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-24 12:56:22 -07:00
Alyssa Rosenzweig	be03060066	panfrost: Track point sprites in fragment shader key In preparation for lowering point sprites, track them like we track alpha testing state. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-24 12:56:16 -07:00
Caio Marcelo de Oliveira Filho	7fc907118e	i965: Move resources lowering after NIR linking Those either depend on information filled by the NIR linking steps OR are restricted by those: - gl_nir_lower_samplers: depends on UniformStorage being set by the linker. - brw_nir_lower_image_load_store: After `6981069fc8` "i965: Ignore uniform storage for samplers or images, use binding info" we want this pass to happen after gl_nir_lower_samplers. - gl_nir_lower_buffers: depends on UniformBlocks and SharedStorageBlocks being set by the linker. For the regular GLSL code path, those datastructures are filled earlier. For NIR linking code path we need to generate the nir_shader first then process it -- and currently the processing works with all shaders together. So move the passes out of brw_create_nir into its own function, called by the brwProgramStringNotify and brw_link_shader(). This patch prepares ground for ARB_gl_spirv, that will make use of NIR linker. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-24 11:44:03 -07:00
Caio Marcelo de Oliveira Filho	6e2ff10886	glsl/nir: Fix copying 64-bit values in uniform storage The iterator `i` already walks the right amount now that is incremented by `dmul`, so no need to `* 2`. Fixes invalid memory access in upcoming ARB_gl_spirv tests. Failure bisected by Arcady Goldmints-Orlov. Fixes: `b019fe8a5b` "glsl/nir: Fix handling of 64-bit values in uniform storage" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-24 11:32:14 -07:00
Caio Marcelo de Oliveira Filho	390ff8ac54	glsl/nir: Fix copying vector constant values For n_columns == 1, we have a vector which is handled by the else case. Fixes invalid memory access in upcoming ARB_gl_spirv tests. Failure bisected by Arcady Goldmints-Orlov. Fixes: `81e51b412e` "nir: Make nir_constant a vector rather than a matrix" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-24 11:32:14 -07:00
Daniel Schürmann	0daeb1d127	amd/common: lower bitfield_extract to ubfe/ibfe. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-24 18:42:20 +02:00
Daniel Schürmann	48a75e7af0	amd/common: lower bitfield_insert to bfm & bitfield_select Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-24 18:42:20 +02:00
Daniel Schürmann	a8b0b6e52b	nir: introduce lowering of bitfield_insert to bfm and a new opcode bitfield_select. bitfield_select is defined as: bitfield_select(mask, base, insert) = (mask & base) \| (~mask & insert) matching the behavior of AMD's BFI instruction. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-24 18:42:20 +02:00
Daniel Schürmann	1403c3a7bf	nir/algebraic: Use unsigned comparison when lowering bitfield insert/extract This lets us use the optimization pattern (('ult', 31, ('iand', b, 31)), False) to remove the bcsel instruction for code originating in D3D shaders. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-24 18:42:20 +02:00
Daniel Schürmann	4eeb49ea71	nir/algebraic: Remove unnecessary iand of [iu]bfe and bfm sources The [iu]bfe and bfm instructions are defined to only use the five least significant bits. This optimizes a common pattern from D3D -> SPIR-V translation. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-24 18:42:20 +02:00
Daniel Schürmann	165b7f3a44	nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec. That is: the five least significant bits provide the values of 'bits' and 'offset' which is the case for all hardware currently supported by NIR and using the bfm/bfe instructions. This patch also changes the lowering of bitfield_insert/extract using shifts to not use bfm and removes the flag 'lower_bfm'. Tested-by: Eric Anholt <eric@anholt.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-24 18:42:20 +02:00
Daniel Schürmann	a74f256c58	nir/algebraic: add optimization pattern for ('ult', a, ('and', b, a)) and friends. These optimizations are based on the fact that 'and(a,b) <= umin(a,b)'. For AMD, this series moves the optimization from LLVM to NIR, so currently no vkpipeline-db changes here. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-06-24 18:42:20 +02:00
Andreas Baierl	fa6ea16a8d	lima/ppir: Add fsat op Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-06-24 16:41:33 +02:00
Andreas Baierl	f1d89bbc2f	lima/ppir: Add fneg op Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-06-24 16:41:33 +02:00
Andreas Baierl	512397058d	lima/ppir: Add fabs op Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-06-24 16:41:33 +02:00
Eric Engestrom	2d2e824fae	util: support "y" and "n" in env_var_as_boolean() Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-24 12:49:13 +00:00
Andreas Baierl	0cb9ce12fd	lima/ppir: lower ffma in ppir Since we cannot handle ffma in ppir, lower it on nir level already. Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-06-24 11:57:57 +00:00
Samuel Pitoiset	946193ae00	radv: add support for VK_AMD_buffer_marker This simple extension might be useful for debugging purposes. GAPID has support for it. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-24 10:50:54 +02:00
Tapani Pälli	ff77b0415b	meson: error out if platforms contains empty string Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110939 Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-06-24 08:40:18 +03:00
Nataraj Deshpande	d94fca5420	anv: Add HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED in vk_format When HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED is used, then the platform gralloc module will select a format based on the usage flags provided by the camera device and the other endpoint of the stream. The patch fixes crash in vulkan when the test is run with camera stream set to HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED. Test: android.graphics.cts.CameraVulkanGpuTest#testCameraImportAndRendering on chromebook with camera HAL3. v2: use AHARDWAREBUFFER_FORMAT_IMPLEMENTATION_DEFINED and take AHARDWAREBUFFER_USAGE_CAMERA_MASK in to account (Gurchetan) Fixes: `f1654fa7e3` "anv/android: support creating images from external format" Signed-off-by: Nataraj Deshpande <nataraj.deshpande@intel.com> Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org> Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-24 08:28:18 +03:00
Timur Kristóf	3b6d787e40	iris: move sysvals to their own constant buffer This commit moves the sysvals to a separate, new constant buffer at the end (before the shader constants). It also allows us to remove the special handling we had for cbuf0, and enables all constant buffers to support user-specified resources and user buffers. v2: (by Kenneth Graunke) - Rebase on the previous patch to fix system value uploading. - Fix disk cache num_cbufs calculation - Fix passthrough TCS to report num_cbufs = 1 so upload actually occurs - Change upload_sysvals to assert that num_cbufs > 0 when num_system_values > 0. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-23 18:33:23 +02:00
Kenneth Graunke	ebc8c20b3e	iris: Mark cbuf0 as not needing uploading every single time I neglected to mark cbuf0_needs_upload = false after uploading it. The obvious fix regressed user clip plane tests, because of a second bug: we also forgot to mark that they may need re-uploading when changing shader programs (which may have more or less system values). Thanks to Timur Kristóf for catching the original issue. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>	2019-06-23 18:32:11 +02:00
Eric Engestrom	188dbb1679	Revert "egl: drop empty eglfallbacks.c" and "egl: move fallback calls to eglapi.c" This reverts commits `cc4b68a801` and `b27fb3eaca`. These caused a bunch of EGLSync tests to crash when they were previously failing. I have a hunch the tests are doing something wrong, like using extensions without checking for they support, but until the issue is investigated I'm just reverting these commits. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-22 21:59:06 +01:00
Eric Engestrom	cc4b68a801	egl: drop empty eglfallbacks.c Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-06-22 15:17:42 +00:00
Eric Engestrom	b27fb3eaca	egl: move fallback calls to eglapi.c Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-06-22 15:17:42 +00:00
Eric Engestrom	262b767023	egl: drop `_eglReturnFalse()` fallbacks v2: drop them altogether, they should never get called in the first place (Emil) Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-06-22 15:17:42 +00:00
Eric Engestrom	82487ede62	egl: remove unnecessary eglGetProcAddress() fallback No need to add a function that returns `false` only to be cast into a pointer, we can just use the existing `return NULL` :) Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-06-22 15:17:42 +00:00
Eric Engestrom	30ecd86947	egl: remove NULL assignments after calloc() Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-06-22 15:17:42 +00:00
Eric Engestrom	64c7c05b71	egl: move bad_param check further up This way other functions added in these entrypoints don't need to check anything. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-06-22 15:17:42 +00:00
Kenneth Graunke	262787b9bc	iris: Drop bo != NULL check from blorp 48b invalidate function. There is always a BO.	2019-06-21 20:50:42 -05:00
Kenneth Graunke	5da37a826b	Revert "iris: Don't check VF address high bits when there is no buffer." This reverts commit `db8f57a5cb`. This is bonkers. There will always be a BO.	2019-06-21 20:50:42 -05:00
Eric Anholt	4449572c47	freedreno: Only upload UBO pointers for UBOs that haven't been lowered. total constlen in shared programs: 2485933 -> 2462236 (-0.95%) Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-21 17:14:43 -07:00
Eric Anholt	01d0bad9ef	freedreno: Remove silly return from ir3_optimize_nir(). We only ever return the shader we were passed in (but internally modified). Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-21 17:14:43 -07:00
Eric Anholt	56842d33d5	freedreno: Fix up end range of unaligned UBO loads. We need the constants uploaded to cover the NIR offset plus the size, not the aligned-down start of our upload range plus the size. Fixes mistaken UBO analysis with mat3 loads. Fixes: `893425a607` ("freedreno/ir3: Push UBOs to constant file") Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-06-21 17:14:43 -07:00
Eric Anholt	5e7c96b95d	freedreno: Fix UBO load range detection on booleans. NIR 1-bit bool dests will have a bit size of 1, and thus a calculated "bytes" of 0. load_ubo is always loading from dwords in the source. Fixes: `893425a607` ("freedreno/ir3: Push UBOs to constant file") Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-06-21 17:14:43 -07:00
Eric Anholt	23a7feda63	freedreno: Stop reporting max_const in shader-db. We end up uploading constlen regardless, so max_const would only get you slightly improved granularity in const usage in comparison. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-06-21 17:14:43 -07:00
Eric Anholt	ee2e1e85d4	freedreno: Include binning shaders in shader-db. We want to see if we've improved our binning VS output, as well as the render VS. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-06-21 17:14:43 -07:00
Marek Olšák	8ab9f3a857	include: update GL headers from the registry Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-06-21 19:00:52 -04:00
Alyssa Rosenzweig	a6bef350ed	panfrost: Fix unused variable warning Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-21 13:06:49 -07:00
Boris Brezillon	5f81669d88	panfrost: Remove the panfrost_driver abstraction The non-DRM backend is gone. Let's get rid of the panfrost_driver abstraction and call the panfrost_drm_xxx() functions directly. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-21 13:01:49 -07:00
Boris Brezillon	e8257f3de8	panfrost: Remove the perf counters interface The DRM backend has a dummy implementation and the non-DRM backend is gone, so let's remove this perf counter interface. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-21 13:01:12 -07:00
Tomeu Vizoso	0bcbccf887	panfrost: ci: Fix parsing of crashed tests Without this fix, LAVA isn't parsing crashes as failed tests, because the shell logging is interspersed within the fake deqp output. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-21 09:35:35 -07:00
Alyssa Rosenzweig	d38ac21297	panfrost: Conditionally submit fragment job If there are no tiling jobs and no clears, there is no need to submit a fragment job (relevant for transform feedback). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-21 09:35:35 -07:00
Alyssa Rosenzweig	cd5d618b5c	panfrost: Implement rasterizer discard D'aww, look how cute that is now that scoreboarding is setup. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-21 09:35:31 -07:00
Alyssa Rosenzweig	26c5a145a7	panfrost: Track buffer initialization We want to know if a given slice of a buffer is initialized at a particular point in the execution of the program. This is accomplished easily enough -- start out uninitialized and upon an operation writing to the buffer, mark it initialized. The motivation is to optimize away expensive operations (like wallpaper blits) when reading from an uninitialized buffer; since it's uninitialized, the results of these operations are undefined, and it's legal to take the fast path ^_^ Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-21 09:35:09 -07:00
Alyssa Rosenzweig	f0854745fd	panfrost: Implement command stream scoreboarding This is a rather complex change, adding a lot of code but ideally cleaning up quite a bit as we go. Within a batch (single frame), there are multiple distinct Mali job types: SET_VALUE, VERTEX, TILER, FRAGMENT for the few that we emit right now (eventually more for compute and geometry shaders). Each hardware job has a mali_job_descriptor_header, which contains three fields of interest: job index, a dependencies list, and a next job pointer. The next job pointer in each job is used to form a linked list of submitted jobs. Easy enough. The job index and dependencies list, however, are used to form a dependency graph (a DAG, where each hardware job is a node and each dependency is a directed edge). Internally, this sets up a scoreboarding data structure for the hardware to dispatch jobs in parallel, enabling (for example) vertex shaders from different draws to execute in parallel while there are strict dependencies between tiling the geometry of a draw and running that vertex shader. For a while, we got by with an incredible series of total hacks, manually coding indices, lists, and dependencies. That worked for a moment, but combinatorial kaboom kicked in and it became an unmaintainable mess of spaghetti code. We can do better. This commit explicitly handles the scoreboarding by providing high-level manipulation for jobs. Rather than a command like "set dependency #2 to index 17", we can express quite naturally "add a dependency from job T on job V". Instead of some open-coded logic to copy a draw pointer into a delicate context array, we now have an elegant exposed API to simple "queue a job of type XYZ". The design is influenced by both our current requirements (standard ES2 draws and u_blitter) as well as the need for more complex scheduling in the future. For instance, blits can be optimized to use only a tiler job, without a vertex job first (since the screen-space vertices are known ahead-of-time) -- causing tiler-only jobs. Likewise, when using transform feedback with rasterizer discard enabled, vertex jobs are created (to run vertex shaders) with no corresponding tiler job. Both of these cases break the original model and could not be expressed with the open-coded logic. More generally, this will make it easier to add support for compute shaders, geometry shaders, and fused jobs (an optimization available on Bifrost). Incidentally, this moves quite a bit of state from the driver context to the batch, which helps with Rohan's refactor to eventually permit pipelining across framebuffers (one important outstanding optimization for FBO-heavy workloads). v2: Add comment explaining the meaning of "primary batch" as suggested by Tomeu (trivial - not reviewed). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Rohan Garg <rohan.garg@collabora.com>	2019-06-21 09:35:02 -07:00
Anuj Phogat	e334a595e4	intel/icl: Add new ICL PCI-IDs Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-21 08:38:08 -07:00
Jason Ekstrand	1a9e5b9094	anv: Implement "pop-free" clipping This is the preferred clipping mode since it doesn't mean your points disappear the moment part of the point crosses over the edge of the viewport and that lines have weird endpoints at viewport edges. We've just never bothered to hook it up until now. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-21 14:18:59 +00:00
Jason Ekstrand	4a757d6c31	anv: Enable the guardband clip test In workloads where there is a lot of geometry drawn that crosses over the edge of the viewport, this should substantially improve clipper performance. Not really sure why it's taken 3 years to turn it on but we never got around to it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-21 14:18:59 +00:00
Jason Ekstrand	13f0c278c5	i965,iris: Move guardband calculations to a common location Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-21 14:18:59 +00:00
Mauro Rossi	60c581b57d	android: virgl: fix libmesa_winsys_virgil_common build and dependencies Fixes the following building errors and resolves Bug 110922 Fixes gallium_dri target missing symbols at linking. external/mesa/src/gallium/winsys/virgl/drm/Android.mk: error: libmesa_winsys_virgl (STATIC_LIBRARIES android-x86_64) missing libmesa_winsys_virgl_common (STATIC_LIBRARIES android-x86_64) ... external/mesa/src/gallium/winsys/virgl/vtest/Android.mk: error: libmesa_winsys_virgl_vtest (STATIC_LIBRARIES android-x86_64) missing libmesa_winsys_virgl_common (STATIC_LIBRARIES android-x86_64) ... build/core/main.mk:728: error: exiting from previous errors. In file included from external/mesa/src/gallium/winsys/virgl/vtest/virgl_vtest_socket.c:34: external/mesa/src/gallium/winsys/virgl/vtest/virgl_vtest_winsys.h:35:10: fatal error: 'virgl_resource_cache.h' file not found ^~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. In file included from external/mesa/src/gallium/winsys/virgl/vtest/virgl_vtest_winsys.c:32: external/mesa/src/gallium/winsys/virgl/vtest/virgl_vtest_winsys.h:35:10: fatal error: 'virgl_resource_cache.h' file not found #include "virgl_resource_cache.h" ^~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. Fixes: `b18f09a` ("virgl: Introduce virgl_resource_cache") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Tested-by: Clayton Craft <clayton.a.craft@intel.com>	2019-06-21 15:53:29 +02:00
Mauro Rossi	cf389ba895	android: winsys/amdgpu,radv: fix generated amdgfxregs.h header dependecies Fix android building errors in winsys/amdgpu and radv due to 'amdgfxregs.h' not found. Changelog: amd/common - generated $(intermediated)/common path is added to exports winsys/amdgpu - libmesa_amd_common static dependency is added radv - correct generated $(intermediated)/common path is added to includes Fixes: `f480b8a` ("amd/common: use generated register header") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-06-21 15:53:23 +02:00
Samuel Pitoiset	9bf47fefe0	radv: add support for VK_KHR_depth_stencil_resolve Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-21 14:50:38 +02:00
Samuel Pitoiset	e67fc11c26	radv: pass sample locations for transitions before depth/stencil resolves HTILE decompressions need the user sample locations if specified in the current subpass. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-21 14:50:35 +02:00
Samuel Pitoiset	396da5c029	radv: clear the depth/stencil resolve attachment if necessary The driver might need to clear one aspect of the depth/stencil resolve attachment before performing the resolve itself. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-21 14:50:33 +02:00
Samuel Pitoiset	c7872237bf	radv: decompress HTILE if the resolve src image is compressed It's required to decompress HTILE before resolving with the compute path. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-21 14:50:27 +02:00
Samuel Pitoiset	29c4d44cee	radv: select the depth/stencil resolve method based on some conditions Only fallback to the compute path for layers. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-21 14:50:24 +02:00
Samuel Pitoiset	5cf350f565	radv: implement all depth/stencil resolve modes using compute This path supports layers but it requires to decompress HTILE before resolving. The driver also needs to fixup HTILE after the resolve. This path is probably slower than the graphics one. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-21 14:50:19 +02:00
Samuel Pitoiset	cdc6efddf9	radv: implement all depth/stencil resolve modes using graphics When using graphics, the driver doesn't need to decompress HTILE before resolving. This path currently doesn't support layers so we have to fallback to the compute path. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-21 14:50:15 +02:00
Samuel Pitoiset	e52ad9f845	radv: record if a render pass has depth/stencil resolve attachments Only supported with vkCreateRenderPass2(). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-21 14:50:12 +02:00
Samuel Pitoiset	ac6369a2d0	radv: rename has_resolve to has_color_resolve Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-21 14:50:10 +02:00
Samuel Pitoiset	203f60ebf2	radv: emit framebuffer state from primary if secondary doesn't inherit it Otherwise fast color/depth clears can't work because they depend on the framebuffer. This fixes the following CTS (when the small hint is disabled): - dEQP-VK.geometry.layered.1d_array.secondary_cmd_buffer - dEQP-VK.geometry.layered.2d_array.secondary_cmd_buffer - dEQP-VK.geometry.layered.cube.secondary_cmd_buffer - dEQP-VK.geometry.layered.cube_array.secondary_cmd_buffer Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110810 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107986 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-21 13:49:35 +02:00
Eric Engestrom	6a9dd62882	drisw: move build logic to build systems Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-21 11:35:39 +00:00
Tomeu Vizoso	1cbe2ad394	panfrost: ci: Exclude two more flip-flop from results These three tests pass on RK3399, but fail on RK3288: dEQP-GLES2.functional.shaders.matrix.div.const_lowp_mat2_mat2_vertex dEQP-GLES2.functional.shaders.operator.unary_operator.pre_increment_effect.highp_ivec4_vertex dEQP-GLES2.functional.shaders.texture_functions.vertex.texture2dprojlod_vec3 They reliably pass when run individually, but reliably fail when run in a full CI run. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-06-21 10:45:12 +02:00
Gert Wollny	ef4429d9c5	gallium/st: Add Gallium hud to swrast drivers Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-06-21 08:54:57 +02:00
Iago Toral Quiroga	4d8f82946b	v3d: flush jobs writing to vertex buffers used in the current draw call This can happen when any of our vertex buffers was written by a previous transform feedback draw. Fixes the following piglit tests: spec/ext_transform_feedback/position-render-bufferbase spec/ext_transform_feedback/position-render-bufferbase-discard spec/ext_transform_feedback/position-render-bufferoffset spec/ext_transform_feedback/position-render-bufferoffset-discard spec/ext_transform_feedback/position-render-bufferrange spec/ext_transform_feedback/position-render-bufferrange-discard Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-21 08:06:13 +02:00
Iago Toral Quiroga	eb44dcc219	v3d: flush jobs reading from transform feedback output buffers If we are about to write to a transform feedback buffer, we should make sure that we flush any prior work that intended to read from any of these buffers. Fixes piglit test: spec/ext_transform_feedback/immediate-reuse Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-21 08:06:13 +02:00
Iago Toral Quiroga	42572f2f7d	v3d: add a helper to check if transform feedback is enabled v2: We should be safe assuming that bind_vs != NULL (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-21 08:06:13 +02:00
Dave Airlie	00a56acc23	llvmpipe: make remove_shader_variant static. this isn't used outside this file. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-06-21 10:27:57 +10:00
Eric Engestrom	955c63d364	util/os_file: resize buffer to what was actually needed Fixes: `316964709e` "util: add os_read_file() helper" Reported-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-20 21:49:30 +00:00
Tomeu Vizoso	2743e34f20	panfrost: ci: Update expectations These tests have been fixed recently. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-06-20 20:57:41 +02:00
Alyssa Rosenzweig	195e297a92	panfrost/midgard: Broadcast swizzle Fixes regression in shaders using ball/etc by explicitly passing through the number of channels in the NIR op and broadcasting the last components of the channel appropriately, as the Midgard ops are all vec4 implicitly but NIR can be vec2/3. v2: Don't also regress every other swizzle in Equestria. v3: Don't regress the swizzles at Canterlot High either. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-06-20 20:52:04 +02:00
Kenneth Graunke	31de802e7e	iris: Use stream uploader for shader draw parameters. Most vertex data lives in user VBOs in IRIS_MEMZONE_OTHER, which typically have high bits set to 0xffff. The shader draw parameters were being uploaded in IRIS_MEMZONE_DYNAMIC, which have high bets set to 0x2. This was causing a lot of ping-ponging of high bits, leading to unnecessary VF cache flushing. Cuts 7.2% of the flushes in the Civizilation VI demo on Kabylake GT2.	2019-06-20 13:32:16 -05:00
Kenneth Graunke	db8f57a5cb	iris: Don't check VF address high bits when there is no buffer. If there is no buffer, then it doesn't matter. Leave the old stale high bits in place (for next time) and don't bother invalidating. Cuts 5.6% of the flushes in the Civilization VI demo on Kabylake GT2.	2019-06-20 13:32:16 -05:00
Kenneth Graunke	ecc500398f	iris: Drop RT flushes from depth stencil clearing flushes. These write depth and stencil, not color writes, so there's no need to flush the render target.	2019-06-20 13:32:16 -05:00
Kenneth Graunke	1d63af0f2c	iris: Don't bother with PIPE_CONTROLs for CPU writes and no history If a buffer has no usage history, we don't have any read only cache invalidates to do. If we've written it with the CPU, we don't need to flush the render cache. The only bit remaining is the CS stall from iris_flush_bits_for_history. We can just skip the PIPE_CONTROL in this case. This is pretty common - an app creates a buffer, fills it with data, and then binds it for some purpose. Cuts 36% of the flushes in Manhattan 3.0 on Kabylake GT2.	2019-06-20 13:32:16 -05:00
Kenneth Graunke	dfff6e10b4	iris: Only do an RT flush for transfer maps if using copy_region. If we wrote the data via the CPU, there's no point in doing a render target flush. If using BLORP, we do want a render target flush so the data lands.	2019-06-20 13:32:15 -05:00
Kenneth Graunke	c4c17ab3ec	iris: Use iris_flush_bits_for_history in iris_transfer_flush_region Instead of using the combined iris_flush_and_dirty_for_history, use iris_flush_bits_for_history directly - we were already using the split out iris_dirty_for_history. There's no need to dirty twice, and we can avoid the looping altogether for non-buffers.	2019-06-20 13:32:15 -05:00
Kenneth Graunke	6890340c31	iris: Avoid double flushing in iris_transfer_flush_region when copying. My intention was to have iris_copy_region not do flushing, and leave that up to the callers. iris_resource_copy_region needs to do this, but iris_transfer_flush_region was already doing it. The net result was that we were doing it twice for transfers. So, move the flushing from iris_copy_region to iris_resource_copy_region so that it only happens in the callers as I intended.	2019-06-20 13:32:15 -05:00
Kenneth Graunke	64fb20ed32	iris: Fix iris_flush_and_dirty_history to actually dirty history. When I split iris_flush_and_dirty_history into two helper functions, I accidentally made it stop dirtying. Which was...sort of the point. Fixes: `21688a306b` iris: Split iris_flush_and_dirty_for_history into two helpers.	2019-06-20 13:32:15 -05:00
Kenneth Graunke	5e501ffeb2	iris: Add maybe_flush calls to texture_barrier and memory_barrier Otherwise, tests which loop on glMemoryBarrier may run us out of batch space with piles of flushing. (Ideally, we'd elide those bonus PIPE_CONTROLs, but presumably this isn't that common of a case...) Piglit's arb_pipeline_statistics_query-comp would hit this case after some of the next patches remove other PIPE_CONTROLs with maybe_flushes.	2019-06-20 13:32:15 -05:00
Kenneth Graunke	d4a4384b31	iris: Implement INTEL_DEBUG=pc for pipe control logging. This prints a log of every PIPE_CONTROL flush we emit, noting which bits were set, and also the reason for the flush. That way we can see which are caused by hardware workarounds, render-to-texture, buffer updates, and so on. It should make it easier to determine whether we're doing too many flushes and why.	2019-06-20 13:32:15 -05:00
Alyssa Rosenzweig	c378829a0d	panfrost: Skip shading unaffected tiles Looking at the scissor, we can discard some tiles. We specifially don't care about the scissor on the wallpaper, since that's a no-op if the entire tile is culled. v2: Clarify clear comment (not reviewed but trivial). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-06-20 09:30:38 -07:00
Eric Engestrom	65b016b146	glx: fix glvnd pointer types Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110709 Fixes: `22a9e00aab` ("glx: Implement the libglvnd interface.") Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-20 17:21:37 +01:00
Eric Engestrom	e0ee790ba7	glx: drop misleading comment about the file being "generated" This `gen_scrn_dispatch.pl` has never existed, in the sense that NVIDIA never published it. There have been a number (6) of commits to fix various things in there over the years, and never anything from NVIDIA. For all intents and purposes this file is hand-written and hand-maintained, and we're on our own. Let's make this clear by removing this misleading comment. Suggested-by: Eric Anholt <eric@anholt.net> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2019-06-20 16:19:58 +00:00
Boris Brezillon	56434450f6	nir/lower_tex: Add an assert() in nir_lower_txs_lod() We don't expect the output of a TXS instruction to be wider than a vec3. Add an assert() to make sure this never happens. Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-20 09:15:53 -07:00
Tomeu Vizoso	babc3ad291	panfrost: Set job requirements during draw Right now we are doing it at a moment when we don't have all the information we need. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Suggested-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by: Rohan Garg <rohan.garg@collabora.com> Cc: Rohan Garg <rohan.garg@collabora.com> Fixes: `bfca21b622` ("panfrost: Figure out job requirements in pan_job.c")	2019-06-20 18:07:04 +02:00
Alyssa Rosenzweig	dc668203db	panfrost/meson: Link with libpanfrost_shared Fixes: `035a07c0` ("panfrost: Switch to lima tiling") Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-20 08:56:38 -07:00
Hyunjun Ko	f7f8fb1b55	freedreno/ir3: fix typo Fixes: `a9b556d3a0` ("freedreno/ir3: check the type of regs of absneg opcode in is_same_type_mov") Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-06-20 08:34:09 -07:00
Alyssa Rosenzweig	546236e27f	panfrost: Load from tiled images Now that we have lima tiling code available, use it to load from a tiled source. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-20 08:22:38 -07:00
Alyssa Rosenzweig	035a07c0ae	panfrost: Switch to lima tiling Lima and Panfrost both have implementations of software tiling (the Lima one was forked off the Panfrost one which was forked off the original Lima one...). Switch to the most recent Lima code, since it's more complete than ours at this point. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-20 08:22:38 -07:00
Alyssa Rosenzweig	7b46f09f26	panfrost: Fix tiled NPOT textures with bpp<4 Panfrost's tiling routines (incorrectly) ignored the source stride, masking this bug; lima's routines respect this stride, causing issues when tiling NPOT textures whose stride is not a multiple of 64 (for instance, NPOT textures with bpp=1). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-20 08:22:38 -07:00
Alyssa Rosenzweig	413242277a	lima,panfrost: Move lima_tiling.c/h to /src/panfrost This will allow both drivers to share this code. Both drivers build-tested with meson. Android build not tested. v2: Change naming from tiling->shared, in case Lima and Panfrost can share more in the future. Fix Android build system. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-and-tested-by: Qiang Yu <yuq825@gmail.com>	2019-06-20 08:06:35 -07:00
Kenneth Graunke	c57b4c86c0	iris: Use render_batch/compute_batch locals in memory_barrier We have them, may as well use them.	2019-06-20 10:04:38 -05:00
Lionel Landwerlin	4a61be24fe	anv: only resort to sync fds internally with no syncobj support We can rely on only one kind of synchronization object (drm-syncobj) when it is available. This reduces the number of file descriptors we use in our implementation. This will be required later for timeline semaphores implementation, at this point we won't ever want to use anything else but syncobjs. v2: Only use has_syncobj for semaphores (Jason) v3: Only has_syncobj in assert on semaphores in QueueSubmit (Jason) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-20 14:59:51 +00:00
Alyssa Rosenzweig	1d7e53a854	panfrost: Remove other commented pointers Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-20 07:48:05 -07:00
Alyssa Rosenzweig	2608da14b9	panfrost/decode: Elide more zero fields Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-20 07:48:05 -07:00
Alyssa Rosenzweig	cfc2218a8c	panfrost/decode: Remove memory comments These do more harm than good at this point. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-20 07:48:04 -07:00
Alyssa Rosenzweig	8643b89c48	panfrost: Add missing 0x in invocation_count Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-20 07:48:04 -07:00
Alyssa Rosenzweig	b6d46d09c2	panfrost/decode: Skip decode of fragment backend in non-fragment This is all zero for anything but fragment shaders. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-20 07:48:04 -07:00
Alyssa Rosenzweig	ae2bfab7b7	panfrost/decode: Clip mali_compute_fbd at 64-bytes Looking at internal evidence (later fields including a literal other compute job inception-style, seeming memory corruption, no clear function, and the field after this being a pointer to itself), it looks like this is really a much smaller descriptor. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-20 07:48:04 -07:00
Alyssa Rosenzweig	3faf33488a	panfrost/decode: Print COMPUTE uniforms as pointers In OpenGL, uniforms generally represent fp32 vec4s (at least in highp mode). In OpenCL, they represent vec2s of 64-bit pointers. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-20 07:48:04 -07:00
Alyssa Rosenzweig	0021fae7f8	panfrost/decode: Show int uniforms Float is ambiguous. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-20 07:48:04 -07:00
Alyssa Rosenzweig	1f7dfee1b4	panfrost/decode: Expand pointers in compute descriptor Just as an aid. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-20 07:48:04 -07:00
Alyssa Rosenzweig	0aa5d89acb	panfrost/decode: Identify "compute FBD" There is fundamentally not a framebuffer associated with a compute job. Allocate a new structure for it so we don't mess up graphics when decoding. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-20 07:48:04 -07:00
Tomeu Vizoso	4f881237c3	panfrost: Allocate panfrost_job in panfrost_context Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-20 15:48:35 +02:00
Tomeu Vizoso	b5db7cce60	panfrost: Release transient pools Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-20 15:48:35 +02:00
Tomeu Vizoso	6cec937e22	panfrost: ci: Exclude flip-flops from results These tests are failing at times, blacklist for now: dEQP-GLES2.functional.fbo.render.shared_colorbuffer_clear.tex2d_rgba dEQP-GLES2.functional.fbo.render.shared_colorbuffer_clear.tex2d_rgb dEQP-GLES2.functional.shaders.matrix.mul.dynamic_highp_mat4_vec4_vertex Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-20 15:48:15 +02:00
Alejandro Piñeiro	6a159bca9d	util: add empty line before virgl options Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-06-20 15:21:39 +02:00
Alejandro Piñeiro	790c3dbac8	util: add missing DRI_CONF_OPT_END When DRI_CONF_GLES_EMULATE_BGRA was added for the virgl driver, it missed a DRI_CONF_OPT_END. This make some drivers, like v4c/v3d to crash with the following error: Fatal error in __driConfigOptions line 99, column 2: mismatched tag. Not sure why it doesn't fail with virgl. Fixes: `b793663449` Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-20 14:11:30 +02:00
Eric Engestrom	a9e09d56a9	isl: tag unreachable path as such GCC should be able to figure out that all the possible enum values are exhausted in the switch() and all the branches return from the function, but apparently it doesn't, so let's tell the compiler explicitly. This gets rid of the following warnings in GCC 9: [1/24] Compiling C object 'src/intel/isl/60d23f8@@isl@sta/isl.c.o'. ../src/intel/isl/isl.c: In function ‘isl_surf_init_s’: ../src/intel/isl/isl.c:1569:10: warning: ‘array_pitch_el_rows’ may be used uninitialized in this function [-Wmaybe-uninitialized] 1569 \| surf = (struct isl_surf) { \| ~~~~~~^~~~~~~~~~~~~~~~~~~~~ 1570 \| .dim = info->dim, \| ~~~~~~~~~~~~~~~~~ 1571 \| .dim_layout = dim_layout, \| ~~~~~~~~~~~~~~~~~~~~~~~~~ 1572 \| .msaa_layout = msaa_layout, \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1573 \| .tiling = tiling, \| ~~~~~~~~~~~~~~~~~ 1574 \| .format = info->format, \| ~~~~~~~~~~~~~~~~~~~~~~~ 1575 \| \| 1576 \| .levels = info->levels, \| ~~~~~~~~~~~~~~~~~~~~~~~ 1577 \| .samples = info->samples, \| ~~~~~~~~~~~~~~~~~~~~~~~~~ 1578 \| \| 1579 \| .image_alignment_el = image_align_el, \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1580 \| .logical_level0_px = logical_level0_px, \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1581 \| .phys_level0_sa = phys_level0_sa, \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1582 \| \| 1583 \| .size_B = size_B, \| ~~~~~~~~~~~~~~~~~ 1584 \| .alignment_B = base_alignment_B, \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1585 \| .row_pitch_B = row_pitch_B, \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1586 \| .array_pitch_el_rows = array_pitch_el_rows, \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1587 \| .array_pitch_span = array_pitch_span, \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1588 \| \| 1589 \| .usage = info->usage, \| ~~~~~~~~~~~~~~~~~~~~~ 1590 \| }; \| ~ ../src/intel/isl/isl.c:1488:24: warning: ‘((void )&phys_total_el+4)’ may be used uninitialized in this function [-Wmaybe-uninitialized] 1488 \| struct isl_extent2d phys_total_el; \| ^~~~~~~~~~~~~ ../src/intel/isl/isl.c:1335:38: warning: ‘phys_total_el’ may be used uninitialized in this function [-Wmaybe-uninitialized] 1335 \| isl_align_div(phys_total_el->w tile_el_scale, \| ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~ ../src/intel/isl/isl.c:1488:24: note: ‘phys_total_el’ was declared here 1488 \| struct isl_extent2d phys_total_el; \| ^~~~~~~~~~~~~ Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-20 12:05:14 +00:00
Samuel Pitoiset	f179febde0	radv: enable DCC for mipmapped color textures on GFX8 It's tricky on GFX9, so only GFX8 for now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-20 11:04:02 +02:00
Samuel Pitoiset	17f94e1984	radv: do not fast clears if one level can't be fast cleared And fallback to slow color clears. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-20 11:03:58 +02:00
Samuel Pitoiset	450bce522a	radv: add fast clears support for mipmapped color images with DCC Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-20 11:03:57 +02:00
Samuel Pitoiset	fa903ba799	radv: add radv_dcc_clear_level() helper For clearing only one level. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-20 11:03:53 +02:00
Samuel Pitoiset	b92d87f7f0	radv: re-initialize DCC metadata after decompressing using compute Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-20 11:03:52 +02:00
Samuel Pitoiset	dc6e3053a7	radv: initialize levels without DCC during layout transitions Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-20 11:03:49 +02:00
Thomas Hellstrom	71b43490dd	svga: Support ARB_buffer_storage This basically boils down to supporting persistent and coherent buffer storage. We chose to use coherent buffer storage for all persistent buffers even if it's not explicitly specified, since using glMemoryBarrier to obtain coherency would be particularly expensive in our driver stack, and require a lot of additional bookkeeping. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2019-06-20 09:30:22 +02:00
Thomas Hellstrom	8c01e5ed5f	gallium/util: Make it possible to disable persistent maps in the upload manager For svga, the use of persistent / coherent maps is typically slightly slower than without them. It's probably a bit case-dependent and possible to tune, but for now, make sure we can disable those. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2019-06-20 09:30:22 +02:00
Thomas Hellstrom	3b828c4e68	svga: Map vertex- index- and constant buffers ansynchronously when reading With SWTNL and index translation we're mapping buffers for reading. These buffers are commonly upload_mgr buffers that might already be referenced by another submitted or unsubmitted GPU command. A synchronous map will then trigger a flush and sync, at least on Linux that doesn't distinguish between read- and write referencing. So map these buffers async. If they for some obscure reason happen to be dirty (stream-output, buffer-copy), the resource_buffer code will read-back and sync anyway. For persistent / coherent buffers a corresponding read-back and sync will happen in the kernel fault handler. Testing: Piglit quick. No regressions. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2019-06-20 09:30:22 +02:00
Thomas Hellstrom	f51915ba62	svga: Fix index buffer uploads In the case of SWTNL and index translation we were uploading index buffers and then reading out from them using the CPU. Furthermore, when translating indices we often cached the results with an upload_mgr buffer, causing the cached indexes to be immediately discarded on the next write to that upload_mgr buffer. Fix this by only uploading when we know the index buffer is going to be used by hardware. If translating, only cache translated indices if the original buffer was not a user buffer. In the latter case when we're not caching, use an upload_mgr buffer for the hardware indices. This means we can also remove the SWTNL hand-crafted index buffer upload mechanism in favour of the upload_mgr. Finally avoid using util_upload_index_buffer(). It wastes index buffer space by trying to make sure that the offset of the indices in the upload_mgr buffer is larger or equal to the position of the indices in the source buffer. From what I can tell, the SVGA device does not require that. Testing done: Piglit quick. No regressions. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2019-06-20 09:30:22 +02:00
Thomas Hellstrom	4f59d51d82	winsys/svga: Make it possible to specify coherent resources Add a flag in the surface cache key and a winsys usage flag to specify coherent memory. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2019-06-20 09:30:22 +02:00
Thomas Hellstrom	4412be40dd	gallium/util: Make u_debug_flush support persistent maps Previously unsynchronized maps have been assumed to also be persistent, Now destinguish between persistent and unsynchronized map and also support PIPE_TRANSFER_PERSISTENT from ARB_buffer_storage. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2019-06-20 09:30:22 +02:00
Gert Wollny	a478e56fbd	virgl: Add debug flag to bypass driconf to enable the BGRA tweaks This useful for testing, also because with vtest the dri configuration is not read. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-06-20 08:50:38 +02:00
Gert Wollny	5dbecf7863	virgl: Add a tweak to set the value for emulated queries of GL_SAMPLES_PASSED On GLES hosts GL_SAMPLES_PASSED is emulated by GL_ANY_SAMPLES_PASSED which returns a boolen. With this tweak the value that is returned if any sample passed can be set. This may be of iterest when an application decides whether some geometry is rendered based on an amount of visibility and not just a binary desicion. virgelrenderer sets a default of 1024 on th host. v2: Remove reference from virgl and correct description (Emil) v3: Send the tweak binary encoded instead of using strings (Gurchetan) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-06-20 08:50:38 +02:00
Gert Wollny	59757dbad6	virgl: Add tweak to apply a swizzle when drawing/blitting to a emulated BGRA texture With Qemu this final swizzle is not needed, but with vtest it is, i.e. it depends on how a program using virglrenderer uses the surface that is rendered to, hence a tweak is added. v2: Update description and fix spelling (Emil) v3: Send tweak as binary value instead of using strings (Gurchetan) Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-06-20 08:50:38 +02:00
Gert Wollny	b793663449	virgl: Add driconf tweak for emulating BGRA surfaces on GLES These tweaks are used to fix rendering issues with Valve games and at least also "The Raven Remastered" when run on a GLES host. v2: Fix type in define and remove virgl from driconf option (Emil) v3: Encode tweak binary instead of using strings (Gurchetan) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-06-20 08:50:38 +02:00
Gert Wollny	13d4a34c44	virgl: Add override for BGRA format to use swizzled SRGB format Tie in the check whether the host supports tweaks and whether this tweak is enabled. v2: Add comment about the emulated formats not being used directly in the guest (Gurchetan) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-06-20 08:50:38 +02:00
Gert Wollny	22edafb239	virgl: Add code to accept BGRx_SRGB as RGBx_SRGB This will be enabled in later patches by the emulation tweak. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-06-20 08:50:38 +02:00
Gert Wollny	d8967b7951	virgl: Add skeleton to evaluate cap and send tweaks Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-06-20 08:50:38 +02:00
Gert Wollny	28dc096e15	virgl: factor out format host bits check This will make it a single location when we want to replace a format. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-06-20 08:50:38 +02:00
Gert Wollny	30eb1fdc51	gallium/virgl: Add code path for virgl to read driconf This works only for the drm variant of virgl and not for the vtest variant. v2: Rebase, replace the configuration query function by a pointer to the configuration data. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1) Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-06-20 08:50:38 +02:00
Gert Wollny	cf800998af	virgl: Add driinfo file and tie it into the build Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-06-20 08:50:37 +02:00
Caio Marcelo de Oliveira Filho	9b0720c436	glspirv: Call pass to lower frexp instructions These were previously handled by the spirv_to_nir, but that changed to be an explict pass in `23d30f4099` "spirv,nir: lower frexp_exp/frexp_sig inside a new NIR pass" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-19 22:07:57 -07:00
Caio Marcelo de Oliveira Filho	12131096fa	spirv: Restrict use of descriptor intrinsics to Vulkan In ARB_gl_spirv we'll be able to use variables for uniform buffers, so don't use the descriptor intrinsics to lower the block access. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-19 22:07:51 -07:00
Nicolai Hähnle	21dd881416	ac/rtld: report better error messages for LDS overallocation Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2019-06-19 20:30:32 -04:00
Marek Olšák	b64bd5887e	ac/rtld: check correct LDS max size Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2019-06-19 20:30:32 -04:00
Nicolai Hähnle	1ee0f0d315	radeonsi: add s_sethalt to shaders for debugging Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2019-06-19 20:30:32 -04:00
Nicolai Hähnle	87182200c7	ac/rtld: fix sorting of LDS symbols by alignment Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2019-06-19 20:30:32 -04:00
Bas Nieuwenhuizen	d1c04835ab	meson: Allow building radeonsi with just the android platform. Just as was allowed by autotools. Fixes: `108d257a16` "meson: build libEGL" Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-19 23:42:49 +00:00
Bas Nieuwenhuizen	755c633b8d	anv: Fix vulkan build in meson. Apparently the android part was never ported to meson. CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-19 23:27:46 +00:00
Bas Nieuwenhuizen	4c300bd328	radv: Fix vulkan build in meson. Apparently the android part was never ported to meson. CC: <mesa-stable@lists.freedesktop.org> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-06-19 23:27:46 +00:00
Jason Ekstrand	ef323d02d8	anv/image: Set different usage flags for shadow surfaces For the block BLOCK_TEXEL_VIEW_COMPATIBLE case, this didn't matter because the flags were already more-or-less what we wanted. However, for gen7 stencil shadow images, it still had ISL_SURF_USAGE_STENCIL_BIT so we were getting W-tiled which isn't what we want for the shadow. By passing just ISL_SURF_USAGE_TEXTURE_BIT (and CUBE if we care), we now get something that's actually texturable. Fixes: `f3ea0cf828` "anv: Add stencil texturing support for gen7"	2019-06-19 22:21:46 +00:00
Jason Ekstrand	215f9f83f5	anv: Flush caches in anv_image_copy_to_shadow Copies to a shadow image happen during a VkCmdPipelineBarrier or at subpass transitions. We could potentially be a bit more conservative but these transitions shouldn't happen often and it's better to have our bases covered. Fixes: `f3ea0cf828` "anv: Add stencil texturing support for gen7"	2019-06-19 22:21:46 +00:00
Jason Ekstrand	81e51b412e	nir: Make nir_constant a vector rather than a matrix Most places in NIR, we treat matrices like arrays. The one annoying exception to this has been nir_constant where a matrix is a first-class thing. This commit changes that so a matrix nir_constant is the same as an array nir_constant. This makes matrix nir_constants a tiny bit more expensive but shrinks all others by 96B. Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-06-19 21:05:54 +00:00
Jason Ekstrand	b019fe8a5b	glsl/nir: Fix handling of 64-bit values in uniform storage Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-06-19 21:05:54 +00:00
Jason Ekstrand	a54e397152	spirv: Only copy needed components for OpSpecConstantOp Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-06-19 21:05:54 +00:00
Jason Ekstrand	96bb9c9277	spirv: Use a single path for OpSpecConstantOp of OpVectorShuffle Now that nir_const_value is a scalar, there's no reason why we need multiple paths here and it's just extra paths to keep working. While we're here, we also add a vtn_fail_if check that component indices are in-bounds. Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-06-19 21:05:54 +00:00
Jason Ekstrand	280e5442e5	spirv: Use vtn_constan_uint() for array lengths and gather components Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-06-19 21:05:54 +00:00
Jason Ekstrand	aa11c2e75e	spirv: Add a vtn_constant_int helper Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-06-19 21:05:54 +00:00
Jason Ekstrand	93f4aa9889	glsl/types: Add a real is_integer helper Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-06-19 20:28:52 +00:00
Jason Ekstrand	f0920e266c	glsl/types: Rename is_integer to is_integer_32 It only accepts 32-bit integers so it should have a more descriptive name. This patch should not be a functional change. Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-06-19 20:28:52 +00:00
Jason Ekstrand	21a7e6d569	glsl/types: Ignore bit sizes in contains_integer() All of the callers for this function are looking at interpolation qualifiers and want to make sure they're declared flat. Any 64-bit integer inputs need to be flat. It's also makes the function make more sense since "integer" is fairly generic. Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-06-19 20:28:52 +00:00
Jason Ekstrand	0d1fb380b1	glsl/types: Handle all bit sizes in glsl_type_is_integer All of the callers of this function really just want to know if the type is an integer and don't care about bit size. Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-06-19 20:28:52 +00:00
Caio Marcelo de Oliveira Filho	feb0cdcb52	glsl/nir_opt_access: Update uniforms correctly when only vars change Even if only variables access flags are changed, the existing NIR infrastructure expects metadata to be explicitly preserved, so do that. Don't care about avoiding preserve to be called twice since the cost is negligible. This scenario can be triggered by dead variables, and also by other intrinsics that read the variables -- but not cause progress to be made when processing the intrinsics. Fixes: `f2d0e48ddc` "glsl/nir: Add optimization pass for access flags" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-19 12:50:41 -07:00
Caio Marcelo de Oliveira Filho	d7ea433a5f	glsl/nir: Fix getting the sampler dim when arrays are involved Unwrap any array in the variable type so we can get the sampler dim. This fixes piglit test spec/arb_arrays_of_arrays/execution/image_store/basic-imageStore-const-uniform-index.shader_test. Fixes: `f2d0e48ddc` "glsl/nir: Add optimization pass for access flags" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-19 12:50:39 -07:00
Jory Pratt	10e8d46601	meson: Search for execinfo.h Rather than checking __GLIBC__/__UCLIBC__ macros as a proxy for execinfo.h presence, just check directly. This allows the build to work on musl. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-19 12:16:18 -07:00
Jory Pratt	fd7b7f14d8	util: Heap-allocate 256K zlib buffer The disk cache code tries to allocate a 256 Kbyte buffer on the stack. Since musl only gives 80 Kbyte of stack space per thread, this causes a trap. See https://wiki.musl-libc.org/functional-differences-from-glibc.html#Thread-stack-size (In musl-1.1.21 the default stack size has increased to 128K) [mattst88]: Original author unknown, but I think this is small enough that it is not copyrightable. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-19 12:16:18 -07:00
Kenneth Graunke	9c19d07b1c	anv: Fix wrong printf formatter %lu is for unsigned long, %zu is for size_t. Just cast the data.	2019-06-19 11:57:01 -05:00
Kenneth Graunke	bbbf7a538c	iris: Bail on queries for INTEL_NO_HW=1. We don't execute any of the commands to record snapshots, so we can't actually produce a real result. We do however need to avoid waiting on a syncpt which will never be signalled. So, just return 0.	2019-06-19 11:55:43 -05:00
David Riley	11e74daae5	virgl: Support VIRGL_BIND_SHARED Support a new virgl bind type for shared buffers. Signed-off-by: David Riley <davidriley@chormium.org> Reviewed-By: Gert Wollny <gert.wollny@collabora.com>	2019-06-19 07:28:47 -07:00
Lionel Landwerlin	bc62673dce	anv: write spirv-nir logs back to the application Using the existing VK_EXT_debug_report extension. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-19 15:45:52 +03:00
Connor Abbott	53a7649e5d	ac/nir: Set speculatable for buffer loads where allowed This brings the nir path in line with the TGSI path. Totals from affected shaders: SGPRS: 2984 -> 2984 (0.00 %) VGPRS: 2792 -> 2652 (-5.01 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 247380 -> 248072 (0.28 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 121 -> 132 (9.09 %) Wait states: 0 -> 0 (0.00 %) Most of the change came from DiRT: Showdown, and came from sinking SSBO loads. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-19 14:08:28 +02:00
Connor Abbott	77be5b2f88	nir: Use reorderable access flag No changes with radeonsi shader-db. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-19 14:08:28 +02:00
Connor Abbott	a1c737927c	nir: Add a helper to determine if an intrinsic can be reordered This is simple now, but we're going to be adding a few more conditions to this later. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-19 14:08:28 +02:00
Connor Abbott	6fc83c253f	st/nir: Use gl_nir_opt_access Nothing uses its results yet, that will come with the following commits. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-19 14:08:28 +02:00
Connor Abbott	f2d0e48ddc	glsl/nir: Add optimization pass for access flags Right now, this just deduces when we can arbitrarily reorder SSBO and image loads, matching the existing logic in radeonsi's TGSI->LLVM pass. This approach can't handle some things that nir_opt_copy_prop_vars can, but it can handle images, and with GCM it lets us hoist reads outside of loops. We can also pass this information to LLVM which lets it do its own optimizations on it. This is GLSL only as I haven't tested it on Vulkan yet, and it would probably need a few changes to work there. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-19 14:08:28 +02:00
Connor Abbott	c813c5776d	nir: Add reorderable memory access enum Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-19 14:08:28 +02:00
Connor Abbott	75063fbac5	nir/copy_prop_vars: Ignore volatile accesses The spec explicitly says that volatile writes can't be removed and volatile reads do not guarantee that the same value will still be around after the read, as if there were a barrier after each read/write. Just ignore them. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-19 14:08:28 +02:00
Connor Abbott	364996d70d	glsl/nir: Propagate access qualifiers We were completely ignoring these before, except for putting them on variables. While we're here, don't set access qualifiers when converting to bindless since glsl_to_nir will already have set a more accurate qualifier that includes any qualifiers on struct members that are dereferenced. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-19 14:08:27 +02:00
Connor Abbott	6f20643b47	nir: Allow qualifiers on copy_deref and image instructions In the next commit, we'll properly handle access qualifiers on struct members by propagating them to load/store instructions, but these instructions had no way to specify the qualifier. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-19 14:08:27 +02:00
Connor Abbott	3bf8981c51	ac,radeonsi: Always mark buffer stores as inaccessiblememonly inaccessiblememonly means that it doesn't modify memory accesible via normal LLVM pointers. This lets LLVM's dead store elimination, memcpy forwarding, etc. ignore functions with this attribute. We don't represent descriptors as pointers, so this property is always true of buffer and image stores. There are plans to represent descriptors via pointers, but this just means that now nothing is inaccessiblememonly, as LLVM will then understand loads/stores via its usual alias analysis. Radeonsi was mistakenly only setting it if the driver could prove that there were no reads, and then it was cargo-culted into ac_llvm_build and ac_llvm_to_nir. Rip it out of everything. statistics with nir enabled: Totals from affected shaders: SGPRS: 152 -> 152 (0.00 %) VGPRS: 128 -> 132 (3.12 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 9324 -> 9244 (-0.86 %) bytes LDS: 2 -> 2 (0.00 %) blocks Max Waves: 17 -> 17 (0.00 %) Wait states: 0 -> 0 (0.00 %) The only difference was a manhattan31 shader. Acked-by: Timothy Arceri <tarceri@itsqueeze.com> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-19 14:08:27 +02:00
Eric Engestrom	4db2c1e2fe	egl: add missing #include close() is in <unistd.h> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-06-19 12:05:58 +00:00
Samuel Pitoiset	0a313cc285	radv: disable viewport clamping even if FS doesn't write Z This fixes new CTS dEQP-VK.pipeline.depth_range_unrestricted.*. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-19 11:18:50 +02:00
Samuel Pitoiset	e91c1ea06c	radv: implement compressed FMASK texture reads with RADV_PERFTEST=tccompatcmask This allows us to disable the FMASK decompress pass when transitioning from CB writes to shader reads. This will likely be improved and enabled by default in the future. No CTS regressions on GFX8 but a few number of multisample CTS failures on GFX9 (they look related to the small hint). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-19 10:06:39 +02:00
Samuel Pitoiset	a7f75377ab	radv: fix FMASK expand with SRGB formats Found while working on DCC for MSAA. Fixes: `6b976024a8` ("radv: add support for FMASK expand") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-19 07:53:53 +02:00
Tomeu Vizoso	0fcf73bc2d	panfrost: Move to use ralloc for some allocations We have some serious leaks, so plug some and also move to ralloc to limit the lifetime of some objects to that of their parent. Lots more such work to do. For some reason, this fixes: dEQP-GLES2.functional.lifetime.attach.deleted_output.texture_framebuffer Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-19 07:34:15 +02:00
Mathias Fröhlich	5743a36b2b	egl: Don't add hardware device if there is no render node v2. Do not offer a hardware drm backed egl device if no render node is available. The current implementation will fail on this egl device. On top it issues a warning that is actually missleading. There are finally more error paths that can fail on the way to a hardware backed egl device. Fixing all of them would kind of require opening the drm device and see if there is a usable driver associated with the device. The taken approach avoids a full probe and fixes at least this kind of problem on kvm virtualization hosts I observe here. Fixes: `dbb4457d98` ("egl: add EGL_EXT_device_drm support") Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-06-19 07:17:23 +02:00
Christian Gmeiner	8dd26fa2f0	etnaviv: support GL_ARB_seamless_cubemap_per_texture Passes spec@amd_seamless_cubemap_per_texture@amd_seamless_cubemap_per_texture Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-By: Guido Günther <agx@sigxcpu.org>	2019-06-19 00:39:50 +02:00
Christian Gmeiner	a13efb3cdb	etnaviv: update headers from rnndb Update to etna_viv commit `a3bf0da`. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-19 00:39:50 +02:00
Dave Airlie	378ea92bf6	radeonsi: fix undefined shift in macro definition Pointed out by coverity Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-19 08:32:36 +10:00
Dave Airlie	93ba356544	nouveau: fix frees in unsupported IR error paths. This is pointless in that we won't ever hit those paths in real life, but coverity complains. Fixes: `f014ae3c7c` ("nouveau: add support for nir") Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-06-19 08:32:19 +10:00
Rohan Garg	ad284f794c	panfrost: Move clearing logic into pan_job Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 12:32:43 -07:00
Chia-I Wu	98eda99ab8	virgl: fix sync issue regarding discard/unsync transfers GL_MAP_INVALIDATE_BUFFER_BIT cannot be treated as GL_MAP_INVALIDATE_RANGE_BIT naively. When we run into ptr = glMapBufferRange(buf, 0, size, GL_WRITE_BIT\|GL_MAP_INVALIDATE_BUFFER_BIT); memcpy(ptr, data1, size); glUnmapBuffer(buf); ptr = glMapBufferRange(buf, size, size, GL_WRITE_BIT\|GL_MAP_UNSYNCHRONIZED_BIT); memcpy(ptr, data2, size); glUnmapBuffer(buf); we never want data1 to be copy_transfer'ed. Because that would mean that data2 might overwrite valid data. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis alexandros.frantzis@collabora.com Fixes: `a22c5df079` ("virgl: Use buffer copy transfers to avoid waiting when mapping") Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-06-18 10:38:21 -07:00
Alyssa Rosenzweig	2a717f300b	panfrost: Enable sRGB Now that sRGB formats are supported for both rendering and sampling, advertise support. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 09:59:29 -07:00
Alyssa Rosenzweig	5aa51ba97f	panfrost: Disable AFBC on sRGB buffers The performance impact is slightly mitigated by tiling the render target, but it's undeniably still slow compared to AFBC. Unfortunately, it doesn't look like AFBC and sRGB play nice... Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 09:59:29 -07:00
Alyssa Rosenzweig	6585bb9f52	panfrost: Enable sRGB fixed-function blending For fixed-function, we have hardware to handle sRGB so we just set a flag. For blend shaders, it's rather more involved; this is currently unimplemented. Assert it out for now; we don't need it quite yet. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 09:59:29 -07:00
Alyssa Rosenzweig	4b137da409	panfrost: Specify sRGB in the render target Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 09:59:29 -07:00
Alyssa Rosenzweig	58c34e4a6c	panfrost: Implement sRGB texturing Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 09:59:29 -07:00
Alyssa Rosenzweig	31a4ef847c	panfrost: Add sRGB render target flag Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 09:59:29 -07:00
Alyssa Rosenzweig	01e1eecb95	panfrost: Implement tiled rendering We already can sample from Mali's linear/tiled encoding (the one from Utgard -- AFBC is mostly unrelated); let's be able to render to it as well. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 09:59:29 -07:00
Alyssa Rosenzweig	d50795109b	panfrost: Decode rendering block type A mode for rendering tiled/uncompressed was noticed, so we reshuffle the MFBD render target definitions to explicitly include block type. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 09:59:28 -07:00
Alyssa Rosenzweig	83c02a5ea9	panfrost: Refactor texture targets This combines the two cmdstream bits "is_3d" and "is_not_cubemap" into a single 2-bit texture target selection, noticing it's the same as the 2-bit selection in Midgard and Bifrost texturing ops. Accordingly, we share this definition and add the missing entry for 1D/buffer textures. This requires a nontrivial (but functionally similar) refactor of all parts of the driver to use the new definitions appropriately. Theoretically, this should add support for buffer textures, but that's obviously not tested and probably wouldn't work. While doing so, we notice the sRGB enable bit, which we document and decode as well here so we don't forget about it. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 09:59:28 -07:00
Rohan Garg	bfca21b622	panfrost: Figure out job requirements in pan_job.c Requirements for a job should be figured out in pan_job.c v2: [Alyssa] Fix early return Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 09:52:20 -07:00
Rohan Garg	debb85d1ec	panfrost: Reset job counters once the job is submitted Move the reset out of frame invalidation into job submission Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 09:52:20 -07:00
Rohan Garg	0f43a2ae8a	panfrost: Initial implementation of panfrost_job_submit Start fleshing out panfrost_job v2: [Alyssa: Remove unused variable, warning introduced] Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 09:52:01 -07:00
Gurchetan Singh	2daf3d8215	virgl_hw: add YUV support Add corresponding entries from p_format.h Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-06-18 09:18:58 -07:00
Gurchetan Singh	2480ce802a	virgl: sync to virglrenderer virgl_hw.h It's nice to keep these two files in sync, as they define guest userspace <---> host userspace communcation. Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-06-18 09:18:48 -07:00
Jason Ekstrand	58cb865313	anv: Make border colors the right size and alignment on HSW Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-18 16:07:08 +00:00
Lionel Landwerlin	51076eb87c	imgui: bump imgui memory editor copy Getting rid of a compiler warning : In file included from ../src/intel/tools/aubinator_viewer.cpp:225: ../src/imgui/imgui_memory_editor.h: In member function ‘void MemoryEditor::DisplayPreviewData(size_t, const u8, size_t, MemoryEditor::DataType, MemoryEditor::DataFormat, char, size_t) const’: ../src/imgui/imgui_memory_editor.h:637:16: warning: enumeration value ‘DataType_COUNT’ not handled in switch [-Wswitch] switch (data_type) ^ Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-18 15:34:13 +00:00
Alyssa Rosenzweig	9402970751	panfrost/midgard: Enable autovectorization Enable nir_opt_vectorize. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 06:44:13 -07:00
Connor Abbott	47e7c6961a	nir: add a vectorization pass This effectively does the opposite of nir_lower_alus_to_scalar, trying to combine per-component ALU operations with the same sources but different swizzles into one larger ALU operation. It uses a similar model as CSE, where we do a depth-first approach and keep around a hash set of instructions to be combined, but there are a few major differences: 1. For now, we only support entirely per-component ALU operations. 2. Since it's not always guaranteed that we'll be able to combine equivalent instructions, we keep a stack of equivalent instructions around, trying to combine new instructions with instructions on the stack. The pass isn't comprehensive by far; it can't handle operations where some of the sources are per-component and others aren't, and it can't handle phi nodes. But it should handle the more common cases, and it should be reasonably efficient. [Alyssa: Rebase on latest master, updating with respect to typeless moves] Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-18 06:43:30 -07:00
Boris Brezillon	c3558868da	panfrost: Add support for TXS instructions This patch adds support for nir_texop_txs instructions which are needed to support the OpenGL textureSize() function. This is also needed to support RECT texture sampling which is currently lowered to 2D sampling + a TXS() instruction by the nir_lower_tex() helper. Changes in v2: * Split options for the 1st and 2nd tex lowering passes Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 06:36:07 -07:00
Boris Brezillon	5c17f84ae2	panfrost: Prepare things to support non-native texture ops We are about to add support for the TXS (texture size) op which is not implemented using a midgard texture instruction. Let's rename emit_tex() into emit_texop_native() and repurpose emit_tex() as a dispatcher. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 06:36:07 -07:00
Boris Brezillon	c57f7d0f15	panfrost: Move sysval upload logic out of panfrost_emit_for_draw() We're about to add more sysval types, and panfrost_emit_for_draw() is big enough, so let's move the sysval upload logic in a separate function. We also add one sub-function per sysval type to keep the panfrost_upload_sysvals() small/readable. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 06:36:07 -07:00
Boris Brezillon	bd49c8f0eb	panfrost: Make the sysval logic more generic We are about to add support for nir_texop_txs which requires adding a sysval/uniform containing the texture size. Let's change the emit_sysval_read() prototype to take a nir_instr object instead of a nir_intrinsic_instr one so we can re-use this function when emitting a sysval for a txs instruction. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 06:36:07 -07:00
Boris Brezillon	296c5fd25d	nir/lower_tex: Add a way to lower TXS(non-0-LOD) instructions The V3D driver has an open-coded solution for this, and we need the same thing for Panfrost, so let's add a generic way to lower TXS(LOD) into max(TXS(0) >> LOD, 1). Changes in v2: * Use == 0 instead of ! * Rework the minification logic as suggested by Jason * Assign cursor pos at the beginning of the function * Patch the LOD just after retrieving the old value Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 06:36:07 -07:00
Boris Brezillon	0e489fd360	nir/lower_tex: Update ->sampler_dim value before calling get_texture_size() get_texture_size() will create a txs instruction with ->sampler_dim set to the original tex->sampler_dim. The condition to call lower_rect() only checks the value of ->sampler_dim and whether lower_rect is requested or not. This leads to an infinite loop when calling nir_lower_tex() with the same options until it returns false. In order to avoid that, let's move the tex->sampler_dim patching before get_texture_size() is called. This way the txs instruction will have ->sampler_dim set to GLSL_SAMPLER_DIM_2D and nir_lower_tex() won't try to lower it on the subsequent passes. Changes in v2: * Add Jason R-b * Add a comment explaining why we patch ->sampler_dim at the beginning of the lower_rect() func Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 06:36:07 -07:00
Boris Brezillon	352b1d9c31	nir/lower_tex: Actually report when projector lowering happened The code considers that projector lowering was done even if it's not really the case. Change the project_src() prototype to return a bool encoding whether projector lowering happened or not and update the progress var accordingly in nir_lower_tex_block(). --- Changes in v2: * Add Jason R-b * Drop the part suggesting that nir_lower_rect() could be called in a do-while(progress) loop. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 06:36:07 -07:00
Tomeu Vizoso	6f60fec48f	panfrost: Adapt to constant name change in UABI We hadn't updated the kernel header after the driver got into mainline. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 15:26:08 +02:00
Tomeu Vizoso	5ad5777f89	panfrost: ci: Update results Alyssa fixed some failing tests last night. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-18 15:25:01 +02:00
Samuel Pitoiset	c16bf48bfc	radv: adjust the DCC base VA for mipmapped color attachments Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-18 12:24:26 +02:00
Samuel Pitoiset	6ee40efd02	radv: fix color decompressions for FMASK/CMASK Only skip levels without DCC when it's a DCC decompression. Whoops. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-18 12:09:04 +02:00
Samuel Pitoiset	42a41a9e4a	radv: do not decompress levels without DCC with the graphics path Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-18 11:24:50 +02:00
Samuel Pitoiset	e8917dcadb	radv: do not decompress levels without DCC with the compute path Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-18 11:24:41 +02:00
Samuel Pitoiset	864ddda8a3	radv: check if DCC is enabled per mip not for the whole image In other words, make use of radv_dcc_enabled() instead of radv_image_has_dcc() all over the places. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-18 11:24:36 +02:00
Iago Toral Quiroga	79a30543ee	v3d: implement simultaneous peripheral access exceptions for V3D 4.1+ Shader-db results: total instructions in shared programs: 9117550 -> 9102719 (-0.16%) instructions in affected programs: 1752873 -> 1738042 (-0.85%) helped: 7076 HURT: 478 helped stats (abs) min: 1 max: 22 x̄: 2.19 x̃: 2 helped stats (rel) min: 0.07% max: 13.89% x̄: 1.70% x̃: 1.07% HURT stats (abs) min: 1 max: 7 x̄: 1.41 x̃: 1 HURT stats (rel) min: 0.09% max: 10.17% x̄: 0.86% x̃: 0.54% 95% mean confidence interval for instructions value: -2.00 -1.92 95% mean confidence interval for instructions %-change: -1.58% -1.50% Instructions are helped. total max-temps in shared programs: 1327774 -> 1327728 (<.01%) max-temps in affected programs: 1025 -> 979 (-4.49%) helped: 47 HURT: 2 helped stats (abs) min: 1 max: 2 x̄: 1.02 x̃: 1 helped stats (rel) min: 2.63% max: 20.00% x̄: 7.67% x̃: 5.26% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 4.17% max: 4.17% x̄: 4.17% x̃: 4.17% 95% mean confidence interval for max-temps value: -1.06 -0.82 95% mean confidence interval for max-temps %-change: -8.89% -5.49% Max-temps are helped. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-18 08:09:03 +02:00
Iago Toral Quiroga	6d97c8fac1	v3d: only flush jobs accessing the query BO when reading query results Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-18 08:09:03 +02:00
Iago Toral Quiroga	5491883a9a	v3d: add a helper function to flush jobs using a BO v2: use _mesa_set_search() (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-18 08:09:03 +02:00
Kenneth Graunke	e8cd7a30d5	iris: Support more RGBX pipe formats. Without them, the state tracker falls back to an RGBA format, but it doesn't always manage to override the swizzle for us. So we lose the information that the API expects an X channel, where alpha is garbage and reads back as 1. We have no equivalent ISL RGBX format for these, so we just use RGBA directly and override the swizzle in all cases.	2019-06-17 21:52:38 -05:00
Kenneth Graunke	3c10a2726b	glsl: Fix out of bounds read in shader_cache_read_program_metadata The VaryingNames array has NumVaryings entries. But BufferStride is a small array of MAX_FEEDBACK_BUFFERS (4) entries. Programs with more than 4 varyings would read out of bounds. Also, BufferStride is set based on the shader itself, which means that it's inherently already included in the hash, and doesn't need to be included again. At the point when shader_cache_read_program_metadata is called, the linker hasn't even set those fields yet. So, just drop it entirely. Fixes valgrind errors in KHR-GL45.transform_feedback.linking_errors_test. Fixes: `6d830940f7` glsl/shader_cache: Allow shader cache usage with transform feedback Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-06-17 21:22:19 -05:00
Jason Ekstrand	9672b7044c	anv: Set STATE_BASE_ADDRESS upper bounds on gen7 This should fix floating-point border color on all gen7 HW. Integer is still thoroughly busted on gen7 because it doesn't exist on IVB and it's crazy on HSW. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-17 18:53:07 -05:00
Bas Nieuwenhuizen	925c04b4c7	radv: Disable linear tiled compressed textures. Support got removed in the new addrlib update. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-06-18 01:00:49 +02:00
Jason Ekstrand	1be38f9178	anv:Use VK_EXT_separate_stencil_usage to avoid stencil shadows on gen7 Whenever stencil texturing is not required (most of the time), we can use VK_EXT_separate_stencil_usage to only create the shadow image when VK_IMAGE_USAGE_SAMPLED_BIT is required for stencil. Of course, this depends on applications to use the extension but hopefully DXVK and similar translators are doing so and that covers most of the apps. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-17 22:32:26 +00:00
Jason Ekstrand	f3ea0cf828	anv: Add stencil texturing support for gen7 Intel hardware didn't get support for sampling from W-tiled (required for stencil) images until Broadwell so we can't directly sample from stencil. Instead, if we want to support stencil texturing on gen7 hardware, we have to keep a texture-capable shadow copy around and use BLORP to update when stencil changes. The one thing this commit does not implement is self-dependencies with stencil input attachments. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99493 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-17 22:32:26 +00:00
Jason Ekstrand	4faa3145b1	anv/blorp: Update shadow images when clearing or uploading Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-17 22:32:26 +00:00
Jason Ekstrand	2b736d9e6c	anv/cmd_buffer: Add a stencil transition helper Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-17 22:32:26 +00:00
Jason Ekstrand	86fc268142	anv/blorp: Take an aspect in anv_image_copy_to_shadow Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-17 22:32:26 +00:00
Jason Ekstrand	fcbefe013a	anv/formats: Re-arrange the way se set some flag bits Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-17 22:32:26 +00:00
Kenneth Graunke	659d4f613e	iris: Make resource_copy_region handle packed depth-stencil resources. Also copy along the separate stencil buffer if needed. Fixes Piglit's arb_copy_image-formats.	2019-06-17 17:29:09 -05:00
Kenneth Graunke	a36f1542ae	iris: Order CS stall and TC invalidate for format reinterpretation hacks This should ensure the TC invalidate happens after the stall. Fixes KHR-GL43.copy_image.functional which does a CopyImage (blorp_copy) from a buffer (using R8G8B8A8_UINT), then GetTexImage to read back the original image (using R10G10B10A2_UNORM).	2019-06-17 16:38:08 -05:00
Kenneth Graunke	94b9f50e63	iris: Be more aggressive at post-format-reintepret TC invalidate hack When copying/blitting with format reinterpretation, we invalidate the texture cache before/after. Before is so the source of the copy works, and after is to get rid of our new data in the "wrong" format to protect future attempts to sample. When I ported these hacks to iris, I tried to be cautious by only bothering with the hacks if the batch referenced the BO. This makes some sense for the before case. If it isn't referenced, the texture cache can't really have any data for the BO (since it's also invalidated between batches). But we still need to do the after case regardless, as we've just polluted the cache with hazardous entries.	2019-06-17 16:38:08 -05:00
Gert Wollny	2b87753a84	virgl: Assume sRGB write control for older guest kernels or virglrenderer hosts When the host virglrenderer is an older version that doesn't check the sRGB write control feature, or when the guest kernel doesn't support CAPS v2, then the guest will only report support for GL 2.1 on a GL 3.3 host, even though it was supporting 3.3 with earlier guest mesa versions. By also checking the host feature check version this regression can be avoided. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110921 Fixes: `2845939d6a` virgl: Set sRGB write control CAP based on host capabilities Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-06-17 21:16:11 +00:00
Rob Clark	21c795ab07	freedreno/a6xx: disallow UBWC for x24s8 Fixes: dEQP-GLES31.functional.stencil_texturing.format.depth24_stencil8_2d dEQP-GLES31.functional.stencil_texturing.format.stencil_index8_2d dEQP-GLES31.functional.stencil_texturing.misc.compare_mode_effect Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-06-17 20:29:13 +00:00
Rob Clark	4e72abcd97	freedreno/a6xx: un-swap X24S8_UINT The stencil is actually in the .w component, but we used to use SWAP to remap the channels. This doesn't work when tiled/ubwc. Fixes: dEQP-GLES31.functional.stencil_texturing.format.depth24_stencil8_2d_array dEQP-GLES31.functional.stencil_texturing.format.depth24_stencil8_cube dEQP-GLES31.functional.stencil_texturing.format.stencil_index8_2d_array dEQP-GLES31.functional.stencil_texturing.format.stencil_index8_cube dEQP-GLES31.functional.stencil_texturing.misc.base_level dEQP-GLES31.functional.texture.border_clamp.formats.stencil_index8.nearest_size_pot dEQP-GLES31.functional.texture.border_clamp.formats.stencil_index8.nearest_size_npot dEQP-GLES31.functional.texture.border_clamp.formats.depth24_stencil8_sample_stencil.nearest_size_pot dEQP-GLES31.functional.texture.border_clamp.formats.depth24_stencil8_sample_stencil.nearest_size_npot dEQP-GLES31.functional.texture.border_clamp.sampler.uint_stencil Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-06-17 20:29:13 +00:00
Samuel Pitoiset	6e3aee4630	radv: add mipmaps support for DCC decompression on compute Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-17 22:20:53 +02:00
Samuel Pitoiset	ebb1db96d5	radv: add mipmaps support for color decompressions (DCC/FMASK/CMASK) And some cleanups. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-17 22:20:53 +02:00
Samuel Pitoiset	00f0e5c6fd	radv: set the DCC/FCE predicates from the base level Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-17 22:20:53 +02:00
Samuel Pitoiset	7832e75ea8	radv: load the fast color clear values from the base level Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-17 22:20:53 +02:00
Samuel Pitoiset	7971697efe	radv: store the DCC predicate for each mip Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-17 22:20:53 +02:00
Samuel Pitoiset	38aa386e96	radv: store the FCE predicate for each mip Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-17 22:20:53 +02:00
Samuel Pitoiset	7295512037	radv: store the fast color clear values for each mip Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-17 22:20:53 +02:00
Samuel Pitoiset	58506fec63	radv: allocate DCC metadata for each mip Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-17 22:20:53 +02:00
Caio Marcelo de Oliveira Filho	4b0bc664a5	gallium: Remove unused util_ringbuffer Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-06-17 13:02:44 -07:00
Caio Marcelo de Oliveira Filho	397d1a18ef	llvmpipe: Don't use u_ringbuffer for lp_scene_queue Inline the ring buffer and signal logic into lp_scene_queue instead of using a u_ringbuffer. The code ends up simpler since there's no need to handle serializing data from / to packets. This fixes a crash when compiling Mesa with LTO, that happened because of util_ringbuffer_dequeue() was writing data after the "header packet", as shown below struct scene_packet { struct util_packet header; struct lp_scene scene; }; / Snippet of old lp_scene_deque(). */ packet.scene = NULL; ret = util_ringbuffer_dequeue(queue->ring, &packet.header, sizeof packet / 4, return packet.scene; but due to the way aliasing analysis work the compiler didn't considered the "&packet->header" to alias with "packet->scene". With the aggressive inlining done by LTO, this would end up always returning NULL instead of the content read by util_ringbuffer_dequeue(). Issue found by Marco Simental and iThiago Macieira. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110884 Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-06-17 13:02:44 -07:00
Alyssa Rosenzweig	390126e70a	panfrost/midgard: Simplify 2D array logic It shouldn't matter if we stick a z in for non-arrays, anyway. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 12:52:51 -07:00
Alyssa Rosenzweig	a3ae3cb8e9	panfrost/midgard: Handle non-zero component in store Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 12:52:51 -07:00
Alyssa Rosenzweig	2c9e124f81	panfrost/midgard: Apply writemask to LUTs Fixes LUT instructions with NIR registers. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 12:52:50 -07:00
Marek Olšák	eba932ea43	amd: update addrlib Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-17 15:14:55 -04:00
Nicolai Hähnle	d15cc1f55a	radeonsi: reduce MAX_GEOMETRY_OUTPUT_VERTICES This fixes piglit spec@glsl-1.50@gs-max-output on gfx9. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-17 15:14:51 -04:00
Alyssa Rosenzweig	aef01dd2e5	panfrost: Cleanup default blend mode Just encode the Mali magic number for `replace` rather than awkwardly forcing Gallium structures through. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 10:45:52 -07:00
Alyssa Rosenzweig	fbbb29aa5b	panfrost: Don't accidentally include blend shader Some residual dirty state can leak through across frames; zero this out. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 10:45:52 -07:00
Alyssa Rosenzweig	565c446dab	panfrost/midgard: Use typeless moves internally We switch all fmov to (i)mov, following the NIR switch. This simplifies some code surrounding blend shaders and should have no functional changes elsewhere. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 10:45:52 -07:00
Chia-I Wu	1fece5fa5f	virgl: better support for PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE When the resource to be mapped is busy and the backing storage can be discarded, reallocate the backing storage to avoid waiting. In this new path, we allocate a new buffer, emit a state change, write, and add the transfer to the queue . In the PIPE_TRANSFER_DISCARD_RANGE path, we suballocate a staging buffer, write, and emit a copy_transfer (which may allocate, memcpy, and blit internally). The win might not always be clear. But another win comes from that the new path clears res->valid_buffer_range and does not clear res->clean_mask. This makes it much more preferable in scenarios such as access = enough_space ? GL_MAP_UNSYNCHRONIZED_BIT : GL_MAP_INVALIDATE_BUFFER_BIT; glMapBufferRange(..., GL_MAP_WRITE_BIT \| access); memcpy(...); // append new data glUnmapBuffer(...); Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-06-17 09:36:31 -07:00
Chia-I Wu	9975a0a84c	virgl: add virgl_rebind_resource We are going support reallocating the HW resource for a virgl_resource. When that happens, the virgl_resource needs to be rebound to the context. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-06-17 09:36:31 -07:00
Chia-I Wu	7e0508d9aa	virgl: save virgl_hw_res in virgl_transfer When PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE is properly supported, virgl_transfer might refer to a different virgl_hw_res than virgl_resource does. We need to save the virgl_hw_res and use the saved one. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-06-17 09:36:31 -07:00
Chia-I Wu	ad1ef35dc1	virgl: add resource_reference to virgl_winsys It works similar to pipe_resource_reference but is for virgl_hw_res. It can also replace resource_unref. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-06-17 09:36:31 -07:00
Alyssa Rosenzweig	73bf669e3f	panfrost/midgard: Add rounding mode specific opcodes This adds a set of opcodes for performing moves and type conversions with respect to particular rounding modes, required for OpenCL. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 09:32:31 -07:00
Alyssa Rosenzweig	9865b79a88	panfrost: Drop draws with complete scissor The hardware support for scissoring requires minimally 1 pixel to be drawn. If the scissor culls everything, we need to drop the draw entirely early on. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 09:29:09 -07:00
Alyssa Rosenzweig	3a9b7692f1	panfrost: Disable pipelining temporarily Pipelined rendering is important for performance but is not working right these days. Disable it for correctness until the panfrost_job refactor is enabled and we can do it right. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 09:25:52 -07:00
Alyssa Rosenzweig	d4aed00214	panfrost/mfbd: Handle rendering to linear mipmap In anticipation of more general mipmapping support, we implemented support for rendering to linear mipmaps (a very simple case). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 08:42:54 -07:00
Alyssa Rosenzweig	531715431f	panfrost: Implement sampling from non-zero initial levels In preparation for more complex mipmap operations. glGenerateMipmap() in particular, as implemented by u_blitter, requires reading from non-zero initial mip levels. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 08:42:54 -07:00
Alyssa Rosenzweig	a5f5b0640c	panfrost: Resource management for linear 2D texture arrays Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 08:36:15 -07:00
Alyssa Rosenzweig	dabfc71d36	panfrost/midgard: Adjust swizzles for 2D arrays Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 08:36:14 -07:00
Alyssa Rosenzweig	67a34acd00	panfrost: Set array_size to permit array textures Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 08:36:14 -07:00
Alyssa Rosenzweig	bdf169abb3	panfrost: Decode array textures Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 08:36:14 -07:00
Alyssa Rosenzweig	0ae6bbe8a9	panfrost: Implement 3D texture resource management Passes dEQP-GLES3.functional.texture.format.unsized.3d Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 08:36:14 -07:00
Alyssa Rosenzweig	36a7b2b018	panfrost: Specify 3D in texture descriptor Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 08:28:13 -07:00
Alyssa Rosenzweig	8429beef5e	panfrost/midgard: Fix 3D texture masks/swizzles Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 08:28:13 -07:00
Alyssa Rosenzweig	56f9b47efd	panfrost/midgard: Add swizzle_of/mask_of helpers These make manipulating vectors in the Midgard compiler easier. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 08:28:13 -07:00
Alyssa Rosenzweig	8d1adc091b	panfrost: Enable helper invocations when texturing it turns out we have explicit control over helper invocations; if a particular bit in the fragment shader descriptor is set, helper invocations are launched; if it clear, they are not. Helper invocations are required whenever computing derivatives, whether explicitly (dFdx/dFdy) or implicitly (any texturing). Accordingly, we set this bit when texturing to fix edge case behaviour (literally, haha). Thank you to Jason Ekstrand and Ilia Mirkin for pointing out the representative dEQP test failed along triangle edges and for suggesting helper invocations / derivatives as a list of suspect pieces (which led to discovering the helper invocations enable bit in the first place). Ideally we would use the new NIR analysis pass for this, but that hasn't landed quite yet. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 08:22:37 -07:00
Alyssa Rosenzweig	0219b99500	panfrost: Handle missing texture case In some cases, Gallium can give us bad info about the texture count, counting some NULL textures. We pass Gallium's info to the hardware blindly, which can confuse the hardware in edge cases. This patch adjusts accordingly.	2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig	443f9ae0ad	panfrost: Remove forced flush on clears This worked around a bug in oooold versions of Panfrost. Nowadays, its presence is, at best, creating bugs. Let's wack it. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig	6460442049	panfrost: Flush scanout too In a poorly coded app, the framebuffer can be partially drawn, an FBO switched, switch back to the framebuffer and keep drawing, etc. Reordering would fix this, but for now we need to just be careful about flushing scanout too. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig	fc3f57bd7f	panfrost: Improve viewport (clipping) robustness On more complex apps (possibly using desktop GL specific extensions?), our viewport code was getting wacky results for unclear reasons. Let's be a little less wacky. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig	f9ecca2ff0	panfrost: Disable the tiler for clear-only jobs To do so, we route some basic information through to the FBD creation routines (currently just a binary toggle of "has draws?"). Eventually, more refactoring will enable dynamic hierarchy mask selection, but right now we do the most basic. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig	ac68946d9d	panfrost: Identify and decode mfbd_flags Previously known as the unk3 field. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig	12d4289bf9	panfrost: Stub out hierarchy mask selection Quite a bit of refactoring in the main driver will be necessary to make use of this effectively, so the implementation is incomplete. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig	6434f5c494	panfrost: Rename misc_0 -> tiler_polygon_list Just for readability. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig	e2c2ccd5b8	panfrost: Sanity check tiler polygon list size Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig	953cc4b540	panfrost: Compute and use polygon list body size This is a bit of a hack, but it gets the point across. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig	b660953733	panfrost: Use polygon list header size computation Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig	edfba9bee2	panfrost: Calculate polygon list header size As per the notes at the beginning of pan_tiler.c, we implement a routine to calculate the size of the polygon list header given the framebuffer dimensions and the provided hierarchy mask. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 07:59:14 -07:00
Alyssa Rosenzweig	e88ff9ad85	panfrost: Add pan_tiler.h header Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 07:47:49 -07:00
Alyssa Rosenzweig	21eb411d2f	panfrost: Document tile size heuristic I'm not sure how the blob does it, but this seems to be a dead simple test and roughly corresponds to what I've noticed from the blob, so maybe it's good enough. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 07:47:49 -07:00
Alyssa Rosenzweig	7f26bb3553	panfrost: Rename tiler fields per tiler research Following the research into Midgard's hierarchical tiling infrastructure, we now understand (in broad stokes) the purpose of each tiler field in the MFBD. Additionally, we understand more of the tiling fields in the SFBD and in Bifrost's structures, although this knowledge is still incomplete. Update the names, decoder, and comments to reflect this new understanding. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 07:47:49 -07:00
Alyssa Rosenzweig	8d6fb66e3a	panfrost: Add notes about the tiler allocations This explains how the polygon list is allocated, updating the headers appropiately to sync the terminology. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 07:47:49 -07:00
Alyssa Rosenzweig	85e745f2b4	panfrost: Integrate kernel names for tiler FBD These names are from the replay workaround in kbase; they begin to shine some light on the meaning of these fields. In particular, we now understand why the "tiler_meta" field has the effect it does on performance in certain scenes (controlling tile granularity). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-17 07:47:49 -07:00
Bas Nieuwenhuizen	1a7caac9e9	radv: Add asserts that buffer descriptors are created with valid buffer formats. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-06-17 10:56:50 +00:00
Bas Nieuwenhuizen	4107590911	radv: Decompress DCC when the image format is not allowed for buffers. Otherwise the buffer loads/stores in the bufimage meta operations fail. If we decompress DCC then we can use the "canonical" format compatible with the not-supported format. CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-06-17 10:56:50 +00:00
Samuel Pitoiset	e9875fc0b6	radv: make sure to init the DCC decompress compute path state This fixes a segfault when forcing DCC decompressions on compute because internal meta objects are not created since the on-demand stuff. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-17 11:30:49 +02:00
Samuel Pitoiset	4c7ef1b02e	ac: make ac_compute_cmask() a static function Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-17 11:30:47 +02:00
Samuel Pitoiset	cf77d3abf1	radv: rely on ac_compute_cmask() for CMASK info Instead of re-computing in the driver. The 3d and cube flags are correctly set, so the same values should returned by ac_compute_surface(). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-17 11:30:44 +02:00
Samuel Pitoiset	6880b42cfc	radv: silent a compiler warning in radv_CmdPushDescriptorSetKHR() Trivial. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-06-17 09:53:26 +02:00
Tomeu Vizoso	e655d63644	panfrost: ci: Speed things up a bit by skipping a git clone Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-06-17 09:17:53 +02:00
Tomeu Vizoso	f1efb0f254	panfrost: ci: Exclude all blend tests from results As they randomly fail on T760. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-06-17 09:17:53 +02:00
Samuel Pitoiset	b5012a0518	ac: update llvm.amdgcn.icmp intrinsic name for LLVM 9+ LLVM r363339 changed llvm.amdgcn.icmp.i* to llvm.amdgcn.icmp.i64.i*. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-17 08:58:33 +02:00
Erico Nunes	d72bbb2c89	lima: lower fmod in ppir and gpir Since commit `4f3c82c72c` fmod is no longer being lowered in nir, and ends up crashing lima programs with "unsupported nir_op: fmod" in both ppir and gpir. There seems to be no mod operation in hardware in utgard and there is an optimization in nir to lower fmod to instructions that lima already implements, so let's use that. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-06-16 10:11:59 +00:00
Rob Clark	a417c323ad	freedreno/a6xx: re-enable UBWC for depth/stencil Now that we can blit depth/stencil in a way that plays nicely with UBWC, re-enable it. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>	2019-06-15 07:33:04 -07:00
Rob Clark	363a9ed614	freedreno/a6xx: handle z24s8/z24x8 blits with u_blitter Now that it can turn these blits into rendering to RB6_Z24_UNORM_S8_UINT it can properly handle cases where only one of depth+stencil is being blit. And this avoids lying about he format, which completely doesn't work when UBWC is used. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>	2019-06-15 07:33:04 -07:00
Rob Clark	a96ae18de6	freedreno/a6xx: handle fallback for rewritten blits ourself For re-written z/s blits, we want to use the re-written `pipe_blit_info` even if we have to fallback to 3d pipe (`u_blitter`). So handle that fallback ourself. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>	2019-06-15 07:33:04 -07:00
Rob Clark	94c36a8554	freedreno/a6xx: rename variable The name 'separate' doesn't make a while lot of sense, as only one of the cases is the blit actually split. But split out from previous patch in an attempt to reduce the noise. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>	2019-06-15 07:33:04 -07:00
Rob Clark	5fe7b627eb	freedreno/a6xx: consolidate z/s blit handling This will get even simpler with the next patch Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>	2019-06-15 07:33:04 -07:00
Rob Clark	4c75d62ce8	gallium: add z24s8_as_r8g8b8a8 format This maps to a special format that recent generations of adreno have, for blitting z24s8. Conceptually it is similar to doing Z and/or S blits by pretending it is r8g8b8a8 (with appropriate writemask). But it differs when bandwidth compression is used, as z24 is a different type from r8g8b8. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>	2019-06-15 07:33:04 -07:00
Kenneth Graunke	1d75f52589	st/mesa: Respect GL_TEXTURE_SRGB_DECODE_EXT in GenerateMipmaps() Apparently, we're supposed to look at the texture object's built-in sampler object's sRGB decode setting in order to decide whether to decode/downsample/re-encode, or simply downsample as-is. Previously, we had just respected the pipe_resource's format. Fixes SKQP's Skia_Unit_Tests.SRGBMipMaps test. (This ports commit `337a808062` from i965 to st/mesa for Gallium drivers.) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-14 20:13:46 +00:00
Erico Nunes	3ddea5e8c5	lima: fix dynarray usage in lima_submit_add_bo Commit `de8a919702` refactored dynarray usage and changed the size of the allocation in lima_submit_add_bo. That causes a segfault in programs running with lima. This commit restores the allocation size back to the previous size. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-06-14 20:47:35 +02:00
Alyssa Rosenzweig	9ab8d31f32	panfrost: Fix variant selection Fixes 1acffb ("panfrost: Unify...") Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-14 10:35:07 -07:00
Marek Olšák	abe9a51d27	ac: add radeon_info::is_amdgpu instead of checking drm_major == 3 and clean up Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-06-14 13:31:18 -04:00
Mauro Rossi	bbbbea243a	android: amd/common: fix missing include path Fixes the following building error in Android: In file included from external/mesa/src/amd/common/ac_llvm_helper.cpp:34: In file included from external/mesa/src/amd/common/ac_llvm_build.h:30: In file included from external/mesa/src/compiler/nir/nir.h:40: In file included from external/mesa/src/compiler/nir_types.h:36: external/mesa/src/compiler/glsl_types.h:37:10: fatal error: 'main/config.h' file not found ^~~~~~~~~~~~~~~ 1 error generated. Fixes: `bd4c661` ("ac,ac/nir: use a better sync scope for shared atomics") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-06-14 18:36:10 +02:00
Mauro Rossi	51e24af8fd	android: radv: fix necessary dependecies Fixes building errors due to libmesa_util and libexpat dependencies: In file included from external/mesa/src/amd/vulkan/radv_device.c:52: external/mesa/src/util/xmlpool.h:115:10: fatal error: 'xmlpool/options.h' file not found ^~~~~~~~~~~~~~~~~~~ 1 error generated. FAILED: out/target/product/x86_64/obj_x86/SHARED_LIBRARIES/vulkan.radv_intermediates/LINKED/vulkan.radv.so ... external/mesa/src/util/xmlconfig.c:670: error: undefined reference to 'XML_ParserCreate' ... clang.real: error: linker command failed with exit code 1 (use -v to see invocation) Fixes: `3c2e826` ("radv: Add support for driconf.") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-06-14 18:35:10 +02:00
Alejandro Piñeiro	d317944c24	docs: document three NIR_ envvars Initially I was only interested on documenting NIR_PRINT, as today I needed to check the code to find this envvar, that at the moment I vaguely remembered that existed. As we are here, though, let's just document all of them (assuming that makes sense). Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-14 16:18:43 +02:00
Alexandros Frantzis	83829abe03	virgl: Return immediately when finding a compatible resource in the cache When searching for resources in the cache, we previously released all expired resources even after having found a compatible resource. This commit changes this behavior to return immediately when finding a compatible resource, so that the operation finishes more quickly. This moves more of the burden of releasing expired resources to cache addition, which, since it happens at resource destruction time, it's less time critical. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-06-14 12:59:51 +03:00
Alexandros Frantzis	801753d4b3	virgl: Use virgl_resource_cache in the vtest winsys Replace the cache implementation in the vtest winsys with virgl_resource_cache. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-06-14 12:59:49 +03:00
Alexandros Frantzis	13f70d3668	virgl: Use virgl_resource_cache in the drm winsys Replace the cache implementation in the drm winsys with virgl_resource_cache. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-06-14 12:59:43 +03:00
Alexandros Frantzis	b18f09a509	virgl: Introduce virgl_resource_cache Introduce a resource cache implementation that can be used by any virgl winsys backend. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-06-14 12:58:51 +03:00
Haihao Xiang	8ead5bebdb	i965: support UYVY for external import only It is similar with YUYV Fixes: `165e704719` ("i965/i915: Add UYVY as the supported format") Signed-off-by: Haihao Xiang <haihao.xiang@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-14 15:45:56 +08:00
Neil Roberts	34d4b3e367	glsl: Set default precision on record members Record types have their own slot to store the precision for each member in glsl_struct_field. Previously if the member didn’t have an explicit precision qualifier this was being left as GLSL_PRECISION_NONE. This patch makes it take into account the type’s default precision qualifier like it does for regular variables in apply_type_qualifier_to_variable. This has the additional benefit of correctly reporting an error when a float type is used in a struct without declaring the default type. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-14 09:29:53 +02:00
Neil Roberts	235425771c	glsl/linker: Make precision matching optional in intrastage_match This function is confusingly also used to match interstage interfaces as well as intrastage. In the interstage case it needs to avoid comparing the precisions. This patch adds a parameter to specify whether to take the precision into account or not so that it can be used for both cases. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-14 09:29:53 +02:00
Neil Roberts	19b27a8569	glsl/linker: Don’t check precision for shader interface On GLES, the interface between vertex and fragment shaders doesn’t need to have matching precision. Section 4.3.10 of the GLSL ES 3.00 spec: “The type of vertex outputs and fragment inputs with the same name must match, otherwise the link command will fail. The precision does not need to match.” Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-14 09:29:53 +02:00
Neil Roberts	230d1e8d86	compiler/types: Making comparing record precision optional On GLES, the interface between vertex and fragment shaders doesn’t need to have matching precision. This adds an extra argument to glsl_types::record_compare to disable the precision comparison. This will later be used for the shader interface check. In order to make this work this patch also adds a helper function to recursively compare types while ignoring the precision. v2: Call record_compare from within compare_no_precision to avoid duplicating code (Eric Anholt). Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-14 09:29:53 +02:00
Lucas Stach	ab74699190	etnaviv: fix some pm query issues The offsets to read the query results were off-by-one, which causes the counters to report bogus increasing values. Also the counter result is u32, so we need to initialize the query type to reflect that. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-14 09:06:28 +02:00
Iago Toral Quiroga	360b832c58	v3d: do not setup execute flags for else block in uniform control flow Either all channels executed the 'then' block, in which case all channels will directly jump to the 'endif' block at the end of the 'then' block, or all channels execute the 'else' block (so no execution masking is necessary). Shader-db results: total instructions in shared programs: 9119238 -> 9117550 (-0.02%) instructions in affected programs: 401252 -> 399564 (-0.42%) helped: 855 HURT: 77 total uniforms in shared programs: 3022622 -> 3022605 (<.01%) uniforms in affected programs: 3566 -> 3549 (-0.48%) helped: 17 HURT: 0 total max-temps in shared programs: 1327762 -> 1327774 (<.01%) max-temps in affected programs: 619 -> 631 (1.94%) helped: 2 HURT: 15 Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-14 08:00:52 +02:00
Iago Toral Quiroga	2a2501247b	nir: detect more dynamically uniform expressions Shader-db results for v3d: total instructions in shared programs: 9132728 -> 9119238 (-0.15%) instructions in affected programs: 596886 -> 583396 (-2.26%) helped: 1118 HURT: 224 total threads in shared programs: 234298 -> 234308 (<.01%) threads in affected programs: 10 -> 20 (100.00%) helped: 5 HURT: 0 total uniforms in shared programs: 3022949 -> 3022622 (-0.01%) uniforms in affected programs: 29163 -> 28836 (-1.12%) helped: 108 HURT: 37 total max-temps in shared programs: 1328030 -> 1327762 (-0.02%) max-temps in affected programs: 10097 -> 9829 (-2.65%) helped: 263 HURT: 15 total spills in shared programs: 3793 -> 3777 (-0.42%) spills in affected programs: 432 -> 416 (-3.70%) helped: 16 HURT: 0 total fills in shared programs: 4380 -> 4266 (-2.60%) fills in affected programs: 828 -> 714 (-13.77%) helped: 16 HURT: 0 Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-14 08:00:52 +02:00
Tapani Pälli	287b58f827	ir3: initialize progress false before ir3_nir_lower_imul Removes a compiler warning about uninitialized variable. Fixes: `c02ffd2700` "ir3: Use the new NIR lowering pass for integer multiplication" Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Rob Clark <robclark@gmail.com> Reviewed-by: Eduardo Lima <elima@igalia.com>	2019-06-14 08:21:42 +03:00
Boris Brezillon	749c544b84	panfrost: Fix general purpose varying handling When both the fragment and vertex shaders point to the same varying location they expect to share the same varying slot. Make sure vertex and fragment varyings pointing to the same loc have ->src_offset set to the same value. [Alyssa: In addition a patch implement txs, this fixes GALLIUM_HUD on Panfrost] Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-13 10:54:18 -07:00
Marek Olšák	7566a9a58a	ac/registers: use better names for disambiguated definitions Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-06-13 13:52:06 -04:00
Marek Olšák	08ab9b70ce	ac/registers: remove deprecated/inapplicable definitions Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-06-13 13:52:06 -04:00
Caio Marcelo de Oliveira Filho	5bd48ff252	iris: Enable INTEL_shader_atomic_float_minmax Supported only for gen >= 9. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-06-13 09:03:58 -07:00
Caio Marcelo de Oliveira Filho	81835f87a4	gallium: Add PIPE_CAP_ATOMIC_FLOAT_MINMAX Used to enable INTEL_shader_atomic_float_minmax. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-06-13 09:03:58 -07:00
Rob Clark	9f10e40cde	freedreno/a6xx: fix MAX_INDICES Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-06-13 08:56:27 -07:00
Rob Clark	ce12ac8c2b	freedreno/blitter: remove dead code The src/dst format is overriden from the pipe_blit_info, so this just logic just serves to confuse the reader. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-06-13 08:56:27 -07:00
Rob Clark	a8be53211d	freedreno: turn staging cube into 2d-array Since we could only need a subset of the layers, and otherwise we trigger an assert in util_max_layer() Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-06-13 08:56:27 -07:00
Tomeu Vizoso	3adf9b0757	panfrost: ci: Exclude some tests from results These are tests that regressed in RK3288 but still pass on RK3399. So we still have a CI we can rely on, add them to the flip-flop list for now. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-06-13 17:45:27 +02:00
Tomeu Vizoso	50901a27f6	panfrost: ci: Update test expectations Some tests got fixed since the last update, but also some regressions crept in. To keep the CI green, add the regressions to the expected failures. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-06-13 17:45:22 +02:00
Connor Abbott	37b92b0ae6	nir: Don't manually index intrinsic index enum This fixes a rebase fail in `ea51275e07`, and prevents it from happening again. There's no reason to do this manually. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-13 17:10:41 +02:00
Erik Faye-Lund	901795238b	docs: work around broken altsoftware.com link altsoftware.com seems to no longer be around, and is currently being held by a domain squatter. Let's link to waybackmachine instead. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-13 14:14:05 +00:00
Erik Faye-Lund	795b5d923f	docs: work around broken dsbox.com link dsbox.com now forwards to haystax.com, which is tehcnially unrealted to this link. Let's link to waybackmachine instead. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-13 14:14:05 +00:00
Erik Faye-Lund	b16e77e051	docs: work around broken sgi.com links sgi.com now forwards to hpe.com, which is technically unrelated to these links. Let's link to waybackmachine instead. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-13 14:14:05 +00:00
Erik Faye-Lund	a9956ed87a	docs: update link to OpenGL FAQ Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-13 14:14:05 +00:00
Erik Faye-Lund	12f4cd6a09	docs: update link to the Linux OpenGL ABI Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-13 14:14:05 +00:00
Erik Faye-Lund	e19448c102	docs: update link to glw GLW is currently living in gitlab, the cgit-page is just a mirror. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-13 14:14:05 +00:00
Erik Faye-Lund	04fc0bc3f3	docs: fixup link-target Just a couple of lines above, we have this exact same link, but this time with a leading "www.". Let's match that. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-13 14:14:05 +00:00
Erik Faye-Lund	372f9f6947	docs: eliminate another stale autoconf-reference Meson is what should tell you about these issues, not the configure script. We no longer have that. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-13 14:14:05 +00:00
Erik Faye-Lund	c9d396710b	docs: replace autoconf with meson We no longer have an autoconf build-system to maintain, but we do have a meson build-system. So let's mention that instead. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-13 14:14:05 +00:00
Erik Faye-Lund	f4f78a59b0	docs: update required packages Automake and libtool are no longer required to build, instead we need meson and ninja-build. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-13 14:14:05 +00:00
Erik Faye-Lund	26287b91ac	docs: remove pointless haiku-comment The only build system that doesn't support Haiku is `Android.mk`, which also doesn't support most other platforms either, so there is no need to single it out. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-13 14:14:05 +00:00
Erik Faye-Lund	c339c0175a	docs: fixup typo Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-13 14:14:05 +00:00
Daniel Schürmann	c58dff753c	radv: enable AMD_shader_ballot with RADV_PERFTEST_SHADER_BALLOT ('shader_ballot') Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-13 12:44:23 +00:00
Daniel Schürmann	deedc0b31d	amd/common: add support for AMD_shader_ballot functions Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-13 12:44:23 +00:00
Daniel Schürmann	7a858f274c	spirv/nir: add support for AMD_shader_ballot and Groups capability This commit also renames existing AMD capabilities: - gcn_shader -> amd_gcn_shader - trinary_minmax -> amd_trinary_minmax Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-13 12:44:23 +00:00
Daniel Schürmann	ea51275e07	nir: add intrinsics for AMD_shader_ballot Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-13 12:44:23 +00:00
Daniel Schürmann	f2277c327a	radv: enable shader_subgroup_vote & shader_subgroup_ballot extensions Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-13 12:44:23 +00:00
Daniel Schürmann	1b89ebeede	nir/spirv: add support for the SubgroupBallotKHR SPIR-V capability This capability is required for the VK_EXT_shader_subgroup_ballot extension. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-13 12:44:23 +00:00
Daniel Schürmann	de56ebadce	nir/spirv: add support for the SubgroupVoteKHR SPIR-V capability This capability is required for the VK_EXT_shader_subgroup_vote extension. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-06-13 12:44:23 +00:00
Alejandro Piñeiro	17c2c9cd67	v3d: fix checking twice auf flag Seems a C&P error, and should check for auf/muf. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110902 Fixes: `8f065596d2` "v3d: Add an optimization pass for redundant flags updates." Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-13 11:45:18 +02:00
Samuel Pitoiset	ca6bf9a6cd	radv: flush and invalidate CB before resetting query pools on GFX9 We have to emit a CACHE_FLUSH_AND_INV_TS_EVENT to be sure all prior GPU work is done. While we are at it, also flush and invalidate DB. This fixes the following CTS (when the small hint is disabled): dEQP-VK.query_pool.statistics_query.reset_before_copy.* Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-13 11:23:48 +02:00
Bas Nieuwenhuizen	cb728f28ac	vl: Always enable drm winsys. The dri2 winsys also uses libdrm (and you can only enable dri3 if you enable dri2), and the drm winsys only requires libdrm. So if any winsys is enabled you can also enable the drm winsys, and since we always want at least one winsys we can always enable it. I removed the check for the drm platform for VA and OMX since they do not care anymore. Since we still check for one of r600g, nouveau or radeonsi, we are guarantueed to still only enable it by default in a configuration that requires libdrm anyway. So for people using va=auto, we don't suddenly start requiring libdrm were we did not before. This supersedes "vl: Enable DRM by default.", which I pushed, but rolled back because it used dep_libdrm before its definition. Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-06-13 08:25:48 +00:00
Bas Nieuwenhuizen	b4c7ce360b	radv: Always disable DCC on shareable images. Do not want it for perf reasons. Always have to disable DCC when transferring to external queue. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-06-13 08:15:45 +00:00
Bas Nieuwenhuizen	0667c1f14b	radv: Skip transitions coming from external queue. Transitions to external queue should do the transition & make sure it works on all queues. Fixes: `8ebc7dcb59` "radv: Allow fast clears with concurrent queue mask for some layouts." Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-06-13 08:15:45 +00:00
Mateusz Krzak	60009aefdb	lima/ppir: change offset type to int Offset doesn't need to be 64-bit. This fixes compilation error with 64-bit off_t. Fixes: `af0de6b9` lima/ppir: implement discard and discard_if Suggested-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Mateusz Krzak <kszaquitto@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com> Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>	2019-06-13 07:43:24 +02:00
Chia-I Wu	900a80f9e4	virgl: virgl_transfer should own its virgl_resource We should avoid having potentially dangling pointers to pipe_resources in general. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-06-12 18:20:30 -07:00
Chia-I Wu	74051efbea	virgl: pass virgl_context to transfer create/destroy A pipe_transfer is a context object. It is fine for the constructor/destructor to have access to the context. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-06-12 18:20:30 -07:00
Chia-I Wu	514e12b1b8	virgl: init transfer queue from virgl_context A pipe_transfer is a context object. It is fine for virgl_transfer_queue to have access to the context. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-06-12 18:20:30 -07:00
Chia-I Wu	308ba2c0f9	virgl: clean up virgl_transfer_queue.h Add header guard and forward declare structs. Move virgl_resource.h inclusion to the C file. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-06-12 18:20:30 -07:00
Nicolai Hähnle	2d114e6267	radeonsi: add radeonsi_debug_disassembly option This dumps disassembly to the pipe_debug_callback together with shader stats. Can be used together with shader-db to get full disassembly of all shaders in the database. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 20:28:23 -04:00
Nicolai Hähnle	3bde69e789	radeonsi: fix line splitting in si_shader_dump_assembly Compute the count since the start of the current line instead of the count since the start of the the disassembly. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 20:28:23 -04:00
Nicolai Hähnle	aa737e8580	radeonsi: raise the alignment of LDS memory for compute shaders This implies that the memory will always be at address 0, which allows LLVM to generate slightly better code. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 20:28:23 -04:00
Nicolai Hähnle	33be5ad8a3	radeonsi: use an explicit symbol for the LSHS LDS memory Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 20:28:23 -04:00
Nicolai Hähnle	174fad7075	radeonsi: rename lds_{load,store} to lshs_lds_{load,store} These functions are now only used in LS/HS shaders (both separate and merged). Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 20:28:23 -04:00
Nicolai Hähnle	b519ddc35c	radeonsi/gfx9: declare LDS ESGS ring as an explicit symbol on LLVM >= 9 This will make it easier to use LDS for other purposes in geometry shaders in the future. The lifetime of the esgs_ring variable is as follows: - declared as [0 x i32] while compiling shader parts or monolithic shaders - just before uploading, gfx9_get_gs_info computes (among other things) the final ESGS ring size (this depends on both the ES and the GS shader) - during upload, the "esgs_ring" symbol is given to ac_rtld as a shared LDS symbol, which will lead to correctly laying out the LDS including other LDS objects that may be defined in the future - si_shader_gs uses shader->config.lds_size as the LDS size This change depends on the LLVM changes for emitting LDS symbols into the ELF file. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 20:28:23 -04:00
Nicolai Hähnle	f8315ae04b	amd/rtld: layout and relocate LDS symbols Upcoming changes to LLVM will emit LDS objects as symbols in the ELF symbol table, with relocations that will be resolved with this change. Callers will also be able to define LDS symbols that are shared between shader parts. This will be used by radeonsi for the ESGS ring in gfx9+ merged shaders. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 20:28:23 -04:00
Nicolai Hähnle	dc99a8cd9b	radeonsi: cleanup some #includes Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 20:28:23 -04:00
Nicolai Hähnle	1ff2440eee	amd/common: use ARRAY_SIZE for the LLVM command line options This is more convenient for changing it around during debug. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 20:28:23 -04:00
Nicolai Hähnle	ca21ba2a08	radeonsi: inline si_shader_binary_read_config into its only caller Since it can only be used for reading the config of an individual, non-combined shader, it is not very reusable anyway. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 20:28:23 -04:00
Nicolai Hähnle	bf8a1ca902	radeonsi: use the new run-time linker for shaders v2: - fix a memory leak Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 20:28:23 -04:00
Nicolai Hähnle	16bee0e5f6	radeonsi: don't declare pointers to static strings The compiler should be able to optimize them away, but still. There's no point in declaring those as pointers, and if the compiler doesn't optimize them away, they add unnecessary load-time relocations. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 20:28:23 -04:00
Nicolai Hähnle	3c958d924a	amd/common: add ac_compile_module_to_elf A new variant of ac_compile_module_to_binary that allows us to keep the entire ELF around. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 20:28:23 -04:00
Nicolai Hähnle	66da60f4da	radeonsi: dump shader binary buffer contents Help identify bugs related to corruption of shaders in memory, or errors in shader upload / rtld. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 20:28:23 -04:00
Nicolai Hähnle	bf11c594dd	radeonsi: return bool from si_shader_binary_upload We didn't really use error codes anyway. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 20:28:23 -04:00
Nicolai Hähnle	8b1343ca79	radeonsi: let si_shader_create return a boolean We didn't really use error codes anyway. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 20:28:23 -04:00
Nicolai Hähnle	77b05cc42d	radeonsi: use ac_shader_config Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 20:28:23 -04:00
Nicolai Hähnle	b3be346c68	amd/common: add a more powerful runtime linker Using an explicit linker instead of just concatenating .text sections will allow us to start using .rodata sections and explicit descriptions of data on LDS that is shared between stages. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 20:28:23 -04:00
Caio Marcelo de Oliveira Filho	608257cf82	i965: Fix INTEL_DEBUG=bat Use hash_table_u64 instead of hash_table directly, since the former will also handle the special keys (deleted and freed) and allow use the whole u64 space. Fixes crash in INTEL_DEBUG=bat when using a key with value 0 -- the current value for a freed key. Fixes: `b38dab101c` "util/hash_table: Assert that keys are not reserved pointers" Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-12 15:57:16 -07:00
Caio Marcelo de Oliveira Filho	eb41ce1b01	util/hash_table: Properly handle the NULL key in hash_table_u64 The hash_table_u64 should support any uint64_t as input. It does special handling for the "deleted" key, storing the data in the table itself; do the same for the "freed" key. Fixes: `b38dab101c` "util/hash_table: Assert that keys are not reserved pointers" Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-12 15:57:16 -07:00
Nicolai Hähnle	c129cb3861	amd/common: clarify ac_shader_binary::lds_size Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 18:33:21 -04:00
Nicolai Hähnle	2e96c01073	amd/common: extract ac_parse_shader_binary_config Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 18:33:08 -04:00
Nicolai Hähnle	de8a919702	u_dynarray: turn util_dynarray_{grow, resize} into element-oriented macros The main motivation for this change is API ergonomics: most operations on dynarrays are really on elements, not on bytes, so it's weird to have grow and resize as the odd operations out. The secondary motivation is memory safety. Users of the old byte-oriented functions would often multiply a number of elements with the element size, which could overflow, and checking for overflow is tedious. With this change, we only need to implement the overflow checks once. The checks are cheap: since eltsize is a compile-time constant and the functions should be inlined, they only add a single comparison and an unlikely branch. v2: - ensure operations are no-op when allocation fails - in util_dynarray_clone, call resize_bytes with a compile-time constant element size v3: - fix iris, lima, panfrost Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 18:30:25 -04:00
Nicolai Hähnle	71b45bae14	u_dynarray: return 0 on realloc failure and ensure no-op We're not very good at handling out-of-memory conditions in general, but this change at least gives the caller the option of handling it gracefully and without memory leaks. This happens to fix an error in out-of-memory handling in i965, which has the following code in brw_bufmgr.c: node = util_dynarray_grow(vma_list, sizeof(struct vma_bucket_node)); if (unlikely(!node)) return 0ull; Previously, allocation failure for util_dynarray_grow wouldn't actually return NULL when the dynarray was previously non-empty. v2: - make util_dynarray_ensure_cap a no-op on failure, add MUST_CHECK attribute - simplify the new capacity calculation: aside from avoiding a useless loop when newcap is very large, this also avoids an infinite loop when newcap is larger than 1 << 31 Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 18:30:25 -04:00
Nicolai Hähnle	dc75362511	freedreno: use util_dynarray_clear instead of util_dynarray_resize(_, 0) This is more expressive and simplifies a subsequent change. v2: - fix one more call-site after rebase Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-12 18:30:25 -04:00
Alyssa Rosenzweig	1ee2366693	panfrost/midgard: Differentiate vertex/fragment texture tags Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-12 14:32:12 -07:00
Alyssa Rosenzweig	5062b612be	panfrost/midgard: Assert on unknown texture source Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-12 14:32:07 -07:00
Alyssa Rosenzweig	4ea512844c	panfrost/midgard: Set minimal swizzle on texture input Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-12 14:32:01 -07:00
Alyssa Rosenzweig	6ae4f9c523	panfrost/midgard: Lower texture projectors We do have native support for perspective division on the load/store unit, but this is for the future, something ideally we would select generally, not just for textures. Meanwhile, flipping on projector lowering works now. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-12 14:31:53 -07:00
Alyssa Rosenzweig	4012e06788	panfrost/midgard: Implement txl This follows the txb implementation, but requires an adjustment to how the cont/last flags are set. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-12 14:31:45 -07:00
Alyssa Rosenzweig	a19ca344ab	panfrost/midgard: Implement txb op We refactor the main tex handling to fit a bias argument in as well. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-12 14:31:37 -07:00
Alyssa Rosenzweig	1acffb5671	panfrost: Unify bind_vs/fs_state This replaces bind_vs/fs_state calls to a unified bind_shader_state call, removing a great deal of duplicated logic related to variant selection. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-12 14:29:05 -07:00
Alyssa Rosenzweig	f8a4090f80	panfrost: Add panfrost_job_type_for_pipe helper This logic is repeated in a bunch of places and will only grow worse as we support more job types; collect it. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-12 14:27:47 -07:00
Alyssa Rosenzweig	15fae1e38c	panfrost/midgard: Extract emit_varying_read Paralleling emit_uniform_read, this allows varying reads to be emitted independent of an honest-to-goodness load vary instruction in the NIR. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-12 14:25:17 -07:00
Alyssa Rosenzweig	8c88bd0253	panfrost: Remove "vertex/tiler render target" silliness I don't think these are actual structures, just figments over cargoculting dumped memory without making any sense of it. Nothing seems to break if the region is zeroed out, anyway. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-12 14:21:56 -07:00
Alyssa Rosenzweig	b96df80069	panfrost/decode: Print line number of bad memory access Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-12 14:14:53 -07:00
Alyssa Rosenzweig	fc7bcee865	panfrost: Replace pantrace with direct decoding History lesson! In the early days of a Panfrost, we had a library independent of the driver called `panwrap` which would be LD_PRELOAD'ed into a driver to decode its cmdstream in real-time. When upstreaming Panfrost, we realized that we would much rather have this decode functionality maintained in-tree to avoid divergence, but that we could not upstream panwrap because of its use with the legacy API. So we instead dumped GPU memory to the filesystem with an out-of-tree panwrap, and decoded that with the in-tree pandecode module. When we migrated to the new kernel, we just added support for doing this memory dump directly from the driver (via a module "pantrace"). This works, but dumping memory every frame is sloooooooooooooow and error-prone. I figured if we have pandecode in-tree, we might as well link to it directly in the driver, allowing us to decode Panfrost's command streams without dumping memory to the filesystem first. This cleans up the code substantially and improves dumping performance by a HUGE margin. I'm talking "several seconds per frame" to "dumping in real-time" kind of jump. Note to users: this removes the environmental option "PANTRACE_BASE". Instead, for equivalent functionality set "PAN_MESA_DEBUG=trace" and redirect stdout to the file of your choosing. This should be debugging Panfrost much more pleasant. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-12 14:07:09 -07:00
Kevin Strasser	845ec8576a	st/mesa: Add rgbx handling for fp formats Add missing cases for fp32 and fp16 formats. Fixes: `c68334ffc0` "st/mesa: add floating point formats in st_new_renderbuffer_fb()" Signed-off-by: Kevin Strasser <kevin.strasser@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-12 19:03:47 +00:00
Kevin Strasser	ec0a68e50d	gallium/winsys/kms: Fix dumb buffer bpp The bpp in the dumb buffer creation request is hardcoded to 32, which is an incorrect assumption as the caller is free to pick any pipe format. Use the bpp supplied to us through util_format_get_blocksizebits(). Fixes: `3b176c441b` "gallium: Add a dumb drm/kms winsys backed swrast provider" Signed-off-by: Kevin Strasser <kevin.strasser@intel.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-12 11:44:10 -07:00
Eric Engestrom	9996ddbb27	util/futex: fix dangling pointer use Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110901 Fixes: `7dc2f47882` "util: emulate futex on FreeBSD using umtx" Cc: Greg V <greg@unrelenting.technology> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-12 17:27:44 +01:00
Samuel Pitoiset	d378151246	radv: fix VK_EXT_memory_budget if one heap isn't available When the visible VRAM size is equal to the VRAM size only two heaps are exposed. This fixes dEQP-VK.api.info.device.memory_budget. Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-12 15:52:48 +02:00
Samuel Pitoiset	2ef9d2738c	radv: fix occlusion queries on VegaM The number of render backends is 16 but the enabled mask is 0xaaaa. As noticed by Bas, allowing disabled render backends might break the OCCLUSION_QUERY packet. We don't use it yet but keep this in mind. This fixes dEQP-VK.query_pool.* and dEQP-VK.multiview.*. Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-12 15:51:12 +02:00
Lionel Landwerlin	93b93e5a9d	anv: do not parse genxml data without INTEL_DEBUG=bat This significantly slows down the CTS runs. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `32ffd90002` ("anv: add support for INTEL_DEBUG=bat") Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-06-12 12:53:35 +03:00
Lionel Landwerlin	f80679c8e8	intel/dump: fix segfault when the app hasn't accessed the device Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-12 09:49:55 +03:00
Caio Marcelo de Oliveira Filho	48d7e7a9b8	iris: Only upload surface state for grid info when needed Special care is needed to ensure that when we have two consecutive calls with the same grid size, we only bail in the second one if it either don't need the surface state or the surface state was already uploaded. v2: Instead of having a new bool in ice->state to know whether we had a surface, check whether we have state->ref. (Ken) Clean up the logic a little bit by adding 'grid_updated' local. (Ken) Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> [v1] Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-11 17:57:37 -07:00
Caio Marcelo de Oliveira Filho	f346b277d1	iris: Create binding table slot for num_work_groups only when needed Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-11 17:57:37 -07:00
Rui Salvaterra	7b43362f29	r300g: implement GLSL disk shader caching This implements GLSL disk shader caching for the R300-R500 series of AMD GPUs. Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-11 20:49:34 -04:00
Richard Thier	ffd2f948fe	r300g: restore performance after RADEON_FLAG_NO_INTERPROCESS_SHARING was added v1: Fix skipped slab allocators and the buffer cache. v2: Use only 1 domain for texture allocation v3: Added flag for the create_fence call too Based on Marek v1 and v2 proposed fixes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=1107812.patch Cc: 19.1 <mesa-stable@lists.freedesktop.org> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-11 20:45:27 -04:00
Marek Olšák	ec0956a194	radeonsi: don't test SDMA perf if SDMA is disabled/unsupported	2019-06-11 20:05:21 -04:00
Marek Olšák	993bf52977	radeonsi: always interpolate PrimID as flat	2019-06-11 20:05:21 -04:00
Marek Olšák	7f7ffa0883	radeonsi: move color clamping to si_llvm_export_vs to unify the code	2019-06-11 20:05:21 -04:00
Marek Olšák	4773f5a293	radeonsi: use the ac helper for index buffer stores in the culling shader	2019-06-11 20:05:21 -04:00
Marek Olšák	579003e7bd	radeonsi: use the ac helper for image stores	2019-06-11 20:05:21 -04:00
Marek Olšák	deef3833f8	radeonsi: use the ac helper for SSBO stores	2019-06-11 20:05:21 -04:00
Marek Olšák	e5fe38484a	radeonsi: fixes for vec3 buffer stores in LLVM 9	2019-06-11 20:05:21 -04:00
Caio Marcelo de Oliveira Filho	9c81db8adb	iris: Enable PIPE_CAP_CS_DERIVED_SYSTEM_VALUES_SUPPORTED This avoids lowering of CS system values by GLSL (configured by state tracker). In i965 we don't use that lowering, and we also shouldn't need that in Iris. Using it cause some unnecessary round trip between values, e.g.: shader uses gl_LocalInvocationIndex, GLSL rewrites it in terms of gl_LocalInvocationID, then driver rewrites those in terms of gl_LocalInvocationIndex again. Copy propagation can make some of those go away, but not all as seen below. Intel SKL shader-db results: total instructions in shared programs: 15595189 -> 15594556 (<.01%) instructions in affected programs: 74880 -> 74247 (-0.85%) helped: 81 HURT: 4 helped stats (abs) min: 2 max: 172 x̄: 7.88 x̃: 4 helped stats (rel) min: 0.19% max: 5.66% x̄: 1.71% x̃: 1.23% HURT stats (abs) min: 1 max: 2 x̄: 1.25 x̃: 1 HURT stats (rel) min: 0.45% max: 1.65% x̄: 0.76% x̃: 0.46% 95% mean confidence interval for instructions value: -11.56 -3.34 95% mean confidence interval for instructions %-change: -1.91% -1.28% Instructions are helped. total loops in shared programs: 4831 -> 4831 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 372136618 -> 372145628 (<.01%) cycles in affected programs: 9218230 -> 9227240 (0.10%) helped: 131 HURT: 86 helped stats (abs) min: 1 max: 798 x̄: 39.79 x̃: 12 helped stats (rel) min: <.01% max: 6.75% x̄: 0.42% x̃: 0.13% HURT stats (abs) min: 2 max: 2442 x̄: 165.38 x̃: 6 HURT stats (rel) min: <.01% max: 20.83% x̄: 0.74% x̃: 0.12% 95% mean confidence interval for cycles value: -2.07 85.11 95% mean confidence interval for cycles %-change: -0.22% 0.30% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 11956 -> 11950 (-0.05%) spills in affected programs: 77 -> 71 (-7.79%) helped: 3 HURT: 0 total fills in shared programs: 25619 -> 25549 (-0.27%) fills in affected programs: 593 -> 523 (-11.80%) helped: 4 HURT: 0 LOST: 0 GAINED: 0 Total CPU time (seconds): 1695.69 -> 1706.03 (0.61%) Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-11 15:12:17 -07:00
Caio Marcelo de Oliveira Filho	46de3beab1	gallium: Add PIPE_CAP_CS_DERIVED_SYSTEM_VALUES_SUPPORTED Tells whether or not the driver can handle gl_LocalInvocationIndex and gl_GlobalInvocationID. If not supported (the default), state tracker will lower those on behalf of the driver. v2: Add case to u_screen.c. (Anholt) Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-11 15:12:17 -07:00
Caio Marcelo de Oliveira Filho	f03b21ae69	st/glsl: Perform some var optimizations Perform those before some derefs are gone when we lower the buffers after the st_nir_opts() call. Intel SKL shader-db results: total instructions in shared programs: 15593685 -> 15590708 (-0.02%) instructions in affected programs: 378078 -> 375101 (-0.79%) helped: 777 HURT: 44 helped stats (abs) min: 1 max: 68 x̄: 4.07 x̃: 4 helped stats (rel) min: 0.04% max: 31.58% x̄: 2.88% x̃: 1.37% HURT stats (abs) min: 1 max: 24 x̄: 4.20 x̃: 2 HURT stats (rel) min: 0.17% max: 8.00% x̄: 1.60% x̃: 1.27% 95% mean confidence interval for instructions value: -4.02 -3.23 95% mean confidence interval for instructions %-change: -2.93% -2.35% Instructions are helped. total loops in shared programs: 4815 -> 4815 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 371965528 -> 371788566 (-0.05%) cycles in affected programs: 184190307 -> 184013345 (-0.10%) helped: 3650 HURT: 2855 helped stats (abs) min: 1 max: 59400 x̄: 99.45 x̃: 15 helped stats (rel) min: <.01% max: 43.18% x̄: 2.60% x̃: 1.02% HURT stats (abs) min: 1 max: 16362 x̄: 65.16 x̃: 10 HURT stats (rel) min: <.01% max: 66.22% x̄: 2.78% x̃: 0.81% 95% mean confidence interval for cycles value: -53.73 -0.68 95% mean confidence interval for cycles %-change: -0.39% -0.08% Cycles are helped. total spills in shared programs: 11936 -> 11956 (0.17%) spills in affected programs: 443 -> 463 (4.51%) helped: 0 HURT: 8 total fills in shared programs: 25644 -> 25619 (-0.10%) fills in affected programs: 2306 -> 2281 (-1.08%) helped: 24 HURT: 2 LOST: 7 GAINED: 16 Total CPU time (seconds): 1679.04 -> 1695.69 (0.99%) shader-db results radeonsi (VEGA64): Totals from affected shaders: SGPRS: 180160 -> 179552 (-0.34 %) VGPRS: 115368 -> 114544 (-0.71 %) Spilled SGPRs: 5627 -> 5603 (-0.43 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 7808364 -> 7803268 (-0.07 %) bytes LDS: 192 -> 192 (0.00 %) blocks Max Waves: 19202 -> 19340 (0.72 %) Wait states: 0 -> 0 (0.00 %) Radeonsi results provided by Timothy. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-11 14:53:54 -07:00
Ville Syrjälä	6230bfeb65	anv/cmd_buffer: Reuse gen8 Cmd{Set, Reset}Event on gen7 Modern DXVK requires event support [1], but looks like it only uses vkCmdSetEvent() + vkGetEventStatus(). So we can just borrow the relevant code from gen8, leaving CmdWaitEvents still unimplemented. [1] `8c3900c533` v2: Also move CmdWaitEvents into genX_cmd_buffer.c (Jason) Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-11 16:25:07 -05:00
Ian Romanick	39f4dc23a5	intel/fs: Mark source 0 of bcsel as needing Boolean resolve The other sources of the bcsel behave like the sources of an and or other logical operation. However, source zero behaves differently. It is evaluated as a Boolean, so it needs to be resolved. No shader-db changes, but the tests mentioned in the bug get a couple instructions added back. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110857 Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-06-11 12:12:07 -07:00
Rob Clark	f9f89df8bc	freedreno/a5xx: enable a540 Tested-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com> Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-06-11 12:03:10 -07:00
Rob Clark	832010f6ac	freedreno/a6xx: enable UBWC by default Flip the FD_MESA_DEBUG flag to a disable rather than enable, drop the obsolete comment (and bonus, drop unused softpin debug flag) Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-11 10:55:27 -07:00
Rob Clark	81cc555e9a	freedreno/a6xx: disallow UBWC for z24s8 This is slightly annoying because it mostly works.. but we have some issues to sort out about how to blit z24s8/x24s8/z24x8 with UBWC before we can enable UBWC by default. For now it is a step forward to at least enable it for non-z/s while we figure out how to blit z24s8+UBWC. (The basic issue is that pretending z24s8 is an equivalently sized rgba format for the purpose of blitting falls apart when UBWC is in the picture.) Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-11 10:55:27 -07:00
Rob Clark	4f1319a17d	freedreno/a6xx: use correct UBWC reg builders No functional change, the registers have the same layout as MRT flags pitch reg. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-11 10:55:27 -07:00
Rob Clark	d42ce659ed	freedreno: update generated headers Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-11 10:55:27 -07:00
Rob Clark	490baa6974	freedreno/a6xx: disable UBWC for some formats An older blob claims to support UBWC w/ r32ui an r32i, but not r32f. Results from deqp indicate that it doesn't work with r32ui and r32i. This could also just mean that use as "IBO" (image) is more limited than as texture, although blob also doesn't seem to bother to try to use UBWC with images at all, so hard to know for sure. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-11 10:55:27 -07:00
Rob Clark	8ddffa75c0	freedreno/a6xx: handle non-UWC-compatible image views Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-11 10:55:27 -07:00
Rob Clark	dac3bc9862	freedreno/a6xx: handle non-UBWC-compatible texture views Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-11 10:55:27 -07:00
Rob Clark	fe5c7b2b75	freedreno: add helper to uncompress UBWC resource We'll need this for a few edge cases, like image/sampler view that uses a format that UBWC does not support with a resource originally created in a format that UBWC does support. NOTE we could in some cases do an in-place uncompress. But that has a couple potential sharp edges: 1) the uncompressed buffer could have different layout, ie. a5xx with meta and pixel data of layers/levels interleaved. 2) if it comes mid-batch, it would force flush, or somehow fixing up cmdstream for draws already emitted. But with the resource shadowing approach we can rely on batch re-ordering to avoid splitting things.. older draws see the older compressed version, newer draws see the new uncompressed version of the rsc. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-11 10:55:27 -07:00
Rob Clark	846b8a76bd	freedreno: handle images in rebind_resource() Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-11 10:55:27 -07:00
Rob Clark	c6ae354299	freedreno: allow null discard box in shadow path When uncompressing a UBWC buffer, we don't want to discard anything. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-11 10:55:27 -07:00
Rob Clark	12201d7a8b	freedreno: swap UBWC state in shadow path It doesn't come up yet, as so far we only hit this path with linear buffers. But it will when we start re-using the shadow path for uncompressing UBWC buffers. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-11 10:55:27 -07:00
Rob Clark	3c9a31eb50	freedreno: add modifier param to fd_try_shadow_resource() To uncompress UBWC, I want to re-use the shadow path, but we'll need a way to request that the new buffer is not compressed. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-11 10:55:27 -07:00
Rob Clark	3b05a120a3	freedreno: correct modifier for UBWC buffers Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-11 10:55:27 -07:00
Chia-I Wu	15323c14fd	virgl: consider newly created resources idle A newly created resource can be regarded as idle. We don't care if the RESOURCE_CREATE command has been retired, unless it is used for fencing. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-06-11 10:03:54 -07:00
Chia-I Wu	9e4452cfd9	virgl: make resource_wait/resource_is_busy cheaper The round trip to the kernel is expensive. Add a local cache to avoid it when possible. There is a race condition when two contexts access the same resource at the same time (e.g., ctx1 submits a cmdbuf that accesses a resource while ctx2 maps the resource). But that is probably an app bug in the first place. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-06-11 10:03:54 -07:00
Chia-I Wu	ddc90be907	virgl: add virgl_drm_{alloc,free,clear}_res_list Helpers to work with resource list. virgl_drm_release_all_res is removed. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-06-11 10:03:54 -07:00
Chia-I Wu	71465fe569	virgl: do not cache external resources We should not reuse a resource for other purposes when it can still be accessed by another process or device. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-06-11 10:03:54 -07:00
Alyssa Rosenzweig	7d43999e63	panfrost: Enable AFBC on depth/stencil This seems to be a performance win, but more rigorous testing is necessary to figure out the exact circumstances when this is good/bad. Incidentally, this fixes non-aligned ZS. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-11 08:46:43 -07:00
Alyssa Rosenzweig	15f62b8e7c	panfrost: Linear depth/stencil should be aligned We might render to it. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-11 08:46:43 -07:00
Alyssa Rosenzweig	d7ad29ce25	panfrost/midgard: Decode LOD/bias registers For constant LODs/biases, we can use an immediate embedded in the texture (already decoded); for non-constant, we have to use a register squeezed into the usual immediate field, which is decoded here. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig	b4a3296e77	panfrost/midgard: Decode texture offset register swizzle Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig	4e9e42cc56	panfrost/midgard/disasm: include textureGather() Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig	6c18ae33bc	panfrost/midgard: Support negative immediate offsets It's not at all clear why this work for texelFetch but not texture. Maybe the top bits are dual-purpose on other texturing ops...? Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig	4d8157f12d	panfrost/midgard: Fix redunant mask redundancy Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig	3dee556c4e	panfrost/midgard/disasm: Print LOD for texelFetch Its encoding differs slightly from the LOD used in normal texture calls. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig	cda9f32909	panfrost/midgard: Identify the in_reg_full field This is clear for texelFetch, hence the confusion with Bifrost's filter field, but it's much more general in reality. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig	445a7b523f	panfrost/midgard/disasm: Correctly dump bias/LOD Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig	873a3ed342	panfrost/midgard/disasm: Cleanup texture op code Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig	289405392d	panfrost/midgard/disasm: Add missing space Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig	f4ee8d055c	panfrost/midgard/disasm: LOD immediate/register select Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-11 08:44:19 -07:00
Alyssa Rosenzweig	59fa7c95c8	panfrost/midgard/disasm: Use texture op name bare This allows us to show a call to textureLod in a reasonable way. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-11 08:44:18 -07:00
Alyssa Rosenzweig	109460f03a	panfrost/midgard/disasm: Varying perspective divides With an extra flag, we're able to do a perspective division "for free" while loading a varying. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-11 08:44:18 -07:00
Alyssa Rosenzweig	fc472007e7	panfrost/midgard: Add perspective division opcodes ...on the load/store unit, not the ALUs. Looks goofy but hey. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-11 08:44:18 -07:00
Alyssa Rosenzweig	b0396d6dda	panfrost/midgard: Print texture offsets This patch identifies the two modes of offsets in a texture instruction (immediate and register, disambiguated by the bit-once-known-as "has_offset") and implements disassembly for both. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-11 08:44:18 -07:00
Alyssa Rosenzweig	ed1c48e91d	panfrost/midgard: Expand texture to 4-channel swizzle This eliminates some unknowns, clarifies 3D textures, and will maybe help with array/shadow textures? Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-11 08:44:18 -07:00
Juan A. Suarez Romero	b586ed51f3	docs: update calendar, add news item and link release notes for 19.1.0 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2019-06-11 17:38:22 +02:00
Juan A. Suarez Romero	cc7fc7e319	docs: Add SHA256 sums for 19.1.0 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `2a5b4e2b9f`)	2019-06-11 15:26:42 +00:00
Juan A. Suarez Romero	7e8e49475c	docs: Add release notes for 19.1.0 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `1517811f4f`)	2019-06-11 15:26:38 +00:00
Samuel Iglesias Gonsálvez	32e1d85cb6	radv: assert on inline uniform blocks in radv_CmdPushDescriptorSetKHR() According to the Vulkan spec, inline uniform blocks are not allowed to be updated through vkCmdPushDescriptorSetKHR(). These are the spec quotes from "13.2.1. Descriptor Set Layout" that are relevant for this case: "VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR specifies that descriptor sets must not be allocated using this layout, and descriptors are instead pushed by vkCmdPushDescriptorSetKHR." "If flags contains VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR, then all elements of pBindings must not have a descriptorType of VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK_EXT". There is no explicit mention in vkCmdPushDescriptorSetKHR() to forbid this case but it is implied in the creation of the descriptor set layout as aforementioned. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-06-11 16:32:27 +02:00
Samuel Iglesias Gonsálvez	d0c52ff610	anv: ignore inline uniform blocks in anv_CmdPushDescriptorSetKHR() According to the Vulkan spec, inline uniform blocks are not allowed to be updated through vkCmdPushDescriptorSetKHR(). These are the spec quotes from "13.2.1. Descriptor Set Layout" that are relevant for this case: "VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR specifies that descriptor sets must not be allocated using this layout, and descriptors are instead pushed by vkCmdPushDescriptorSetKHR." "If flags contains VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR, then all elements of pBindings must not have a descriptorType of VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK_EXT". There is no explicit mention in vkCmdPushDescriptorSetKHR() to forbid this case but it is implied in the creation of the descriptor set layout as aforementioned. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-11 16:25:53 +02:00
Eric Engestrom	773ff93bc4	egl: compare the whole list of attributes `memcmp()` compares a given number of bytes, but `EGLAttrib` is larger than a byte. Fixes: `8e991ce539` "egl: handle the full attrib list in display::options" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-06-11 12:18:09 +00:00
Eduardo Lima Mitev	3fb7b1fd35	freedreno/a5xx: Fix indirect draw max_indices calculation The number of elements to draw should not be affected by the offset. A similar fix was submitted for a6xx at `79180a05`. Fixes these dEQP tests on a5xx: dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_separate_grid_500x500_drawcount_8 dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_separate_grid_500x500_drawcount_2500 dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawarrays_separate_grid_500x500_drawcount_2500 dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawarrays_combined_grid_500x500_drawcount_2500 dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_500x500_drawcount_8 dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_500x500_drawcount_2500 Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-06-11 08:28:45 +02:00
Samuel Pitoiset	40699f74b8	radv: remove extra assignment in radv_decompress_resolve_subpass_src() baseArrayLayer is defined twice, trivial. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-06-11 08:17:22 +02:00
Samuel Pitoiset	c39a1611ab	radv: add radv_get_resolve_pipeline() helper in the graphics path Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-11 08:06:42 +02:00
Samuel Pitoiset	b06d1f029d	radv: do not decompress all image layers before resolving inside a subpass When decompressing resolve source images, we should rely on the framebuffer layer count instead of resolving all images layers. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-11 08:06:39 +02:00
Samuel Pitoiset	4efbd963ec	radv: initialize the aspect mask when decompressing resolve source images Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-11 08:06:35 +02:00
Samuel Pitoiset	c31a07fa85	radv: perform proper layout transitions before resolving Use an explicit pipeline barrier for doing layout transitions instead of duplicating some code. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-11 08:06:32 +02:00
Samuel Pitoiset	92fa6264cb	radv: do not resolve all image layers with compute inside a subpass When resolving inside a subpass, we should rely on the framebuffer layer count instead of resolving all images layers. This should improve performance of layered resolves a bit. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-11 08:06:28 +02:00
Kenneth Graunke	a8588f512b	iris: Bypass half-float pack/unpack lowering. This skips GLSL IR lowering of pack/unpackHalf operations, allowing the NIR optimizer to see them Improves performance in Synmark2's OglCSDof by about 2x, by cutting about 90% of the cycles from one of the compute shaders. shader-db statistics on Skylake: 4 compute shaders went from SIMD8 to SIMD16. total instructions in shared programs: 15598871 -> 15542568 (-0.36%) instructions in affected programs: 143016 -> 86713 (-39.37%) helped: 144 HURT: 0 helped stats (abs) min: 17 max: 4669 x̄: 390.99 x̃: 164 helped stats (rel) min: 7.48% max: 85.28% x̄: 30.17% x̃: 24.22% 95% mean confidence interval for instructions value: -510.50 -271.49 95% mean confidence interval for instructions %-change: -32.70% -27.65% Instructions are helped. total cycles in shared programs: 371973958 -> 368902103 (-0.83%) cycles in affected programs: 5557722 -> 2485867 (-55.27%) helped: 144 HURT: 0 helped stats (abs) min: 106 max: 1026600 x̄: 21332.33 x̃: 1697 helped stats (rel) min: 0.53% max: 88.98% x̄: 36.12% x̃: 34.67% 95% mean confidence interval for cycles value: -41570.02 -1094.64 95% mean confidence interval for cycles %-change: -38.44% -33.80% Cycles are helped. total spills in shared programs: 11936 -> 11903 (-0.28%) spills in affected programs: 110 -> 77 (-30.00%) helped: 3 HURT: 2 total fills in shared programs: 25644 -> 25178 (-1.82%) fills in affected programs: 677 -> 211 (-68.83%) helped: 5 HURT: 0 total loops in shared programs: 4830 -> 4829 (-0.02%) loops in affected programs: 1 -> 0 helped: 1 HURT: 0	2019-06-10 16:01:36 -07:00
Bas Nieuwenhuizen	e0d12f79c5	radv: Handle UNDEFINED format in image format list. Was watching a presentation on YT where this was used and it turns out it is not invalid. The only case it is actually valid as format in the creation of an image or image view is with Android Hardware Buffers which have their format specified externally. So we can just ignore all entries with VK_FORMAT_UNDEFINED. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-06-10 22:21:16 +00:00
Bas Nieuwenhuizen	39c71e0025	radv: Prevent out of bound shift on 32-bit builds. uintptr_t is 32-bits then and shifting it by 32 bits results in undefined behavior IIRC. Fixes: `b3c8de1c55` "radv: save all descriptor pointers into the trace BO" Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-06-10 22:18:51 +00:00
Caio Marcelo de Oliveira Filho	2cb5907508	glsl: Check order and uniqueness of interlock functions With this commit all remaining compilation tests in Piglit for ARB_fragment_shader_interlock will pass. Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>	2019-06-10 14:29:32 -07:00
Caio Marcelo de Oliveira Filho	b7c9fc72fd	glsl: Make interlock builtins follow same compiler rules as barriers Generalize the barrier code to provide correct error messages for other builtins. Fixes most of piglit compilation tests for ARB_fragment_shader_interlock. Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>	2019-06-10 14:29:26 -07:00
Eduardo Lima Mitev	fb2169040a	nir/opt_algebraic: Fix rules for imadsh_mix16 The rules added in patch `3addd7c` are inverted: It should be: (al * bh) << 16 + c instead of: (ah * bl) << 16 + c Fixes a number of regressions under dEQP-GLES31.functional.draw_indirect.compute_interop.large.* on Freedreno. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-06-10 22:27:46 +02:00
Alyssa Rosenzweig	e9703fb416	panfrost: Ignore discards in dead branch analysis Fixes regressions in dEQP-GLES2.functional.shaders.discard.dynamic_loop_* Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-10 08:23:08 -07:00
Samuel Pitoiset	e9316fdfd4	radv: fix setting CB_SHADER_MASK for dual source blending CB_SHADER_MASK was computed without the second color buffer format which looks totally wrong to me. While we are at it, copy a comment from RadeonSI. Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-10 17:21:56 +02:00
Alyssa Rosenzweig	50ffaaff3b	panfrost/midgard: Disambiguate register mode We postfix instructions by their size if a destination override is in place (a la AT&T assembly), disambiguating instruction sizes. Previously, "16-bit instruction, 16-bit dest, 16-bit sources" disassembled identically to "32-bit instruction, 16-bit dest, 16-bit sources", which is semantically distinct due to the lessened opportunity for parallelism but (potentially) greater precision. Adding a postfix removes the ambiguity and relieves mental gymnastics reading weird disassemblies even in some cases that are not ambiguous. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-10 06:50:12 -07:00
Alyssa Rosenzweig	8027cc9975	panfrost/midgard: Expose vec8/vec16 modes Midgard ALUs can operate in one of four modes: vec2 64-bit, vec4 32-bit, vec8 16-bit, or vec16 8-bit. Our compiler (and indeed, any OpenGL ES shader) only uses 32-bit (and eventually vec4 16-bit) modes in normal circumstances. Nevertheless, the other modes do exist and are easily accessible through OpenCL; they also come up in cases like blend shaders. While we have had minimal support for decoding 8-bit/64-bit modes, we did so pretending they were vec4 in each case; 16-bit registers had a synthetically duplicated register file to separate lo/hi halves, etc. This works for GL, but it doesn't map to what the hardware is -actually- doing, which can cause some headscratchingly bizarre disassemblies from OpenCL. So, we dive in the deep end and support these other modes natively in the disassembler, using absurdly long masks/swizzles, since the hardware is considerably more flexible than what was exposed before. Outside of some fixed routines for blending, none of the above is supported in the compiler yet. But it's better to have it in the ISA definitions and disassembler than not, for future use if nothing else. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-10 06:50:11 -07:00
Alyssa Rosenzweig	2d0bda0885	panfrost/midgard: Add shifting int modifiers As a source modifier, shift allows shifting a value left by the bit size, useful in conjunction with a greater register mode, for instance to implement `upsample`. As a concrete example, the following OpenCL: ushort hr0 = /* ... /, uint r1 = / ... /; uint r2 = (convert_uint(hr0) << 16) ^ b; compiles to the following Midgard assembly: ixor r, (hr0) << 16, b In reverse, the ".hi" output modifier shifts the value right by the bit size, leaving just the carry/overflow at the bottom. To implement _hi functions in OpenCL (for <64-bit), we do arithmetic in the 2x higher mode with the .hi modifier. (For 64-bit, things are hairier, since there is not an 128-bit int mode). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-10 06:50:11 -07:00
Alyssa Rosenzweig	6780481a3f	panfrost/midgard: Add integer outmods For floats, output modifiers determine clamping behaviour. For integers, they determine wrapping/saturation behaviour (or shifting -- see next commit). These are very different; they are conceptually two unrelated enums union'ed together; the distinction is responsible for many-a-bug. While clamping behaviour for floats was clear from GL, the int behaviour is only known From OpenCL contortion with convert_*_sat() functions. With the underlying functions known, clean up the codebase, likely fixing outmod type related bugs in the process. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-10 06:50:11 -07:00
Alyssa Rosenzweig	215b8844ee	panfrost/midgard: Note floating compares type convert OP_TYPE_CONVERTS denotes an opcode that returns a different type than is source (going from int-domain to float-domain or vice versa), named after the f2i/i2f family of opcodes it covers. We care because source mods are determined by the source type (i/f) but output modifiers are determined by the output type (equals the source type, unless the op type converts, in which case it's the opposite). The upshot is that floating-point compares (feq/fne/etc) actually do type-convert. That is, that take in floating-points and output in integer space (a boolean), so we mark them off this way to ensure the correct output modifiers are used. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-10 06:50:11 -07:00
Alyssa Rosenzweig	d48d991ce2	panfrost: Align linear renderable resources It's just -easier- to render to aligned framebuffers. For winsys targets, we already align, but even for an internal linear FBO we ought to align everything nicely. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-10 06:48:07 -07:00
Alyssa Rosenzweig	d89e0716a1	panfrost: Fix stride check when mipmapping Now that we support custom strides on mipmapped textures (theoretically, at least), extend the stride check to support mipmaps. Fixes incorrect strides of linear windows in Weston. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-06-10 06:47:18 -07:00
Alyssa Rosenzweig	416fc3b5ef	panfrost: Refactor texture/sampler upload We move some coding packing the texture/sampler descriptors into dedicated functions (out of the terrifyingly long emit_for_draw monolith), cleaning them up as we go. The discovery triggering the cleanup is the format for including manual strides in the presence of mipmaps/cubemaps. Rather than placed at the end like previously assumed, they are interleaved after each address. This difference is relevant when handling NPOT linear mipmaps. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-10 06:45:33 -07:00
Alyssa Rosenzweig	a35069a7b5	panfrost: Refactor blitting code We refactor the wallpaper rendering code to separate the wallpaper-specific bits from the general blitting capabilities. In the (hopefully near) future, we'll turn this on to implement real Gallium blits, e.g. for automatic mipmap generation. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-10 06:45:25 -07:00
Alyssa Rosenzweig	d878753efa	panfrost: Refactor AFBC code This patch does a substantial cleanup of the code for handling AFBC, moving various disparate misplaced functions into a new central pan_afbc.c file. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-10 06:45:14 -07:00
Alyssa Rosenzweig	b4763984ac	panfrost: Move pan_screen() to pan_screen.h Trivial. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-10 06:45:05 -07:00
Alyssa Rosenzweig	a38583e352	panfrost: Always align strides to cache line (64) (Performance tweak.) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-10 06:44:56 -07:00
Emil Velikov	0534fcf57d	docs: fixup 19.0.5 <> 19.0.6 confusion The title of the release notes says 19.0.5 while the rest of the file (correctly) says 19.0.6 Fixes: `fe79d75ccf` ("docs: Add relnotes for 19.0.6") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan at pnwbakers.com>	2019-06-10 14:04:39 +01:00
Emil Velikov	a379b1c0ee	mapi: correctly handle the full offset table Earlier commit converted ES1 and ES2 to a new, much simpler, dispatch generator. At the same time, GL/glapi and the driver side are still using the old code. There is a hidden ABI between GL*.so and glapi.so, former referencing entry-points by offset in the _glapi_table. Hence earlier commit added the full table of entry-points, alongside a marker for other cases like indirect GL(X) and driver-size remapping. Yet the patches did not handle things fully, thus it was possible to get different interpretations of the dispatch table after the marker. This commit fixes that adding an indicative error message to catch future bugs. While here correct the marker (MAX_OFFSETS) comment. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110302 Fixes: `cf317bf093` ("mapi: add all _glapi_table entrypoints tostatic_data.py") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-06-10 14:04:30 +01:00
Emil Velikov	497de977bd	mapi: add static_date offset to EXT_dsa As elaborated in the next patch, there is some hidden ABI that effectively require most entrypoints to be listed in the file. Cc: Marek Olšák <marek.olsak@amd.com> Fixes: `d2906293c4` ("mesa: EXT_dsa add selectorless matrix stackfunctions") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-06-10 14:04:25 +01:00
Emil Velikov	61960547df	mapi: add static_date offset to MaxShaderCompilerThreadsKHR As elaborated in the next patch, there is some hidden ABI that effectively require most entrypoints to be listed in the file. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110302 Cc: Marek Olšák <maraeo@gmail.com> Fixes: `c5c38e831e` ("mesa: implement ARB/KHR_parallel_shader_compile") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-06-10 14:04:18 +01:00
Mathias Fröhlich	a7ecf78b90	egl: Let the caller of dri2_create_drawable decide about loaderPrivate. In the call arguments to dri2_create_drawable decouple loaderPrivate from dri2_surf. For all callers of dri2_create_drawable the two pointers are the same with the exception of the gbm backed platform. Let the calling code of dri2_create_drawable decide what loaderPrivate shall be. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-06-10 11:06:48 +02:00
Samuel Pitoiset	91aa25f462	radv: fix alpha-to-coverage when there is unused color attachments When alphaToCoverage is enabled, we should always write the alpha channel of MRT0 if it's unused. This now matches RadeonSI. This fixes the new CTS: dEQP-VK.pipeline.multisample.alpha_to_coverage_unused_attachment.samples_*.alpha_invisible Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl	2019-06-10 09:23:41 +02:00
Tomeu Vizoso	2fe7f9f2ae	panfrost: ci: Switch from direct Docker use to buildah Use the infrastructure in wayland/ci-templates to build the container images. This prevents from getting into some situations in which the images wouldn't be rebuilt, and allows us to share some infrastructure with other projects in freedesktop.org. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Suggested-by: Michel Dänzer <michel@daenzer.net> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-10 08:09:23 +02:00
Kenneth Graunke	81582e9366	gallium/u_transfer_helper: Free the staging buffer on unmap. u_transfer_helper sometimes mallocs a staging buffer, and leaked it. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-09 15:16:10 -07:00
Lionel Landwerlin	17898a9b7e	intel/gpu_dump: fix argument passing We were dropping "/' around arguments grouped together. This was triggering failures with : $ ./framemetrics -g "Memory Writes Distribution Gen9" -o /tmp/output.csv -f ./my.trace 10 11 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-06-09 19:45:13 +00:00
Eric Engestrom	93349d7118	util/os_file: suppress sign comparison warning Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-09 13:14:13 +00:00
Eric Engestrom	fd5c18de88	util/os_file: fix error being sign-cast back and forth Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-09 13:14:13 +00:00
Eric Engestrom	341ba406fd	util/os_file: avoid shadowing read() with a local variable Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-09 13:14:13 +00:00
Eric Engestrom	7e35f20d44	util/os_file: actually return the error read() gave us Fixes: `316964709e` "util: add os_read_file() helper" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-09 13:14:13 +00:00
Alexandros Frantzis	f8f222ea36	virgl: Work around possible memory exhaustion Since we don't normally flush before performing copy transfers, it's possible in some scenarios to use too much memory for staging resources and start failing. This can happen either because we exhaust the total available memory (including system memory virtio-gpu swaps out to), or, more commonly, because the total size of resources in a command buffer doesn't fit in virtio-gpu video memory. To reduce the chances of this happening, force a flush before a copy transfer if the total size of queued staging resources exceeds a certain limit. Since after a flush any queued staging resources will be eventually released, this ensures both that each command buffer doesn't require too much video memory, and that we don't end up consuming too much memory for staging resources in total. Fixes kernel errors reported when running texture_upload tests in glbench. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-06-07 21:45:45 -07:00
Alexandros Frantzis	e34f79c918	virgl: Remove incorrect resource wait condition Now that we have copy transfers in place, we can remove the incorrect resource wait condition. Copy transfers and other optimizations minimize the performance impact of this removal, while providing the correct behavior. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-06-07 21:45:43 -07:00
Alexandros Frantzis	236c55f650	virgl: Use copy transfers for textures Extend copy transfers to also be used for busy textures. Performance results: Unigine Valley, qemu before: 22.7 FPS after: 23.1 FPS Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-06-07 21:45:42 -07:00
Alexandros Frantzis	a22c5df079	virgl: Use buffer copy transfers to avoid waiting when mapping We typically need to wait for a buffer to become ready before mapping, so that we don't write new contents while the host is still using the old contents. However, if we are allowed to discard the contents of the mapped buffer range, then we can avoid waiting by using a staging buffer range which we guarantee to never be busy, copying from the staging buffer range to the target buffer in the host. This commit implements this optimization by utilizing a dedicated u_upload_mgr for the staging buffer. Performance results: Twilight Struggle (Steam/Proton), qemu before: 7 FPS after: 25 FPS glmark2 ubo, qemu before: 38 FPS after: 331 FPS Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Suggested-by: Gurchetan Singh <gurchetansingh@chromium.org> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-06-07 21:45:39 -07:00
Alexandros Frantzis	6e7726e50c	virgl: Support copy transfers Support transfers that use a different resource as the source of data to transfer. This will be used in upcoming commits to send data to host buffers through a transfer upload buffer, in order to avoid waiting when the buffer resource is busy. Note that we don't support queueing copy transfers in the transfer queue. Copy transfers should be emitted directly in the command queue, allowing us to avoid flushes before them and leads to better performance. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-06-07 21:45:36 -07:00
Alexandros Frantzis	199d95f29e	virgl: Add copy_transfer3d definitions Introduce definitions for the copy_transfer3d protocol command and virgl capability. This command transfers data to the host by copying through another resource, and will be used in upcoming commits to avoid waiting when transferring data for busy resources. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-06-07 21:45:34 -07:00
Alexandros Frantzis	ccec1555c1	virgl: Make VIRGL_BIND_STAGING resources cacheable This could help performance when trying to recreate such resources for copy transfers. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-06-07 21:45:33 -07:00
Alexandros Frantzis	636345f496	virgl: Support VIRGL_BIND_STAGING Support a new virgl bind type for staging buffers which don't require dedicated host-side storage. These will be used to implement copy transfers. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-06-07 21:45:31 -07:00
Alexandros Frantzis	f38cdaebac	virgl: Avoid unfinished transfer_get with PIPE_TRANSFER_DONTBLOCK If we are not allowed to block, and we know that we will have to wait, either because the resource is busy, or because it will become busy due to a readback, return early to avoid performing an incomplete transfer_get. Such an incomplete transfer_get may finish at any time, during which another unsynchronized map could write to the resource contents, leaving the contents in an undefined state. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Suggested-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-06-07 21:45:22 -07:00
Alexandros Frantzis	8eb8222c10	virgl: Deduplicate checks for resource caching Also fixes a missed check for VIRGL_BIND_CUSTOM in one of the duplicate code snippets. Note that legacy fences also use VIRGL_BIND_CUSTOM, but we ensured they don't go through the cache in the previous commit. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-06-07 21:45:20 -07:00
Alexandros Frantzis	e0ffcdf16a	virgl: Don't try to use cached resources for legacy fences Resources for fences should not be from the cache, since we are basing the fence status on the resource creation busy status. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-06-07 21:45:16 -07:00
Alexandros Frantzis	8089d3658a	virgl: More info about chosen alignment value Add more info about why the value of VIRGL_MAP_BUFFER_ALIGNMENT. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-06-07 21:44:53 -07:00
Chia-I Wu	371743157e	virgl: store all info about atomic buffers We will need the full info. This also speeds up virgl_attach_res_atomic_buffers and fixes resource leaks when the context is destroyed. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-06-07 22:47:07 +00:00
Chia-I Wu	98fd742d7e	virgl: add shader images to virgl_shader_binding_state It replaces virgl_context::images. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-06-07 22:47:07 +00:00
Chia-I Wu	f965efb3c8	virgl: add SSBOs to virgl_shader_binding_state It replaces virgl_context::ssbos. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-06-07 22:47:07 +00:00
Chia-I Wu	920c4143f0	virgl: add UBOs to virgl_shader_binding_state It replaces virgl_context::ubos. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-06-07 22:47:07 +00:00
Chia-I Wu	2e21d66d7a	virgl: add virgl_shader_binding_state virgl_shader_binding_state will be used to manage all per-stage shader bindings. For now, it manages only sampler views. This replaces virgl_textures_info and fixes some issues - start_slot is now honored - views outside of [start_slot, slart_slot+count) are unmodified - views are released when the context is destroyed Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-06-07 22:47:07 +00:00
Kenneth Graunke	30314270d4	iris: Zero shs->cbuf0 when binding a passthrough TCS Fixes valgrind errors when running two CTS tests back to back: - KHR-GL45.shader_image_load_store.basic-allTargets-loadStoreT* (The first test has an actual TCS, the second uses passthrough.)	2019-06-07 15:13:42 -07:00
Jason Ekstrand	1e6b32d08c	intel/blorp: Only double the fast-clear rect alignment on HSW This restriction was accidentally added to the BSpec/PRM as an unrestricted restriction starting with the HSW docs and it was never removed. However, it only ever applied to HSW and actually potentially causes problems on BDW and above where we have mipmapped fast-clears. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2019-06-07 22:00:55 +00:00
Rob Clark	3c456cf583	freedreno/a6xx: re-arrange program stageobj/group Split out a separate program config state group to run early before the other groups. This seems to help w/ intermittent "missed tiles" (although I had assumed that was a mem2gmem issue), or at least I can't reproduce that issue with this patch, but can without. It has the benefit of HLSQ_VS_CNTL.CONSTLEN matching for VS and BS. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-07 12:07:29 -07:00
Rob Clark	958f6ffb60	freedreno/a6xx: fix hangs with newer sqe fw With the newer (v1.76) fw, we were getting hangs (compared to older v1.66 fw). Re-work the GMEM code to structure things a bit closer to the blob. This moves some PKT7 packets from IB2 to IB1, which I think is what was confusing SQE and causing it to get stuck in an infinite loop. But in general structuring things at least closer to the same way blob does makes it easier to compare cmdstream. Note: this is a bit on the large side for what I'd normally consider for stable.. but right now it is looking like it is the newer fw that is headed for linux-firmware. This should defn have some soak time on master, but probably a good idea for this patch to end up in distro mesa builds by the time a630_sqe.fw hits linux-firmware. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-07 12:07:29 -07:00
Rob Clark	1d002cfade	freedreno/a6xx: WFI before RB_CCU_CNTL writes This seems to be in a block of non buffered/context regs. Blob always WFIs before write, so probably a good idea. Annoyingly, compared to ealier gens, it is a bit harder to tell from the register offset whether it is a buffered reg, it isn't as simple as everything below 0x2000, it seems. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-07 12:07:29 -07:00
Rob Clark	8a02ca807d	freedreno/a6xx: don't pre-dispatch texture fetch on accident Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-07 12:07:29 -07:00
Rob Clark	b820c09fa8	freedreno/a6xx: fix issues with gallium HUD In some cases the draw for the text wasn't working. This seems to be fixed by resyncing some of the "golded registers" from blob (initial values were based on somewhat older blob version). Perhaps good to have a bit of soak time on master, but would be good to eventually land in 19.x stable branches. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-06-07 12:07:29 -07:00
Nanley Chery	b4198e792c	anv/cmd_buffer: Initalize the clear color struct for CNL+ On CNL+, the clear color struct is composed of RGBA channel values and fields which are either reserved by the HW or used to control fast-clears. Currently anv initializes the channel values to zero and allows the other fields to be undefined. Satisfy the MBZ field requirements by removing an optimization that doesn't hold true for CNL+ and pulling in the number of dwords to initialize from ISL. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-07 18:43:06 +00:00
Jon Turney	87173ded6e	glx/windows: Fix compilation with -Werror-format Fix compilation where the DWORD type is used with a format, after -Werror-format added by `c9c1e261`. Some Win32 API types are different fundamental types in the 32-bit and 64-bit versions. This problem is then further compounded by the fact that whilst both 32-bit Cygwin and 32-bit MinGW use the ILP32 data model, 64-bit MinGW uses the LLP64 data model, but 64-bit Cygwin uses the LP64 data model. This makes it near impossible to write printf format specifiers which are correct for all those targets. In the Win32 API, DWORD is an unsigned, 32-bit type. So, it is defined in terms of an unsigned long, except in the LP64 data model used by 64-bit Cygwin, where it is an unsigned int. It should always be safe to cast it to unsigned int and use %u or %x. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-07 11:28:48 -07:00
Kenneth Graunke	cd796120c9	iris: Rename bind_state to bind_shader_state. bind_state is possibly the worst name ever. For create, we used create_shader_state, which is more descriptive. Put shader in the name.	2019-06-07 11:26:20 -07:00
Kenneth Graunke	d5d2fb5c4c	isl: Mark enum isl_channel_select packed so it becomes 1 byte. I recently discovered that the following code lead to valgrind errors: struct isl_swizzle swizzle = ISL_SWIZZLE_IDENTITY; VALGRIND_CHECK_MEM_IS_DEFINED(&swizzle, sizeof(swizzle)); which is surprising, because struct isl_swizzle is simply: struct isl_swizzle { enum isl_channel_select r:4; enum isl_channel_select g:4; enum isl_channel_select b:4; enum isl_channel_select a:4; }; and the above code initializes all of them with a C99 initializer. Iván Briano reminded me that C99 initializers don't necessarily zero padding. A quick inspection revealed that sizeof(struct isl_swizzle) was 4 (rather than the expected 2). Ian Romanick suggested changing it to uint16_t, since this is essentially dicing up an unsigned, and that worked. This patch marks enum isl_channel_select packed, changing its size from 4 bytes to 1 byte. This then makes struct isl_swizzle 2 bytes, with no bogus padding fields. This eliminates valgrind undefined memory warnings. These isl_swizzle values become part of our BLORP blit program keys, which are then hashed. This undefined padding was being included in the hashing, possibly leading to issues. I originally saw this error when running KHR-GL45.texture_size_promotion.functional in iris under valgrind. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-07 11:09:44 -07:00
Alyssa Rosenzweig	e1c14b2820	panfrost/ci: Texture wrap tests are legitimately fixed These depended on the wallpaper reload. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-07 09:05:29 -07:00
Alyssa Rosenzweig	8442dde169	panfrost/midgard: Lower inot to inor with 0 We were previously lowering to inand, but the second arg was not duplicated so inot would always return ~0. Oops. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-07 09:05:29 -07:00
Alyssa Rosenzweig	d415748955	panfrost/midgard: Cleanup tag fetch in disassembler Trivial. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-07 09:05:29 -07:00
Alyssa Rosenzweig	d3ad8d6b48	panfrost/midgard: Use fancy iterator Trivial cleanup. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-07 09:05:29 -07:00
Alyssa Rosenzweig	ae20bee75e	panfrost/midgard: Cull dead branches This fixes bugs with complex control flow. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-07 09:05:28 -07:00
Alyssa Rosenzweig	c62f2ff852	panfrost/midgard: Add mir_print_bundle helper This helps with debugging scheduling/emission. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-07 09:05:28 -07:00
Alyssa Rosenzweig	fd6d6c1b15	panfrost/midgard/disasm: Pretty-print branch tags Just makes it a little more obvious what's going on. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-07 09:05:28 -07:00
Alyssa Rosenzweig	2ebf22c399	panfrost/ci: Note some since-fixed tests Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-07 09:05:28 -07:00
Alyssa Rosenzweig	de8d49acdc	panfrost/midgard: Vectorize I/O This uses the new mesa/st functionality for NIR I/O vectorization, which eliminates a number of corner cases (resulting in assorted dEQP failures and regressions) and should improve performance substantial due to lessened pressure on the load/store pipe. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-07 09:05:28 -07:00
Alyssa Rosenzweig	4aced18031	panfrost/midgard: Remove varyings delay pass This pass interfered with the more delicate path required for non-vectorized I/O. It's also ugly and duplicating the job of an actual honest-to-goodness scheduler. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-07 09:05:28 -07:00
Alyssa Rosenzweig	43568f2675	panfrost/midgard: Apply component to load_input Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-07 09:05:28 -07:00
Eric Engestrom	440fe0eb43	nir: fix s/&&/\|\|/ typo Fixes: `cd73b6174b` "nir/lower_to_source_mods: Stop turning add, sat, and neg into mov" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-07 16:06:25 +01:00
Kristian H. Kristensen	b9bbac6234	freedreno/a6xx: Drop struct stage array This now boils down to just picking between binning or vertex shader and dummy_fs or real fs, which we can do in a couple of lines of code instead. The constlen logic isn't doing what it thinks it's doing, both constlens at this point MAX2(s[VS].constlen, align(state->bs->constlen, 4)); are binning shader constlens. We'll have to revisit the constlen logic, but this commit doesn't change how it works. Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-07 07:33:12 -07:00
Kristian H. Kristensen	9382a3c11d	freedreno/a6xx: Drop support for SS6_DIRECT shader upload a6xx only supports indirect shaders. Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-07 07:33:10 -07:00
Kristian H. Kristensen	0ef00ceb2e	freedreno/a6xx: Share shader_t_to_opcode We have a similar function in fd6_program.c. Move to fd6_emit.h and share. Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-07 07:33:03 -07:00
Kristian H. Kristensen	4552162e2d	freedreno/a6xx: Consolidate more of dword 0 building in fd6_draw_vbo There's already a bit of duplicated logic here and tessellation will add more. Build up dword 0 in fd6_draw_vbo() and drop the a4xx in the process. Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-07 07:32:59 -07:00
Kristian H. Kristensen	cae6b4d741	freedreno: Move fd4_size2indextype() helper to freedreno_util.h In preparation for refactoring fd6_draw.c a bit. Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-07 07:32:34 -07:00
Samuel Pitoiset	0905189a25	radv: enable VK_EXT_sample_locations Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-07 13:11:17 +02:00
Samuel Pitoiset	05f5fa661f	radv: enable HTILE for images that might need variable sample locations This is now supported. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-07 13:11:14 +02:00
Samuel Pitoiset	e7677a697b	radv: handle sample locations during automatic layout transitions From the Vulkan spec 1.1.109: "Some implementations may need to evaluate depth image values while performing image layout transitions. To accommodate this, instances of the VkSampleLocationsInfoEXT structure can be specified for each situation where an explicit or automatic layout transition has to take place. [...] and VkRenderPassSampleLocationsBeginInfoEXT can be chained from VkRenderPassBeginInfo to provide sample locations for layout transitions performed implicitly by a render pass instance." Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-07 13:11:11 +02:00
Samuel Pitoiset	d0d41e58c3	radv: determine the first subpass id for every attachments Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-07 13:11:08 +02:00
Samuel Pitoiset	f58e9f6d69	radv: handle sample locations during explicit depth/stencil transitions From the Vulkan spec 1.1.109, "Some implementations may need to evaluate depth image values while performing image layout transitions. To accommodate this, instances of the VkSampleLocationsInfoEXT structure can be specified for each situation where an explicit or automatic layout transition has to take place. VkSampleLocationsInfoEXT can be chained from VkImageMemoryBarrier structures to provide sample locations for layout transitions performed by vkCmdWaitEvents and vkCmdPipelineBarrier calls." This handles explicit depth/stencil layout transitions performed with CmdWaitEvents() or CmdPipelineBarrier(). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-07 13:11:01 +02:00
Samuel Pitoiset	a20925f2a9	radv: allow the depth decompress pass to emit dynamic sample locations Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-07 13:11:00 +02:00
Samuel Pitoiset	2dd8dfd913	radv: allow to set dynamic sample locations to the depth decompress pass If VK_EXT_sample_locations is used, the driver might need to emit the sample locations specified during layout transitions. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-07 13:10:55 +02:00
Samuel Pitoiset	d78990c174	radv: allow to save/restore sample locations during meta operations This will be used for the depth decompress pass that might need to emit variable sample locations during layout transitions. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-07 13:10:50 +02:00
Kenneth Graunke	22025595f3	iris: Sweep the NIR in iris_create_uncompiled_shader(). We run a ton of backend specific passes here (mostly brw_preprocess_nir) and ought to sweep up any unused memory at this point, since we're going to hang on to this NIR for as long as the linked program lives.	2019-06-07 01:29:38 -07:00
Eduardo Lima Mitev	c02ffd2700	ir3: Use the new NIR lowering pass for integer multiplication Shader-db stats courtesy of Eric Anholt: total instructions in shared programs: 6480215 -> 6475457 (-0.07%) instructions in affected programs: 662105 -> 657347 (-0.72%) helped: 1209 HURT: 13 total constlen in shared programs: 1432704 -> 1427769 (-0.34%) constlen in affected programs: 100063 -> 95128 (-4.93%) helped: 512 HURT: 0 total max_sun in shared programs: 875561 -> 873387 (-0.25%) max_sun in affected programs: 46179 -> 44005 (-4.71%) helped: 1087 HURT: 0 Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-07 08:45:05 +02:00
Eduardo Lima Mitev	340277ad71	ir3/nir: Add new NIR AlgebraicPass for lowering imul Currently, ir3 backend compiler is lowering integer multiplication from: dst = a * b to: dst = (al * bl) + (ah * bl << 16) + (al * bh << 16) by emitting this code: mull.u tmp0, a, b ; mul low, i.e. al * bl madsh.m16 tmp1, a, b, tmp0 ; mul-add shift high mix, i.e. ah * bl << 16 madsh.m16 dst, b, a, tmp1 ; i.e. al * bh << 16 which at that point has very low chances of being optimized. This patch adds a new nir_algebraic.AlgebraicPass to performs this lowering during NIR algebraic optimization passes, giving it a better chance for optimizing the resulting code. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-07 08:45:05 +02:00
Eduardo Lima Mitev	3addd7c8d9	nir_algebraic: Add basic optimizations for umul_low and imadsh_mix16 For umul_low (al * bl), zero is returned if the low 16-bits word of either source is zero. for imadsh_mix16 (ah * bl << 16 + c), c is returned if either 'ah' or 'bl' is zero. A couple of nir_search_helpers are added: is_upper_half_zero() returns true if the highest word of all components of an integer NIR alu src are zero. is_lower_half_zero() returns true if the lowest word of all components of an integer nir alu src are zero. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-07 08:45:05 +02:00
Eduardo Lima Mitev	e45de3a6c3	ir3/compiler: Handle new alu opcodes 'umul_low' and 'imadsh_mix16' They directly emit ir3_MULL_U and ir3_MADSH_M16 respectively. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-07 08:45:05 +02:00
Eduardo Lima Mitev	c27b3758fa	nir/opcodes: Add new 'umul_low' and 'imadsh_mix16' opcodes 'umul_low' is the low 32-bits of unsigned integer multiply. It maps directly to ir3's MULL_U. 'imadsh_mix16' is multiply add with shift and mix, an ir3 specific instruction that maps directly to ir3's IMADSH_M16. Both are necessary for the lowering of integer multiplication on Freedreno, which will be introduced later in this series. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-07 08:45:05 +02:00
Iago Toral Quiroga	9b96ae69bc	v3d: don't emit point coordinates varyings if the FS doesn't read them We still need to emit them in V3D 3.x since there there is no mechanism to disable them. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-07 08:29:42 +02:00
Iago Toral Quiroga	5e26e55e72	v3d: add a helper to track variables that need point coordinates Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-07 08:26:52 +02:00
Kenneth Graunke	4e3297f7d4	egl/x11: calloc dri2_surf so it's properly zeroed Commit `2282ec0a` refactored drawable creation across various platforms into a new dri2_create_drawable helper function. The GBM code in platform_drm.c code passed in dri2_surf->gbm_surf as the loaderPrivate, while most other backends passed in dri2_surf directly. To try and handle this, the patch checked if dri2_surf->gbm_surf was non-NULL, and if so, presumed that the caller is the DRM platform and we should use the dri2_surf->gbm_surf pointer. This worked for most platforms, which calloc their dri2_surf structure, zeroing the data. Unfortunately, platform_x11.c used malloc, leaving most of the dri2_surf as garbage. In particular, dri2_surf->gbm_surf was often non-NULL, causing dri2_create_drawable to try and use it, passing a garbage pointer to the createNewDrawable hook, usually leading to a SIGBUS or SIGSEGV when trying to dereference that bad pointer. Since most callers calloc the data, make platform_x11.c follow suit. Fixes crashes with i915_dri.so when running dEQP-GLES2. Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-06-06 22:45:27 -07:00
Mark Janes	04dac69752	tests/graw: use C99 print conversion specifier for 32 bit builds Fixes formatting errors for 32 bit compilations, eg: error: format specifies type 'unsigned long' but the argument has type 'uint64_t' (aka 'unsigned long long') [-Werror,-Wformat] printf("result1 = %lu result2 = %lu\n", res1.u64, res2.u64); Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-06 14:39:41 -07:00
Alyssa Rosenzweig	30adeb7a53	panfrost/midgard: Fix crash with unused SSA values Crash introduced in "b38dab101ca7e0896255dccbd85fd510c47d84d1" but not adding a Fixes tag since it's our bug anyway. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-06 13:44:27 -07:00
Boris Brezillon	3d661a4ef9	panfrost: Report sRGB colorspace as not supported The driver does not support sRGB yet, so let's report it as unsupported. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-06 13:41:54 -07:00
Erik Faye-Lund	c0dfe8c6df	docs: do not use div for line-breaking HTML has the <p>-tag for this purpose. It adds some margins, but that just makes this read better, IMO. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-06 17:51:45 +00:00
Erik Faye-Lund	f3235cfa70	docs: fixup code-tag positioning This reads better if we include the asterisk in the code-block, as it's part of the function-reference, even though it's not technically speaking code. But as the <code>-tag isn't purely for code, this should be fine. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-06 17:51:45 +00:00
Erik Faye-Lund	205f960e08	docs: add missing code-tags Looks like I missed a few cases when I recently added more code-tags here. So let's add these cases as well. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-06 17:51:45 +00:00
Erik Faye-Lund	54b7a1f175	docs: add accidentally dropped "at" When rewriting `20c56e18c2` after review, I accidentally dropped the "at" here. Sorry for that, and let's fix it up! Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Fixes: `20c56e18c2` ("docs: use proper links instead of code-tags") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-06 17:51:45 +00:00
Gurchetan Singh	110f139f98	anv: allow NV12 <--> AHARDWAREBUFFER_FORMAT_Y8Cb8Cr8_420 inter-op AHARDWAREBUFFER_FORMAT_Y8Cb8Cr8_420 is an implementation defined flexible YUV format. Most of the times, it's NV12 or YV12. On Intel, NV12 is preferred since it can be used by the display engine. This API adds a dependency between gralloc and buffer consumers, unfortunately. Right now, the code seems to work for i915 gralloc, but not cros_gralloc. Add a preprocessor flag to fix this. TEST=android.graphics.cts.MediaVulkanGpuTest#testMediaImportAndRendering Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-06-06 09:20:03 -07:00
Connor Abbott	9d93d2a404	ac/nir: Remove stale TODO While we're here, copy the comment explaining this from radeonsi. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-06-06 17:14:28 +02:00
Connor Abbott	1d55b0da59	radeonsi: Don't force dcc disable for loads When `e9d935ed0e` added force_dcc_off(), we forced it off for any preloaded image descriptor which had stores associated with them, since the same preloaded descriptors were used for loads and stores. However, when the preloading was removed in `16be87c904`, the existing logic was kept despite it not being necessary anymore. The comment above force_dcc_off() only mentions stores, so only force DCC off for stores. Cc: Nicolai Hähnle <nicolai.haehnle@amd.com> Cc: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-06 17:14:28 +02:00
Gert Wollny	10895c39c3	mesa/main: Expose EXT_clip_control and related enums and the function Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-06-06 12:25:17 +02:00
Gert Wollny	f1f6228a38	mapi/glapi/registry: Update gl.xml to latest upstream version The old copy didn't include EXT_clip_control, so update it. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Acked-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-06-06 12:25:12 +02:00
Gert Wollny	8657257a6e	virgl: Enable CAP_CLIP_HALFZ if host supports it On according hosts this enables the piglits as "pass": arb_clip_control-* v2: sync flag with host Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com> (v1) Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-06-06 12:24:53 +02:00
Charmaine Lee	f29b8fde91	svga: Remove unnecessary check for the pre flush bit for setting vertex buffers This fixes the missing rebind when the can_pre_flush bit is not set and the vertex buffers are the same as what have been sent. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Neha Bhende <bhenden@vmware.com> Signed-off-by: Charmaine Lee <charmainel@vmware.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>	2019-06-06 10:27:10 +02:00
Deepak Rawat	72fc886826	winsys/svga/drm: Fix 32-bit RPCI send message Depending on whether compiled with frame-pointer or not, the temporary memory location used for the bp parameter in these macros are referenced relative to the stack pointer or the frame pointer. Hence we can never reference that parameter when we've modified either the stack pointer or the frame pointer, because then the compiler would generate an incorrect stack reference. Fix this by pushing the temporary memory parameter on a known location on the stack before modifying the stack- and frame pointers. Also in case of failuire RPCI channel is not closed which lead to vmx running out of channels. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Deepak Rawat <drawat@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>	2019-06-06 10:27:10 +02:00
Samuel Pitoiset	b9d3a6b656	radv: set the subpass before any initial subpass transitions This might fix initial subpass transitions when multiview is used. Noticed while implementing sample locations during layout transitions. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-06 10:00:29 +02:00
Nataraj Deshpande	d6724471a5	anv: Fix check for isl_fmt in assert Checking isl_fmt returned value in assert seems appropriate instead of format variable. Fixes: `f1654fa7e3` "anv/android: support creating images from external format" Signed-off-by: Nataraj Deshpande <nataraj.deshpande@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>	2019-06-06 09:24:08 +03:00
Iago Toral Quiroga	09d230c6cf	v3d: fix scheduling dependency tracking for ALU with small immediates We were not accountint for small immediates in the B mux so the scheduler was interpreting these are regular register file accesses, which could lead to additional (incorrect) write-read dependencies. Shader-db changes: total instructions in shared programs: 9163664 -> 9137263 (-0.29%) instructions in affected programs: 3931035 -> 3904634 (-0.67%) helped: 12457 HURT: 2563 total max-temps in shared programs: 1325787 -> 1325597 (-0.01%) max-temps in affected programs: 5746 -> 5556 (-3.31%) helped: 186 HURT: 16 helped stats (abs) min: 1 max: 4 x̄: 1.12 x̃: 1 helped stats (rel) min: 1.45% max: 22.22% x̄: 4.42% x̃: 3.28% HURT stats (abs) min: 1 max: 3 x̄: 1.12 x̃: 1 HURT stats (rel) min: 2.86% max: 10.00% x̄: 5.76% x̃: 5.88% 95% mean confidence interval for max-temps value: -1.04 -0.84 95% mean confidence interval for max-temps %-change: -4.16% -3.07% Max-temps are helped. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-06 08:16:43 +02:00
Vasily Khoruzhick	b412e05751	lima/ppir: add missing handling of min/max ops for vec4 add slot Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-06-06 04:30:36 +00:00
Vasily Khoruzhick	5980565a37	lima/ppir: fix crash when program uses no registers at all Program may need no regalloc at all, e.g. in case when program consists of single discard op. Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-06-06 04:30:36 +00:00
Jason Ekstrand	b38dab101c	util/hash_table: Assert that keys are not reserved pointers If we insert a NULL key, it will appear to succeed but will mess up entry counting. Similar errors can occur if someone accidentally inserts the deleted key. The later is highly unlikely but technically possible so we should guard against it too. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-06 00:27:53 +00:00
Jason Ekstrand	8306dabc03	util/set: Assert that keys are not reserved pointers If we insert a NULL key, it will appear to succeed but will mess up entry counting. Similar errors can occur if someone accidentally inserts the deleted key. The later is highly unlikely but technically possible so we should guard against it too. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-06 00:27:53 +00:00
Jason Ekstrand	7a18ce0b91	glsl/loop_analysis: Don't search for NULL variables in the hash table Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-06 00:27:53 +00:00
Jason Ekstrand	d96878a66a	nir/propagate_invariant: Don't add NULL vars to the hash table Fixes: `8410cf66d` "nir/propagate_invariant: Skip unknown vars" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-06 00:27:53 +00:00
Ian Romanick	1c30d26d89	intel/compiler: Treat b32csel as potentially producing a Boolean result for resolve analysis If the 2nd and 3rd source are both Boolean values, we can potentially avoid a resolve by only resolving the result of the b32csel. No changes on any Gen6+ Intel platform. v2: Use ?: instead of cast from bool to unsigned. Suggested by Caio. Iron Lake total instructions in shared programs: 8142729 -> 8142677 (<.01%) instructions in affected programs: 12890 -> 12838 (-0.40%) helped: 26 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.25% max: 0.74% x̄: 0.45% x̃: 0.38% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -0.52% -0.39% Instructions are helped. total cycles in shared programs: 188549632 -> 188549394 (<.01%) cycles in affected programs: 60754 -> 60516 (-0.39%) helped: 25 HURT: 1 helped stats (abs) min: 2 max: 26 x̄: 9.92 x̃: 8 helped stats (rel) min: 0.07% max: 2.23% x̄: 0.59% x̃: 0.27% HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 HURT stats (rel) min: 0.70% max: 0.70% x̄: 0.70% x̃: 0.70% 95% mean confidence interval for cycles value: -12.91 -5.40 95% mean confidence interval for cycles %-change: -0.84% -0.23% Cycles are helped. GM45 total instructions in shared programs: 5013119 -> 5013093 (<.01%) instructions in affected programs: 6764 -> 6738 (-0.38%) helped: 13 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.24% max: 0.68% x̄: 0.43% x̃: 0.36% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -0.52% -0.34% Instructions are helped. total cycles in shared programs: 128977804 -> 128977700 (<.01%) cycles in affected programs: 37738 -> 37634 (-0.28%) helped: 13 HURT: 0 helped stats (abs) min: 8 max: 8 x̄: 8.00 x̃: 8 helped stats (rel) min: 0.18% max: 0.46% x̄: 0.30% x̃: 0.26% 95% mean confidence interval for cycles value: -8.00 -8.00 95% mean confidence interval for cycles %-change: -0.36% -0.24% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-05 17:04:17 -07:00
Ian Romanick	0ba9497e66	intel/fs: Improve discard_if code generation Previously we would blindly emit an sequence like: mov(1) f0.1<1>UW g1.14<0,1,0>UW ... cmp.l.f0(16) g7<1>F g5<8,8,1>F 0x41700000F /* 15F / (+f0.1) cmp.z.f0.1(16) null<1>D g7<8,8,1>D 0D The first move sets the flags based on the initial execution mask. Later discard sequences contain a predicated compare that can only remove more SIMD channels. Often times the only user of the result from the first compare is the second compare. Instead, generate a sequence like mov(1) f0.1<1>UW g1.14<0,1,0>UW ... cmp.l.f0(16) g7<1>F g5<8,8,1>F 0x41700000F / 15F / (+f0.1) cmp.ge.f0.1(8) null<1>F g5<8,8,1>F 0x41700000F / 15F */ If the results stored in g7 and f0.0 are not used, the comparison will be eliminated. This removes an instruction and potentially reduces register pressure. v2: Major re-write of the commit message (including fixing the assembly code). Suggested by Matt. All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224434 -> 17198659 (-0.15%) instructions in affected programs: 2908125 -> 2882350 (-0.89%) helped: 18891 HURT: 5 helped stats (abs) min: 1 max: 12 x̄: 1.38 x̃: 1 helped stats (rel) min: 0.03% max: 25.00% x̄: 1.76% x̃: 1.02% HURT stats (abs) min: 9 max: 105 x̄: 51.40 x̃: 35 HURT stats (rel) min: 0.43% max: 4.92% x̄: 2.34% x̃: 1.56% 95% mean confidence interval for instructions value: -1.39 -1.34 95% mean confidence interval for instructions %-change: -1.79% -1.73% Instructions are helped. total cycles in shared programs: 361468458 -> 361170679 (-0.08%) cycles in affected programs: 38470116 -> 38172337 (-0.77%) helped: 16202 HURT: 1456 helped stats (abs) min: 1 max: 4473 x̄: 26.24 x̃: 18 helped stats (rel) min: <.01% max: 28.44% x̄: 2.90% x̃: 2.18% HURT stats (abs) min: 1 max: 5982 x̄: 87.51 x̃: 28 HURT stats (rel) min: <.01% max: 51.29% x̄: 5.48% x̃: 1.64% 95% mean confidence interval for cycles value: -18.24 -15.49 95% mean confidence interval for cycles %-change: -2.26% -2.14% Cycles are helped. total spills in shared programs: 12147 -> 12176 (0.24%) spills in affected programs: 175 -> 204 (16.57%) helped: 8 HURT: 5 total fills in shared programs: 25262 -> 25292 (0.12%) fills in affected programs: 269 -> 299 (11.15%) helped: 8 HURT: 5 Haswell total instructions in shared programs: 13530316 -> 13502647 (-0.20%) instructions in affected programs: 2507824 -> 2480155 (-1.10%) helped: 18859 HURT: 10 helped stats (abs) min: 1 max: 12 x̄: 1.48 x̃: 1 helped stats (rel) min: 0.03% max: 27.78% x̄: 2.38% x̃: 1.41% HURT stats (abs) min: 5 max: 39 x̄: 25.70 x̃: 31 HURT stats (rel) min: 0.22% max: 1.66% x̄: 1.09% x̃: 1.31% 95% mean confidence interval for instructions value: -1.49 -1.44 95% mean confidence interval for instructions %-change: -2.42% -2.34% Instructions are helped. total cycles in shared programs: 377865412 -> 377639034 (-0.06%) cycles in affected programs: 40169572 -> 39943194 (-0.56%) helped: 15550 HURT: 1938 helped stats (abs) min: 1 max: 2482 x̄: 25.67 x̃: 18 helped stats (rel) min: <.01% max: 37.77% x̄: 3.00% x̃: 2.25% HURT stats (abs) min: 1 max: 4862 x̄: 89.17 x̃: 35 HURT stats (rel) min: <.01% max: 67.67% x̄: 6.16% x̃: 2.75% 95% mean confidence interval for cycles value: -14.42 -11.47 95% mean confidence interval for cycles %-change: -2.05% -1.91% Cycles are helped. total spills in shared programs: 26769 -> 26814 (0.17%) spills in affected programs: 826 -> 871 (5.45%) helped: 9 HURT: 10 total fills in shared programs: 38383 -> 38425 (0.11%) fills in affected programs: 834 -> 876 (5.04%) helped: 9 HURT: 10 LOST: 5 GAINED: 10 Ivy Bridge total instructions in shared programs: 12079250 -> 12044139 (-0.29%) instructions in affected programs: 2409680 -> 2374569 (-1.46%) helped: 16135 HURT: 0 helped stats (abs) min: 1 max: 23 x̄: 2.18 x̃: 2 helped stats (rel) min: 0.07% max: 37.50% x̄: 2.72% x̃: 1.68% 95% mean confidence interval for instructions value: -2.21 -2.14 95% mean confidence interval for instructions %-change: -2.76% -2.67% Instructions are helped. total cycles in shared programs: 180116747 -> 179900405 (-0.12%) cycles in affected programs: 25439823 -> 25223481 (-0.85%) helped: 13817 HURT: 1499 helped stats (abs) min: 1 max: 1886 x̄: 26.40 x̃: 18 helped stats (rel) min: <.01% max: 38.84% x̄: 2.57% x̃: 1.97% HURT stats (abs) min: 1 max: 3684 x̄: 98.99 x̃: 52 HURT stats (rel) min: <.01% max: 97.01% x̄: 6.37% x̃: 3.42% 95% mean confidence interval for cycles value: -15.68 -12.57 95% mean confidence interval for cycles %-change: -1.77% -1.63% Cycles are helped. LOST: 8 GAINED: 10 Sandy Bridge total instructions in shared programs: 10878990 -> 10863659 (-0.14%) instructions in affected programs: 1806702 -> 1791371 (-0.85%) helped: 13023 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.18 x̃: 1 helped stats (rel) min: 0.07% max: 13.79% x̄: 1.65% x̃: 1.10% 95% mean confidence interval for instructions value: -1.18 -1.17 95% mean confidence interval for instructions %-change: -1.68% -1.62% Instructions are helped. total cycles in shared programs: 154082878 -> 153862810 (-0.14%) cycles in affected programs: 20199374 -> 19979306 (-1.09%) helped: 12048 HURT: 510 helped stats (abs) min: 1 max: 323 x̄: 20.57 x̃: 18 helped stats (rel) min: 0.03% max: 17.78% x̄: 2.05% x̃: 1.52% HURT stats (abs) min: 1 max: 448 x̄: 54.39 x̃: 16 HURT stats (rel) min: 0.02% max: 37.98% x̄: 4.13% x̃: 1.17% 95% mean confidence interval for cycles value: -17.97 -17.08 95% mean confidence interval for cycles %-change: -1.84% -1.75% Cycles are helped. LOST: 1 GAINED: 0 Iron Lake total instructions in shared programs: 8155075 -> 8142729 (-0.15%) instructions in affected programs: 949495 -> 937149 (-1.30%) helped: 5810 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.12 x̃: 2 helped stats (rel) min: 0.10% max: 16.67% x̄: 2.53% x̃: 1.85% 95% mean confidence interval for instructions value: -2.14 -2.11 95% mean confidence interval for instructions %-change: -2.59% -2.48% Instructions are helped. total cycles in shared programs: 188584610 -> 188549632 (-0.02%) cycles in affected programs: 17274446 -> 17239468 (-0.20%) helped: 3881 HURT: 90 helped stats (abs) min: 2 max: 168 x̄: 9.08 x̃: 6 helped stats (rel) min: <.01% max: 23.53% x̄: 0.83% x̃: 0.30% HURT stats (abs) min: 2 max: 10 x̄: 2.80 x̃: 2 HURT stats (rel) min: <.01% max: 0.60% x̄: 0.10% x̃: 0.07% 95% mean confidence interval for cycles value: -9.35 -8.27 95% mean confidence interval for cycles %-change: -0.85% -0.77% Cycles are helped. GM45 total instructions in shared programs: 5019308 -> 5013119 (-0.12%) instructions in affected programs: 489028 -> 482839 (-1.27%) helped: 2912 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.13 x̃: 2 helped stats (rel) min: 0.10% max: 16.67% x̄: 2.46% x̃: 1.81% 95% mean confidence interval for instructions value: -2.14 -2.11 95% mean confidence interval for instructions %-change: -2.54% -2.39% Instructions are helped. total cycles in shared programs: 129002592 -> 128977804 (-0.02%) cycles in affected programs: 12669152 -> 12644364 (-0.20%) helped: 2759 HURT: 37 helped stats (abs) min: 2 max: 168 x̄: 9.03 x̃: 4 helped stats (rel) min: <.01% max: 21.43% x̄: 0.75% x̃: 0.31% HURT stats (abs) min: 2 max: 10 x̄: 3.62 x̃: 4 HURT stats (rel) min: <.01% max: 0.41% x̄: 0.10% x̃: 0.04% 95% mean confidence interval for cycles value: -9.53 -8.20 95% mean confidence interval for cycles %-change: -0.79% -0.70% Cycles are helped. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-05 17:04:13 -07:00
Ian Romanick	a288708506	intel/fs: Add need_dest parameter to fs_visitor::nir_emit_alu This is the same as the need_dest parameter to prepare_alu_destination_and_sources. This allows us to not change the register that is expected to hold an result if an instruction is re-emitted. This is particularly a problem if the re-emitted instruction is a partial write. A later patch will use this feature. No shader-db changes on any Intel platform. v2: Don't do the Boolean resolve when there is no destination. If the ALU instruction didn't write a register, there's nothing to resolve. This replaces an earlier patch "intel/fs: Allocate dummy destination register when need_dest is false". Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-05 17:04:08 -07:00
Ian Romanick	e13a5c7d67	intel/fs: Allow cmod propagation across reads and writes of different flags This also helps a later patch (intel/fs: Improve discard_if code generation) on about 200 shaders. v2: Document that other instruction sequences are also valid in subtract_merge_with_compare_intervening_mismatch_flag_write. Suggested by Caio. All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224438 -> 17224434 (<.01%) instructions in affected programs: 296 -> 292 (-1.35%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.99% max: 1.92% x̄: 1.43% x̃: 1.40% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -2.04% -0.81% Instructions are helped. total cycles in shared programs: 361468455 -> 361468458 (<.01%) cycles in affected programs: 2862 -> 2865 (0.10%) helped: 2 HURT: 2 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.24% max: 0.39% x̄: 0.31% x̃: 0.31% HURT stats (abs) min: 3 max: 4 x̄: 3.50 x̃: 3 HURT stats (rel) min: 0.32% max: 0.70% x̄: 0.51% x̃: 0.51% 95% mean confidence interval for cycles value: -4.34 5.84 95% mean confidence interval for cycles %-change: -0.70% 0.90% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-05 17:03:45 -07:00
Ian Romanick	8030cb75c1	intel/fs: Fix flag_subreg handling in cmod propagation There were two errors. First, the pass could propagate conditional modifiers from an instruction that writes on flag register to an instruction that writes a different flag register. For example, cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F cmp.nz.f0.1(16) null:F, vgrf6:F, vgrf5:F could be come cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F Second, if an instruction writes f0.1 has it's condition propagated, the modified instruction will incorrectly write flag f0.0. For example, linterp(16) vgrf6:F, g2:F, attr0:F cmp.z.f0.1(16) null:F, vgrf6:F, vgrf5:F (-f0.1) discard_jump(16) (null):UD could become linterp.z.f0.0(16) vgrf6:F, g2:F, attr0:F (-f0.1) discard_jump(16) (null):UD None of these cases will occur currently. The only time we use f0.1 is for generating discard intrinsics. In all those cases, we generate a squence like: cmp.nz.f0.0(16) vgrf7:F, vgrf6:F, vgrf5:F (+f0.1) cmp.z(16) null:D, vgrf7:D, 0d (-f0.1) discard_jump(16) (null):UD Due to the mixed types and incompatible conditions, this sequence would never see any cmod propagation. The next patch will change this. No shader-db changes on any Intel platform. v2: Fix typo in comment in test case subtract_delete_compare_other_flag. Noticed by Caio. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-05 17:03:40 -07:00
Ian Romanick	2dd6013933	intel/fs: Add missing tests for cmod_propagate_not Tests like this should have been added in `4467040cb6` ("i965/fs: Propagate conditional modifiers from not instructions"). Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-05 17:03:31 -07:00
Kenneth Graunke	6a7d387394	i965: Allow signed/unsigned integer conversions in miptree up/download BLORP now handles this so there's no reason to fall back. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-05 16:58:07 -07:00
Kenneth Graunke	f06c86358c	intel/blorp: Handle SINT/UINT clamping on blits. This patch makes blorp_blit handle SINT<->UINT blit value clamping. After reading the source's integer data (which is expanded to 32-bit), we either IMAX with 0 (for SINT -> UINT, to clamp negative numbers) or UMIN with (1 << 31) - 1 (for UINT -> SINT, to clamp positive numbers outside of the representable range). Such blits are not allowed by the OpenGL or Vulkan APIs directly: The Vulkan 1.1 spec for vkCmdBlitImage says: "Integer formats can only be converted to other integer formats with the same signedness." The GL 4.5 spec for glBlitFramebuffer says: "An INVALID_OPERATION error is generated if format conversions are not supported, which occurs under any of the following conditions: [...] * The read buffer contains unsigned integer values and any draw buffer does not contain unsigned integer values. * The read buffer contains signed integer values and any draw buffer does not contain signed integer values." However, they are useful for other operations, such as texture upload and download, which typically are implemented via blorp_blit(). i965 has code to fall back in this case (which the next commit will delete), and Gallium expects blit() to handle this case for texture upload. Fixes the following tests on iris: - GTF-GL46.gtf32.GL3Tests.packed_pixels.packed_pixels - GTF-GL46.gtf32.GL3Tests.packed_pixels.packed_pixels_pbo - GTF-GL46.gtf32.GL3Tests.packed_pixels.packed_pixels_pixelstore Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-05 16:58:07 -07:00
Caio Marcelo de Oliveira Filho	1aea4cd0d9	anv/pipeline: Move lowering of nir_var_mem_global later This let deref optimizations apply to globals before lowering them. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-05 16:57:09 -07:00
Kenneth Graunke	4f3c82c72c	st/nir: Don't use GLSL IR's MOD_TO_FLOOR lowering when using NIR. Both GLSL IR and NIR perform the same mod -> floor lowering for 32-bit types. But nir_lower_double_ops is slightly more defensive against lowered drcp precision loss, and handles mod(x, x) = 0 directly. This works well...assuming nir_lower_double_ops actually gets an fmod op to lower in the first place. The previous patches enabled NIR-based lowering for the remaining drivers, so we can stop using the GLSL IR lowering when using NIR. Fixes KHR-GL45.gpu_shader_fp64.builtin.mod_dvec[234] on iris. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-05 16:45:12 -07:00
Kenneth Graunke	f4d4c42608	radeonsi: Enable NIR's lower_fmod option. Currently, st/mesa is always calling the GLSL IR lower_instructions() pass with MOD_TO_FLOOR set, so mod operations will be lowered before ever reaching NIR. This enables the same lowering at the NIR level, which will let me shut off the GLSL IR path for NIR-based drivers. The AMD NIR backend also has code to handle fmod, so we could potentially skip this and still be fine. I don't have an opinion on that. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-05 16:45:12 -07:00
Kenneth Graunke	e0641e0728	vc4: Enable NIR's lower_fmod option. Currently, st/mesa is always calling the GLSL IR lower_instructions() pass with MOD_TO_FLOOR set, so mod operations will be lowered before ever reaching NIR. This enables the same lowering at the NIR level, which will let me shut off the GLSL IR path for NIR-based drivers. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Eric Anholt <eric@anholt.net>	2019-06-05 16:45:12 -07:00
Kenneth Graunke	b0e3bd79dc	v3d: Enable NIR's lower_fmod option. Currently, st/mesa is always calling the GLSL IR lower_instructions() pass with MOD_TO_FLOOR set, so mod operations will be lowered before ever reaching NIR. This enables the same lowering at the NIR level, which will let me shut off the GLSL IR path for NIR-based drivers. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Eric Anholt <eric@anholt.net>	2019-06-05 16:45:12 -07:00
Kenneth Graunke	c7d1b52a2c	nir: Combine lower_fmod16/32 back into a single lower_fmod. We originally had a single lower_fmod option. In commit `2ab2d2e5`, Sam split 32 and 64-bit lowering into separate flags, with the rationale that some drivers might want different options there. This left 16-bit unhandled, so Iago added a lower_fmod16 option in commit `ca31df6f`. Now that lower_fmod64 is gone (in favor of nir_lower_doubles and nir_lower_dmod), we re-combine lower_fmod16 and lower_fmod32 into a single lower_fmod flag again. I'm not aware of any hardware which need lowering for one bitsize and not the other. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-05 16:45:12 -07:00
Kenneth Graunke	edd45af9ba	nir: Drop lower_fmod64 option. nir_lower_doubles offers a wide variety of fp64 lowering, including lowering fmod@64. The version there also better handles imprecisions due to lowered frcp@64. Let's consolidate on one version. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-05 16:45:12 -07:00
Kenneth Graunke	dfb18f0a28	panfrost: Switch to nir_lower_doubles instead of lower_fmod64. I don't think panfrost actually does doubles yet, but it at least claims to support PIPE_CAP_DOUBLES, so at least pretend to switch to the new lowering. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-05 16:45:12 -07:00
Kenneth Graunke	d13059f4d5	nouveau: Use nir_lower_doubles instead of lower_fmod64 on nvc0. We currently have two duplicate mechanisms for lowering fmod@64. One is a nir_opt_algebraic rule keyed off of options->lower_fmod64, and the other is nir_lower_doubles, which offers a full gamut of fp64 lowering. The latter works slightly better in some corner cases, so I'm trying to eliminate lower_fmod64 and drop the redundancy. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-05 16:45:12 -07:00
Kenneth Graunke	fa56a3795f	gallium: Drop lower_fmod64 from drivers that don't support doubles. Neither freedreno nor nv50 expose PIPE_CAP_DOUBLES, so there's no fmod64 to be lowered. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-05 16:45:12 -07:00
Dylan Baker	e8a60e5d50	docs: update calendar, and news item and link release notes for 19.0.6	2019-06-05 16:42:36 -07:00
Dylan Baker	fccd44940d	docs: Add SHA256 sums for 19.0.6	2019-06-05 16:40:57 -07:00
Dylan Baker	fe79d75ccf	docs: Add relnotes for 19.0.6	2019-06-05 16:40:55 -07:00
Erik Faye-Lund	ff41ac7292	docs: add day of month to all news-entries This makes it easier to batch-convert them to other structured markup-formats. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:46 +02:00
Erik Faye-Lund	382776d241	docs: add MD5 checksums for 9.2.2 files These checksums were obtained by downloading the releases from ftp://ftp.freedesktop.org/pub/mesa/older-versions/9.x/9.2.2/ and running md5sum on them. Hopefully the server wasn't compromised since release. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:46 +02:00
Erik Faye-Lund	1941e642bc	docs: use pre-block for showing commit-note Having a single-item list for this seems odd. Let's just use a pre-block in stead. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:46 +02:00
Erik Faye-Lund	b16e593f79	docs: switch to definition list and code-tags A definition list is a better semantic match for what this list is supposed to convey, so let's use that instead. And while we're at it, let's add some code-tags around filenames, as they stand a bit more out that way. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:46 +02:00
Erik Faye-Lund	3b0d48e219	docs: combine headings This is more in line with how we mark-up other definition lists, and avoids portability issues with other markup-formats. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:46 +02:00
Erik Faye-Lund	942c4daac9	docs: more code-tags in llvmpipe.html This makes the article a bit easier to read. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:46 +02:00
Erik Faye-Lund	52667f990e	docs: use more code-tags in envvars.html This wraps code, identifiers, values and paths in code-tags, which makes them appear in a monospace-font for readability. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:46 +02:00
Erik Faye-Lund	d311d8f424	docs: use code-tags for envvars and options This makes it a bit easier to tell what's what. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:45 +02:00
Erik Faye-Lund	5639f0d5ee	docs: use dl instead of ul A HTML definition-list is more semantically strong than just some unordered list, and renders a bit cleaner by default. So let's use that instead. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:45 +02:00
Erik Faye-Lund	0bca0f1aa2	docs: remove pointlessly repeated list The examples listed above are exactly the same ones are we're about to list, so let's just keep the list that defines what they do. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:45 +02:00
Erik Faye-Lund	aed4ac6da8	docs: remove stray whitespace There's some stray whitespace in these files that doesn't do anything useful. Let's get rid of if. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:45 +02:00
Erik Faye-Lund	20c56e18c2	docs: use proper links instead of code-tags These links are a bit odd in that the URLs are simply placed in code-tags. This makes them harder to work with. Let's use proper links instead. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:45 +02:00
Erik Faye-Lund	c59c793ae5	docs: update doxygen-links One of these URLs are dead these days, and the other one forwards to the current one, doxygen.nl. Let's get these links up to date. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:45 +02:00
Erik Faye-Lund	7c4a4fb09a	docs: remove some noisy spacing in pre-blocks These newlines caused the blocks to have trailing newlines in them, which renders a bit noisily. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:45 +02:00
Erik Faye-Lund	412046f74e	docs: improve quoting slightly Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:45 +02:00
Erik Faye-Lund	8620f53212	docs: do not use br-tag for non-significant breaks According to the W3C, we shouldn't use the br-tag unless the line-break is part of the content: https://www.w3.org/TR/2011/WD-html5-author-20110809/the-br-element.html All of these instances are for non-content usage, and is as such technically out-of-spec. So let's either remove them, or split paragraphs, based on how related the content are. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:45 +02:00
Erik Faye-Lund	d5e273aad2	docs: remove pointless line-break Line-breaks at the end of a paragraph doesn't do anything useful, so let's just get rid of it. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:45 +02:00
Erik Faye-Lund	db8211a883	docs: remove pointless trailing hard-breaks Line-break at the end of an article is quite pointless, and doesn't do much to increase the readability. Let's get rid of them. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:45 +02:00
Erik Faye-Lund	74a6a68196	docs: rewrite paragraph to be free-form These half-way structured sections are needlessly problematic to translate cleanly to other markup-languages, so let's just make this into a free-form paragraph instead. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:45 +02:00
Erik Faye-Lund	9e5bc2c868	docs: use h4 instead of free-standing paragraphs and br-tags This makes this document a bit more structured, which is generally considered a good thing for HTML. It will also translate a bit better into other markup-formats. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:45 +02:00
Erik Faye-Lund	38652a29ae	docs: slightly reword paragraph and tweak markup This makes this paragraph a bit easier to digest. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:44 +02:00
Erik Faye-Lund	b2ac7582d9	docs: remove stray space in code-block Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:44 +02:00
Erik Faye-Lund	0114d15ed6	docs: remove some pointless spacing The different headers and header-sizes already convey the hierarchical structure of this document, the unusual spacing arguably just looks a bit inconsistent with the rest of the site. Let's remove it; it looks fine without it, and will translate better to other markup languages. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:44 +02:00
Erik Faye-Lund	392c083377	docs: add more more code-tags It's easier to read function-names, file-names and other "machine"-related strings if they are formatted in a monospace font. So let's mark these up with code-tags. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:44 +02:00
Erik Faye-Lund	0ee366960c	docs: use code instead of tt-tag The tt-tag has been removed from HTML5, so let's normalize this to code-tags intead. This just makes things a bit more consistent, as we've mixed these left and right so far anyway. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:44 +02:00
Erik Faye-Lund	d60dc5d16f	docs: use paragraph instead of double newlines This is a bit more semantically clean in HTML, and makes us keep content and presentation a bit more separated. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:44 +02:00
Erik Faye-Lund	9a65de343e	docs: use verbatim .plan quote This quote is now verbatim, as archived here: https://github.com/ESWAT/john-carmack-plan-archive/blob/master/by_year/johnc_plan_1999.txt This makes it look a bit more consistent with the following news-entry, and makes things IMO a bit more clear. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-06-05 23:48:44 +02:00
Alyssa Rosenzweig	905d914cb6	panfrost/midgard: Verify SSA claims when pipelining The pipeline register creation algorithm is only valid for SSA indices; NIR registers and such cannot be pipelined without more complex analysis. However, there are the ocassional class of "liars" -- indices that claim to be SSA but are not. This occurs in the blend shader prologue, for example. Detect this and just bail quietly for now. Eventually we need to rewrite the blend shader prologue to occur in NIR anyway (which would mitigate the issue), but that's more involved and depends on a better understanding of pixel formats in blend shaders (for non-RGBA8888/UNORM cases). Fixes some blend shader regressions. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-05 14:40:08 -07:00
Alyssa Rosenzweig	dcd12aad46	panfrost/midgard: Don't assign var locations ourselves This piece of code was cargo-culted from the ir3 standalone compiler and made sense when we were a standalone compiler ourselves. Unfortunately, for the online compiler, mesa/st already handles this for us and if we duplicate it here, we're duplicating it incorrectly. So just delete these lines and fix a heck of a lot of tests. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-05 14:40:08 -07:00
Tomeu Vizoso	de5c882973	panfrost: Reload framebuffer contents if there's no clear If by flush time the client hasn't submitted a clear, add jobs for reloading the framebuffer contents as the first draw in the frame. This is required by programs such as Weston which don't do clears and rely on the previous contents of the framebuffer being there. Reloading the whole framebuffer on every frame without regards to what is needed or what is going to be covered is very inefficient, but future work will introduce support for damage regions and partial updates so we know what needs to be actually reloaded. Fixes quite a few tests in dEQP-EGL.functional.buffer_age.*. [Alyssa: The context is that tilers do an implicit glClear() on every frame, whether you asked them to or not. If you want a clear, this is very efficient. But if you don't, you have to explicitly blit the backbuffer back into tile memory, accomplished by a dummy texturing draw. This patch generates that draw via u_blitter, although we could do a bit better ourselves by eliding the vertex job. This fixes "black rectangles in Weston/sway" as well as "video not displaying when UI visible in mpv"] Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-05 14:35:48 -07:00
Alyssa Rosenzweig	2adf35e4f5	panfrost: Don't flip scanout The mesa/st flips the viewport, so we respect that rather than trying to flip the framebuffer itself and ignoring the viewport and using a messy heuristic. However, this brings an underlying disagreement about the interpretation of winding order to light. The blob uses a different strategy than Mesa for handling viewport Y flipping, so the meanings of the winding order bit are flipped for it. To keep things clean on our end, we rename to explicitly use Gallium (rather than flipped OpenGL) conventions. Fixes upside-down Xwayland/egl windows. v2: Adjust lowering configuration to correctly flip gl_PointCoord.y and gl_FragCoord.y. v1 was R-b'd by Tomeu, but then retracted due to these regressions which are not fixed. Suggested-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Sort-of-reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-06-05 14:35:48 -07:00
Timur Kristóf	c94b70a178	st/nine: Use tgsi_to_nir when preferred IR is NIR. This patch allows nine to read the preferred IR from pipe caps and use NIR when that is preferred by the driver, by calling tgsi_to_nir. Also adds some debug options that allow overriding it. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Axel Davy <davyaxel0@gmail.com>	2019-06-05 23:32:13 +02:00
Lionel Landwerlin	c162127440	intel/perf: improve dynamic loading config detection We're currently trying to detect dynamic loading config support by trying to remove to test config (hard coded in the i915 driver) and checking we get ENOENT. This can fail if the test config was updated in Mesa but not yet in i915. A better way to do this is to pick an invalid ID and check for ENOENT. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-05 20:16:23 +00:00
Jason Ekstrand	811c05dfe6	intel/nir: Take nir_shaders in brw_nir_link_shaders Since NIR_PASS no longer swaps out the NIR pointer when NIR_TEST_ is enabled, we can just take a single pointer and not a pointer to pointer. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-05 20:07:28 +00:00
Jason Ekstrand	bb67a99a2d	intel/nir: Stop returning the shader from helpers Now that NIR_TEST_* doesn't swap the shader out from under us, it's sufficient to just modify the shader rather than having to return in case we're testing serialization or cloning. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-05 20:07:28 +00:00
Jason Ekstrand	fe2fc30cb5	nir: Don't replace the nir_shader when NIR_TEST_SERIALIZE=1 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108957 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rob Clark <robdclark@chromium.org>	2019-06-05 20:07:28 +00:00
Jason Ekstrand	9eba6d9a88	nir: Don't replace the nir_shader when NIR_TEST_CLONE=1 Instead, we add a new helper which stomps one nir_shader and replaces it with another. The new helper effectively just changes which pointer gets used for the base nir_shader. It should be 99% as good at testing cloning but without requiring that everything handle having the shader swapped out from under it constantly. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108957 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rob Clark <robdclark@chromium.org>	2019-06-05 20:07:28 +00:00
Caio Marcelo de Oliveira Filho	747926ddfb	iris: Only recompile CS when needed Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>	2019-06-05 12:57:54 -07:00
Lionel Landwerlin	0430c6d18a	intel/perf: fix EuThreadsCount value in performance equations EuThreadsCount is supposed to be the number of threads per EU, not the total number of threads in the whole device. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `1fc7b95127` ("i965: Add Gen8+ INTEL_performance_query support") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-05 22:41:01 +03:00
Mark Janes	36d8a922de	intel/tools: use C99 print conversion specifier for 32 bit builds Fixes formatting errors for 32 bit compilations, eg: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘uint64_t’ {aka ‘long long unsigned int’} [-Werror=format=] Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-06-05 19:25:15 +00:00
Samuel Pitoiset	8a31eaa4e2	radv: use only one descriptor in the fmask expand pass This removes one useless SMEM load operations which pointed to the same descriptor anyway. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-05 20:50:58 +02:00
Samuel Pitoiset	7664eb8f2b	radv: set ACCESS_NON_READABLE on the fmask expand pass output image The driver will emit GLC=1. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-05 20:50:56 +02:00
Samuel Pitoiset	8206390546	radv: remove one useless image type in the fmask expand shader Both input and output images use the same type. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-05 20:50:53 +02:00
Kristian H. Kristensen	1e6c873f1f	freedreno/ir3: Extend debug helpers to support TCS/TES/GS Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-06-05 11:15:04 -07:00
Kristian H. Kristensen	3da9a24f35	freedreno/a6xx: Use VALIDREG in next_regid() helper Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-06-05 11:15:04 -07:00
Kristian H. Kristensen	6fffc091e2	freedreno/a6xx: Remove dead code from a5xx Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-06-05 11:15:04 -07:00
Kristian H. Kristensen	cea39af2fb	freedreno/ir3: Generalize ir3_shader_disasm() Use a helper function to get the sysval/attribute/varying/output name and make the disam debug output independent of shader stage. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-06-05 11:15:04 -07:00
Alyssa Rosenzweig	1ea987576d	panfrost/midgard: Always break up fragment writeout In a fragment shader, r0 is written out with a special branch sequence. r0 is not a real register here, but essentially a pipeline register -- as such, it needs to be written out in full and on time, with hanging dependencies in the bundle. Otherwise, we break up the bundle, which costs an extra ALU cycle and adds a move. When the scheduler ran last thing, we could do this analysis within the scheduler. Now that RA can run after scheduling, that's no longer valid, so we remove the analysis and always break it up (at a performance penalty). Future work can add a post-RA/post-schedule pass to merge writeout blocks if possible. It's a bit of a low-priority next to fixing conformance regressions, of course. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-05 18:06:49 +00:00
Alyssa Rosenzweig	3d11b075f0	panfrost/midgard: Fix cubemap regression Fixes: `2d9802233` ("panfrost/midgard: Extend RA to non-vec4 sources") Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-05 18:06:48 +00:00
Deepak Rawat	828e1b0b4c	winsys/drm: Fix out of scope variable usage In this particular instance, struct member were used outside of the block where it was defined. Fix this by moving the definition outside of block. Signed-off-by: Deepak Rawat <drawat@vmware.com> Fixes: `569f838987` ("winsys/svga: Add support for new surface ioctl, multisample pattern") Reviewed-by: Brian Paul <brianp@vmware.com>	2019-06-02 22:31:07 -07:00
Alyssa Rosenzweig	c51312bc94	panfrost/midgard: Lower integer division We use the shared nir_lower_idiv pass to lower integer division, fixing 144 dEQP tests. This pass was not applied in the past due to breakage from iabs fixed earlier in the series. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-By: Ryan Houdek <Sonicadvance1@gmail.com>	2019-06-05 17:59:27 +00:00
Alyssa Rosenzweig	88c59798fe	panfrost/midgard: Fix 1-arg ALU memory corruption Certain ops that only take one argument have an imaginary "zero" constant for their second argument. For instance, conversions: i2f [dest], [source], #0 Memory corruption meant that #0 was instead random noise. For some ops, that doesn't matter (manifested as abnormally large code size and poor scheduling due to extra constants in random places). But for others, where a 1-op is emulated by a 2-op with an implicit 0 second argument, that broke things. Fixes iabs (emulated by iabsdiff). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-By: Ryan Houdek <Sonicadvance1@gmail.com>	2019-06-05 17:59:24 +00:00
Alyssa Rosenzweig	9f14e20fa1	panfrost/midgard: Add a bunch of new ALU ops These ops are used to accelerate various functions exposed in OpenCL. This commit only includes the routine additions to the table. They are not wired through the compiler; rather, they are just here to keep a reference for the disassembler. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-By: Ryan Houdek <Sonicadvance1@gmail.com>	2019-06-05 17:58:14 +00:00
Emil Velikov	d6edccee8d	egl: add EGL_platform_device support This new 'platform' is added by default with no guards. It is effectively a copy of the surfaceless one, with updated function names and brand new probe function. Due to the reuse, some of the ifdef HAVE_SURFACELESS_PLATFORM guards have been dropped. A worthy mention are the changes in _egFindDisplay, since the original and dup'd fd are required, we make use of the plat_opt argument. Note that no hacks for eglGetDisplay are added - the API works only with the eglGetPlatformDisplay* API. v2: - s/_eglCompareDeviceDisplay/_eglSameDeviceDisplay/ (Eric) - let ^^ return bool (Eric) - fixup meson build, move files() further up (Eric) - copy from plat. surfaceless w/o the visual cleanups - close and free when destroying the dpy - sprinkle a few _eglDeviceSupports - split fd handling into separate function - use directly the render node if no FD is given (Mathias) v3: - s/dpy/disp/g - drop swap_buffers* callbacks - drop loader_set_logger() - drop local define - re-introduce _eglGetDRMDeviceRenderNode() - EGL_WARN on ForceSoftware with HW device - continue using the HW device - bail out for "EGL_MESA_device_software" until it's fixed - wire-up the Android build v4: - use new style _eglFindDisplay() - split hw vs sw code paths - don't close the internal fd (already handled in FiniDisplay()) - make swrast work (bit hacky bit will do for now) - Android for real, drop autotools - Correct HW + LIBGL_ALWAYS_SOFTWARE check - use the dri2_create_drawable() helper v5: - enhance comment around fd checks (Mathias) - rebase for dri2_init_surface() changes Cc: Mathias Fröhlich <Mathias.Froehlich@gmx.net> Acked-by: Marek Olšák <marek.olsak@amd.com> (v4) Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-05 13:35:21 -04:00
Emil Velikov	2f11957532	egl: keep the software device at the end of the list By default, the user is likely to pick the first device so it should not be the least performant (aka software) one. v2: Drop odd comment (Marek) Suggested-by: Marek Olšák <maraeo@gmail.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> (v1) Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1) Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-05 13:35:21 -04:00
Emil Velikov	2282ec0ad6	egl/dri: flesh out and use dri2_create_drawable() Wrap the loader->createNewDrawable() dance into a helper and use it throughout the codebase. This addresses a cases like surfaceless (SL) on swrast (SL on kms_swrast is fine) where we'd attempt using the wrong driver and crash out. v2: fixup quirky GBM (Mathias) v3: fixup GBM for real (Marek) Cc: mesa-stable@lists.freedesktop.org Cc: Mathias Fröhlich <Mathias.Froehlich@web.de> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> (v1) Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1) Signed-off-by: Emil Velikov <emil.velikov@collabora.com> (v2) Signed-off-by: Marek Olšák <marek.olsak@amd.com> (v2) Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-06-05 13:35:21 -04:00
Emil Velikov	5e0f527d60	egl: fold X11 attrib handling like other platforms Since we no longer need special handling for X11, refactor the code to follow the style used by all other platforms. Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-05 13:35:21 -04:00
Adam Jackson	2b29cf2468	egl: remove Options::Platform handling The full set of attributes is already handled with previous patches. Thus all this is not dead code. v2 (Emil) - split from a larger patch. Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-05 13:35:21 -04:00
Adam Jackson	4aebd86f9a	egl/x11: pick the user requested screen At the moment the user will pass the screen number via attribs, yet we would throw that away. Reason being that the int *screen passed to xcb_connect() is output only. v2 (Emil): - split from a larger patch - use xcb_connect() returned screen, as fallback - use helper function only as needed Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-05 13:35:21 -04:00
Adam Jackson	8e991ce539	egl: handle the full attrib list in display::options Earlier spec is vague, although EGL 1.5 makes it clear: Multiple calls made to eglGetPlatformDisplay with the same parameters will return the same EGLDisplay handle. With this commit we store and compare the full attrib list. v2 (Emil): - Split into separate patches - Use EGLBoolean over int masked as such - Don't return free'd pointed on calloc failure Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-05 13:35:21 -04:00
Emil Velikov	72b9aa973b	egl: flesh out a _eglNumAttribs() helper Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-05 13:35:21 -04:00
Krzysztof Raszkowski	4ff02b3edd	swr: fix support for GL_ARB_copy_image extension This commit fix support and adjusts the capabilities returned by the SWR driver and the documentation to correctly report the GL_ARB_copy_image extension. Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-06-05 15:26:47 +00:00
Guido Günther	755fdd6f9d	etnaviv: etnaviv_bo_cache_test: Use /dev/dri/renderD128 by default Signed-off-by: Guido Günther <guido.gunther@puri.sm> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-05 08:58:05 +00:00
Guido Günther	90cc0de102	build: Build etnaviv drm tests Signed-off-by: Guido Günther <guido.gunther@puri.sm> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-05 08:58:05 +00:00
Guido Günther	01a8ba79fe	etnaviv: drm tests: Use mesa header locations Signed-off-by: Guido Günther <guido.gunther@puri.sm> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-05 08:58:05 +00:00
Guido Günther	2b377547c3	etnaviv: Add libdrm tests as of 922d92994267743266024ecceb734ce0ebbca808 Signed-off-by: Guido Günther <guido.gunther@puri.sm> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-05 08:58:05 +00:00
Guido Günther	b921df352d	build: Build etnaviv drm Signed-off-by: Guido Günther <guido.gunther@puri.sm> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-05 08:58:05 +00:00
Guido Günther	3696235f82	etnaviv: gallium: Use internal etnaviv_drmif.h Signed-off-by: Guido Günther <guido.gunther@puri.sm> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-05 08:58:05 +00:00
Guido Günther	95d8b4ac0b	etnaviv: drm: s/bo_del/_etna_bo_del/ This avoids a conflict with freedreno's bo_del(). Signed-off-by: Guido Günther <guido.gunther@puri.sm> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-05 08:58:05 +00:00
Guido Günther	17d7282cca	etnaviv: drm: s/table_lock/etna_table_lock/ This avoids a conflict with freedreno's table_lock Signed-off-by: Guido Günther <guido.gunther@puri.sm> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-05 08:58:05 +00:00
Guido Günther	7a5b19346a	etnaviv: drm: Move uapi header Signed-off-by: Guido Günther <guido.gunther@puri.sm> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-05 08:58:05 +00:00
Guido Günther	3925d38870	etnaviv: drm: Drop excessive debugging in perfmon Signed-off-by: Guido Günther <guido.gunther@puri.sm> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-05 08:58:05 +00:00
Guido Günther	22fa1c95ff	entaviv: drm: Don't use drmMsg() Signed-off-by: Guido Günther <guido.gunther@puri.sm> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-05 08:58:05 +00:00
Guido Günther	c93007a618	etnaviv: drm: Use _mesa_hash_table instead of drmHash Signed-off-by: Guido Günther <guido.gunther@puri.sm> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-05 08:58:05 +00:00
Guido Günther	92fc14321f	etnaviv: drm: Use mesa's ARRAY_SIZE Signed-off-by: Guido Günther <guido.gunther@puri.sm> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-05 08:58:05 +00:00
Guido Günther	e87d128b52	etnaviv: drm: Use mesa's os_m{un,}map Signed-off-by: Guido Günther <guido.gunther@puri.sm> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-05 08:58:05 +00:00
Guido Günther	2ebd444c10	etnaviv: drm: Use mesa's atomic definitions Signed-off-by: Guido Günther <guido.gunther@puri.sm> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-05 08:58:05 +00:00
Guido Günther	6ab83b8474	etnaviv: drm: Drop drm_{public,private} Signed-off-by: Guido Günther <guido.gunther@puri.sm> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-05 08:58:05 +00:00
Guido Günther	66eb554d46	etnaviv: drm: Drop inexistent headers Signed-off-by: Guido Günther <guido.gunther@puri.sm> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-05 08:58:05 +00:00
Guido Günther	58eec3808e	etnaviv: Add libdrm code as of 922d92994267743266024ecceb734ce0ebbca808 Signed-off-by: Guido Günther <guido.gunther@puri.sm> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-05 08:58:05 +00:00
Guido Günther	3835e21369	etnaviv: untabify Two driver files had tabs mixed with spaces. Remove the tabs. Signed-off-by: Guido Günther <guido.gunther@puri.sm> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-05 08:58:05 +00:00
Tomeu Vizoso	c7a6e07454	panfrost: bifrost: Fix format string in disassembler The compiler configuration was hardened to fail on format warnings and things stopped building. Fixes: `c9c1e26106` ("mesa: prevent common string formatting security issues") Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-By: Ryan Houdek <Sonicadvance1@gmail.com>	2019-06-05 10:40:19 +02:00
Kenneth Graunke	8d4f68ee20	iris: Free the buffer when reading from the disk cache.	2019-06-04 23:53:57 -07:00
Alyssa Rosenzweig	bfa9f56a2a	panfrost/midgard: Don't promote non-SSA to pipeline registers Fixes: `33800f4612` ("panfrost/midgard: Implement "pipeline register" prepass") Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2019-06-05 00:12:36 +00:00
Eric Anholt	36cb209787	freedreno: Drop invalid scissor optimization. We do support TF now, so it's no longer valid. Besides, if we want this optimization, we should probably have mesa/st doing it right for everyone. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-06-04 16:44:37 -07:00
Eric Anholt	8843b90cac	freedreno: Reuse glsl_get_sampler_coordinate_components(). We have the GLSL type, so we can just ask it how many coordinates there are. The GLSL function already has Vulkan cases that we'd probably want eventually. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-06-04 16:44:24 -07:00
Eric Anholt	fb872748ec	freedreno: Improve the pi approximations in trig lowering. When comparing our sin/cos behavior to the closed source driver, I noticed that we were off by a bit (or, in the case of 1/2pi, 3 bits). Fixes: dEQP-GLES3.functional.shaders.random.trigonometric.vertex.52 dEQP-GLES3.functional.shaders.random.all_features.vertex.0 Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-06-04 23:35:38 +00:00
Marek Olšák	ff63b99531	ac: rename LLVM <= 7 helpers for readability Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-06-04 18:53:46 -04:00
Marek Olšák	c9b64b58de	ac: fix a typo in ac_build_wg_scan_bottom Cc: 19.1 <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-06-04 18:53:46 -04:00
Caio Marcelo de Oliveira Filho	04e8ff8595	glx: Fix error message when no driverName is available Just provide a "(null)" literal in case driverName is NULL. In file included from ../src/glx/dri3_glx.c:76: ../src/glx/dri3_glx.c: In function ‘dri3_create_screen’: ../src/glx/dri_common.h:70:36: error: ‘%s’ directive argument is null [-Werror=format-overflow=] 70 \| #define CriticalErrorMessageF(...) dri_message(_LOADER_FATAL, __VA_ARGS__) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../src/glx/dri3_glx.c:1002:4: note: in expansion of macro ‘CriticalErrorMessageF’ 1002 \| CriticalErrorMessageF("failed to load driver: %s\n", driverName); \| ^~~~~~~~~~~~~~~~~~~~~ ../src/glx/dri3_glx.c:1002:50: note: format string is defined here 1002 \| CriticalErrorMessageF("failed to load driver: %s\n", driverName); \| ^~ cc1: some warnings being treated as errors Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-04 15:28:12 -07:00
Chia-I Wu	65439291a0	virgl: resolve to correct level during texture read When PIPE_TRANSFER_READ requires a resolve, we blit from the host storage to a temporary storage, and do a format conversion from the temporary storage to the guest storage. This change makes sure we convert to the correct level of the guest storage. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-06-04 21:37:03 +00:00
Chia-I Wu	067018d4e7	virgl: fix texture resolving with compressed formats util_format_translate_3d expects the source box to be aligned to the block size. When resolving, make sure the size of the staging buffer is aligned to the block size. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-06-04 21:37:03 +00:00
Bas Nieuwenhuizen	a6a5a6f67f	freedreno: Add printf pattern string. Some new flag setting disallows it due to being a security risk. Fixes: `c9c1e26106` "mesa: prevent common string formatting security issues" Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-06-04 23:20:50 +02:00
Bas Nieuwenhuizen	6256925b11	Revert "vl: Enable DRM by default." Reason: meson.build:586:7: ERROR: Unknown variable "dep_libdrm". if building without x11 platform. This reverts commit `392c60928a`.	2019-06-04 23:14:56 +02:00
Alyssa Rosenzweig	4a03d37827	panfrost/midgard: .pos propagation A previous optimization converts fmax(x, 0.0) instructions to fmov.pos. This pass then propagates the .pos from the move up to the source instruction (when possible). From there, copy propagation will eliminate the move. In the future, we might prefer to do this in common NIR code like we do for saturate, as Bifrost can also benefit. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>	2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig	5da0a33fab	panfrost/midgard: Cleanup copy propagation Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>	2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig	33800f4612	panfrost/midgard: Implement "pipeline register" prepass This prepass, run after scheduling but before RA, specializes to pipeline registers where possible. It walks the IR, checking whether sources are ever used outside of the immediate bundle in which they are written. If they are not, they are rewritten to a pipeline register (r24 or r25), valid only within the bundle itself. This has theoretical benefits for power consumption and register pressure (and performance by extension). While this is tested to work, it's not clear how much of a win it really is, especially without an out-of-order scheduler (yet!). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>	2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig	2a79afc5f0	panfrost/midgard: Helpers for pipeline Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>	2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig	3c7abbfbe8	panfrost/midgard: Refactor schedule/emit pipeline First, this moves the scheduler and emitter out of midgard_compile.c into their own dedicated files. More interestingly, this slims down midgard_bundle to be essentially an array of _pointers_ to midgard_instructions (plus some bundling metadata), rather than the instructions and packing themselves. The difference is critical, as it means that (within reason, i.e. as long as it doesn't affect the schedule) midgard_instrucitons can now be modified _after_ scheduling while having changes updated in the final binary. On a more philosophical level, this removes an IR. Previously, the IR before scheduling (MIR) was separate from the IR after scheduling (post-schedule MIR), requiring a separate set of utilities to traverse, using different idioms. There was no good reason for this, and it restricts our flexibility with the RA. So unify all the things! Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>	2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig	0524ab9c37	panfrost/midgard: Cleanup RA (stylistic changes) Trivial. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>	2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig	debc29b9ad	panfrost/midgard: Share MIR utilities These are more generally useful than the files they were constrained to. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>	2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig	1bfa0d6ccc	panfrost/midgard: Misc. cleanup for readibility Mostly, this fixes a number of instances of lines >> 80 chars, refactoring them into something legible. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>	2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig	2d98022330	panfrost/midgard: Extend RA to non-vec4 sources This represents a major break with the former RA design. We now use conflicting register classes to represent the subdivision of Midgard's 128-bit registers into varying sizes and arrangement. We determine class based on the number of components in the instructions' masks. To support this, we include a number of helpers in the RA to allow composing swizzles and masks, such that MIR written implicitly assuming .xyzw sources can be transformed to use actual (non-aligned) sources. The net result is a marked decrease in register pressure on non-vec4-exclusive shaders. We could still be doing much better. Not implemented yet are: - Register spilling - Per-component liveness Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>	2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig	c1715b558a	panfrost/midgard: Set masks on ld_vary These masks distinguish scalar/vec2/vec3 loads from the default vec4, which helps with assembly readability (since it's immediately obvious how many components are _actually_ affected, rather than doing mysterious things to an unknown number of unused components). Later in the series, this will enable smarter register allocation, as the unused components will not be interpreted abnormally. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>	2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig	550be763fa	panfrost/midgard: Fix liveness analysis bugs This fixes liveness analysis with respect to inline constants and branching. in practice, the symptom is abnormally high register pressure. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>	2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig	c54f3f42eb	panfrost/midgard: Set int outmod for "pasted" code These snippets of integer assembly are injected for various purposes. Eventually, we'll want to implement these in NIR directly. Regardless, the "default" output modifier is different between floats and ints, so let's set the right one. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>	2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig	51196c3591	panfrost/midgard: Hoist some utility functions These were static to midgard_compile.c but are more generally useful across the compiler. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>	2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig	005d9b1ada	panfrost/midgard: Remove pinning This mechanism is only used by blend shaders, so just use a move here. Ideally, it'll be copy-propped and DCE'd away; this removes a source of considerable indirection and will simplify RA logic. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>	2019-06-04 20:14:50 +00:00
Alyssa Rosenzweig	d2d3cc66cf	nir/algebraic: Simplify max(abs(a), 0.0) -> abs(a) This pattern was noticed in glmark's jellyfish scene. v2: Add inexact qualifier due to NaN behaviour. Minimal shader-db changes (slightly helped). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Reviewed-by: Elie Tournier <tournier.elie@gmail.com>	2019-06-04 19:57:19 +00:00
Mark Janes	c9c1e26106	mesa: prevent common string formatting security issues Adds a compile-time error for obvious security issues like: printf(string_var); The proposed flag is more tolerant than -Wformat-nonliteral. Specifically, it tolerates common mesa formatting like: static const char *shader_template = "really long string %d"; printf(shader_template, uniform_number); Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110833 Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-06-04 12:49:38 -07:00
Jason Ekstrand	f4ef34f207	intel/fs: Add an UNDEF instruction to avoid excess live ranges With 8 and 16-bit types and anything where we have to use non-trivial strides registersto deal with restrictions, we end up with things that look like partial writes even though we don't care about any values in the register except those written by that instruction. This is particularly important when dealing with loops because liveness sees is_partial_write and the fact that an old version from a previous loop iteration may be valid at that point and extends all purely partially written values to the entire loop. This commit adds a new UNDEF instruction which does nothing (the generator doesn't emit anything) but which does a fake write to the register. This informs liveness that we don't care about any values before that point so it won't consider those registers to be falsely live. We can safely emit UNDEF instructions for all SSA values that come in from NIR and nearly all temporaries generated by various stages of the compiler. In particular, we need to insert UNDEF instructions when we handle region restrictions because the newly allocated registers are almost guaranteed to be partially written. No shader-db changes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110432 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-04 14:27:30 -05:00
Caio Marcelo de Oliveira Filho	d482a8f680	spirv: Update the OpenCL.std.h header This corresponds to commit 8b911bd2ba37677037b38c9bd286c7c05701bcda on GitHub. We previously tweaked OpenCL.std.h from upstream to be included in C code. Now upstream header can be included, however the symbol names are slightly different (include an OpenCLstd_ prefix), so this patch also fixes vtn_opencl.c to use those. Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-06-04 12:12:51 -07:00
Bas Nieuwenhuizen	9701cb1034	radv: Use bo metadata for imported image tiling on Android. This way we handle linear images etc. correctly. Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-06-04 18:32:45 +00:00
Bas Nieuwenhuizen	392c60928a	vl: Enable DRM by default. If libdrm is found the pipe loader enables drm anyway, and that is pretty much the only extra dependency this code has. This enables creating libva display using a drm fd without having to enable the DRM (GBM really) backend of EGL, which is completely unrelated. Leaving the X11 platforms alone as they would still result in the additional inclusion of extra deps. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-06-04 20:01:34 +02:00
Jason Ekstrand	c2a0335bb0	anv: Advertise support for VK_EXT_fragment_shader_interlock Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-06-04 17:30:51 +00:00
Jason Ekstrand	5176805471	spirv: Implement SPV_EXT_fragment_shader_interlock Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-06-04 17:30:51 +00:00
Jason Ekstrand	b5aa76b1df	spirv: Update the headers from latest Khronos master This corresponds to 8b911bd2ba37677037b38c9bd286c7c05701bcda in https://github.com/KhronosGroup/SPIRV-Headers. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-06-04 17:30:51 +00:00
Jason Ekstrand	8339e3f010	vulkan: Update the XML and headers to 1.1.110 Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-06-04 17:30:51 +00:00
Rhys Perry	73dda85512	ac/nir: mark some texture intrinsics as convergent Otherwise LLVM can sink them and their texture coordinate calculations into divergent branches. v2: simplify the conditions on which the intrinsic is marked as convergent v3: only mark as convergent in FS and CS with derivative groups Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-04 17:30:53 +01:00
Rhys Perry	d4a2f8b33b	radv: fix some compiler warnings Fixes -Woverflow warnings with GCC 9.1.1 v2: use a cast instead of a bitwise and Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-04 17:30:53 +01:00
Jason Ekstrand	a84de3fb7c	intel/fs: Skip registers faster when setting spill costs This might be slightly faster since we're doing one read rather than two before we decide to skip. The more important reason, however, is because no_spill prevents us from re-spilling spill registers. In the new world in which we don't re-calculate liveness every spill, we may not have valid liveness for spill registers so we shouldn't even look their live ranges up. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110825 Fixes: `e99081e76d` "intel/fs/ra: Spill without destroying the..." Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Tested-by: Tapani Pälli <tapani.palli@intel.com>	2019-06-04 14:37:56 +00:00
Connor Abbott	d68218dbca	radeonsi/nir: Fix type in bindless address computation Bindless handles in GL are 64-bit. This fixes an assert failure in LLVM. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-04 15:15:46 +02:00
Christian Gmeiner	a6e879984c	etnaviv: implement set_active_query_state(..) for hw queries Clear w/ quad uses a normal draw which adds up to OQ. st/meta uses set_active_query_state(..) to tell the driver to pause queries in such cases. Fixes spec@arb_occlusion_query@occlusion_query_meta_save piglit. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-04 14:58:02 +02:00
Samuel Pitoiset	8a35eb0602	radv: do not use gfx fast depth clears for layered depth/stencil images The driver should only fast depth clears with the graphics path when the view covers all image layers, otherwise this might corrupt layers when HTILE is enabled. Cc: 19.0 19.1 mesa-stable@lists.freedesktop.org Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-04 08:55:32 +02:00
Samuel Pitoiset	33f4e04d5a	ac,radv: do not emit vec3 for raw load/store on SI It's unsupported, only load/store format with vec3 are supported. Fixes: `6970a9a6ca` ("ac,radv: remove the vec3 restriction with LLVM 9+")" Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-04 08:47:26 +02:00
Sagar Ghuge	3016756398	intel/compiler: Fix assertions in brw_alu3 v2: Fix assertion for src1 (Ian Romanick) Fixes: `3b967e17` (intel/compiler: Avoid false positive assertions) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Suggested-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-06-03 23:14:34 -07:00
Kenneth Graunke	34d3103dee	iris: Fix SO stride units for DrawTransformFeedback Mesa measures in DWords. The hardware also claims to measure in DWords. Except the SO_WRITE_OFFSET field is actually bits 31:2, with 1:0 MBZ. Which means that it really measures in bytes. So, convert to bytes. Without this, our offset / stride denominator was 1/4th the size it should be, leading to 4x the vertex count that we should have had. Fixes GTF-GL46.gtf40.GL3Tests.transform_feedback2.transform_feedback2_two_buffers	2019-06-03 22:51:18 -07:00
Timothy Arceri	fea36a8f43	st/glsl: make sure to propagate initialisers to driver storage This essentially reverts `20234cfe3a`. Fixes piglit test: tests/spec/arb_get_program_binary/execution/uniform-after-restore.shader_test Fixes: `20234cfe3a` "st/mesa: don't propagate uniforms when restoring from cache" Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110784	2019-06-04 11:36:45 +10:00
Caio Marcelo de Oliveira Filho	61de825e11	spirv: Like Uniform, do nothing for UniformId Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-03 17:20:54 -07:00
Caio Marcelo de Oliveira Filho	b4eff83180	spirv: Implement SpvOpCopyLogical This is the same as SpvOpCopyObject but without the type checking, which is how vtn_composite_copy works, so we just need to hook the operation. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-03 17:20:54 -07:00
Caio Marcelo de Oliveira Filho	81586e9f53	spirv: Generalize OpSelect SPIR-V 1.4 supports OpSelect over any composite type, and also allows scalar boolean condition for vector types -- a case which we already handled to support old GLSLang. Added a helper function to recursively perform nir_bcsel, that makes easier to support structs. v2: Replace asserts() with vtn_fail_if(). (Jason) v3: Simplify Condition and Result types verifications. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-03 17:20:54 -07:00
Caio Marcelo de Oliveira Filho	17630291e5	spirv: Move OpSelect handling to a function This will make a later change easier to review. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-03 17:20:54 -07:00
Caio Marcelo de Oliveira Filho	ea0e89859c	nir/vars_to_ssa: Handle UNDEF_NODE in more places Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110832 Fixes: `911ea2c66f` "nir/vars_to_ssa: Use a non-null UNDEF_NODE pointer" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-03 17:09:22 -07:00
Marek Olšák	b2bbd1a27b	ac/registers: don't use the si, cik, vi names, use gfxN trivial	2019-06-03 20:06:41 -04:00
Nicolai Hähnle	f480b8aaa4	amd/common: use generated register header	2019-06-03 20:05:20 -04:00
Nicolai Hähnle	853ef5ccba	amd/common: use SH{0,1}_CU_EN definitions only of COMPUTE_STATIC_THREAD_MGMT_SE0 The automatic header generation unifies identical registers in a series and only emits definitions for the first one. This is mostly to avoid emitting excessive definitions for CB registers, but special-casing an exception for this family of registers doesn't seem worth it.	2019-06-03 20:05:20 -04:00
Nicolai Hähnle	cf51009ad2	amd/common: unify PITCH_GFX6 and PITCH_GFX9 The definition of the fields differs, but PITCH_GFX9 is a mere extension of PITCH_GFX6 that does not conflict with any other fields. This aligns the definitions with what will be generated from the register JSON. The information about how large the fields really are is preserved in the register database.	2019-06-03 20:05:20 -04:00
Nicolai Hähnle	e04215815e	amd/common: rename R_3F2_CONTROL to IB_CONTROL for disambiguation This "register" name collides with R_370_CONTROL. This aligns the definitions with what will be generated from the register JSON.	2019-06-03 20:05:20 -04:00
Nicolai Hähnle	cd247cf456	amd/common: cleanup DATA_FORMAT/NUM_FORMAT field names The field layout wasn't actually changed in gfx9, so having the suffix isn't very useful. The field contents were changed, but this is reflected in the V_xxx_xxx definitions and is taken into account by the ac_debug logic based on the register JSON. This aligns the definitions with what will be generated from the register JSON.	2019-06-03 20:05:20 -04:00
Nicolai Hähnle	ef6ef098af	amd/common: derive ac_debug tables from register JSON	2019-06-03 20:05:20 -04:00
Nicolai Hähnle	d02286c753	amd/registers: add JSON description of packet3 fields	2019-06-03 20:05:20 -04:00
Nicolai Hähnle	67702e3319	amd/registers: add JSON descriptions of registers The descriptions are mostly derived from parsing the existing register headers.	2019-06-03 20:05:20 -04:00
Nicolai Hähnle	e6184b0892	amd/registers: scripts for processing register descriptions in JSON We will derive both the debugging tables and (the majority of) the register headers from descriptions in JSON, instead of deriving the debugging tables from an awkward parsing of the register headers. Some of the scripts are useful for maintaining the register database itself. The scripts are designed to output reasonably readable JSON by default.	2019-06-03 20:05:20 -04:00
Vinson Lee	d4e70be739	freedreno: Fix GCC build error. ../src/freedreno/vulkan/tu_device.c:900:4: error: initializer element is not constant .minImageTransferGranularity = (VkExtent3D) { 1, 1, 1 }, ^ Suggested-by: Kristian Høgsberg <krh@bitplanet.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110698 Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-06-03 16:46:54 -07:00
Mark Janes	774a088f64	mesa: Use string literals for format strings Android build settings require format strings to be string literals. Fixes: `d2906293c4` "mesa: EXT_dsa add selectorless matrix stack functions" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110833 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-03 16:17:23 -07:00
Caio Marcelo de Oliveira Filho	045aeccf0e	iris: Always reserve binding table space for NIR constants Don't have a separate mechanism for NIR constants to be removed from the table. If unused, we will compact it away. The use_null_surface is needed when INTEL_DISABLE_COMPACT_BINDING_TABLE is set. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho	5611444809	iris: Print binding tables when INTEL_DEBUG=bt Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho	97cd865be2	iris: Compact binding tables Change the iris_binding_table to keep track of what surfaces are actually going to be used, then assign binding table indices just for those. Reducing unused bytes on those are valuable because we use a reduced space for those tables in Iris. The rest of the driver can go from "group indices" (i.e. UBO #2) to BTI and vice-versa using helper functions. The value IRIS_SURFACE_NOT_USED is returned to indicate a certain group index is not used or a certain BTI is not valid. The environment variable INTEL_DISABLE_COMPACT_BINDING_TABLE can be set to skip compacting binding table. v2: (all from Ken) Use BITFIELD64_MASK helper. Improve comments. Assert all group is marked as used when we have indirects. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho	79f1529ae0	iris: Create an enum for the surface groups This will make convenient to handle compacting and printing the binding table. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho	1c8ea8b300	iris: Handle binding table in the driver Stop using brw_compiler to lower the final binding table indices for surface access. This is done by simply not setting the 'prog_data->binding_table.*_start' fields. Then make the driver perform this lowering. This is a better place to perfom the binding table assignments, since the driver has more information and will also later consume those assignments to upload resources. This also prepares us for two changes: use ibc without having to implement binding table logic there; and remove unused entries from the binding table. Since the `block` field in brw_ubo_range now refers to the final binding table index, we need to adjust it before using to index shs->constbuf. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho	518f83236b	iris: Pull brw_nir_analyze_ubo_ranges() call out setup_uniforms We'll change iris to perform lowering of the binding table indices earlier (before the backend kick in), but the backend compiler uses the result of the analysis to identify load_ubo intrinsics, so we do the analysis after the lowering to have the right indices. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-06-03 14:14:45 -07:00
Caio Marcelo de Oliveira Filho	1f8546ba2f	spirv: Implement OpPtrEqual, OpPtrNotEqual and OpPtrDiff Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-03 13:45:09 -07:00
Caio Marcelo de Oliveira Filho	ca164ab495	nir: Add functions to subtract and compare addresses v2: Fix comparing addresses from formats that have more than one component by using nir_ball_iequal(). (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-03 13:45:09 -07:00
Caio Marcelo de Oliveira Filho	09cc3389b9	nir: Add nir_ball_iequal() helper Similar to nir_bany_inequal(). Suggested by Jason. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-06-03 13:45:09 -07:00
Sergii Romantsov	88340372ee	mesa: ARB program parser should clean parameters Program parser allocates parameter list. In case of parsing error some variables will not be freed. Patch adds freeing of it. Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-03 16:41:26 -04:00
Hyunjun Ko	382e3553af	freedreno/ir3: fix counting and printing for half registers. v2: defining 0x100 and use this for setting the FS_OUTPUT_REG.HALF_PRECISION Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-06-03 13:31:51 -07:00
Neil Roberts	fb53b326c2	freedreno/ir3: Fix up the half reg source even when src instr==NULL Previously the loop for assigning registers was bailing out early if the register had a null source. I think the intention is that in this case it isn’t necessary to assign a register. However it was also missing out the part to fix up the types. This can happen if the instruction is copy propagated to be a move from a constant half-float input register. In that case it still needs to fix up the types. Fixes assert in dEQP-GLES3.functional.shaders.invariance.highp.subexpression_precision_mediump when lowering the precision of the variables. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-06-03 13:31:51 -07:00
Neil Roberts	3222216a58	freedreno/ir3: Add a 16-bit implementation of nir_op_imul Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-06-03 13:31:51 -07:00
Hyunjun Ko	daee6bc1a1	freedreno/ir3: set dst type of alu instructions correctly. Though it should be fixed in RA pass, it needs to be set correctly from the beginning according to the bitsize of NIR dest. v2: Would be better for mad,fddx,fddy to fixup later in RA pass. [small cleanup of fallout from imov/fmov removal fallout] Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-06-03 13:31:26 -07:00
Hyunjun Ko	43d80a3e20	freedreno/ir3: adjust the bitsize of regs when an array loading. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-06-03 12:44:03 -07:00
Hyunjun Ko	cbd1f47433	freedreno/ir3: convert back to 32-bit values for half constant registers. It seems to handle only 32-bit values for half constant registers within floating point opcodes according to the blob driver. So we need to convert back to 32-bit values from 16-bit values, when a lower precision pass is in effect. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-06-03 12:44:03 -07:00
Hyunjun Ko	a9b556d3a0	freedreno/ir3: check the type of regs of absneg opcode in is_same_type_mov. If the type of dest reg and src reg of absneg opcode are different, it shouldn't be considered as same type mov. This patch becomes meaningful when we start to use mediump information for doing precision lowering to 16bit. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-06-03 12:44:03 -07:00
Hyunjun Ko	6fb8ef3da6	freedreno/ir3: set proper dst type for uniform according to the type of nir dest. eg. uniform mediump vec4 f; This patch means nothing since there's no mediump lowering pass for now, but will be meaningful when the pass land in the near future. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-06-03 12:44:03 -07:00
Neil Roberts	689c3c7d40	freedreno/ir3: Use output type size to set OUTPUT_REG_HALF_PRECISION Previously the A5XX_SP_FS_OUTPUT_REG_HALF_PRECISION was set depending on whether half_precision was set in the shader key. With support for mediump precision, it is possible to have different outputs use different precisions. That means we can’t have a global shader state to specify it. Instead it now tries to copy the half-float-ness from the nir_variable for the output into the ir3_shader_variant. This is then used to decide whether to set half-precision for each output. The a6xx version is copied from the a5xx code but it has not been tested. v2. [Hyunjun Ko (zzoon@igalia.com)] There's the half flag recently added, which represents precision based on IR3_REG_HALF. Now use this flag to avoid duplication. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-06-03 12:44:03 -07:00
Neil Roberts	8cd1b76b7d	freedreno/ir3: Fix loading half-float immediate vectors Previously the code to load from a constant instruction was always using the u32 pointer. If the constant is actually a 16-bit source this would end up with the wrong values because the pointer would be offset by the wrong size. This fixes it to use the u16 pointer. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-06-03 12:44:03 -07:00
Rob Clark	7bbf21e898	freedreno/ir3: immediately schedule meta instructions The aren't real instructions, and don't change # of live values, so no point in them competing with real instructions. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-03 12:44:03 -07:00
Rob Clark	771d04c82d	freedreno/ir3: scheduler improvements For instructions that increase the # of live values, apply a threshold to avoid scheduling them too early. And factor the net change of # of live values that would result from scheduling an instruction, to prioritize instructions that reduce number of live values as the number of live values increases. For manhattan: total instructions in shared programs: 27869 -> 28413 (1.95%) instructions in affected programs: 26756 -> 27300 (2.03%) helped: 102 HURT: 87 total full in shared programs: 1903 -> 1719 (-9.67%) full in affected programs: 1390 -> 1206 (-13.24%) helped: 124 HURT: 9 The reduction in register usage nets ~20% gain in manhattan. (So getting mediump support should be a huge win for gles gfxbench.) Also significantly helps some of the more complex shadertoy shaders, like IQ's Piano (32 to 18 regs, doubles fps). The effect is less pronounced on smaller shaders. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-03 12:44:03 -07:00
Rob Clark	bb3aa44ade	freedreno/ir3: sched should mark outputs used Account for shader outputs and values live in any direct/indirect successor block. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-03 12:44:03 -07:00
Pierre-Eric Pelloux-Prayer	d2906293c4	mesa: EXT_dsa add selectorless matrix stack functions Allows the legacy matrix stacks to be manipulated without disturbing the matrix mode selector. Adapted from a patch from Chris Forbes. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-03 15:28:51 -04:00
Pierre-Eric Pelloux-Prayer	28ce704bb0	mesa: factor out enum -> matrix stack lookup Split this out from glMatrixMode since we're about to need it independently for EXT_DSA. Adapted from Chris Forbes commit. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-03 15:28:49 -04:00
Timothy Arceri	b69584ad69	mesa: add new EXT_direct_state_access tokens Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-03 15:28:47 -04:00
Chris Forbes	028682f7f4	glapi: add EXT_direct_state_access Signed-off-by: Chris Forbes <chrisf@ijw.co.nz> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-03 15:28:45 -04:00
Timothy Arceri	9c5d86af38	mesa: add a list of EXT_direct_state_access to dispatch sanity This extension is huge and this gives us a TODO list of functions to implement. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-03 15:28:33 -04:00
Pierre-Eric Pelloux-Prayer	4583f09caa	radeonsi: init sctx->dma_copy before using it Commit `a1378639ab` reordered context functions initializations but broke sctx->b.resource_copy_region init when using AMD_DEBUG=forcedma. In this case sctx->dma_copy was assigned a value after being used in: sctx->b.resource_copy_region = sctx->dma_copy; This commit moves the FORCE_DMA special case after sctx->dma_copy initialization. See https://bugs.freedesktop.org/show_bug.cgi?id=110422 Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-06-03 15:05:30 -04:00
Axel Davy	5820ac6756	d3dadapter9: Revert to old throttling limit value Recently PIPE_CAP_MAX_FRAMES_IN_FLIGHT was changed from 2 to 1: `20909284f2` No driver seems to overwrite the default value. One user reports severe regressions for some games. For now, revert to the value 2 for nine. Cc: "19.1" mesa-stable@lists.freedesktop.org Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-06-03 20:37:13 +02:00
Marek Olšák	486bc1e17e	ac: use amdgpu-flat-work-group-size Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-03 14:32:47 -04:00
Marek Olšák	4b11ed443b	u_blitter: don't fail mipmap generation for depth formats containing stencil Bugzilla: https://bugzilla.freedesktop.org/show_bug.cgi?id=109754 Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org> Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>	2019-06-03 14:32:47 -04:00
Christian Gmeiner	3135ca4172	etnaviv: drop a bunch of duplicated gallium PIPE_CAP default code Now that we have the util function for the default values, we can get rid of the boilerplate. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-06-03 16:29:59 +02:00
Samuel Pitoiset	445098916a	radv: flush pending query reset caches before copying results From the Vulkan spec 1.1.108: "vkCmdCopyQueryPoolResults is guaranteed to see the effect of previous uses of vkCmdResetQueryPool in the same queue, without any additional synchronization." Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-06-03 16:05:46 +02:00
Jonathan Marek	91672becc3	nir: copy intrinsic type when lowering load input/uniform and store output Fixes: `c1275052` "nir: add type information to load uniform/input and store output intrinsics" Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Erico Nunes <nunes.erico@gmail.com> Tested-by: Erico Nunes <nunes.erico@gmail.com> Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>	2019-06-03 12:46:14 +00:00
Samuel Pitoiset	6970a9a6ca	ac,radv: remove the vec3 restriction with LLVM 9+ This changes requires LLVM r356755. 32706 shaders in 16744 tests Totals: SGPRS: 1448848 -> 1455984 (0.49 %) VGPRS: 1016684 -> 1016220 (-0.05 %) Spilled SGPRs: 25871 -> 25815 (-0.22 %) Spilled VGPRs: 122 -> 122 (0.00 %) Scratch size: 11964 -> 11956 (-0.07 %) dwords per thread Code Size: 55324500 -> 55301152 (-0.04 %) bytes Max Waves: 235660 -> 235586 (-0.03 %) Totals from affected shaders: SGPRS: 293704 -> 300840 (2.43 %) VGPRS: 246716 -> 246252 (-0.19 %) Spilled SGPRs: 159 -> 103 (-35.22 %) Scratch size: 188 -> 180 (-4.26 %) dwords per thread Code Size: 8653664 -> 8630316 (-0.27 %) bytes Max Waves: 60811 -> 60737 (-0.12 %) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-06-03 11:30:08 +02:00
Caio Marcelo de Oliveira Filho	75590604a9	nir: Return nir_type_invalid for non-numeric base types Now that the type gathering function look at instructions that might have other types, return invalid type instead of crashing. That invalid will be properly ignored later. Fixes: `c12750527b` "nir: add type information to load uniform/input and store output intrinsics" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 16:27:03 -07:00
Caio Marcelo de Oliveira Filho	27497c5c02	iris: Drop unused locals from iris_clear.c to avoid warning Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-05-31 15:55:05 -07:00
Jonathan Marek	f387c2b238	nir: remove bool lowering from lower_int_to_float Removes the bool_to_float logic from the int_to_float pass, so that both can be used separately. By having separate passes we have better validation and it makes it possible to use with the lower_ftrunc option (int lowering generates ftrunc, but lower_ftrunc generates bools, ftrunc lowering should probably be reworked). For now we always expect lower_bool to come after lower_int. Also fixes f2i32 to become ftrunc and adds u2f/f2u cases. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 21:35:26 +00:00
Jonathan Marek	f6579ee204	nir: fix lower_{int,bool}_to_float for new mov opcode It is treated like the vecN instructions which also have no type. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 21:35:26 +00:00
Jonathan Marek	f889180ee1	nir: add lower_bitshift option Add a "lower_bitshift" option, which disables optimizations introducing bitshifts and lowers ishl by constant to a multiply, so that we don't have to deal with bitshifts in int_to_float lowering. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 21:35:26 +00:00
Jonathan Marek	887c2a6092	nir: fix gather_ssa_types Consts and undefs can be used as different types (common with "0" constant) so don't copy types from consts/undefs, only to them. It doesn't entirely solve the problem that the type given to the const could be wrong , but now the only realistic case is with "0" which is the same when casted to float, so it doesn't matter for lower_int_to_float. The other change is to get type information for load input/uniform and store output, and use that to get correct results. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 21:35:26 +00:00
Jonathan Marek	c12750527b	nir: add type information to load uniform/input and store output intrinsics This type information will be used by gather_ssa_types to get usable results Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 21:35:26 +00:00
Jonathan Marek	6016df211f	nir: improvements to native_integers removal Improvements related to the patch that removed native_integers: * In glsl_to_nir, special cases for i2f,u2f,etc are no longer needed * In prog_to_nir, use sge/slt and let lower_scmp lower it if needed Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 21:35:26 +00:00
Rob Clark	32131a9568	freedreno/a6xx: add 'type' to shader state key We could have identical texture state for both VS and FS.. which would result in VS state getting created first, and FS state mapping to the identical cmdstream. Resulting in VS state getting emitted twice and no FS state emitted. Fixes: dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.basic_array.sampler2D_both dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.struct_in_array.sampler2D_samplerCube_both dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.array_in_struct.sampler2D_samplerCube_both dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both dEQP-GLES31.functional.program_uniform.by_pointer.render.array_in_struct.sampler2D_samplerCube_both dEQP-GLES31.functional.program_uniform.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both dEQP-GLES31.functional.program_uniform.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-31 12:58:47 -07:00
Rob Clark	8b7bf5e07a	freedreno/ir3: fix constlen versus indirect UBO If we access the address of the UBO indirectly, and there is no higher const emitted w/ direct access (like an immediate lowered to uniform) the assembler won't figure out the correct constlen. Fixes: dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_vertex dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_fragment dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_vertex dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_fragment Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-31 12:58:33 -07:00
Rob Clark	8eaa2d5021	freedreno/a6xx: fix GPU crash on small render targets Fixes dEQP-GLES2.functional.multisampled_render_to_texture.readpixels Signed-off-by: Rob Clark <robdclark@chromium.org> Acked-by: Eric Anholt <eric@anholt.net>	2019-05-31 12:58:33 -07:00
Rob Clark	f9fa456e1d	freedreno/ir3: set more barrier bits Blob is also setting the .l bit, and it seems to solve some intermittent failures with a couple of deqp's: dEQP-GLES31.functional.image_load_store.2d.qualifiers.coherent_r32i dEQP-GLES31.functional.image_load_store.2d.qualifiers.volatile_r32f Signed-off-by: Rob Clark <robdclark@chromium.org> Acked-by: Eric Anholt <eric@anholt.net>	2019-05-31 12:58:33 -07:00
Rob Clark	5d43b806ba	freedreno/ir3: set (ss) on last_input if ldlv It seems like (ei) handling doesn't sync on (ss), so we could end up in a situation where we release varying storage before an ldlv for flat shaded varyings completes. Keep track if we've done an (ss) since the last ldlv, and if not add (ss) flag to last_input which gets (ei). Noticed with dEQP-GLES3.functional.fragment_out.random.24 and dEQP-GLES3.functional.fragment_out.random.27, which previously passed by luck because ir3_sched ordered instructions in a way that resulted in a lucky (ss). Signed-off-by: Rob Clark <robdclark@chromium.org> Acked-by: Eric Anholt <eric@anholt.net>	2019-05-31 12:58:33 -07:00
Rob Clark	73fb02c5d6	freedreno/ir3: add assert The special handling for last_input assumes that all the varying loads are in the first block. Add an assert to catch if anyone breaks that assumption. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-31 12:58:33 -07:00
Connor Abbott	8c74772edc	util/hash_table: Use fast modulo computation While we're here, copy the size table from set.c to get rid of hard tabs in the hash_table.c version. Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 19:14:35 +02:00
Connor Abbott	83667f7a61	util/set: Use fast modulo computation Compilation times with my shader-db database: Difference at 95.0% confidence -1.22312 +/- 0.726033 -0.283979% +/- 0.168254% (Student's t, pooled s = 1.02177) Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 19:14:30 +02:00
Connor Abbott	b87817871b	util: Add a helper for faster remainders This should be at least as fast as using fast_idiv_by_const, and has the advantage that the precomputation is simple enough to be evaluated at Mesa-compile time for hash tables and sets which have a fixed table of possible divisors. Acked-by: Eric Anholt <eric@anholt.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 19:14:27 +02:00
Connor Abbott	983b001c77	util/hash_table: Add specialized resizing add function To keep it in sync with the set implementation. Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 19:14:22 +02:00
Connor Abbott	6f9beb28bb	util/set: Add specialized resizing add function A significant portion of the time spent in nir_opt_cse for the Dolphin ubershaders was in resizing the set. When resizing a hash table, we know in advance that each new element to be inserted will be different from every other element, so we don't have to compare them, and there will be no tombstone elements, so we don't have to worry about caching the first-seen tombstone. We add a specialized add function which skips these steps entirely, speeding up resizing. Compile-time results from my shader-db database: Difference at 95.0% confidence -2.29143 +/- 0.845534 -0.529475% +/- 0.194767% (Student's t, pooled s = 1.08807) Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 19:14:16 +02:00
Connor Abbott	451211741c	util/hash_table: Pull out loop-invariant computations To keep the set and hash table in sync. Note that some of this had already been done for hash tables, in particular pulling out the hash % ht->size computation. Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 19:14:09 +02:00
Connor Abbott	f7ff685649	util/set: Pull out loop-invariant computations Unfortunately GCC can't do this for us, probably because we call the key comparison function which GCC can't prove won't modify arbitrary memory. This is a pretty hot function, so do the optimization manually to be sure the compiler will get it right. While we're here, make the computation of the new probe address use a single conditional subtract instead of a modulo, since we know that it won't ever get as big as 2 * ht->size before the modulo. Modulos tend to be pretty expensive operations. shader-db compile time results for my database: Difference at 95.0% confidence -2.24934 +/- 0.69897 -0.516296% +/- 0.159993% (Student's t, pooled s = 0.983684) Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 19:14:04 +02:00
Connor Abbott	3bd0733011	nir/instr_set: Use _mesa_set_search_or_add() Before this change, we were searching for each instruction twice, once when checking if it exists and once when figuring out where to insert it. By using the new function, we can do everything we need to do in one operation. Compilation time numbers for my shader-db database: Difference at 95.0% confidence -4.04706 +/- 0.669508 -0.922142% +/- 0.151948% (Student's t, pooled s = 0.95824) Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 19:13:59 +02:00
Connor Abbott	8a838e172f	util/set: Add a _mesa_set_search_or_add() function Unlike _mesa_set_search_and_add(), it doesn't replace an entry if it's found, returning it instead. This is useful for nir_instr_set, where we have to know both the original original instruction and its equivalent. Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-31 19:13:45 +02:00
Jonathan Marek	1db86d8b62	freedreno/ir3: fix input ncomp for vertex shaders ncomp is never set for vertex shaders, but a3xx and a4xx still use it. Fixes: `831f1a05c0` freedreno/ir3: rework varying packing Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Rob Clark <robdclark@chromium.org>	2019-05-31 12:21:23 -04:00
Ian Romanick	65df6122da	intel/compiler: Use compare rematerialization pass Almost all of the spill / fill benefit is in Deus Ex. Haswell and all Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224438 -> 17196395 (-0.16%) instructions in affected programs: 1518658 -> 1490615 (-1.85%) helped: 1550 HURT: 3 helped stats (abs) min: 1 max: 170 x̄: 18.11 x̃: 2 helped stats (rel) min: 0.04% max: 8.35% x̄: 1.12% x̃: 0.45% HURT stats (abs) min: 5 max: 10 x̄: 6.67 x̃: 5 HURT stats (rel) min: 0.32% max: 0.41% x̄: 0.35% x̃: 0.32% 95% mean confidence interval for instructions value: -19.86 -16.26 95% mean confidence interval for instructions %-change: -1.19% -1.04% Instructions are helped. total cycles in shared programs: 361468455 -> 361288721 (-0.05%) cycles in affected programs: 197367688 -> 197187954 (-0.09%) helped: 990 HURT: 683 helped stats (abs) min: 1 max: 119045 x̄: 806.00 x̃: 16 helped stats (rel) min: <.01% max: 38.56% x̄: 1.06% x̃: 0.26% HURT stats (abs) min: 1 max: 12190 x̄: 905.14 x̃: 22 HURT stats (rel) min: <.01% max: 25.18% x̄: 1.16% x̃: 0.47% 95% mean confidence interval for cycles value: -315.45 100.58 95% mean confidence interval for cycles %-change: -0.31% <.01% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 12147 -> 8948 (-26.34%) spills in affected programs: 5433 -> 2234 (-58.88%) helped: 343 HURT: 0 total fills in shared programs: 25262 -> 21814 (-13.65%) fills in affected programs: 7771 -> 4323 (-44.37%) helped: 343 HURT: 3 LOST: 0 GAINED: 17 Ivy Bridge total instructions in shared programs: 12083517 -> 12081427 (-0.02%) instructions in affected programs: 540744 -> 538654 (-0.39%) helped: 786 HURT: 29 helped stats (abs) min: 1 max: 42 x̄: 2.70 x̃: 2 helped stats (rel) min: 0.06% max: 5.44% x̄: 0.55% x̃: 0.36% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.16% max: 0.95% x̄: 0.38% x̃: 0.31% 95% mean confidence interval for instructions value: -2.83 -2.30 95% mean confidence interval for instructions %-change: -0.57% -0.47% Instructions are helped. total cycles in shared programs: 180153463 -> 180124798 (-0.02%) cycles in affected programs: 72597920 -> 72569255 (-0.04%) helped: 572 HURT: 249 helped stats (abs) min: 1 max: 14830 x̄: 109.48 x̃: 13 helped stats (rel) min: <.01% max: 8.92% x̄: 0.71% x̃: 0.26% HURT stats (abs) min: 1 max: 11060 x̄: 136.37 x̃: 10 HURT stats (rel) min: <.01% max: 10.85% x̄: 0.54% x̃: 0.32% 95% mean confidence interval for cycles value: -96.22 26.39 95% mean confidence interval for cycles %-change: -0.43% -0.23% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 3625 -> 3623 (-0.06%) spills in affected programs: 46 -> 44 (-4.35%) helped: 1 HURT: 0 total fills in shared programs: 4065 -> 4061 (-0.10%) fills in affected programs: 104 -> 100 (-3.85%) helped: 1 HURT: 0 LOST: 0 GAINED: 8 Sandy Bridge total instructions in shared programs: 10879656 -> 10878699 (<.01%) instructions in affected programs: 275167 -> 274210 (-0.35%) helped: 544 HURT: 0 helped stats (abs) min: 1 max: 20 x̄: 1.76 x̃: 1 helped stats (rel) min: 0.06% max: 3.11% x̄: 0.39% x̃: 0.25% 95% mean confidence interval for instructions value: -1.97 -1.55 95% mean confidence interval for instructions %-change: -0.43% -0.36% Instructions are helped. total cycles in shared programs: 154089096 -> 154081132 (<.01%) cycles in affected programs: 4422722 -> 4414758 (-0.18%) helped: 459 HURT: 214 helped stats (abs) min: 1 max: 258 x̄: 26.67 x̃: 8 helped stats (rel) min: <.01% max: 5.45% x̄: 0.51% x̃: 0.14% HURT stats (abs) min: 1 max: 226 x̄: 19.99 x̃: 4 HURT stats (rel) min: <.01% max: 3.15% x̄: 0.34% x̃: 0.09% 95% mean confidence interval for cycles value: -15.51 -8.15 95% mean confidence interval for cycles %-change: -0.31% -0.17% Cycles are helped. total spills in shared programs: 2880 -> 2876 (-0.14%) spills in affected programs: 636 -> 632 (-0.63%) helped: 2 HURT: 0 total fills in shared programs: 3161 -> 3157 (-0.13%) fills in affected programs: 1519 -> 1515 (-0.26%) helped: 2 HURT: 0 LOST: 0 GAINED: 2 Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8157361 -> 8155067 (-0.03%) instructions in affected programs: 382491 -> 380197 (-0.60%) helped: 677 HURT: 0 helped stats (abs) min: 1 max: 43 x̄: 3.39 x̃: 2 helped stats (rel) min: 0.09% max: 5.19% x̄: 0.66% x̃: 0.42% 95% mean confidence interval for instructions value: -3.76 -3.01 95% mean confidence interval for instructions %-change: -0.72% -0.59% Instructions are helped. total cycles in shared programs: 188588292 -> 188583040 (<.01%) cycles in affected programs: 3155064 -> 3149812 (-0.17%) helped: 377 HURT: 13 helped stats (abs) min: 2 max: 180 x̄: 14.13 x̃: 6 helped stats (rel) min: <.01% max: 3.96% x̄: 0.39% x̃: 0.12% HURT stats (abs) min: 2 max: 8 x̄: 5.85 x̃: 6 HURT stats (rel) min: <.01% max: 0.22% x̄: 0.06% x̃: 0.04% 95% mean confidence interval for cycles value: -15.67 -11.27 95% mean confidence interval for cycles %-change: -0.45% -0.30% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-31 08:47:03 -07:00
Ian Romanick	3ee2e84c60	nir: Rematerialize compare instructions On some architectures, Boolean values used to control conditional branches or condtional selection must be propagated into a flag. This generally means that a stored Boolean value must be compared with zero. Rather than force the generation of extra compares with zero, re-emit the original comparison instruction. This can save register pressure by not needing to store the Boolean value. There are several possible ares for future improvement to this pass: 1. Be more conservative. If both sources to the comparison instruction are non-constants, it may be better for register pressure to emit the extra compare. The current shader-db results on Intel GPUs (next commit) lead me to believe that this is not currently a problem. 2. Be less conservative. Currently the pass requires that all users of the comparison match the pattern. The idea is that after the pass is complete, no instruction will use the resulting Boolean value. The only uses will be of the flag value. It may be beneficial to relax this requirement in some cases. 3. Be less conservative. Also try to rematerialize comparisons used for discard_if intrinsics. After changing the way the Intel compiler generates cod e for discard_if (see MR!935), I tried implementing this already. The changes were pretty small. Instructions were helped in 19 shaders, but, overall, cycles were hurt. A commit "nir: Rematerialize comparisons for nir_intrinsic_discard_if too" is on my fd.o cgit. 4. Copy the preceeding ALU instruction. If the comparison is a comparison with zero, and it is the only user of a particular ALU instruction (e.g., (a+b) != 0.0), it may be a further improvment to also copy the preceeding ALU instruction. On Intel GPUs, this may enable cmod propagation to make additional progress. v2: Use much simpler method to get the prev_block for an if-statement. Suggested by Tim. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-31 08:47:03 -07:00
Ian Romanick	336eab0630	nir: Add a shallow clone function for nir_alu_instr Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Suggested-by: Matt Turner <mattst88@gmail.com>	2019-05-31 08:47:03 -07:00
Tomeu Vizoso	0e1c5cc78f	panfrost: Remove link stage for jobs And instead, link them as they are added. Makes things a bit clearer and prepares future work such as FB reload jobs. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-31 14:37:10 +02:00
Tomeu Vizoso	da9f7ab6d4	panfrost: ci: Switch to kernel 5.2-rc2 Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-31 13:51:51 +02:00
Tomeu Vizoso	77f5663cf3	panfrost: ci: Update expectations A bunch of tests have been fixed, but some regressions have appeared on T760. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-05-31 13:51:43 +02:00
Connor Abbott	78f33620e8	radeonsi/nir: Remove hack for builtins We now bounds check properly in the uniform loading fast path, so there's no need to disable it by pretending there are other UBO bindings in use. The way this looks at the variable name was causing problems when two piglit shaders, one with a name that triggered the hack and one that didn't, got hashed to the same thing after stripping out the names. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-05-31 11:03:05 +02:00
Connor Abbott	fca1a35163	radeonsi/nir: Use correct location for uniform access bound location is the API-level location, but driver_location is the actual location the uniform gets passed to the driver. This apparently only caused failures with builtins, where the location is 0 because it's represented via the state tokens instead. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-05-31 11:02:57 +02:00
Connor Abbott	6571032af1	radeonsi/nir: Correctly handle double TCS/TES varyings ac expands the store to 32-bit components for us, but we still have to deal with storing up to 8 components, and when a varying is split across two vec4 slots we have to calculate the address again for the second slot, since they aren't adjacent in memory. I didn't do this on the ac level because we should generate better indexing arithmetic for the lds store, where slots are contiguous. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-05-31 11:02:11 +02:00
Christian Gmeiner	ca19f7639a	etnaviv: blt: s/TRUE/true && s/FALSE/false Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-05-31 10:04:49 +02:00
Christian Gmeiner	9e6463e62a	etnaviv: rs: s/TRUE/true && s/FALSE/false Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-05-31 10:04:49 +02:00
Bas Nieuwenhuizen	e24a7840f6	nir: Actually propagate progress in nir_opt_move_load_ubo. Found with Jasons new metadata rework (https://gitlab.freedesktop.org/mesa/mesa/merge_requests/950). Fixes: `af355aaa07` "nir: add nir_opt_move_load_ubo() optimization pass" Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-05-31 07:45:43 +00:00
Samuel Pitoiset	9178076a46	radv: use RADV_CMD_DIRTY_DYNAMIC_* when restoring viewport/scissor Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-31 08:50:16 +02:00
Samuel Pitoiset	0e7b029d00	radv: use CmdPushConstants when restoring constants after meta operations Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-31 08:50:13 +02:00
Jason Ekstrand	f1cb3348f1	nir/split_vars: Properly bail in the presence of complex derefs Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-31 01:08:03 +00:00
Jason Ekstrand	cc59503b16	nir/vars_to_ssa: Properly ignore variables with complex derefs Because the core principle of the vars_to_ssa pass is that it globally (within a function) looks at all of the uses of a never-indirected path and does a full into-SSA on that path, it can't handle a path which has any chance of having aliasing. If a function_temp variable has a cast or anything else which may cause aliasing, we have to assume that all paths to that variable may alias and ignore the entire variable. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-31 01:08:03 +00:00
Jason Ekstrand	911ea2c66f	nir/vars_to_ssa: Use a non-null UNDEF_NODE pointer We're about to change the meaning of get_deref_node returning NULL so we need a non-NULL value to mean properly undefined. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-31 01:08:03 +00:00
Jason Ekstrand	e84194686d	nir/deref: Add a has_complex_use helper This lets passes easily detect derefs which have uses that fall outside the standard load/store/copy pattern so they can bail appropriately. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-31 01:08:03 +00:00
Jason Ekstrand	8948048c6f	nir/dead_cf: Call instructions aren't dead When we inlined cf_node_has_side_effects into node_is_dead, all the conditions flipped and we forgot to flip one. Fortunately, it doesn't matter right now because no one uses this pass on shaders with more than one function. Fixes: `b50465d197` "nir/dead_cf: Inline cf_node_has_side_effects" Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-31 01:08:03 +00:00
Dave Airlie	5441d56243	vtn: create cast with type stride. When creating function parameters, we create pointers from ssa values, this creates nir casts with stride 0, however we have no where else to get this value from. Later passes to lower explicit io need this stride value to do the right thing. Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-05-31 09:57:45 +10:00
Rob Clark	372e83b95f	list: add some iterator debug Debugging use of unsafe iterators when you should have used the _safe version sucks. Add some DEBUG build support to catch and assert if someone does that. I didn't update the UPPERCASE verions of the iterators. They should probably be deprecated/removed. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-05-30 22:11:26 +00:00
Caio Marcelo de Oliveira Filho	03ce12c5ed	nir: Accept nir_var_mem_global in derefs used by phis This mode is used by PhysicalStorageBufferEXT storage class. Fixes: `8bdf5a008b` "nir: Allow derefs to be used as phi sources" Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-30 14:07:29 -07:00
Jason Ekstrand	5e43a75950	intel/blorp: Use the hardware op for CCS ambiguate on gen10+ Cannonlake hardware adds a new resolve type in 3DSTATE_PS called FAST_CLEAR_0 which does an ambiguate. Now that the hardware can do it directly, we should use that instead of binding the CCS as a render target and doing it manually. This was tested with a full Vulkan CTS run on Cannonlake. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2019-05-30 13:49:48 -07:00
Jan Zielinski	b31a31bba5	swr/rast: Enable ARB_GL_texture_buffer_range No significant changes in the code needed to enable the extension. Just updating SWR capabilities and the documentation Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-05-30 15:42:15 +00:00
Jan Zielinski	cf673747ce	swr/rast: fix 32-bit compilation on Linux Removing unused but problematic code from simdlib header to fix compilation problem on 32-bit Linux. Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-05-30 15:31:15 +00:00
Jason Ekstrand	9e403dc56e	intel/fs: Do a stalling MFENCE in endInvocationInterlock() Fixes: `939312702e` "i965: Add ARB_fragment_shader_interlock support" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-30 14:00:26 +00:00
Jason Ekstrand	859de4a748	intel/fs,vec4: Use g0 as the header for MFENCE We set header_present but then pass it some random garbage. Give it g0 instead. I'm not actually sure this does anything but g0 is the usual header data and this is what the windows driver does so it seems like a good idea. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-30 14:00:26 +00:00
Samuel Pitoiset	43cc3dc9c0	radv: enable transformFeedbackStreamsLinesTriangles The driver should already support this without any changes. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-30 15:42:36 +02:00
Samuel Pitoiset	da26013eb7	radv: implement VK_EXT_sample_locations and disable it Basically, this extension allows applications to use custom sample locations. It doesn't support variable sample locations during subpass. Note that we don't have to upload the user sample locations because the spec doesn't allow this. The extension is currently disabled because the driver needs to support variable sample locations during layout transitions. The depth decompress needs to know them and that's a bit invasive. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-30 09:52:16 +02:00
Kenneth Graunke	e917bb7ad4	iris: Avoid holding the lock while allocating pages. We only need the lock for: 1. Rummaging through the cache 2. Allocating VMA We don't need it for alloc_fresh_bo(), which does GEM_CREATE, and also SET_DOMAIN to allocate the underlying pages. The idea behind calling SET_DOMAIN was to avoid a lock in the kernel while allocating pages, now we avoid our own global lock as well. We do have to re-lock around VMA. Hopefully this shouldn't happen too much in practice because we'll find a cached BO in the right memzone and not have to reallocate it. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2019-05-30 00:46:37 -07:00
Kenneth Graunke	0cb380a6b3	iris: Move SET_DOMAIN to alloc_fresh_bo() Chris pointed out that the order between SET_DOMAIN and SET_TILING doesn't matter, so we can just do the page allocation when creating a new BO. Simplifies the flow a bit. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2019-05-30 00:15:26 -07:00
Kenneth Graunke	53878f7a89	iris: Be lazy about cleaning up purged BOs in the cache. Mathias Fröhlich reported that commit `6244da8e23` crashes. list_for_each_entry_safe is safe against removing the current entry, but iris_bo_cache_purge_bucket was potentially removing next entries too, which broke our saved next pointer. To fix this, don't bother with the iris_bo_cache_purge_bucket step. We just detected a single entry where the kernel has purged the BO's memory, and so it isn't a usable entry for our cache. We're about to continue the search with the next BO. If that one's purged, we'll clean it up too. And so on. We may miss cleaning up purged BOs that are further down the list after non-purged BOs...but that's probably fine. We still have the time-based cleaner (cleanup_bo_cache) which will take care of them eventually, and the kernel's already freed their memory, so it's not that harmful to have a few kicking around a little longer. Fixes: `6244da8e23` iris: Dig through the cache to find a BO in the right memzone Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2019-05-29 23:38:01 -07:00
Kenneth Graunke	6244da8e23	iris: Dig through the cache to find a BO in the right memzone This saves some util_vma thrash when the first entry in the cache happens to be in a different memory zone, but one just a tiny bit ahead is already there and instantly reusable. Hopefully the cost of a little extra searching won't break the bank - if it does, we can consider having separate list heads or keeping a separate VMA cache. Improves OglDrvRes performance by 22%, restoring a regression from deleting the bucket allocators in `694d1a08d3`. Thanks to Clayton Craft for alerting me to the regression. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 20:03:45 -07:00
Kenneth Graunke	4c2d9729df	iris: Tidy BO sizing code and comments Buckets haven't been power of two sized in over a decade. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:42:15 -07:00
Kenneth Graunke	7acc88a47c	iris: Move some field setting after we drop the lock. It's not much, but we may as well hold the lock for a bit less time. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:42:04 -07:00
Kenneth Graunke	76c5a19668	iris: Move cached BO allocation into a helper function. There's enough going on here to warrant a helper. This also simplifies the control flow and eliminates the last non-error-case goto. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:41:52 -07:00
Kenneth Graunke	cea6671395	iris: Fall back to fresh allocations of mapping for zero-memset fails. It is unlikely that we would fail to map a cached BO in order to zero its contents. When we did, we would free the first BO in the cache and try again with the second. It's possible that this next BO already had a map setup, in which case we'd succeed. But if it didn't, we'd likely fail again in the same manner. There's not much point in optimizing this case (and frankly, if we're out of CPU-side VMA we should probably dump the cache entirely)...so instead, just fall back to allocating a fresh BO from the kernel which will already be zeroed so we don't have to try and map it. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:41:50 -07:00
Kenneth Graunke	042f8514e6	iris: Move fresh BO allocation into a helper function. There's enough going on here to warrant a helper. More cleaning coming. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:41:22 -07:00
Kenneth Graunke	06421e5be7	iris: Do SET_TILING at a single point rather than in two places. Both the from-cache and fresh-from-GEM cases were calling SET_TILING. In the cached case, we would retry the allocation on failure, pitching one BO from the cache each time. This is silly, because the only time it should fail is if the tiling or stride parameters are unacceptable, which has nothing to do with the particular BO in question. So there's no point in retrying - we should simply fail the allocation. This patch moves both calls to bo_set_tiling_internal() below the cache/fresh split, so we have it at a single point in time instead of two. To preserve the ordering between SET_TILING and SET_DOMAIN, we move that below as well. (I am unsure if the order matters.) Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:41:08 -07:00
Kenneth Graunke	43d835cb0f	iris: Use the BO cache even for coherent buffers on non-LLC. We mark snooped BOs as non-reusable, so we never return them to the cache. This means that we'd need to call I915_GEM_SET_CACHING to make any BO we find in the cache snooped. But then again, any BO we freshly allocate from the kernel will also be non-snooped, so it has the same issue. There's really no reason to skip the cache - we may as well use it to avoid the I915_GEM_CREATE overhead. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:40:18 -07:00
Kenneth Graunke	78003014d0	iris: Fix locking around vma_alloc in iris_bo_create_userptr util_vma needs to be protected by a lock. All other callers of vma_alloc and vma_free appear to be holding a lock already. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:40:16 -07:00
Kenneth Graunke	5fc11fd988	iris: Fix lock/unlock mismatch for non-LLC coherent BO allocation. The goto jumped over the mtx_lock, but proceeded to hit the mtx_unlock. We can simply set the bucket to NULL and it will skip the cache without goto, and without messing up locking. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-29 19:40:15 -07:00
Marek Olšák	2285b93032	radeonsi: fix timestamp queries for compute-only contexts Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Jan Vesely <jan.vesely@rutgers.edu>	2019-05-29 21:13:35 -04:00
Marek Olšák	b5697c311b	Change a few frequented uses of DEBUG to !NDEBUG debugoptimized builds don't define NDEBUG, but they also don't define DEBUG. We want to enable cheap debug code for these builds. I only chose those occurences that I care about. Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-05-29 21:13:35 -04:00
Kenneth Graunke	0f1b68ebee	iris: Re-emit Surface State Base Address when context is lost. When we hit a GPU hang, we failed to reset Surface State Base Address right away, and would keep hanging until we filled up the binder. Then we'd finally get it right after a lot of repeated stumbles. Update it right away so we hopefully hang fewer times before succeeding.	2019-05-29 16:35:02 -07:00
Jason Ekstrand	e459d6d6df	iris: Enable nir_opt_large_constants Shader-db results on Kaby Lake: total instructions in shared programs: 15306230 -> 15304726 (<.01%) instructions in affected programs: 4570 -> 3066 (-32.91%) helped: 16 HURT: 0 total cycles in shared programs: 361703436 -> 361680041 (<.01%) cycles in affected programs: 129388 -> 105993 (-18.08%) helped: 16 HURT: 0 LOST: 0 GAINED: 2 The helped programs were in XCom 2, Deus Ex: Mankind Divided, and Kerbal Space Program Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-29 21:09:16 +00:00
Jason Ekstrand	9dc57eebd5	iris: Don't assume UBO indices are constant It will be true for the constant/system value buffer because they use a constant zero but it's not true in general. If we ever got here when the source wasn't constant, nir_src_as_uint would assert. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: mesa-stable@lists.freedesktop.org	2019-05-29 21:09:16 +00:00
Jason Ekstrand	744f93f5c1	iris: Move upload_ubo_ssbo_surf_state to iris_program.c Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-29 21:09:16 +00:00
Brian Paul	e584fd894e	nir: silence three compiler warnings seen with MinGW Silence two unused var warnings. And init elem_size, elem_align to zero to silence "maybe uninitialized" warnings. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-29 13:59:24 -06:00
Brian Paul	c71ca65405	svga: clamp max_const_buffers to SVGA_MAX_CONST_BUFS In case the device reports 15 (or more) buffers. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2019-05-29 13:59:23 -06:00
Kenneth Graunke	6892d2b94a	iris: Clone before calling nir_strip and serializing This is non-destructive and leaves the debugging information in place. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-29 18:16:32 +00:00
Kenneth Graunke	e1409aead5	iris: Only store the SHA1 of the NIR in iris_uncompiled_shader Jason pointed out that we don't need to keep an entire copy of the serialized NIR around, we just need the SHA1. This does change our disk cache key to be taking a SHA1 of a SHA1, which is a bit odd, but should work out and be faster and use less memory. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-29 18:16:32 +00:00
Caio Marcelo de Oliveira Filho	e45bf01940	spirv: Change spirv_to_nir() to return a nir_shader spirv_to_nir() returned the nir_function corresponding to the entrypoint, as a way to identify it. There's now a bool is_entrypoint in nir_function and also a helper function to get the entry_point from a nir_shader. The return type reflects better what the function name suggests. It also helps drivers avoid the mistake of reusing internal shader references after running NIR_PASS on it. When using NIR_TEST_CLONE or NIR_TEST_SERIALIZE, those would be invalidated right in the first pass executed. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-29 10:34:35 -07:00
Caio Marcelo de Oliveira Filho	a3bfdacb6c	radv: Don't re-use entry_point pointer from spirv_to_nir Replace its uses with checking for is_entrypoint and calling nir_shader_get_entrypoint(). This is a preparation to change spirv_to_nir() return type. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-29 10:34:35 -07:00
Caio Marcelo de Oliveira Filho	ee59bac9f4	glspirv: Don't re-use entry_point pointer from spirv_to_nir Replace its use with checking for is_entrypoint. This is a preparation to change spirv_to_nir() return type. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-29 10:34:30 -07:00
Caio Marcelo de Oliveira Filho	c92d002982	turnip: Don't re-use entry_point pointer from spirv_to_nir Replace its uses with nir_shader_get_entrypoint(), and change the helper function to return nir_shader *. This is a preparation to change spirv_to_nir() return type. Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-29 10:26:22 -07:00
Chia-I Wu	0a0be7aee0	virgl: fix readback with pending transfers When readback is true, and there are pending writes in the transfer queue, we should flush to avoid reading back outdated data. This fixes piglit arb_copy_buffer/dlist and a subtest of arb_copy_buffer/data-sync. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-05-29 16:47:04 +00:00
Caio Marcelo de Oliveira Filho	8bdf5a008b	nir: Allow derefs to be used as phi sources It is possible and valid for a pointer to be selected based on a conditional before used, and depending on the mode, those cases will result in a phi with derefs as sources. To achieve this, we don't rematerialize derefs that are used by phis. As a consequence, when converting from SSA to regs, we may have phis that come from different blocks and are used by phis. We now convert those to regs too. Validation was added to ensure only derefs of certain modes can be used as phi sources. No extra validation is needed for the presence of cast, any instruction that uses derefs will validate the deref-chain is complete (ending in a cast or a var). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-29 08:19:15 -07:00
Connor Abbott	ee2a92bcde	radeonsi: Fix editorconfig At least on vim, indenting doesn't work without this. Copied from src/amd/vulkan. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-29 15:55:40 +02:00
Erik Faye-Lund	551b61528f	mesa/main: clean up extension-check for GL_SAMPLE_MASK Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-05-29 10:54:09 +02:00
Erik Faye-Lund	426e896515	mesa/main: clean up extension-check for GL_SAMPLE_SHADING Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-05-29 10:54:09 +02:00
Erik Faye-Lund	b9e9d701dc	mesa/main: correct extension-checks for GL_PRIMITIVE_RESTART_FIXED_INDEX This shouldn't be allowed in GLES 1/2. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-05-29 10:54:09 +02:00
Erik Faye-Lund	34ade0dc7c	mesa/main: correct extension-checks for GL_BLEND_ADVANCED_COHERENT_KHR KHR_blend_equation_advanced_coherent isn't exposed on OpenGL ES 1.x, so we shouldn't allow its enums there either. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-05-29 10:54:09 +02:00
Erik Faye-Lund	c0dabc6192	mesa/main: correct extension-checks for GL_FRAMEBUFFER_SRGB This enum shouldn't be allowed on OpenGL ES 1.x, so let's instead use the extenion-helpers, and check for desktop and gles extensions separately. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-05-29 10:54:09 +02:00
Erik Faye-Lund	a33ff7876f	mesa/main: correct extension-checks for MESA_tile_raster_order This extension isn't enabled for GLES 1.x, so we shouldn't allow the state there. Let's use the extension-helpers instead of CHECK_EXTENSION for this. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-05-29 10:54:09 +02:00
Erik Faye-Lund	bf91d6ae4a	mesa/main: make the CONSERVATIVE_RASTERIZATION_NV checks consistent This just makes the logic of the checks for this enum the same for gl{Enable,Disable} and for glIsEnabled. They are already functionally the same, so this is just a minor code-cleanup. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-05-29 10:54:09 +02:00
Erik Faye-Lund	00c683bc8e	mesa/main: make the PRIMITIVE_RESTART_NV checks consistent {En,Dis}ableClientState(PRIMITIVE_RESTART_NV) should only work on compatibility contextxs. While we're at it, modernize the code a bit, by using the extension helpers instead of open-coding. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-05-29 10:54:09 +02:00
Samuel Pitoiset	d3771ccaa3	radv: use view format when selecting the resolve path for subpasses Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-29 08:53:48 +02:00
Samuel Pitoiset	017170a785	radv: always use view format when performing subpass resolves It makes sense to use the image view formats when resolving inside subpasses, while we have to use the image formats for normal resolves. Original patch by Philip Rebohle. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110348 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-29 08:53:46 +02:00
Samuel Pitoiset	eaeaad25f7	radv: sync before resetting a pool if there is active pending queries Make sure to sync all previous work if the given command buffer has pending active queries. Otherwise the GPU might write queries data after the reset operation. This fixes a bunch of new dEQP-VK.query_pool.* CTS failures. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-29 08:47:54 +02:00
Kenneth Graunke	bc273dece2	intel/decoder: Use get_state_size() over guessed counts in more cases This makes the following packets use actual driver provided sizes rather than guessing an arbitrary number: - CC_VIEWPORT - SF_CLIP_VIEWPORT - BLEND_STATE - COLOR_CALC_STATE - SCISSOR_RECT Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>	2019-05-28 13:44:16 -07:00
Mike Lothian	29ea92e6a1	meson: Link Gallium drivers with ld_args_build_id Link all Gallium drivers with ld_args_build_id to prevent failures in Iris that uses GNU_BUILD_ID Bugs: https://bugs.freedesktop.org/show_bug.cgi?id=110757 Fixes: `4756864cdc` "iris: Start wiring up on-disk shader cache" Signed-off-by: Mike Lothian <mike@fireburn.co.uk> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-28 13:37:36 -07:00
Lionel Landwerlin	366811bedb	nir/lower_non_uniform: safely iterate over blocks This fixes a problem where the same instruction gets replaced twice. This was happening when the replaced instruction would be at the end of a block. Replacement of : if ssa_8 { .... intrinsic bindless_image_store (ssa_44, ssa_16, ssa_0, ssa_15) (5, 0, 34836, 32) /* image_dim=Buf / / image_array=false / / format=34836 / / access=32 / } Would be : if ssa_8 { loop { vec1 32 ssa_47 = intrinsic read_first_invocation (ssa_44) () vec1 1 ssa_48 = ieq ssa_47, ssa_44 if ssa_48 { loop { vec1 32 ssa_49 = intrinsic read_first_invocation (ssa_44) () vec1 1 ssa_50 = ieq ssa_49, ssa_44 if ssa_50 { intrinsic bindless_image_store (ssa_44, ssa_16, ssa_0, ssa_15) (5, 0, 34836, 32) / image_dim=Buf / / image_array=false / / format=34836 / / access=32 */ break } else { .... } Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `3bd5457641` ("nir: Add a lowering pass for non-uniform resource access") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-28 20:23:16 +01:00
Samuel Pitoiset	47a10edefb	radv: allocate more space in the CS when emitting events If the driver waits for CP DMA to be idle and emit an EOP event we need more space. This fixes a crash with Quake Champions. Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-28 16:56:17 +02:00
Kenneth Graunke	6a9e39d44b	iris: Ask st to vectorize our IO. (Technically this is common code, but it doesn't affect i965 or anv.) Improves performance of GFXBench5/gl_tess_off on Skylake GT4e at 1080p by 9.3933% +/- 0.0305157% by eliminating all spilling in the GS. Improves performance of GFXBench5/gl_4_off (Car Chase) on Skylake GT4e at 1080p by 0.325208% +/- 0.0842233% (n=18). Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-28 01:06:48 -07:00
Kenneth Graunke	c31b4420e7	st/nir: Re-vectorize shader IO We scalarize IO to enable further optimizations, such as propagating constant components across shaders, eliminating dead components, and so on. This patch attempts to re-vectorize those operations after the varying optimizations are done. Intel GPUs are a scalar architecture, but IO operations work on whole vec4's at a time, so we'd prefer to have a single IO load per vector rather than 4 scalar IO loads. This re-vectorization can help a lot. Broadcom GPUs, however, really do want scalar IO. radeonsi may want this, or may want to leave it to LLVM. So, we make a new flag in the NIR compiler options struct, and key it off of that, allowing drivers to pick. (It's a bit awkward because we have per-stage settings, but this is about IO between two stages...but I expect drivers to globally prefer one way or the other. We can adjust later if needed.) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-28 01:06:48 -07:00
Mathias Fröhlich	1d0a8cf40d	mesa: Prevent classic swrast crash on a surfaceless context v2. This fixes the egl_mesa_platform_surfaceless piglit test as well as the new egl_ext_device_base piglit test on classic swrast. v2: Fix swrast surfaceless contexts on the driver side. Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-05-28 08:27:16 +02:00
Samuel Pitoiset	15cb19ed6f	radv add radv_get_resolve_pipeline() in the compute path Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-28 08:17:26 +02:00
Samuel Pitoiset	469258c3b1	radv: cleanup the compute resolve path for subpass This makes use of radv_meta_resolve_compute_image() by filling a VkImageResolve region instead of duplicating code. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-28 08:17:23 +02:00
Timothy Arceri	d2b0246741	radeonsi: add drirc workaround for American Truck Simulator Reviewed-by: Marek Olšák <marek.olsak@amd.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110711	2019-05-28 08:47:44 +10:00
Timothy Arceri	11e16ca7ce	Revert "st/mesa: expose 0 shader binary formats for compat profiles for Qt" This reverts commit `55376cb31e`. It's been over a year and both QT 5.9.5 and 5.11.0 contained a fix for the original issue. It seems i965 only ever applied this workaround to the 18.0 branch. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-28 08:46:50 +10:00
Lionel Landwerlin	2042f22e28	anv: fix apply_pipeline_layout pass for arrays of YCbCr descriptors When using the binding tables to access arrays of YCbCr descriptors we did not consider the offset of the accessed element. We can't do a simple multiple because the binding table entries are tightly packed. For example element 0 of the array could use 2 entries/planes and element 1 could use 2 entries/planes. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `3bb8768b9d` ("anv: toggle on support for VK_EXT_ycbcr_image_arrays") Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-05-27 22:47:53 +01:00
Marek Olšák	fccced57cf	radeonsi: clean up winsys creation - unify the code - choose radeon or amdgpu based on the DRM version, not based on which one succeeds first	2019-05-27 15:26:06 -04:00
Marek Olšák	bb5d82bd06	radeonsi: allow query functions for compute-only contexts	2019-05-27 15:26:06 -04:00
Marek Olšák	b257956021	ac: treat Mullins as Kabini, remove the enum it's the same design	2019-05-27 15:10:51 -04:00
Christian Gmeiner	37af75f88c	etnaviv: rs: choose clear format based on block size Fixes following piglit and does not introduce any regressions. spec@ext_packed_depth_stencil@fbo-depth-gl_depth24_stencil8-blit Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>	2019-05-27 20:55:11 +02:00
Vasily Khoruzhick	af0de6b91c	lima/ppir: implement discard and discard_if This commit also adds codegen for branch since we need it for discard_if. Reviewed-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-05-27 07:39:03 -07:00
Samuel Pitoiset	7a7be61398	radv: ignore the loadOp if the first use of an attachment is a resolve Based on ANV. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-27 13:52:39 +02:00
Samuel Pitoiset	ff27eb509a	radv: always dirty the framebuffer when restoring a subpass The old code was not wrong because the transitions performed after the resolves should re-emit the framebuffer if needed. This change is mostly a no-op but it improves consistency regarding other meta operations that need to save/restore subpasses. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-27 13:52:36 +02:00
Samuel Pitoiset	9af15986b0	radv: add radv_clear_htile() helper This helper will be useful for clearing HTILE after some depth/stencil resolves. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-27 13:52:34 +02:00
Chenglei Ren	13b38ca1e4	anv/android: fix missing dependencies issue during parallel build The libmesa_anv_gen* modules require anv_extensions.h, patch makes sure it gets generated as a dependency before building them. Signed-off-by: Chenglei Ren <chenglei.ren@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Cc: <mesa-stable@lists.freedesktop.org>	2019-05-27 10:13:17 +03:00
Samuel Pitoiset	2d2e7954c3	radv: tidy up GetQueryPoolResults for occlusion queries Just move the block that checks the availability bit into the switch like other query types. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-27 08:50:55 +02:00
Kenneth Graunke	b5fa3abfc2	iris: Don't flag IRIS_DIRTY_URB after BLORP operations unless it changed We already flag IRIS_DIRTY_URB when we change it, but we were additionally flagging it on every BLORP operation, even if we didn't.	2019-05-26 17:45:18 -07:00
Dave Airlie	7fe5a8e874	Revert "mesa: unreference current winsys buffers when unbinding winsys buffers" This reverts commit `12bf7cfecf`. This commits caused lots of problems: https://bugs.freedesktop.org/show_bug.cgi?id=110721 https://bugs.freedesktop.org/show_bug.cgi?id=110761 Fixes: `12bf7cfecf` ("mesa: unreference current winsys buffers when unbinding winsys buffers") Pushing without review as we need to get it into next stable.	2019-05-27 09:36:28 +10:00
Alyssa Rosenzweig	659aa3dd65	panfrost/midgard: Implement fneg/fabs/fsat Fix a regression I inadvertently caused by acking typeless movs before implementing/pushing this whistles Nothing to see here, move along folks. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-26 03:16:37 +00:00
Qiang Yu	1dc593e9b9	lima: fix lima_blit with non-zero level source resource lima_blit will do blit between resources with different levels. When blit from a level!=0 source, it will sample from that level of resource as texture. Current texture setup won't respect level when not mipmap filter. Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com>	2019-05-25 12:41:44 +08:00
Qiang Yu	54490b0b36	lima: fix render to non-zero level texture Current implementation won't respect level of surface to render. Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com>	2019-05-25 12:41:31 +08:00
Dylan Baker	9838185056	editorconfig: Fix meson style The syntax was wrong, resulting in it not working at all. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-24 18:44:18 +00:00
Chia-I Wu	ea1e0acfd0	virgl: remove an incorrect check in virgl_res_needs_flush Imagine this resource_copy_region(ctx, dst, ..., src, ...); transfer_map(ctx, src, 0, PIPE_TRANSFER_WRITE, ...); at the beginning of a cmdbuf. We need to flush in transfer_map so that the transfer is not reordered before the resource copy. The check for "vctx->num_draws == 0 && vctx->num_compute == 0" is not enough. Removing the optimization entirely. Because of the more precise resource tracking in the previous commit, I hope the performance impact is minimized. We will have to go with perfect resource tracking, or attempt a more limited optimization, if there are specific cases we really need to optimize for. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-05-24 17:37:40 +00:00
Chia-I Wu	56f9b60e50	virgl: reemit resources on first draw/clear/compute This gives us more precise resource tracking. It can be beneficial because glFlush is often followed by state changes. We don't want to reemit resources that are going to be unbound. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-05-24 17:37:40 +00:00
Chia-I Wu	424ec2356b	virgl: add missing emit_res for SO targets Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-05-24 17:37:40 +00:00
Roland Scheidegger	d4e8a44bf6	gallivm: fix default cbuf info. The default null_output really needs to be static, otherwise the values we'll eventually get later are doubly random (they are not initialized, and even if they were it's a pointer to a local stack variable). VMware bug 2349556. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2019-05-24 19:22:50 +02:00
Roland Scheidegger	84f3f1cf00	scons: fix build with llvm 9. The x86asmprinter component is gone, and things seem to work by just removing it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110707 Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2019-05-24 18:28:28 +02:00
Tomeu Vizoso	9fe1a925e2	panfrost: Dereference sampled texture We are currently leaking resources if they were sampled from. Once we are done with a sampler, we should dereference that resource. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-24 16:50:09 +02:00
Tomeu Vizoso	3c81010213	panfrost: ci: Avoid pulling Docker image on every run Jump over the container stage if we haven't changed any of the files that involved in building the container images. This saves 1-2 minutes in each run and helps conserve resources. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-24 16:50:09 +02:00
Jason Ekstrand	f2dc0f2872	nir: Drop imov/fmov in favor of one mov instruction The difference between imov and fmov has been a constant source of confusion in NIR for years. No one really knows why we have two or when to use one vs. the other. The real reason is that they do different things in the presence of source and destination modifiers. However, without modifiers (which many back-ends don't have), they are identical. Now that we've reworked nir_lower_to_source_mods to leave one abs/neg instruction in place rather than replacing them with imov or fmov instructions, we don't need two different instructions at all anymore. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Acked-by: Rob Clark <robdclark@chromium.org>	2019-05-24 08:38:11 -05:00
Jason Ekstrand	22421ca7be	nir/builder: Merge nir_[if]mov_alu into one nir_mov_alu helper Unless source modifiers are present, fmov and imov are the same. There's no good reason for having two helpers. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-24 08:38:11 -05:00
Jason Ekstrand	cd73b6174b	nir/lower_to_source_mods: Stop turning add, sat, and neg into mov Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-24 08:38:11 -05:00
Jason Ekstrand	2a39788d03	nir/source_mods: Add a helpers for setting source modifiers It's potentially a tiny bit less efficient but the helpers make it much easier to sort out the rules for updating source modifiers. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-24 08:38:11 -05:00
Jason Ekstrand	8ffbb54405	intel: Implement abs, neg, and sat in the back-end Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-24 08:38:11 -05:00
Jason Ekstrand	4fde459563	intel/nir: Call alu_to_scalar one last time before out-of-ssa A few of our very late passes can end up generating vectors accidentally so we need to get rid of them. The only known case of this is the ffma peephole which generates fneg and fabs as vectors. Currently, they're not a problem because they get turned into fmov which the back-end compiler knows how to handle as a vector. That's about to change. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-24 08:38:11 -05:00
Jason Ekstrand	ddd08e1888	nir/builder: Remove the use_fmov parameter from nir_swizzle This flag has caused more confusion than good in most cases. You can validly use imov for floats or fmov for integers because, without source modifiers, neither modify their input in any way. Using imov for floats is more reliable so we go that direction. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-24 08:38:11 -05:00
Jason Ekstrand	6c2ca2a5d3	ptn,ttn: Use nir_channel for selecting channels Both of these passes predate the nir_channel helper. We should just use it instead of hand-rolling it in both passes. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-24 08:38:11 -05:00
Michel Zou	88eb2a1f7e	scons: For MinGW use -posix flag. Signed-off-by: Jose Fonseca <jfonseca@vmware.com>	2019-05-24 12:18:40 +01:00
Christian Gmeiner	78fb5594be	etnaviv: use the correct uniform dirty bits Found during code inspection. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-05-24 12:41:43 +02:00
Danylo Piliaiev	c82dcf89ae	anv: Do not emulate texture swizzle for INPUT_ATTACHMENT, STORAGE_IMAGE If descriptorType is VK_DESCRIPTOR_TYPE_STORAGE_IMAGE or VK_DESCRIPTOR_TYPE_INPUT_ATTACHMENT, the imageView member of each element of pImageInfo must have been created with the identity swizzle. Fixes: `d2aa65eb` Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-24 09:20:38 +00:00
Tapani Pälli	397fe0cc50	st/dri: enable EGL_ANDROID_blob_cache on gallium drivers Verified to work properly with Iris driver on Android Celadon. Cache files get generated as 'com.android.opengl.shaders_cache' for each application. v2: check that cache was returned (Eric Engestrom) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-24 09:17:04 +03:00
Alyssa Rosenzweig	ea6b581444	panfrost: Remove the standalone compiler Now that the online compiler and pandecode are reliable and upstreamed, nobody is using this. If somebody does need it, it should be easy enough to bring back, I suppose. At the moment, it's just a maintenance hazard, since meson is silly and does double builds for compiler updates (triple for disassembler changes). If people need the standalone _disassembler_, that can be added trivially into pandecode (pandecode already includes the disassembler). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>	2019-05-24 03:10:43 +00:00
Eric Engestrom	8d386e6eef	vk/util: suppress warning about out-of-enum android value src/vulkan/util/vk_enum_to_str.c: In function ‘vk_structure_type_size’: src/vulkan/util/vk_enum_to_str.c:3335:9: warning: case value ‘1000010000’ not in enumerated type ‘VkStructureType’ {aka ‘const enum VkStructureType’} [-Wswitch] case VK_STRUCTURE_TYPE_NATIVE_BUFFER_ANDROID: return sizeof(VkNativeBufferANDROID); ^~~~ Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-23 15:28:43 +00:00
Kenneth Graunke	25afbb04c2	iris: Advertise coherent framebuffer fetches This lets us advertise GL_EXT_shader_framebuffer_fetch and GL_KHR_blend_equation_advanced_coherent support.	2019-05-23 08:13:10 -07:00
Kenneth Graunke	cca8af0c7d	gallium: Add PIPE_CAP_FBFETCH_COHERENT and expose extensions st/mesa now exposes KHR_blend_equation_advanced_coherent and EXT_shader_framebuffer_fetch if the new capability is supported. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-23 08:13:09 -07:00
Kenneth Graunke	87f4286137	st/mesa: Advertise GL_EXT_shader_framebuffer_fetch_non_coherent This extension requires the ability to read from all render targets, so we only enable it if PIPE_CAP_FBFETCH >= PIPE_CAP_MAX_RENDER_TARGETS. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-23 08:13:09 -07:00
Kenneth Graunke	a2d7834457	gallium: Change PIPE_CAP_TGSI_FS_FBFETCH bool to PIPE_CAP_FBFETCH count TGSI's FBFETCH instruction currently only supports reading from a single render target, but NIR intrinsics can support multiple render targets. radeonsi can only support fetching from RT 0, but other drivers may be able to support fetching from any render target. To express this, this patch renames PIPE_CAP_TGSI_FS_FBFETCH to simply PIPE_CAP_FBFETCH, and converts it from a boolean "is FBFETCH supported?" to an integer number of render targets which can be fetched. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-23 08:13:07 -07:00
Kenneth Graunke	7d2b54e393	iris: Record state sizes for INTEL_DEBUG=bat decoding. Felix noticed a crash when using INTEL_DEBUG=bat decoding. It turned out that we were sometimes placing variable length data near the end of a buffer, and with the decoder guessing random lengths rather than having an actual count, it was walking off the end and crashing. So this does more than improve the decoder output. Unfortunately, this is a bit more complicated than i965's handling, because we don't have a single state buffer. Various places upload data via u_upload_mgr, and so there isn't a central place to record the size. We don't need to catch every single place, however, since it's only important to record variable length packets (like viewports and binding tables). State data also lives arbitrarily long, rather than being discarded on every batch like i965, so we don't know when to clear out old entries either. (We also don't have a callback when an upload buffer is released.) So, this tracking may space leak over time. That's probably okay though, as this is only a debugging feature and it's a slow leak. We may also get lucky and overwrite existing entries as we reuse BOs, though I find this unlikely to happen. The fact that the decoder works in terms of offsets from a state base address is also not ideal, as dynamic state base address and surface state base address differ for iris. However, because dynamic state addresses start from the top of a 4GB region, and binding tables start from addresses [0, 64K), it's highly unlikely that we'll get overlap. We can always improve this, but for now it's better than what we had.	2019-05-23 08:07:08 -07:00
Eric Engestrom	00cfeacf31	vk/util: drop no-op compiler warning workaround `-Wswitch` applies to `switch()`, not `case:`, and is bypassed by the presence of a `default:` anyway, so let's drop the `default:` and move the warning suppression to where it can make a difference, and then it turns out that we don't need to keep a list of special cases anymore :) Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-23 15:06:11 +00:00
Erik Faye-Lund	90e7ce5bde	mesa/main: make the CONSERVATIVE_RASTERIZATION_INTEL checks consistent INTEL_conservative_rasterization isn't exposed on compatibility contexts, nor for GLES 3.0 and below. We already do this correctly for gl{Enable,Disable}, but we should do the same for glIsEnabled as well. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-05-23 11:43:18 +02:00
Erik Faye-Lund	0dff3eecda	mesa/main: make the FRAGMENT_PROGRAM checks consistent IsEnabled(FRAGMENT_PROGRAM) isn't supposed to be allowed, but our check allowed this anyway. Let's make these checks consistent, and while we're at it, modernize them a bit. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-05-23 11:35:55 +02:00
Erik Faye-Lund	147751a856	mesa/main: make the TEXTURE_CUBE_MAP checks consistent IsEnabled(TEXTURE_CUBE_MAP) isn't supposed to be allowed, but our check allowed this anyway. Let's make these checks consistent, and while we're at it, modernize them a bit. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-05-23 11:35:55 +02:00
Erik Faye-Lund	182d75d2a5	mesa/main: remove duplicate macros These are already defined as the exactly same, so let's get rid of the duplicate definitions. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-05-23 11:35:55 +02:00
Erik Faye-Lund	e002763c99	mesa/main: remove unused argument The 'CAP' argument has been unused in both of these macros since 2010, so let's get rid of it from both. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-05-23 11:35:55 +02:00
Erik Faye-Lund	619b2c9a7d	mesa/main: remove unused macro The first version of this macro is unused, so let's get rid of it. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-05-23 11:35:55 +02:00
Timothy Arceri	a482cf6ab2	glsl: simplify resource list building code This greatly simplifies the code to calculate if we should add a buffer to the resource list. This uses the spec rules and simple math to decide if we should add the buffer rather than complex string processing. This patch refines a patch present in the ARB_gl_spriv merge request for the NIR linker and applies it to the GLSL IR linker. This is why we also move the function to the shared linker code, because we will want to reuse the code for the NIR linker also. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-05-23 15:06:20 +10:00
Chia-I Wu	96c2851586	virgl: track valid buffer range for transfer sync virgl_transfer_queue_is_queued was used to avoid flushing. That fails when the resource is being accessed by previous cmdbufs but not the current one. The new approach of tracking the valid buffer range does not apply to textures however. But hopefully it is fine because the goal is to avoid waiting for this scenario glBufferSubData(..., offset, size, data1); glDrawArrays(...); // append new vertex data glBufferSubData(..., offset+size, size, data2); glDrawArrays(...); If glTex(Sub)Image* turns out to be an issue, we will need to track valid level/layer ranges as well. v2: update virgl_buffer_transfer_extend as well v3: do not remove virgl_transfer_queue_is_queued Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> (v1) Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org> (v2)	2019-05-22 09:28:19 -07:00
Chia-I Wu	440982cdd6	virgl: remove support for buffer surfaces st/mesa does not need it and virglrenderer does not really support it. Remove the support so that we are sure pipe_surface never refers to a buffer resource. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-05-22 09:28:19 -07:00
Chia-I Wu	fa9afb9de0	virgl: handle NULL shader resource explicitly When shader images/buffers are set, do not rely on virgl_encoder_write_res and virgl_resource_dirty to do the implicit NULL check. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-05-22 09:28:19 -07:00
Lionel Landwerlin	cb7c9b2a93	vulkan: fix build dependency issue with generated files On machines with many cores, you can run into that issue : ../mesa-9999/src/vulkan/overlay-layer/overlay.cpp:42:10: fatal error: vk_enum_to_str.h: No such file or directory v2: Move declare_dependency around (Eric) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reported-by: Jan Ziak Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-22 14:07:14 +00:00
Greg V	506ebf55c0	gallium: enable dmabuf on BSD as well The DRM_CONF_SHARE_FD code did not check for Linux, so the commit that introduced PIPE_CAP_DMABUF broke Wayland-EGL clients on FreeBSD. Fixes: `8ae50e60` (gallium: replace DRM_CONF_SHARE_FD with PIPE_CAP_DMABUF) Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-22 13:20:31 +00:00
Tapani Pälli	ed563b79df	iris: fix android build Fixes: `4756864cdc` ""iris: Start wiring up on-disk shader cache Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-22 14:01:41 +03:00
Philipp Zabel	1ccb8a071b	etnaviv: fill missing offset in etna_resource_get_handle Without this gbm_bo_get_offset() can return 0 where it shouldn't. Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Cc: <mesa-stable@lists.freedesktop.org>	2019-05-22 12:57:40 +02:00
Samuel Pitoiset	32a0bc915a	radv: do not reset query pool during creation From the Vulkan spec 1.1.108: "After query pool creation, each query must be reset before it is used." So, the driver doesn't need to do this at creation time. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-22 08:36:41 +02:00
Samuel Pitoiset	e9bfd88183	radv: fix the sample max distance value for 8x It should be 7, not 8. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-22 08:36:39 +02:00
Samuel Pitoiset	bc4548ca3d	radv: emit correct centroid priority based on the number of samples Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-22 08:36:37 +02:00
Samuel Pitoiset	a7763ddcf2	radv: clean up the sample locations codebase Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-22 08:36:35 +02:00
Samuel Pitoiset	135dff8dcf	radv: remove remaining code related to 16 samples The driver only supports up to 8 samples. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-22 08:36:33 +02:00
Kenneth Graunke	6dc1c2d8bd	iris: Fix ALT mode regressions from shader cache We were checking this based on nir->info.name, but with the shader cache enabled, nir_strip throws out the name, causing us to use IEEE mode for ARB programs. gl-1.0-spot-light regressed because it wants ALT mode for 0^0 behavior. Fixes: `dc5dc727d5` iris: Serialize the NIR to a blob we can use for shader cache purposes.	2019-05-21 16:58:54 -07:00
Marek Olšák	d6053bf2a1	radeonsi: fix a regression in si_rebind_buffer Don't update non-buffer images. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110701 Fixes: `78e35df52a` "radeonsi: update buffer descriptors in all contexts after buffer invalidation" Cc: 19.1 <mesa-stable@lists.freedesktop.org> Tested-By: Gert Wollny <gert.wollny@collabora..com>	2019-05-21 18:58:03 -04:00
Kenneth Graunke	fb1d08dcfd	iris: Expose the disk cache to the state tracker as well. This lets st/nir cache the NIR for shaders, based on the shader source string hash, allowing us to skip initial compiles altogether, and also letting us start from there should we need to recompile for NOS. Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-21 15:05:38 -07:00
Dylan Baker	601c9bc135	iris: Cache assembly shaders in the on-disk shader cache This implements storing and retrieving iris_compiled_shader objects from the on-disk shader cache. (by Dylan Baker and Kenneth Graunke)	2019-05-21 15:05:38 -07:00
Kenneth Graunke	dc5dc727d5	iris: Serialize the NIR to a blob we can use for shader cache purposes. We will use a hash of the serialized NIR together with brw_prog_*_key (for NOS) as the disk cache key, where the disk cache contains actual assembly shaders. Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-21 15:05:38 -07:00
Dylan Baker	4756864cdc	iris: Start wiring up on-disk shader cache This creates the on-disk shader cache data structure, and handles the build-id keying aspects. The next commits will fill it out so it's actually used. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-21 15:05:38 -07:00
Kenneth Graunke	6ae2caf201	iris: Move iris_uncompiled_shader definition to iris_context.h It had been internal to iris_program.c, but with the upcoming disk cache code, the "program module" is going to be spread across a couple source files. Into a header it goes! Now it lives alongside iris_compiled_shader, which makes sense. Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-21 15:05:38 -07:00
Kenneth Graunke	419d9b21e1	intel: Move brw_prog_key_set_id from i965 to the compiler. I want to use it in iris. Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-21 15:05:38 -07:00
Dylan Baker	b589c2547d	docs: update calendar, and news item and link release notes for 19.0.5	2019-05-21 14:25:36 -07:00
Dylan Baker	e2987f83ad	docs: Add Sha256 sums for 19.0.5	2019-05-21 14:23:16 -07:00
Dylan Baker	74e8dfecc8	docs: Add release notes for 19.0.5	2019-05-21 14:23:14 -07:00
Caio Marcelo de Oliveira Filho	9b9f7030c6	spirv: Drop GOOGLE suffix from names incorporated to SPIR-V SPV_GOOGLE_decorate_string and SPV_GOOGLE_hlsl_functionality1 were incorporated to SPIR-V. Let's pick the names used by SPIR-V core. Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-05-21 11:52:41 -07:00
Caio Marcelo de Oliveira Filho	02d140ce9a	spirv: Pick the right bitsize when doing SpvUConvert Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-05-21 11:52:29 -07:00
Caio Marcelo de Oliveira Filho	fd94a45823	spirv: Trivially handle new 1.4 loop controls Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-05-21 11:52:12 -07:00
Caio Marcelo de Oliveira Filho	e21dee6c21	spirv: Update JSON and Headers to 1.4 This refers to commit c4f8f65792d4bf2657ca751904c511bbcf2ac77b from GitHub. Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-05-21 11:50:58 -07:00
Caio Marcelo de Oliveira Filho	4b474e2e8a	spirv: Handle instruction aliases in spirv_info_c.py Choose the first we see in the grammar file as the main one. This is needed to parse SPIR-V 1.4 because it introduced opcode aliases. Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-05-21 11:50:47 -07:00
Erik Faye-Lund	810b95e02c	Revert "glsl: do not use deprecated bison-keyword" This reverts commit `eb85124a9f`.	2019-05-21 17:53:54 +02:00
Eric Engestrom	93d900ece3	imgui: delete demo file Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-21 14:40:22 +01:00
Lionel Landwerlin	fd80f1e8d1	vulkan/overlay: update remaining manual error checks Through a series of rebases, I forgot to switch a bunch of error checks to use a macro that will show where the problem is, rather than printing out a dumb "ERROR!". Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-05-21 14:08:35 +01:00
Lionel Landwerlin	213d6527d4	vulkan/overlay: fix timestamp query emission with no pipeline stats The if (!pipe && timestamp) logic was broken. It should have been : if (!pipe && !timestamp) Let just drop this condition as the following code does the right thing for all cases. An error was appearing with the following variables : VK_INSTANCE_LAYERS=VK_LAYER_MESA_overlay VK_LAYER_MESA_OVERLAY_CONFIG=gpu_timing Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `ea7a6fa980` ("vulkan/overlay: add pipeline statistic & timestamps support") Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-05-21 14:08:35 +01:00
Erik Faye-Lund	eb85124a9f	glsl: do not use deprecated bison-keyword %error-verbose has been deprecated since Bison 3.0, which was released in 2013. In Bison 3.3.1 which was recently released, this has started causing warnings. Let's update the code to do this in the modern way intead, to avoid cluttering the output needlessly. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-05-21 11:31:43 +00:00
Karol Herbst	67f9496893	glsl: handle 8 and 16 bit ints in glsl_base_type_is_integer Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-21 08:47:16 +00:00
Dave Airlie	4785e50e75	nir/test: add split vars tests (v2) This just adds some split var splitting tests, it verifies by counting derefs and local vars. a basic load from inputs, store to array, same as before but with a shader temp struct { float } [4] don't split test a basic load from inputs, with some out of band loads. a load/store of only half the array two level array, load from inputs store to all levels a basic load from inputs with an indirect store to array. two level array, indirect store to lvl 0 two level array, indirect store to lvl 1 load from inputs, store to array twice load from input, store to array, load from array, store to another array. load and from input and copy deref to array create wildcard derefs, and do a copy v2: use array_imm helpers, move derefs out of loops, rename toplevel/secondlevel, use ints, fix lvl1 don't split test, rename globabls to shader_temp, add comment, check the derefs type Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-21 13:43:28 +10:00
Caio Marcelo de Oliveira Filho	cf05ffbfd6	anv: Don't re-use entry_point pointer from spirv_to_nir When running with NIR_TEST_CLONE=1, the pointer will not be valid, as the whole shader is going to be recreated every pass. Prefer using is_entrypoint (to query when looping) and nir_shader_get_entrypoint() instead. Fixes the Vulkan Piglit tests - vulkan/glsl450/frexp-double - vulkan/glsl450/isinf-double - vulkan/shaders/fs-multiple-large-local-array Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108957 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-20 16:47:39 -07:00
Caio Marcelo de Oliveira Filho	005cc9ae37	nir: Fix clone of nir_variable state slots When num_state_slots is 0, don't create the array. This was triggering the following assert when running vkcube with NIR_TEST_CLONE=1 vkcube: ../src/compiler/nir/nir_split_per_member_structs.c:66: split_variable: Assertion `var->state_slots == NULL' failed. Fixes: `9fbd390dd4` "nir: Add support for cloning shaders" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-20 16:47:28 -07:00
Charmaine Lee	12bf7cfecf	mesa: unreference current winsys buffers when unbinding winsys buffers This fixes surface leak when no winsys buffers are bound. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Brian Paul <brianp@vmware.com>	2019-05-20 13:09:32 -07:00
Charmaine Lee	b480adfa5e	st/mesa: purge framebuffers with current context after unbinding winsys buffers With commit `c89e8470e5`, framebuffers are purged after unbinding context, but this change also introduces a heap corruption when running Rhino application on VMware svga device. Instead of purging the framebuffers after the context is unbound, this patch first ubinds the winsys buffers, then purges the framebuffers with the current context, and then finally unbinds the context. This fixes heap corruption. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Brian Paul <brianp@vmware.com>	2019-05-20 13:09:32 -07:00
Caio Marcelo de Oliveira Filho	7e5723d6d7	spirv: Generate proper NULL pointer values Use the storage class address format information to pick the right constant values for a NULL pointer. v2: Don't add a deref_cast to the values. (Jason) v3: Update to use vtn_storage_class_to_mode() and vtn_mode_to_address_format() explicitly. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho	83550b7dc4	spirv: Reuse helpers in vtn_handle_type() And change vtn_storage_class_to_mode() to accept NULL as interface_type. In this case, if we have a SpvStorageClassUniform, we assume it is uses an ubo_addr_format, like the code being replaced by the helper. That assumption is a problem, but no different than the previous code. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho	48ea3bbff6	spirv: Add vtn_variable_mode_image Corresponding to SpvStorageClassImage. We see pointers for that storage class in tests, but don't use the storage class any further. Adding this so that we can call vtn_mode_to_address_format() for all supported pointers. v2: Fail when trying to create a SpvStorageClassImage variable. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho	672a3f42d9	spirv: Add vtn_mode_to_address_format() Handles all the modes and we can use it in combination with nir_address_format_to_glsl_type() to replace the vtn_ptr_type_for_mode() helper. Since the new helper is more generic, moved the assertions from the old one to the call sites. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho	192daf68a4	spirv: Add vtn_mode_uses_ssa_offset() Just the mode is needed to decide whether SSA offsets are needed, so make a function that takes that and reuse it for vtn_pointer_uses_ssa_offset(). This will be used for constant null pointers, that won't have a vtn_pointer handy. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho	f9336751bc	spirv: Add and use vtn_type_without_array() helper v2: Renamed from vtn_interface_type. (Jason) Accept any type not only pointers. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho	8af9de0a38	spirv: Change vtn_null_constant() to use vtn_type This is a preparation to handle OpConstantNull for pointers, we'll use the vtn_type to get to the address format and then the appropriate representation of NULL pointer. v2: Move rest of body to use vtn_type. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho	bdf2361b87	spirv: Export vtn_storage_class_to_mode() So we can reuse in spirv_to_nir.c. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho	f051fa6ad7	nir: Add nir_address_format_null_value() Returns the nir_const_value * with the representation of the NULL pointer for each address format. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho	31a7476335	spirv, radv, anv: Replace ptr_type with addr_format Instead of setting the glsl types of the pointers for each resource, set the nir_address_format, from which we can derive the glsl_type, and in the future the bit pattern representing a NULL pointer. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho	6bc9cdb1b7	nir: Add nir_address_format_32bit_offset This is a simple 32-bit address which is not a global address. Gives us a format that don't use 0 as its null pointer value. We will need this in anv to represent nir_var_mem_shared addresses. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-20 10:53:38 -07:00
Caio Marcelo de Oliveira Filho	bdaf41107a	nir: Add nir_address_format_logical An address format representing a purely logical addressing model. In this model, all deref chains must be complete from the dereference operation to the variable. Cast derefs are not allowed. These addresses will be 32-bit scalars but the format is immaterial because you can always chase the chain. E.g. push constants in anv. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-20 10:53:38 -07:00
Rob Clark	9f61aa3f75	freedreno/a6xx: WFI in program stateobj too This "fixes" hangs seen w/ various android games. I think a similar issue to with constant state, we need to avoid CP_LOAD_STATE until previous draw completes. It isn't entirely clear why blob doesn't need to do this, but it might have a different way to accomplish the same thing. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-20 09:10:12 -07:00
Rob Clark	abfb31acdb	freedreno/a6xx: make sure binning pass constlen is large enough Since we use same constant state for both binning pass program state and draw pass state, and it is possible for binning pass shader to use fewer consts, we need to make sure we program a large enough constlen. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-20 09:10:12 -07:00
Rob Clark	d200d58e65	freedreno/a6xx: limit IBO state to draw pass Currently we are only supporting images in FS (and CS) so limit this stateobj to draw pass. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-20 09:10:12 -07:00
Rob Clark	54d94f5780	freedreno/a6xx: don't evaluate FS tex state in binning pass It is unneeded since FS doesn't run in binning pass. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-20 09:10:12 -07:00
Samuel Pitoiset	daa85a882e	radv: decompress FMASK before performing a MSAA decompress using FMASK This fixes some CTS failures related to VK_EXT_sample_locations. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-20 12:41:47 +02:00
Dave Airlie	6b2b150a66	nir/validate: fix crash if entry is null. we validate assert entry just before this, but since that doesn't stop execution, we need to check entry before the next validation assert. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-20 16:26:48 +10:00
Qiang Yu	a1d419603f	lima/gpir: switch to use nir_lower_viewport_transform Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Reviewed-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com>	2019-05-20 10:57:11 +08:00
Qiang Yu	a7688b2713	lima/gpir: support vector ssa load Some vector sysval can't be lowered to scaler, so need to break it to scaler in nir to gpir convertion. Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Reviewed-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com>	2019-05-20 10:57:11 +08:00
Qiang Yu	4a74e28130	lima/gpir: add helper function for emit load node Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Reviewed-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com>	2019-05-20 10:57:11 +08:00
Timothy Arceri	ac779ff2b7	util: add missing include to build_id.h Required to use uint8_t Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-05-20 10:24:23 +10:00
Alyssa Rosenzweig	1155446c19	panfrost/midgard: Split up midgard_compile.c (RA) This commit moves the register allocator out of midgard_compile.c and into its own midgard_ra.c file. In doing so, a number of dependencies are identified and moved into their own files in turn. midgard_compile.c is still fairly monolithic, but this should help. Code churn, but no functional changes should be introduced by this commit. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-19 23:37:45 +00:00
Alyssa Rosenzweig	9cd8cd26de	panfrost: Improve fixed-function blending This fixes a few miscellaneous issues with the fixed-function blending programming, though it is far from complete. For cases known to be buggy, we force a fallback to blend shaders. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-19 17:56:35 +00:00
Alyssa Rosenzweig	d1a9b760ea	panfrost: Wire up nir_lower_blend This implements blend shaders via nir_lower_blend, by creating dummy fragment shaders simply passing through the source color and using the new lowering pass to inject blendability. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-19 17:56:34 +00:00
Alyssa Rosenzweig	39104221e1	panfrost/midgard: Route new blending intrinsics To prepare for the new nir_lower_blend pass, we wire up the intrinsics for tilebuffer reads and constant colour loading. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-19 17:56:14 +00:00
Alyssa Rosenzweig	a1885b2a35	panfrost/nir: Add nir_lower_blend pass This new lowering pass implements the OpenGL ES blend pipeline in shaders, applicable to hardware lacking full-featured blending hardware (including Midgard/Bifrost and vc4). This pass is run on a fragment shader, rewriting the store to a blended version, loading in the framebuffer destination color and constant color via intrinsics as necessary. This pass is sufficient for OpenGL ES 2.0 and is verified to pass dEQP's blend tests. MIN/MAX modes are included and tested as well. That said, at present it has the following limitations: - MRT is not supported (ES3). - sRGB support is missing (ES3). - Extended blending is not yet ported from GLSL IR lowering (ES3.2) - Dual-source blending is not supported. (N/A) - Logic ops are not supported. (N/A) v2: Fix code conventions (per Ian Romanick's feedback). Implement color masks. This pass should be in common nir/ space, but due to non-technical reasons, for now it's in Panfrost space. In the future, depending if other drivers need some of the functionality, we can move this back to src/compiler/nir space. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-19 17:54:56 +00:00
Alyssa Rosenzweig	6b2457e75c	panfrost: Fix Bifrost-specific padding Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>	2019-05-19 17:41:28 +00:00
Alyssa Rosenzweig	7b5217ad70	panfrost: Cleanup panfrost_job comments Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>	2019-05-19 17:41:26 +00:00
Alyssa Rosenzweig	ae705387a9	panfrost/decode: Decode blend constant This adds a forgotten decode line on Midgard and adds the field of a blend constant on Bifrost. The Bifrost encoding is fairly weird; whereas Midgard is just a regular 32-bit float, Bifrost uses a fancy fixed-point-esque encoding. The decode logic here is experimentally correct. The encode logic is a sort of "guesstimate", assuming that the high byte is just int(f / 255.0) and then solving algebraicly for the low byte. This might be slightly off in some cases. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>	2019-05-19 17:41:23 +00:00
Alyssa Rosenzweig	3645c781ab	panfrost: Hoist blend constant into Midgard-specific struct This eliminates one major source of #ifdef parity between Midgard and Bifrost, better representing how the struct acts on Midgard and allowing proper decodes on Bifrost. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>	2019-05-19 17:41:21 +00:00
Alyssa Rosenzweig	50382df728	panfrost/decode: Disassemble Bifrost shaders We already have the Bifrost disassembler in-tree, so now that panwrap is able to dump Bifrost command streams, hook up the disassembler to pandecode. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Ryan Houdek <Sonicadvance1@gmail.com>	2019-05-19 17:41:08 +00:00
Bas Nieuwenhuizen	4689e98fe8	vulkan/wsi: Set X11 minImageCount to 3. For IMMEDIATE and FIFO, most games work in a pipelined manner where the can produce frames at a rate of 1/MAX(CPU duration, GPU duration), but the render latency is CPU duration + GPU duration. This means that with scanout from pageflipping we need 3 frames to run full speed: 1) CPU rendering work 2) GPU rendering work 3) scanout Once we have a nonblocking acquire that returns a semaphore we can merge 1 and 3. Hence the ideal implementation needs only 2 images, but games cannot tellwe currently do not have an ideal implementation and that hence they need to allocate 3 images. So let us do it for them. This is a tradeoff as it uses more memory than needed for non-fullscreen and non-performance intensive applications. Since this is pretty much a TODO that can use the context I added this as a comment. Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-05-19 00:38:03 +00:00
Eric Engestrom	ccb8ea7acf	meson: expose glapi through osmesa Suggested-by: Pierre Guillou <pierre.guillou@lip6.fr> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109659 Fixes: `f121a669c7` "meson: build gallium based osmesa" Fixes: `cbbd5bb889` "meson: build classic osmesa" Cc: Brian Paul <brianp@vmware.com> Cc: Dylan Baker <dylan@pnwbakers.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Tested-by: Chuck Atkins <chuck.atkins@kitware.com>	2019-05-18 11:15:04 +01:00
Kenneth Graunke	28c2ce7105	egl: Allow EGL_CONTEXT_OPENGL_RESET_NOTIFICATION_STRATEGY in ES and GL EGL annoyingly defines a few variants of this token: EGL_CONTEXT_OPENGL_RESET_NOTIFICATION_STRATEGY_EXT - 0x3138 EGL_CONTEXT_OPENGL_RESET_NOTIFICATION_STRATEGY_KHR - 0x31BD EGL_CONTEXT_OPENGL_RESET_NOTIFICATION_STRATEGY - 0x31BD The EGL_EXT_create_context_robustness extension specifies that the EXT token is only valid for ES contexts, not GL. The EGL_KHR_create_context extension defines the KHR version, and says it is only allowed for GL contexts, and specifically calls out that it's an error for ES contexts. But EGL 1.5 includes the new suffixless token, which has the same value as the KHR version, and specifically calls out that it's now valid to use with both GL and ES contexts. So we should allow this. Fixes KHR-NoContext.es32.robustness.no_reset_notification and KHR-NoContext.es32.robustness.lose_context_on_reset on iris, which apparently is exposing EGL 1.5. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-05-17 15:13:15 -07:00
Jason Ekstrand	1c92358bd8	anv: Only consider minSampleShading when sampleShadingEnable is set From the Vulkan 1.1.107 spec: Sample shading is enabled for a graphics pipeline: - If the interface of the fragment shader entry point of the graphics pipeline includes an input variable decorated with SampleId or SamplePosition. In this case minSampleShadingFactor takes the value 1.0. - Else if the sampleShadingEnable member of the VkPipelineMultisampleStateCreateInfo structure specified when creating the graphics pipeline is set to VK_TRUE. In this case minSampleShadingFactor takes the value of VkPipelineMultisampleStateCreateInfo::minSampleShading. Otherwise, sample shading is considered disabled. In other words, if sampleShadingEnable is set to VK_FALSE, we should ignore minSampleShading. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-17 20:33:57 +00:00
Jason Ekstrand	8413fd136c	anv: Stop forcing bindless for images This was an unintended artifact of my testing of bindless images. We should be choosing bindless or not dynamically. Fixes: `c0d9926df7` "anv: Use bindless handles for images" Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-17 19:58:51 +00:00
Neha Bhende	926a6a35cf	draw: fix memory leak introduced `7720ce32a` We need to free memory allocation PrimitiveOffsets in draw_gs_destroy(). This fixes memory leak found while running piglit on windows. Fixes: `7720ce32a` ("draw: add support to tgsi paths for geometry streams. (v2)") Tested with piglit Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-05-17 12:26:48 -06:00
Jason Ekstrand	d2aa65eb18	anv: Emulate texture swizzle in the shader when needed Now that we have the descriptor buffer mechanism, emulated texture swizzle can be implemented in a very non-invasive way. Previous attempts all tried to extend the push constant based image param mechanism which was gross. This could, in theory, be done much faster with a magic back-end instruction which does indirect MOVs but Vulkan on IVB is already so slow this isn't going to matter much. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104355 Cc: "19.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-17 12:25:58 -05:00
Alyssa Rosenzweig	ea479fdc1d	panfrost/midgard: Typofix Reported-by: Ryan Houdek <Sonicadvance1@gmail.com> Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-17 14:59:52 +00:00
Eric Engestrom	6a1f609a4c	gitlab-ci: build-test the tools as well Suggested-by: Rob Clark <robclark@freedesktop.org> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-17 11:21:48 +01:00
Samuel Pitoiset	d7501834cd	radv: add a workaround for Monster Hunter World and LLVM 7&8 The load/store optimizer pass doesn't handle WaW hazards correctly and this is the root cause of the reflection issue with Monster Hunter World. AFAIK, it's the only game that are affected by this issue. This is fixed with LLVM r361008, but we need a workaround for older LLVM versions unfortunately. Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-17 11:41:19 +02:00
Thomas Hellstrom	47afc5eed7	svga: Add an environment variable to force coherent surface memory The vmwgfx driver supports emulated coherent surface memory as of version 2.16. Add en environtment variable to enable this functionality for texture- and buffer maps: SVGA_FORCE_COHERENT. This environment variable should be used for testing only. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2019-05-17 08:44:31 +02:00
Thomas Hellstrom	1a66ead1c7	pipebuffer, winsys/svga: Add functionality to update pb_validate_entry flags In order to be able to add access modes to a pb_validate_entry, update the pb_validate_add_buffer function to take a pointer hash table and also to return whether the buffer was already on the validate list. Update the svga winsys accordingly. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2019-05-17 08:44:31 +02:00
Thomas Hellstrom	a119da3bc9	svga: Set the rendered-to flag for dma transfers to surfaces The rendered-to flag indicates that the HW surface content is more recent than the content of the mob. That's the case after a SurfaceDMA transfer to the surface. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2019-05-17 08:44:31 +02:00
Thomas Hellstrom	fb6d09764d	winsys/svga: Fix RELOC_INTERNAL mob GPU access SVGA_RELOC_INTERNAL indicates a transfer between surface and backing mob. This means that if the GPU for example reads from the surface it writes to the backing mob. But since the buffer mapping code allows for simultaneous gpu- and cpu read access, a read from the surface to the mob will not synchronize a subsequent map to the readback. Fix this by inverting the mob access mode in a surface relocation with SVGA_RELOC_INTERNAL set. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2019-05-17 08:44:31 +02:00
Thomas Hellstrom	eed24156ec	svga: Remove the surface_invalidate winsys function Instead unconditionally call SVGA3D_InvalidateGBSurface() since it's needed also for Linux for dirty buffers and operation without SurfaceDMA. For non-guest-backed operation, remove the surface cache surface invalidation altogether. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2019-05-17 08:44:31 +02:00
Gert Wollny	0f598ed7b3	Revert "softpipe/buffer: load only as many components as the the buffer resource type provides" This reverts commit `865b9ddae4`. The buffer always reports format PIPE_FORMAT_R8_UNORM so with this patch only one component would be supported. The original issue is still relevant, but the fix should be different. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-05-17 08:27:55 +02:00
Dave Airlie	b6e2a9eca7	glsl/nir: init non-static class member. glsl_to_nir.cpp:276: uninit_member: Non-static class member "sig" is not initialized in this constructor nor in any functions that it calls. Reported by coverity Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-05-17 12:33:09 +10:00
Dave Airlie	ebdddb36a0	imgui: fix undefined behaviour bitshift. imgui_draw.cpp:1781: error[shiftTooManyBitsSigned]: Shifting signed 32-bit value by 31 bits is undefined behaviour Reported by coverity Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-05-17 12:33:09 +10:00
Dave Airlie	2bfe5b8556	glsl: init non-static class member in link uniforms. (v2) link_uniforms.cpp:477: uninit_member: Non-static class member "shader_storage_blocks_write_access" is not initialized in this constructor nor in any functions that it calls. Reported by coverity. v2: fix 9->0 typo (Ilia) Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-05-17 12:33:09 +10:00
Dave Airlie	b2d4d08a5c	glsl: init packed in more constructors. src/compiler/glsl_types.cpp:577: uninit_member: Non-static class member "packed" is not initialized in this constructor nor in any functions that it calls. from Coverity. Fixes: `659f333b3a` (glsl: add packed for struct types) Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-05-17 12:33:09 +10:00
Alyssa Rosenzweig	81d3262fa5	panfrost: Cleanup leak todos Many of these are now patched; one of them we patch here. Regardless, this is one less thing to worry about in the code, I suppose. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-17 00:14:49 +00:00
Alyssa Rosenzweig	c65271c929	panfrost: assert(0) -> unreachable for some switch Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-16 23:42:33 +00:00
Nanley Chery	629806b55b	anv: Fix some depth buffer sampling cases on ICL+ Don't attempt sampling with HiZ if the sampler lacks support for it. On ICL, the HW docs state that sampling with HiZ is not supported and that instances of AUX_HIZ in the RENDER_SURFACE_STATE object will be interpreted as AUX_NONE. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2019-05-16 20:54:53 +00:00
Caio Marcelo de Oliveira Filho	ded2c202d5	nir: Only convert SSA values to regs when needed If the SSA def produced by this instruction is only in the block in which it is defined and is not used by ifs or phis, then we don't have a reason to convert it to a register in nir_lower_ssa_defs_to_regs_block(). The special case for derefs is covered by the general case, so can be removed: at this point all derefs in the block are materialized (i.e. the whole deref chain is in the block) and derefs are not used in phis. v2: Fix wrong check for if_uses. If there's such an use, the def is not "local_to_block". (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-16 12:23:47 -07:00
Kenneth Graunke	4b5e8eb3c8	st/mesa: Record samplers for extra planes in info->textures_used. Normally gl_nir_lower_samplers_as_deref records info->textures_used for us, but this pass runs after that, attempting to assign samplers in the same order as st_atom_texture's external_samplers_used loop so the stars align and we get the same locations. Since we're adding textures late, we need to amend info->textures_used. iris uses info->textures_used to set up texture bindings; this fixes Piglit's ext_image_dma_buf_import-sample-{nv12,yuv420,yvu420} there. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-05-16 11:54:07 -07:00
Caio Marcelo de Oliveira Filho	8a995f2b5e	nir: Fix nir_opt_idiv_const when negatives are involved First, allow the case for negative powers of two. Then ensure that we use the absolute value of the non-constant value to calculate the quotient -- this was hinted in the code by the name 'uq'. This fixes an issue when 'd' is positive and 'n' is negative. The ishr will propagate the negative sign and we'll use nir_ineg() again, incorrectly. v2: First version used only ishr, but that isn't sufficient, since it never can produce a zero as a result. (Jason) Allow negative powers of two. (Caio) Fixes: `74492ebad9` "nir: Add a pass for lowering integer division by constants" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-16 10:55:03 -07:00
Eric Anholt	ef88e23d03	freedreno: Log the number of loops in the shader for shader-db. shader-db's report.py will use this to see when we've changed loop unrolling behavior on a shader and skip including other stats like instruction count from being considered for that shader, since they won't be useful as a proxy for real world performance in that case. Reviewed-by: Rob Clark <robdclark@gmail.com> Tested-by: Eduardo Lima Mitev <elima@igalia.com>	2019-05-16 10:25:22 -07:00
Eric Anholt	c2e68bebb4	freedreno: Output the same shader-db format as v3d and intel. This lets us reuse their report.py, at the expense of fd-report.py no longer working. Reviewed-by: Rob Clark <robdclark@gmail.com> Tested-by: Eduardo Lima Mitev <elima@igalia.com>	2019-05-16 10:25:20 -07:00
Eric Anholt	6d9b45171d	freedreno: Remove the ir3_tgsi_to_nir() helper function. It was more of a hindrance, as it pretended that we could compile in the driver with a missing screen. Reviewed-by: Rob Clark <robdclark@gmail.com> Tested-by: Eduardo Lima Mitev <elima@igalia.com>	2019-05-16 10:25:18 -07:00
Eric Anholt	a0d4d7febf	freedreno: Fix assertion failures in context setup in shader-db mode. The TTN path needs access to the screen to make the right decisions about lowering, but we didn't have pctx->screen set up at fdN_prog_init time. Reviewed-by: Rob Clark <robdclark@gmail.com> Tested-by: Eduardo Lima Mitev <elima@igalia.com>	2019-05-16 10:25:06 -07:00
Marek Olšák	9d1485554c	ac: match radeonsi code in ac_shader_binary_read_config	2019-05-16 13:15:36 -04:00
Marek Olšák	894e017c9c	r600+radeonsi: use ctx_query_reset_status on radeon This allows a nice cleanup, because the winsys always handles it.	2019-05-16 13:15:36 -04:00
Marek Olšák	4549c36788	winsys/radeon: implement ctx_query_reset_status by copying radeonsi To make it behave like amdgpu. I'm just trying to move this out of radeonsi. The radeonsi code will be removed in the next commit.	2019-05-16 13:15:36 -04:00
Marek Olšák	6b3343e5d8	winsys/amdgpu: report a CS rejection as a reset only if there's no GPU reset	2019-05-16 13:15:36 -04:00
Marek Olšák	78e35df52a	radeonsi: update buffer descriptors in all contexts after buffer invalidation Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108824 Cc: 19.1 <mesa-stable@lists.freedesktop.org>	2019-05-16 13:15:36 -04:00
Marek Olšák	0f1b070bad	radeonsi: remove old_va parameter from si_rebind_buffer by remembering offsets This is a prerequisite for the next commit. Cc: 19.1 <mesa-stable@lists.freedesktop.org>	2019-05-16 13:14:55 -04:00
Marek Olšák	f3ae455eb0	radeonsi: compute culling - flush CS to remove write references to buffers Only read-only buffers can use compute culling. Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:13:36 -04:00
Marek Olšák	04122532e3	radeonsi: invalidate caches at the beginning of the prim discard compute IB Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:13:36 -04:00
Marek Olšák	9f505ce21d	radeonsi: disable primitive restart for triangles for DiRT Rally It may decrease performance and it prevents compute-based primitive culling. Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:13:36 -04:00
Marek Olšák	0252fb92b8	radeonsi: add primitive culling stats to the HUD Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:13:36 -04:00
Marek Olšák	c9b7a37b8f	radeonsi: cull primitives with async compute for large draw calls Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:13:34 -04:00
Marek Olšák	187f1c999f	winsys/amdgpu: add REWIND emulation via INDIRECT_BUFFER into cs_check_space Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:10:07 -04:00
Marek Olšák	4eb377d1c3	radeonsi: add si_vs_prolog_bits::unpack_instance_id_from_vertex_id:1 The prim discard compute shader bakes InstanceID into the output index buffer. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:10:07 -04:00
Marek Olšák	b206f007de	radeonsi: make some functions non-static Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:10:07 -04:00
Marek Olšák	301344008f	radeonsi: allow si_shader_select_with_key to return an optimized shader or fail If a prim discard compute shader hasn't finished compilation, we don't want to any shader. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:10:07 -04:00
Marek Olšák	ca9edd7cd0	radeonsi: use pipe_draw_info::instance_count indirectly It will be modified by compute shader culling. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:10:07 -04:00
Marek Olšák	d380fabdbb	radeonsi: use pipe_draw_info::prim and primitive_restart indirectly so that the fields can be changed by the driver. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:10:07 -04:00
Marek Olšák	43aa2f4f7c	radeonsi: make functions for creating LLVM functions non-static Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:10:07 -04:00
Marek Olšák	b19884e08e	winsys/amdgpu: add a parallel compute IB coupled with a gfx IB Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:07:00 -04:00
Marek Olšák	eda281e977	ac: add LLVM code for triangle culling Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:06:58 -04:00
Marek Olšák	07c83d25fd	radeonsi: add a cs parameter into si_cp_copy_data Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:06:57 -04:00
Marek Olšák	ce264d19a0	radeonsi: add a cs parameter into si_cp_release_mem Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:06:56 -04:00
Marek Olšák	9624855f13	radeonsi: add threadgroups_per_cu param into si_get_compute_resource_limits Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:06:54 -04:00
Marek Olšák	6e38af0631	radeonsi: move si_*_descriptors_idx functions into si_state.h Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:06:53 -04:00
Marek Olšák	49a016ec5d	radeonsi: make si_initialize_compute reusable Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:06:51 -04:00
Marek Olšák	c44c6951d4	radeonsi: extract COMPUTE_RESOURCE_LIMITS code into a helper Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:06:49 -04:00
Marek Olšák	c7ceeea093	radeonsi: return the last part's return value from @wrapper The primitive discard compute shader will get the position output this way. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:06:40 -04:00
Marek Olšák	d569b7cb31	winsys/amdgpu: always set NO_CPU_ACCESS and NO_SUBALLOC on GDS resources Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-05-16 13:06:18 -04:00
Jan Zielinski	d65b160e6a	swr: clean up supported OGL4.0/4.1 extensions list This commit adjusts the capabilities returned by the SWR driver and the documentation to correctly report the following extensions: GL_ARB_texture_query_lod, GL_ARB_texture_cube_map_array, GL_ARB_gpu_shader_fp64, GL_ARB_texture_gather, GL_ARB_vertex_attrib_64bit. Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-05-16 17:41:14 +02:00
Leo Liu	aa040d3b3c	vl/dri3: set back buffer from output to NULL with front buffer case Since the using output optimization is only for back buffer case Signed-off-by: Leo Liu <leo.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com>	2019-05-16 10:28:38 -04:00
Alejandro Piñeiro	16a1ef7860	docs: advice to resolve discussion on gitlab MR doc For newcomers to gitlab, it is not evident that it is better to press the "Resolve Discussion" button when you update your branch handling feedback. v2: * Fix several grammar nits, reorder, use new corrected text (Connor Abbot) * Use "reviewers", instead of reviewer (Eric Engestrom) Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-16 16:16:32 +02:00
Roland Scheidegger	4171a26193	auxiliary/draw: fix crash with zero-stride draw auto transform feedback draws get the number of vertices from the transform feedback object. In draw, we'll figure this out with the number of bytes written divided by the stride. However, it is apparently possible we end up with a stride of 0 there (not entirely sure it could happen with GL). Probably when nothing was actually ever written (so we don't actually have a stride set). Just avoid the division by zero by setting the count to 0. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2019-05-16 14:01:33 +02:00
Eric Engestrom	22c1657d05	util/os_file: always use the 'grow' mechanism Use fstat() only to pre-allocate a big enough buffer. This fixes a race where if the file grows between fstat() and read() we would be missing the end of the file, and if the file slims down read() would just fail. Fixes: `316964709e` "util: add os_read_file() helper" Reported-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-16 12:56:25 +01:00
Lionel Landwerlin	e04cf0b612	nir: lower_non_uniform_access: iterate over instructions safely This pass moves instructions around and adds control-flow in the middle of blocks. We need to use nir_foreach_instr_safe to ensure that we iterate over instructions correctly anyway. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `3bd5457641` ("nir: Add a lowering pass for non-uniform resource access") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-16 10:22:01 +01:00
Kenneth Graunke	752367b766	iris: Dodge more GLSL IR lowering This avoids some lower_instructions bits in st.	2019-05-15 19:44:21 -07:00
Jason Ekstrand	fce0214e94	intel/fs/live_variables: Do compute_start_end in BITSET_WORD chunks For a block with a contiguous chunk of 32 vars that don't need updating, this lets us skip 32 vars at a time. Also, by using bitscan, we only iterate for each set bit rather than testing them all one at a time. Looking at perf (with -O0 which is unfortunately necessary to get reasonable back-traces), this seems to cuts about 50-60% of the time spent in compute_start_end() which is, itself about 4-6% of the run-time. In the real world, with a release driver build, this cuts 1.34% off a full shader-db run. (I ran shader-db 5 times in each configuration). Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-16 02:14:40 +00:00
Jason Ekstrand	b2d274c677	intel/fs/ra: Choose a spill reg before throwing away the graph Otherwise, we get an effectively random spill reg because we no longer have the information from RA to guide us. Also, a completely clean graph has undefined data in in_stack which is used for choosing the spill reg so it really is non-deterministic. Fixes: `e99081e76d` "intel/fs/ra: Spill without destroying the..." Tested-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-16 02:13:09 +00:00
Jason Ekstrand	c19acf321c	intel/fs/ra: Add spill costs to the graph on-demand Tested-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-16 02:13:09 +00:00
Jason Ekstrand	2c14e2b5bf	intel/fs/ra: Add a helper for discarding the interference graph Tested-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-16 02:13:09 +00:00
Alyssa Rosenzweig	46494c3dc1	nir/algebraic: Remove problematic "optimization" This line is no longer relevant now that booleans are 1-bit, and in fact causes issues (infinite progress loop between algebraic optimizations and copy prop) with constant vector masks. No shader-db changes on Intel platforms (Jason). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2019-05-16 02:08:37 +00:00
Alyssa Rosenzweig	74ab80b92d	panfrost/midgard: Add load/store opcodes This commit adds a bunch of new load/store opcodes, largely related to OpenCL, as well as adjusting the name of existing opcodes to be more uniform. The immediate effect is compute shaders are substantially easier to interpret now. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-16 01:25:25 +00:00
Alyssa Rosenzweig	f73c0b73ec	panfrost/midgard: Enable integer constant inlining Midgard ALU features two types of constants: embedded constants (128-bit chunk, zero/one per schedule bundle) and inline constants (16-bit splattered into the op, second source if present). Inline constants are much more efficient from a space and scheduling freedom standpoint, so it's desirable to inline when possible. Now that integer ops are well understood and in use, we enable inlining of integers constants in addition to floats (which have been inlined since forever). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-16 01:20:41 +00:00
Alyssa Rosenzweig	8214aaa3c8	panfrost/midgard: Remove imov workaround The previous commit fixes the issue this patched around. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-16 01:20:41 +00:00
Alyssa Rosenzweig	0a13babdd8	panfrost/midgard: Set int outmod for ops writing integers By default, the "normal" output modifier is set on ALU ops. This is the correct default for float outputs -- for floats, it preserves the semantic value. Unfortunately, when used with integers, it does not preserve the bitstream encoding, causing misbehaviour. (It's an open question what happens when `normal` is used with integers -- does it apply some other transformation? or does it do floating point normalization/etc on the ints as if they were floats?). Instead, we default to the "clamp to integer" output modifier for ops writing integers. Semantically, this makes sense (clamping an integer to the nearest integer is the identity function). In the hardware with an integer opcode, this is the actual "normal". This fixes numerous sporadic and sometimes bizarre bugs relating to integers, especially integer moves. With this in place, we no longer care about the types involved; it's just bits on the wire again. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-16 01:20:30 +00:00
Alyssa Rosenzweig	81b1053d9b	panfrost: Set custom stride for textures when necessary From Gallium (and our) perspective, the stride of a BO is arbitrary. For internal buffers, we can make it something nice, but for imported linear buffers (e.g. EGL clients), we don't always have that luxury. To cope, we calculate the expected stride of a texture, compare it to the BO's actual reported stride, and if they differ, set the latter as a custom stride. Fixes rendering of windows not on tile boundaries (noticeable in Weston with es2gears_wayland, for instance). Also, this should fix stride issues with bufer reloading. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-16 01:16:36 +00:00
Alyssa Rosenzweig	cea9352059	panfrost/decode: Stride decoding With a special flag, texture descriptors can include custom stride(s). We haven't seen a case of this used for mipmaps/cubemaps, so it's not clear how that will be encoded, but this dumps correctly for single one-level 2D textures. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-16 01:15:37 +00:00
Alyssa Rosenzweig	d699ffbf0e	panfrost/decode: Futureproof texture dumping One field was not dumped for some reason. It's observed to be 0, but it's still good to have it available. Also, extra fields might be snuck in the bitmaps array (it's variable-lengthed at the end), and we want to guard against that possibility, so we dump a little more. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-16 01:15:37 +00:00
Marek Olšák	ccfcb9d818	ac: rename SI-CIK-VI to GFX6-GFX7-GFX8 Acked-by: Dave Airlie <airlied@redhat.com> We already use GFX9 and I don't want us to have confusing naming in the driver. GFXn naming is better from the driver perspective, because it's the real version of the gfx portion of the hw. Also, CIK means Bonaire-Kaveri-Kabini, it doesn't mean CI. It shouldn't confuse our SDMA, UVD, VCE etc. code much. Those have nothing to do with GFXn and they have their own version numbers.	2019-05-15 20:54:10 -04:00
Marek Olšák	e5cc363f43	ac: add comments to chip enums Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (except GFX2 changes) Reviewed-by: Dave Airlie <airlied@redhat.com> (except <= GFX5 changes)	2019-05-15 20:54:10 -04:00
Anuj Phogat	a42163cbbc	compiler: Add lowering support for 64-bit saturate operations to software Fixes 7 Khronos GL CTS tests: KHR-GL45.gpu_shader_fp64.builtin.smoothstep_dvec{double, 2, 3, 4} KHR-GL45.gpu_shader_fp64.builtin.smoothstep_against_scalar_dvec{2, 3, 4} Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-15 23:30:30 +00:00
Kenneth Graunke	d305409db5	st/dri: Minor style fixes Trivial.	2019-05-15 14:49:14 -07:00
Chia-I Wu	659c5800e5	virgl: handle DONT_BLOCK and MAP_DIRECTLY Handle PIPE_TRANSFER_DONT_BLOCK and PIPE_TRANSFER_MAP_DIRECTLY. Make virgl_resource_transfer_prepare return an enum instead of a bool for extensibility (e.g., instruct the callers to map differently). Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-05-15 20:51:28 +00:00
Chia-I Wu	e87186fc67	virgl: add virgl_resource_transfer_prepare virgl_resource_transfer_prepare should be called before mapping to prepare the resource. It does flush, readback, and wait as needed. virgl_res_needs_flush and virgl_res_needs_readback become internal helpers to the new function. There should be no externally visible change. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-05-15 20:51:28 +00:00
Chia-I Wu	cdcf38b98a	virgl: honor DISCARD_WHOLE_RESOURCE in virgl_res_needs_readback Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-05-15 20:51:28 +00:00
Chia-I Wu	a62ab178ce	virgl: clean up virgl_res_needs_readback Add comments and follow the coding style of virgl_res_needs_flush. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>	2019-05-15 20:51:28 +00:00
Lionel Landwerlin	391a836e8f	nir: fix lower_non_uniform_access pass Obviously missing the instruction insertion into the SSA list. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `3bd5457641` ("nir: Add a lowering pass for non-uniform resource access") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-15 18:15:20 +00:00
Alex Villacís Lasso	b2200514af	gbm: gbm_bo_get_handle_for_plane fallback to nonplanar handle Commit `f9567ab435` (gbm: Export a getter for per plane handles) contains an API version check that fails on i915 (API version 7 vs. check for minimum API version 13). Any client that migrates to the planar API will start failing on i915 (see https://gitlab.gnome.org/GNOME/mutter/issues/127 for mutter, and https://bugs.freedesktop.org/show_bug.cgi?id=108487 for weston). This commit adds a fallback for plane 0 when the API check fails and returns the non-planar handle in this scenario, making the call equivalent to gbm_bo_get_handle(). This is enough for weston 6.0.0 to start working again on an i915 system. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=108487 Signed-off-by: Alex Villacís Lasso <a_villacis@palosanto.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2019-05-15 18:27:30 +01:00
Alyssa Rosenzweig	a9cef4f0e5	gallium: Add default check for PIPE_CAP_FRAGMENT_SHADER_INTERLOCK Fixes: `c704c0226` ("gallium: Add a PIPE_CAP_FRAGMENT_SHADER_INTERLOCK") Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 21:34:49 -07:00
Andrii Kryvytskyi	eca53f00aa	iris: Check if resource has stencil before returning it Signed-off-by: Andrii Kryvytskyi <andrii.o.kryvytskyi@globallogic.com> Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 21:16:11 -07:00
Jordan Justen	49958c4b5d	i965/blorp: Set MOCS for gen11 in blorp_alloc_vertex_buffer v2: * Add build error for gen > 6 if MOCS is not set. (Lionel) Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2019-05-14 19:57:01 -07:00
Kenneth Graunke	bb5db02bab	iris: Enable fragment shader interlock on Gen9+. There's some debate about whether we should support this on older hardware as well. Currently i965 turns it off on Gen8- though, so we follow suit. If this changes, we can update this as well. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-14 19:34:33 -07:00
Kenneth Graunke	c704c0226c	gallium: Add a PIPE_CAP_FRAGMENT_SHADER_INTERLOCK. Corresponding to GL_ARB_fragment_shader_interlock and GL_NV_fragment_shader_interlock. Currently, only the NIR paths support this functionality, but someone could conceivably add it to TGSI too. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-14 19:34:29 -07:00
Dave Airlie	4efd04ab18	intel/compiler: use bitset instead of opencoding a 32-bit bitset. (v2) In the future I want to expand this to 128-bits, for vec16 support, so lets just put the code in place to use bitset ranges now. v2: just declare the bitset to be the max of what we should ever see and change assert to reflect it. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-15 07:10:34 +10:00
Dave Airlie	3b2c433167	intel/compiler: remove repeated bit_size / 8 in brw mem lowering pass. Just use a variable already. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-15 07:10:30 +10:00
Kenneth Graunke	646924cfa1	intel/compiler: Implement TCS 8_PATCH mode and INTEL_DEBUG=tcs8 Our tessellation control shaders can be dispatched in several modes. - SINGLE_PATCH (Gen7+) processes a single patch per thread, with each channel corresponding to a different patch vertex. PATCHLIST_N will launch (N / 8) threads. If N is less than 8, some channels will be disabled, leaving some untapped hardware capabilities. Conditionals based on gl_InvocationID are non-uniform, which means that they'll often have to execute both paths. However, if there are fewer than 8 vertices, all invocations will happen within a single thread, so barriers can become no-ops, which is nice. We also burn a maximum of 4 registers for ICP handles, so we can compile without regard for the value of N. It also works in all cases. - DUAL_PATCH mode processes up to two patches at a time, where the first four channels come from patch 1, and the second group of four come from patch 2. This tries to provide better EU utilization for small patches (N <= 4). It cannot be used in all cases. - 8_PATCH mode processes 8 patches at a time, with a thread launched per vertex in the patch. Each channel corresponds to the same vertex, but in each of the 8 patches. This utilizes all channels even for small patches. It also makes conditions on gl_InvocationID uniform, leading to proper jumps. Barriers, unfortunately, become real. Worse, for PATCHLIST_N, the thread payload burns N registers for ICP handles. This can burn up to 32 registers, or 1/4 of our register file, for URB handles. For Vulkan (and DX), we know the number of vertices at compile time, so we can limit the amount of waste. In GL, the patch dimension is dynamic state, so we either would have to waste all 32 (not reasonable) or guess (badly) and recompile. This is unfortunate. Because we can only spawn 16 thread instances, we can only use this mode for PATCHLIST_16 and smaller. The rest must use SINGLE_PATCH. This patch implements the new 8_PATCH TCS mode, but leaves us using SINGLE_PATCH by default. A new INTEL_DEBUG=tcs8 flag will switch to using 8_PATCH mode for testing and benchmarking purposes. We may want to consider using 8_PATCH mode in Vulkan in some cases. The data I've seen shows that 8_PATCH mode can be more efficient in some cases, but SINGLE_PATCH mode (the one we use today) is faster in other cases. Ultimately, the TES matters much more than the TCS for performance, so the decision may not matter much. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 13:16:30 -07:00
Kenneth Graunke	076159b40b	intel/compiler: Move ICP handle fetching into a helper function. This will be significantly different in 8_PATCH mode. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 13:16:28 -07:00
Kenneth Graunke	3d84fd29e8	intel/compiler: Don't repeat dispatch max fixing condition Having a single flag will keep both places in sync if the condition gets more complicated. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 13:16:27 -07:00
Kenneth Graunke	f0d52cf2b0	intel/compiler: Rename invocation_id_mask to instance_id_mask The payload field is actually "instance" (thread number), which is used to calculate the invocation ID. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 13:16:25 -07:00
Kenneth Graunke	d86260719e	intel/compiler: Refactor TCS invocation ID setup into a helper When we add 8_PATCH mode, this will get a bit more complex, so we may as well start by putting it in a helper function. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 13:16:24 -07:00
Kenneth Graunke	381c2aded2	i965: Pass compiler to default key populators This lets us get devinfo and other misc. compiler settings. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 13:16:21 -07:00
Marek Olšák	6b0b8f132a	ac: use 1D GEPs for descriptors and constants just a cleanup Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-14 15:15:11 -04:00
Marek Olšák	67b4785958	mesa: fix _mesa_max_texture_levels for GL_TEXTURE_EXTERNAL_OES This helps fix: piglit/bin/ext_image_dma_buf_import-sample_yuv -fmt=NV12 -auto Fixes: `d88f3392ff` Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-14 15:15:11 -04:00
Eric Anholt	e5db87b00b	freedreno: Restore msm_drm.h to a pristine "make headers_install" copy. This diverged back in `f1374805a8` ("drm-uapi: use local files, not system libdrm") to point at drm-uapi's copy, which we don't need now that we're actually in drm-uapi. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-05-14 11:51:57 -07:00
Eric Anholt	18d11cb4dc	freedreno: Move msm_drm.h to the same spot as other DRM uapi. The new location matches other drivers, and has a README about the rules for updating it. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-05-14 11:51:55 -07:00
Ian Romanick	32d259713b	nir/algebraic: Commute 1-fsat(a) to fsat(1-a) for all non-fmul instructions The goal is to avoid having an extra MOV instruction to perform the saturate. Doing the subtraction first allows the saturate to be applied to the ADD instruction making the MOV unnecessary. Values generated in different block and values from non-ALU instructions (e.g., texture instructions) almost always need the extra MOV. Multiply instructions are restricted because doing this rearrangement can interfere with the generation of flrp and ffma instructions. v2: Now that the final method has been selected, squash three commits into one. All Intel platforms has similar results. (Ice Lake shown) total instructions in shared programs: 17223214 -> 17219386 (-0.02%) instructions in affected programs: 1524376 -> 1520548 (-0.25%) helped: 2686 HURT: 26 helped stats (abs) min: 1 max: 32 x̄: 1.44 x̃: 1 helped stats (rel) min: 0.03% max: 16.67% x̄: 0.54% x̃: 0.37% HURT stats (abs) min: 1 max: 2 x̄: 1.69 x̃: 2 HURT stats (rel) min: 0.33% max: 1.67% x̄: 0.54% x̃: 0.35% 95% mean confidence interval for instructions value: -1.46 -1.36 95% mean confidence interval for instructions %-change: -0.56% -0.50% Instructions are helped. total cycles in shared programs: 360811571 -> 360791896 (<.01%) cycles in affected programs: 103650214 -> 103630539 (-0.02%) helped: 1557 HURT: 675 helped stats (abs) min: 1 max: 1773 x̄: 41.44 x̃: 16 helped stats (rel) min: <.01% max: 26.77% x̄: 1.37% x̃: 0.64% HURT stats (abs) min: 1 max: 1513 x̄: 66.44 x̃: 14 HURT stats (rel) min: <.01% max: 46.16% x̄: 2.00% x̃: 0.49% 95% mean confidence interval for cycles value: -14.82 -2.81 95% mean confidence interval for cycles %-change: -0.50% -0.20% Cycles are helped. LOST: 2 GAINED: 0 Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:23 -07:00
Ian Romanick	a7f0c57673	nir/algebraic: Eliminate useless fsat() on operand of comparison w/value in (0, 1) v2: Fix copy-and-paste bug in a cmp b vs b cmp a cases. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224337 -> 17224269 (<.01%) instructions in affected programs: 13578 -> 13510 (-0.50%) helped: 68 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.31% max: 3.12% x̄: 0.84% x̃: 0.42% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -1.05% -0.63% Instructions are helped. total cycles in shared programs: 360826090 -> 360825137 (<.01%) cycles in affected programs: 94867 -> 93914 (-1.00%) helped: 58 HURT: 1 helped stats (abs) min: 2 max: 28 x̄: 17.74 x̃: 18 helped stats (rel) min: 0.08% max: 3.17% x̄: 1.39% x̃: 1.22% HURT stats (abs) min: 76 max: 76 x̄: 76.00 x̃: 76 HURT stats (rel) min: 2.86% max: 2.86% x̄: 2.86% x̃: 2.86% 95% mean confidence interval for cycles value: -19.53 -12.78 95% mean confidence interval for cycles %-change: -1.56% -1.08% Cycles are helped. No changes on any other Intel platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:23 -07:00
Ian Romanick	281f20e26d	nir/algebraic: Strip double negatives from comparison sources All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224623 -> 17224337 (<.01%) instructions in affected programs: 32648 -> 32362 (-0.88%) helped: 148 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.93 x̃: 2 helped stats (rel) min: 0.16% max: 2.74% x̄: 1.07% x̃: 1.08% 95% mean confidence interval for instructions value: -1.97 -1.89 95% mean confidence interval for instructions %-change: -1.15% -1.00% Instructions are helped. total cycles in shared programs: 360828714 -> 360826090 (<.01%) cycles in affected programs: 347416 -> 344792 (-0.76%) helped: 148 HURT: 26 helped stats (abs) min: 1 max: 426 x̄: 26.33 x̃: 18 helped stats (rel) min: 0.03% max: 15.10% x̄: 1.78% x̃: 1.41% HURT stats (abs) min: 2 max: 337 x̄: 48.96 x̃: 6 HURT stats (rel) min: 0.04% max: 18.82% x̄: 2.15% x̃: 0.27% 95% mean confidence interval for cycles value: -23.78 -6.38 95% mean confidence interval for cycles %-change: -1.59% -0.79% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	45c7ff95fc	intel/compiler: Repeat nir_opt_algebraic_late A tiny bit of help seems to come from nir_copy_prop. Future patches will benefit from this change. Doing more copy propagation on the vec4 backend led to a disaster in hurt cycles. v2: Fix typo in comment. Noticed by Matt. All Gen8+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17224634 -> 17224623 (<.01%) instructions in affected programs: 4586 -> 4575 (-0.24%) helped: 11 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.19% max: 0.53% x̄: 0.27% x̃: 0.23% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.36% -0.19% Instructions are helped. total cycles in shared programs: 360828542 -> 360828714 (<.01%) cycles in affected programs: 151159 -> 151331 (0.11%) helped: 49 HURT: 28 helped stats (abs) min: 1 max: 254 x̄: 26.41 x̃: 6 helped stats (rel) min: 0.06% max: 12.02% x̄: 1.34% x̃: 0.42% HURT stats (abs) min: 1 max: 196 x̄: 52.36 x̃: 15 HURT stats (rel) min: 0.05% max: 10.74% x̄: 2.55% x̃: 0.88% 95% mean confidence interval for cycles value: -13.48 17.95 95% mean confidence interval for cycles %-change: -0.69% 0.84% Inconclusive result (value mean confidence interval includes 0). Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown) total instructions in shared programs: 13529544 -> 13529542 (<.01%) instructions in affected programs: 358 -> 356 (-0.56%) helped: 2 HURT: 0 total cycles in shared programs: 357290311 -> 357289678 (<.01%) cycles in affected programs: 178324 -> 177691 (-0.35%) helped: 48 HURT: 40 helped stats (abs) min: 1 max: 201 x̄: 31.52 x̃: 13 helped stats (rel) min: 0.06% max: 10.92% x̄: 1.71% x̃: 0.66% HURT stats (abs) min: 1 max: 224 x̄: 22.00 x̃: 6 HURT stats (rel) min: 0.05% max: 15.84% x̄: 1.29% x̃: 0.31% 95% mean confidence interval for cycles value: -18.28 3.89 95% mean confidence interval for cycles %-change: -1.01% 0.32% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8159110 -> 8158980 (<.01%) instructions in affected programs: 22719 -> 22589 (-0.57%) helped: 65 HURT: 0 helped stats (abs) min: 1 max: 3 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.07% max: 1.05% x̄: 0.73% x̃: 0.74% 95% mean confidence interval for instructions value: -2.06 -1.94 95% mean confidence interval for instructions %-change: -0.78% -0.68% Instructions are helped. total cycles in shared programs: 188609448 -> 188609214 (<.01%) cycles in affected programs: 1875852 -> 1875618 (-0.01%) helped: 109 HURT: 104 helped stats (abs) min: 2 max: 46 x̄: 5.30 x̃: 4 helped stats (rel) min: 0.02% max: 0.90% x̄: 0.09% x̃: 0.07% HURT stats (abs) min: 2 max: 20 x̄: 3.31 x̃: 2 HURT stats (rel) min: 0.01% max: 0.26% x̄: 0.04% x̃: 0.02% 95% mean confidence interval for cycles value: -1.95 -0.25 95% mean confidence interval for cycles %-change: -0.04% -0.01% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	d2a9ba03e3	Revert "nir: add late opt to turn inot/b2f combos back to bcsel" This reverts commit `7acc865226`. With these optimizations in place, the extra constant folding added in the next commit extends some live ranges of 0.0 and ±1.0 constants, and that causes several hundred shaders to have more spills and fills. I believe this optimization we made basically irrelevant by `7725d60938` "intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))". All Gen7.5+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17225303 -> 17224634 (<.01%) instructions in affected programs: 879402 -> 878733 (-0.08%) helped: 679 HURT: 1 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.03% max: 0.93% x̄: 0.24% x̃: 0.05% HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 HURT stats (rel) min: 0.45% max: 0.45% x̄: 0.45% x̃: 0.45% 95% mean confidence interval for instructions value: -1.02 -0.95 95% mean confidence interval for instructions %-change: -0.26% -0.22% Instructions are helped. total cycles in shared programs: 360842595 -> 360828542 (<.01%) cycles in affected programs: 110443594 -> 110429541 (-0.01%) helped: 389 HURT: 265 helped stats (abs) min: 1 max: 7525 x̄: 162.81 x̃: 28 helped stats (rel) min: <.01% max: 18.66% x̄: 1.11% x̃: 0.11% HURT stats (abs) min: 1 max: 7614 x̄: 185.96 x̃: 48 HURT stats (rel) min: <.01% max: 25.08% x̄: 0.95% x̃: 0.10% 95% mean confidence interval for cycles value: -75.65 32.67 95% mean confidence interval for cycles %-change: -0.49% -0.06% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 12159 -> 12161 (0.02%) spills in affected programs: 13 -> 15 (15.38%) helped: 0 HURT: 1 total fills in shared programs: 25207 -> 25208 (<.01%) fills in affected programs: 25 -> 26 (4.00%) helped: 0 HURT: 1 Ivy Bridge total instructions in shared programs: 12082019 -> 12082013 (<.01%) instructions in affected programs: 1033 -> 1027 (-0.58%) helped: 6 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.41% max: 0.83% x̄: 0.61% x̃: 0.59% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.78% -0.45% Instructions are helped. total cycles in shared programs: 179849270 -> 179849157 (<.01%) cycles in affected programs: 4735 -> 4622 (-2.39%) helped: 4 HURT: 0 helped stats (abs) min: 2 max: 74 x̄: 28.25 x̃: 18 helped stats (rel) min: 0.13% max: 6.53% x̄: 2.85% x̃: 2.36% 95% mean confidence interval for cycles value: -82.73 26.23 95% mean confidence interval for cycles %-change: -7.98% 2.28% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10882750 -> 10882748 (<.01%) instructions in affected programs: 266 -> 264 (-0.75%) helped: 2 HURT: 0 Iron Lake total cycles in shared programs: 188609440 -> 188609448 (<.01%) cycles in affected programs: 4320 -> 4328 (0.19%) helped: 0 HURT: 2 GM45 total cycles in shared programs: 129016868 -> 129016872 (<.01%) cycles in affected programs: 2302 -> 2306 (0.17%) helped: 0 HURT: 1 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	3cb091f8b4	nir/algebraic: Eliminate a tautological compare The value-range tracking pass that is coming is not clever enough to know that the result of the ffma must be non-negative. Making it that smart will require quite a bit of work. It might be possible to add a special case that detects that a whole tree of fadd(fmul(fsat(a), fneg(fsat(a))), 1.0) cannot be negative. For cases when the comparison is used in the domain guard for a square-root (see nir/algebraic: Simplify fsqrt domain guard), the compare may be converted to a fmax. This patch also handles that case. All of the affected cases are in DiRT: Showdown. All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17225365 -> 17225303 (<.01%) instructions in affected programs: 40051 -> 39989 (-0.15%) helped: 62 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.07% max: 0.66% x̄: 0.27% x̃: 0.26% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.31% -0.22% Instructions are helped. total cycles in shared programs: 360842788 -> 360842595 (<.01%) cycles in affected programs: 1818081 -> 1817888 (-0.01%) helped: 29 HURT: 22 helped stats (abs) min: 1 max: 206 x̄: 20.66 x̃: 14 helped stats (rel) min: <.01% max: 9.55% x̄: 0.87% x̃: 0.42% HURT stats (abs) min: 1 max: 108 x̄: 18.45 x̃: 7 HURT stats (rel) min: <.01% max: 4.48% x̄: 0.56% x̃: 0.19% 95% mean confidence interval for cycles value: -14.48 6.91 95% mean confidence interval for cycles %-change: -0.71% 0.21% Inconclusive result (value mean confidence interval includes 0). No changes on any other Intel platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	9725e45b3d	nir/algebraic: Simplify fsqrt domain guard All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17228376 -> 17225365 (-0.02%) instructions in affected programs: 280732 -> 277721 (-1.07%) helped: 1072 HURT: 0 helped stats (abs) min: 1 max: 12 x̄: 2.81 x̃: 2 helped stats (rel) min: 0.16% max: 5.10% x̄: 1.43% x̃: 1.07% 95% mean confidence interval for instructions value: -2.92 -2.70 95% mean confidence interval for instructions %-change: -1.48% -1.37% Instructions are helped. total cycles in shared programs: 360935690 -> 360842788 (-0.03%) cycles in affected programs: 7838017 -> 7745115 (-1.19%) helped: 1569 HURT: 69 helped stats (abs) min: 1 max: 1198 x̄: 63.53 x̃: 20 helped stats (rel) min: 0.06% max: 26.17% x̄: 3.44% x̃: 2.12% HURT stats (abs) min: 1 max: 2820 x̄: 98.22 x̃: 47 HURT stats (rel) min: 0.05% max: 16.67% x̄: 3.50% x̃: 2.31% 95% mean confidence interval for cycles value: -63.55 -49.89 95% mean confidence interval for cycles %-change: -3.33% -2.96% Cycles are helped. No changes on any other platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	e2ad047779	nir/search: Don't compare 8-bit or 1-bit constants with floats Without this, adding an algebraic rule like (('bcsel', ('flt', a, 0.0), 0.0, ...), ...), will cause assertion failures inside nir_src_comp_as_float in GTF-GL46.gtf21.GL.lessThan.lessThan_vec3_frag (and related tests) from the OpenGL CTS and shaders/closed/steam/witcher-2/511.shader_test from shader-db. All of these cases have some code that ends up like ('bcsel', ('flt', a, 0.0), 'b@1', ...) When the 'b@1' is tested, nir_src_comp_as_float fails because there's no such thing as a 1-bit float. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	5116646a76	nir/algebraic: Recognize open-coded fsat with modifiers This change also enables a later change (nir/algebraic: Replace 1-fsat(a) with fsat(1-a)) to affect more shaders. Almost all of the affected shaders are in Bioshock Infinite, and all of those shaders all require GLSL 4.10. All Intel platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17228584 -> 17228376 (<.01%) instructions in affected programs: 31438 -> 31230 (-0.66%) helped: 105 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.98 x̃: 1 helped stats (rel) min: 0.08% max: 1.53% x̄: 0.73% x̃: 0.70% 95% mean confidence interval for instructions value: -2.20 -1.76 95% mean confidence interval for instructions %-change: -0.80% -0.67% Instructions are helped. total cycles in shared programs: 360936431 -> 360935690 (<.01%) cycles in affected programs: 420100 -> 419359 (-0.18%) helped: 71 HURT: 21 helped stats (abs) min: 1 max: 160 x̄: 19.28 x̃: 10 helped stats (rel) min: <.01% max: 9.78% x̄: 0.95% x̃: 0.48% HURT stats (abs) min: 1 max: 198 x̄: 29.90 x̃: 10 HURT stats (rel) min: 0.05% max: 8.36% x̄: 1.24% x̃: 0.90% 95% mean confidence interval for cycles value: -16.77 0.66 95% mean confidence interval for cycles %-change: -0.85% -0.06% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	c769641c8e	nir/algebraic: Push unary operations into source operands of fsat source Pushing a unary operation, like fneg, into the operation that generates its operand allows the fsat to be applied to the inner instruction instead of on a separate instruction that performs the unary operation. This changes fmul ssa_100, ssa_99, ssa_98 fmov.sat ssa_101, -ssa_100 into fmul.sat ssa_100, -ssa_99, ssa_98 Ice Lake, Skylake, and Broadwell had similar results. (Ice Lake shown) total instructions in shared programs: 17228658 -> 17228584 (<.01%) instructions in affected programs: 3163 -> 3089 (-2.34%) helped: 49 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.51 x̃: 2 helped stats (rel) min: 0.58% max: 9.09% x̄: 3.69% x̃: 3.51% 95% mean confidence interval for instructions value: -1.66 -1.37 95% mean confidence interval for instructions %-change: -4.37% -3.00% Instructions are helped. total cycles in shared programs: 360937144 -> 360936431 (<.01%) cycles in affected programs: 24029 -> 23316 (-2.97%) helped: 47 HURT: 2 helped stats (abs) min: 4 max: 18 x̄: 15.34 x̃: 16 helped stats (rel) min: 0.69% max: 6.18% x̄: 3.78% x̃: 4.27% HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.34% max: 0.67% x̄: 0.50% x̃: 0.50% 95% mean confidence interval for cycles value: -16.05 -13.05 95% mean confidence interval for cycles %-change: -4.07% -3.15% Cycles are helped. All Gen7 and earlier platforms had similar results. (Haswell shown) total instructions in shared programs: 13536059 -> 13535884 (<.01%) instructions in affected programs: 8797 -> 8622 (-1.99%) helped: 150 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1 helped stats (rel) min: 0.40% max: 11.11% x̄: 3.51% x̃: 1.96% 95% mean confidence interval for instructions value: -1.23 -1.11 95% mean confidence interval for instructions %-change: -3.97% -3.05% Instructions are helped. total cycles in shared programs: 357696119 -> 357694193 (<.01%) cycles in affected programs: 50216 -> 48290 (-3.84%) helped: 109 HURT: 14 helped stats (abs) min: 2 max: 92 x̄: 18.97 x̃: 16 helped stats (rel) min: 0.26% max: 19.09% x̄: 7.37% x̃: 5.37% HURT stats (abs) min: 2 max: 26 x̄: 10.14 x̃: 5 HURT stats (rel) min: 0.18% max: 4.73% x̄: 1.84% x̃: 0.92% 95% mean confidence interval for cycles value: -19.27 -12.05 95% mean confidence interval for cycles %-change: -7.34% -5.31% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-14 11:38:22 -07:00
Ian Romanick	3b74790941	nir/algebraic: Recognize open-coded flrp(a, b, fsat(c)) All Gen6+ GPUs had similar results. (Skylake shown) total instructions in shared programs: 15336712 -> 15336622 (<.01%) instructions in affected programs: 3952 -> 3862 (-2.28%) helped: 24 HURT: 0 helped stats (abs) min: 3 max: 5 x̄: 3.75 x̃: 4 helped stats (rel) min: 1.75% max: 2.70% x̄: 2.34% x̃: 2.46% 95% mean confidence interval for instructions value: -4.06 -3.44 95% mean confidence interval for instructions %-change: -2.47% -2.22% Instructions are helped. total cycles in shared programs: 355722052 -> 355721235 (<.01%) cycles in affected programs: 27326 -> 26509 (-2.99%) helped: 20 HURT: 4 helped stats (abs) min: 1 max: 227 x̄: 44.75 x̃: 14 helped stats (rel) min: 0.12% max: 22.95% x̄: 3.83% x̃: 1.23% HURT stats (abs) min: 2 max: 64 x̄: 19.50 x̃: 6 HURT stats (rel) min: 0.21% max: 3.63% x̄: 1.24% x̃: 0.55% 95% mean confidence interval for cycles value: -61.61 -6.47 95% mean confidence interval for cycles %-change: -5.59% -0.39% Cycles are helped. No changes on Ice Lake, Iron Lake, or GM45. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-14 11:38:21 -07:00
Ian Romanick	a79570099b	intel/fs: Allow cmod propagation to instructions with saturate modifier v2: Add unit tests. Suggested by Matt. All Intel GPUs had similar results. (Ice Lake shown) total instructions in shared programs: 17229441 -> 17228658 (<.01%) instructions in affected programs: 159574 -> 158791 (-0.49%) helped: 489 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 1.60 x̃: 1 helped stats (rel) min: 0.07% max: 2.70% x̄: 0.61% x̃: 0.59% 95% mean confidence interval for instructions value: -1.72 -1.48 95% mean confidence interval for instructions %-change: -0.64% -0.58% Instructions are helped. total cycles in shared programs: 360944149 -> 360937144 (<.01%) cycles in affected programs: 1072195 -> 1065190 (-0.65%) helped: 254 HURT: 27 helped stats (abs) min: 2 max: 234 x̄: 30.51 x̃: 9 helped stats (rel) min: 0.04% max: 8.99% x̄: 0.75% x̃: 0.24% HURT stats (abs) min: 2 max: 83 x̄: 27.56 x̃: 24 HURT stats (rel) min: 0.09% max: 3.79% x̄: 1.28% x̃: 1.16% 95% mean confidence interval for cycles value: -30.11 -19.75 95% mean confidence interval for cycles %-change: -0.70% -0.41% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]	2019-05-14 11:38:21 -07:00
Ian Romanick	a7724b1cbb	nir/algebraic: Add missing ffma(-1, a, b) pattern All Gen7+ platforms had similar results. (Ice Lake shown) total instructions in shared programs: 17229439 -> 17229377 (<.01%) instructions in affected programs: 9859 -> 9797 (-0.63%) helped: 41 HURT: 0 helped stats (abs) min: 1 max: 6 x̄: 1.51 x̃: 1 helped stats (rel) min: 0.08% max: 11.54% x̄: 1.65% x̃: 0.67% 95% mean confidence interval for instructions value: -1.88 -1.14 95% mean confidence interval for instructions %-change: -2.48% -0.81% Instructions are helped. total cycles in shared programs: 360944145 -> 360942989 (<.01%) cycles in affected programs: 178167 -> 177011 (-0.65%) helped: 36 HURT: 19 helped stats (abs) min: 1 max: 222 x̄: 38.03 x̃: 5 helped stats (rel) min: 0.01% max: 31.01% x̄: 4.01% x̃: 0.45% HURT stats (abs) min: 1 max: 34 x̄: 11.21 x̃: 6 HURT stats (rel) min: 0.03% max: 2.74% x̄: 0.72% x̃: 0.50% 95% mean confidence interval for cycles value: -36.01 -6.02 95% mean confidence interval for cycles %-change: -4.18% -0.57% Cycles are helped. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 11:25:03 -07:00
Ian Romanick	7b4ff6a1af	nir: Mark ffma as 2src_commutative This doesn't make any real difference now, but future work (not in this series) will add a LOT of ffma patterns. Having to duplicate all of them for ffma(a, b, c) and ffma(b, a, c) is just terrible. No shader-db changes on any Intel platform. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 11:25:02 -07:00
Ian Romanick	e049a9c92b	nir: Add support for 2src_commutative ops that have 3 sources v2: Instead of handling 3 sources as a special case, generalize with loops to N sources. Suggested by Jason. v3: Further generalize by only checking that number of sources is >= 2. Suggested by Jason. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 11:25:02 -07:00
Ian Romanick	ede45bf9cf	nir: Rename commutative to 2src_commutative The meaning of the new name is that the first two sources are commutative. Since this is only currently applied to two-source operations, there is no change. A future change will mark ffma as 2src_commutative. It is also possible that future work will add 3src_commutative for opcodes like fmin3. v2: s/commutative_2src/2src_commutative/g. I had originally considered this, but I discarded it because I did't want to deal with identifiers that (should) start with 2. Jason suggested it in review, so we decided that _2src_commutative would be used in nir_opcodes.py. Also add some comments documenting what 2src_commutative means. Also suggested by Jason. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-14 11:25:02 -07:00
Jason Ekstrand	e99081e76d	intel/fs/ra: Spill without destroying the interference graph Instead of re-building the interference graph every time we spill, we modify it in place so we can avoid recalculating liveness and the whole O(n^2) interference graph building process. We make a simplifying assumption in order to do so which is that all spill/fill temporary registers live for the entire duration of the instruction around which we're spilling. This isn't quite true because a spill into the source of an instruction doesn't need to interfere with its destination, for instance. Not re-calculating liveness also means that we aren't adjusting spill costs based on the new liveness. The combination of these things results in a bit of churn in spilling. It takes a large cut out of the run-time of shader-db on my laptop. Shader-db results on Kaby Lake: total instructions in shared programs: 15311224 -> 15311360 (<.01%) instructions in affected programs: 77027 -> 77163 (0.18%) helped: 11 HURT: 18 total cycles in shared programs: 355544739 -> 355830749 (0.08%) cycles in affected programs: 203273745 -> 203559755 (0.14%) helped: 234 HURT: 190 total spills in shared programs: 12049 -> 12042 (-0.06%) spills in affected programs: 2465 -> 2458 (-0.28%) helped: 9 HURT: 16 total fills in shared programs: 25112 -> 25165 (0.21%) fills in affected programs: 6819 -> 6872 (0.78%) helped: 11 HURT: 16 Total CPU time (seconds): 2469.68 -> 2360.22 (-4.43%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	147665d0a2	intel/fs/ra: Put the VGRFs at the end of the nodes This is slightly less convenient in some places but it will make it much easier when we want to start adding nodes dynamically. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	e7b7d572b3	intel/fs/ra: Re-arrange interference setup The old code was arranged by the type of interference being added. It would set up payload registers and then add payload interference for all VGRFs. It would set up MRFs and add MRF interference for all VGRFs. This commit re-arranges things to be organized differently. It first creates and sets up all RA nodes and then groups interference into two new categories: live range and instruction interference. Once all the RA nodes have been set up, it walks the list of VGRFs and sets up their live range interference and then walks the list of instructions and sets up instruction interference. This new arrangement will be advantageous for a future patch but, at the moment, it cuts 2% off the run-time of shader-db on my laptop. Shader-db results on Kaby Lake: total instructions in shared programs: 15311224 -> 15311224 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 355544739 -> 355544739 (0.00%) cycles in affected programs: 0 -> 0 helped: 0 HURT: 0 Total CPU time (seconds): 2523.45 -> 2469.68 (-2.13%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	0fd60e95fb	intel/fs/ra: Do the spill loop inside RA Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	47b1dcdcab	intel/fs/ra: Only add MRF hack interference if we're spilling The only use of the MRF hack these days is for spilling and there we don't need the precise MRF usage information. If we're spilling then we know pretty well how many MRFs are going to be used. It is possible if the only things that are spilled have fewer SIMD channels than the dispatch width of the shader that this may be more MRFs than needed. That's a risk we're willing to takd. Shader-db results on Kaby Lake: total instructions in shared programs: 15311100 -> 15311224 (<.01%) instructions in affected programs: 16664 -> 16788 (0.74%) helped: 1 HURT: 5 total cycles in shared programs: 355543197 -> 355544739 (<.01%) cycles in affected programs: 731864 -> 733406 (0.21%) helped: 3 HURT: 6 The hurt shaders are all SIMD32 compute shaders where we reserve enough space for a 32-wide spill/fill but don't need it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	69878a9bb0	intel/fs/ra: Pull the guts of RA into its own class This accomplishes two things. First, it makes interfaces which are really private to RA private to RA. Second, it gives us a place to store some common stuff as we go through the algorithm. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	9e00a251be	intel/fs/ra: Move assign_regs further down in the file It's the main function from which all the other functions are called. It belongs at the bottom. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	5d9ac57c8c	intel/fs/ra: Split building the interference graph into a helper Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	472ef2f98d	intel/fs/ra: Initialize grf_used with first_non_payload_grf There's no reason why we need to use the calculated payload_node_count value which is just first_non_payload_grf aligned up. The grf_used value will be aligned up to 16 anyway (which is a much bigger alignment) before being handed off to hardware. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	096ad8a809	intel/fs/ra: Stop adding RA interference to too many SENDS nodes We only have one node per VGRF so this was adding way too much interference. No idea how we didn't catch this before. Shader-db results on Kaby Lake: total instructions in shared programs: 15311100 -> 15311100 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 355468050 -> 355543197 (0.02%) cycles in affected programs: 2472492 -> 2547639 (3.04%) helped: 17 HURT: 20 Fixes: `014edff0d2` "intel/fs: Add interference between SENDS sources" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	5911abd76f	util/ra: Assert nodes are in-bounds in add_node_interference Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	88cac12230	intel/fs/ra: Only add dest interference to sources that exist Fixes: `83dedb6354` "i965: Add src/dst interference for certain" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	e291cd8a7e	util/ra: Don't destroy the graph in ra_allocate() We want to be able to call ra_allocate() and, when it fails, mutate the graph and try again rather than re-building the graph from scratch. This commit moves all the scratch bits except the final register allocation (which is really an out value not scratch) into sub-structs named "tmp" to make it clear which things are scratch. It also adds bits to the ra_select() initialization loop to initialize things (since we can't trust rzalloc anymore) and copy q_test and forced_reg over. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	9040215f5d	util/ra: Add a helper for resetting a node's interference Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	698bb9b984	util/ra: Add helpers for adding nodes to an interference graph Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	6c0f75c953	util/ralloc: Add helpers for growing zero-initialized memory Unfortunately, we can't quite follow the standard C conventions for these because ralloc doesn't know the sizes of pointers. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	6212326941	intel/fs: Stop doing extra RA calls In the last phase of the schedule and RA loop, the RA call is redundant if we spill. Immediately afterwards, we're going to see that we couldn't allocate without spilling and call back into RA and tell it to go ahead and spill. We've known about it for a while but we've always brushed over it on the theory that, if you're going to spill, you'll be calling RA a bunch anyway and what does one extra RA hurt? As it turns out, it hurts more than you'd expect. Because the RA interference graph gets sparser with each spill and the RA algorithm is more efficient on sparser graphs, the RA call that we're duplicating is actually the most expensive call in the RA-and-spill loop. There's another extra RA call we do that's a bit harder to see which this also removes. If we try to compile a shader that isn't the minimum dispatch width and it fails to allocate without spilling we call fail() to set an error but then go ahead and do the first spilling RA pass and only after that's complete do we detect the fail and bail out. By making minimum dispatch widths part of the spill condition, we side-step this problem. Getting rid of these extra spills takes the compile time of a nasty Aztec Ruins shader from about 28 seconds to about 26 seconds on my laptop. It also makes shader-db 1.5% faster Shader-db results on Kaby Lake: total instructions in shared programs: 15311100 -> 15311100 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 355468050 -> 355468050 (0.00%) cycles in affected programs: 0 -> 0 helped: 0 HURT: 0 Total CPU time (seconds): 2524.31 -> 2486.63 (-1.49%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	41b310e219	util/ra: Improve the performance of ra_simplify The most expensive part of register allocation is the ra_simplify step which is a fixed-point algorithm with a worst-case complexity of O(n^2) which adds the registers to a stack which we then use later to do the actual allocation. This commit uses bit sets and changes the core loop of ra_simplify to first walk 32-node chunks and then walk each chunk. This lets us skip whole 32-node chunks in one go based on bit operations and compute the minimum q value potentially 32x as fast. Of course, the algorithm still has the same fundamental O(n^2) run-time but the constant is now much lower. In the nasty Aztec Ruins compute shader, this shaves a full four seconds off the 30s compile time for a release build of mesa. In a debug build (needed for accurate stack traces), perf says that ra_select takes 20% of runtime before this patch and only 5-6% of runtime after this patch. It also makes shader-db runs faster. Shader-db results on Kaby Lake: total instructions in shared programs: 15311100 -> 15311100 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 total cycles in shared programs: 355468050 -> 355468050 (0.00%) cycles in affected programs: 0 -> 0 helped: 0 HURT: 0 Total CPU time (seconds): 2602.37 -> 2524.31 (-3.00%) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	e1511f1d4c	util/ra: Only update q_total if the reg is not assigned We only use q_total if the reg is not assigned so there's no point in updating it if the reg is not assigned. This has no known perf benefit but it will reduce churn in a future commit. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	9d6d1f47e7	util/ra: Only update best_optimistic_node if !progress This shaves about half a second off the 30 second compile time of one of the compute shaders in Aztec ruins. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	de56d3a2d1	util/ra: Make in_stack a bitset in the graph Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-14 12:30:22 -05:00
Jason Ekstrand	7720ad65ae	util/ra: Get rid of tabs Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-14 12:30:22 -05:00
Chia-I Wu	34810f4237	virgl: clean up virgl_res_needs_flush Add comments and some minor cleanups. v2: document the function Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> (v1) Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org> Signed-off-by: Chia-I Wu <olvaffe@gmail.com>	2019-05-14 17:00:22 +00:00
Chia-I Wu	08241624ad	virgl: comment on a sync issue in transfers Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-05-14 17:00:22 +00:00
Chia-I Wu	76e45534d2	virgl: PIPE_TRANSFER_READ does not imply flush virgl_res_needs_flush should suffice. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-05-14 17:00:22 +00:00
Chia-I Wu	9f8521882a	virgl: do not skip readback because of explicit flush Both apps and we (see virgl_buffer_transfer_flush_region) might flush regions that are unmodified. We have to read back for those flushes. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-05-14 17:00:22 +00:00
Chia-I Wu	be8eeb3b59	virgl: remove unused virgl_transfer_inline_write It currently has no user and is probably incorrect (resource_wait is required in some more cases). Remove it so that we can focus on transfers first. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-05-14 17:00:22 +00:00
Nanley Chery	e81392868e	iris/resource: Drop redundant checks for aux support Drop some checks that are already done by ISL. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-05-14 16:23:12 +00:00
Nanley Chery	75a3947af4	iris/resource: Fall back to no aux if creation fails No surface requires an auxiliary surface to operate correctly. Fall back to an uncompressed surface if mesa fails to create and allocate an auxiliary surface. This enables adding more restrictions to ISL without having to update iris. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-05-14 16:23:12 +00:00
Nanley Chery	1423b78633	i965/miptree: Refactor intel_miptree_supports_ccs_e() Update and rename this function to format_supports_ccs_e() to better match its behavior. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-05-14 16:23:12 +00:00
Nanley Chery	779bd8d332	i965/miptree: Drop intel_*_supports_hiz() intel_tiling_supports_hiz() and intel_miptree_supports_hiz() duplicate much the work done by isl_surf_get_hiz_surf(). Replace them with simple expressions. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-05-14 16:23:12 +00:00
Nanley Chery	29a13eb71d	isl: Add restrictions to isl_surf_get_hiz_surf() Import some restrictions from intel_tiling_supports_hiz() and intel_miptree_supports_hiz(). Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-05-14 16:23:12 +00:00
Nanley Chery	942755bec4	i965/miptree: Drop intel_*_supports_ccs() intel_tiling_supports_ccs() and intel_miptree_supports_ccs() duplicate much the work done by isl_surf_get_ccs_surf(). Drop them both and index a boolean array to choose CCS_D in intel_miptree_choose_aux_usage(). Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-05-14 16:23:12 +00:00
Nanley Chery	d57242190e	isl: Add restriction and comments to isl_surf_get_ccs_surf() Import some restrictions and comments from intel_miptree_supports_ccs(). Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-05-14 16:23:12 +00:00
Nanley Chery	91a42537d1	i965/miptree: Drop intel_miptree_supports_mcs() This function duplicates much the work done by isl_surf_get_mcs_surf(). Replace it with a simple expression. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-05-14 16:23:12 +00:00
Nanley Chery	1de089797c	isl: Modify restrictions in isl_surf_get_mcs_surf() Import some restrictions from intel_miptree_supports_mcs() and don't assume that the caller knows which device generations are supported. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-05-14 16:23:12 +00:00
Nanley Chery	cf758c4182	i965/miptree: Fall back to no aux if creation fails No surface requires an auxiliary surface to operate correctly. Fall back to an uncompressed surface if mesa fails to create and allocate an auxiliary surface. This enables adding more restrictions to ISL without having to update i965. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-05-14 16:23:12 +00:00
Mathias Fröhlich	fc455797c1	mesa: Set _NEW_VARYING_VP_INPUTS iff varying_vp_inputs are set. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-05-14 18:09:49 +02:00
Mathias Fröhlich	b4b1df5a17	mesa: Avoid setting _NEW_VARYING_VP_INPUTS in non fixed function mode. Instead of checking the API variant on entry of set_varying_vp_inputs to check if we can ever be interrested in fixed function processing or not, we can check if we are actually fixed function processing. To check this we can use the immediately updated gl_context::VertexProgram._VPMode value that tells us if we have a user provided shader program or if we are in fixed function processing either through an internal TNL shader of directly through hardware. When doing so, we also need to recheck the varying_vp_inputs variable at the time gl_context::VertexProgram._VPMode is set to VP_MODE_FF. Put asserts at the consumers of gl_context::varying_vp_inputs to make sure gl_context::VertexProgram._VPMode is set to VP_MODE_FF. By that gl_context::varying_vp_inputs should be up to date then. By not looking at the opengl api for this decision we should actually catch more cases where we can avoid setting a state change flag, including the ones where we cannot get into VP_MODE_FF by the choice of the api. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-05-14 18:09:49 +02:00
Mathias Fröhlich	663f93c869	mesa: Fix test for setting the _NEW_VARYING_VP_INPUTS flag. The precondition stated in the comment is not true. The values mentioned are only set from _mesa_update_state which in turn may not yet be called. For now set the _NEW_VARYING_VP_INPUTS flag a bit more often, we will narrow that down to a minimum again in a later patch. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-05-14 18:09:49 +02:00
Mathias Fröhlich	df50af19d3	mesa: Make _mesa_set_varying_vp_inputs static in state.c. Is no longer used outside that file. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-05-14 18:09:49 +02:00
Mathias Fröhlich	99952579f3	mesa: Fix old outdated variable name in a comment. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-05-14 18:09:49 +02:00
Mathias Fröhlich	e634ba5116	mesa/vbo: Update Comment to what is actually happening. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-05-14 18:09:49 +02:00
Jonas Ådahl	903ad59407	wayland/egl: Ensure correct buffer size when allocating Whenever a buffer is allocated, e.g. by the first draw call or EGL call after a buffer swap, make sure the size is up to date. Prior to this commit, we failed to do so when querying the buffer age, or swapping buffers without any prior EGL call or draw call. Signed-off-by: Jonas Ådahl <jadahl@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-14 15:33:35 +00:00
Paulo Zanoni	73055ae1c9	egl: check if a window/pixmap is already used on surface creation The spec says we can't create another surface if we already created a surface with the given window or pixmap. Implement this check. This behavior is exercised by piglit/egl-create-surface. Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-05-14 12:41:14 +00:00
Paulo Zanoni	04ecda3b3c	egl: store the native surface pointer in struct _egl_surface Each platform stores this in a different place: - platform_drm uses dri2_surf->gbm_surf->base - platform_android uses dri2_surf->window - platform_wayland uses dri2_surf->wl_win - platform_x11 uses dri2_surf->drawable - platform_x11_dri3 uses dri3_surf->loader_drawable.drawable - haiku doesn't even store it! We need access to the native surface since the specification asks us to refuse creating a new surface if there's already an EGLSurface associated with native_surface. An alternative to this patch would be to create a new API.GetNativeWindow callback that each platform would have to implement. While that's something we can definitely do, I prefer this approach. Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2019-05-14 12:41:14 +00:00
Samuel Pitoiset	9520e7c1e9	radv: add support for VK_KHR_uniform_buffer_standard_layout Nothing to do. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-14 09:15:28 +02:00
Gert Wollny	865b9ddae4	softpipe/buffer: load only as many components as the the buffer resource type provides Otherwise we risk to read past the end of the buffer. In addition, change the loop counters to unsigned to be consistent with the types. Fixes: `afa8707ba9` softpipe: add SSBO/shader atomics support. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-05-14 06:49:43 +00:00
Tomeu Vizoso	1050273094	panfrost: ci: Reduce batch size to 3000 As with the previous value of 5000 we seemed to be reaching OOM in some circumstances. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-14 07:43:11 +02:00
Tomeu Vizoso	9beb8aedeb	panfrost: ci: Update expectations Since last Friday, these two tests have been fixed: dEQP-GLES2.functional.shaders.functions.control_flow.return_in_nested_loop_fragment dEQP-GLES2.functional.shaders.linkage.varying_7 Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-14 07:43:06 +02:00
Eric Anholt	db329260bf	freedreno: Fix warning on printing a uint64_t using %llx. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-13 15:37:01 -07:00
Eric Anholt	40dd28acc3	freedreno: Silence compiler warnings about "" in boolean context. It sure looks like we just want both of them to be nonzero, and && is probably going to be cheaper than anyway. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-13 15:37:01 -07:00
Eric Anholt	06168d3f6a	freedreno: Silence compiler warnings about uninit 'layers' My gcc can't see that the uninitialized value from the PIPE_BUFFER case isn't used from the !PIPE_BUFFER cases later. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-13 15:37:01 -07:00
Eric Anholt	c49f0159bd	freedreno: Quiet compiler warnings on 64-bit. __u64 is a ulonglong on x86_64, not uint64_t, so my gcc was complaining about the wrong type being passed in. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-13 15:37:01 -07:00
Eric Anholt	0734905d9a	freedreno: Make emacs indent the way robclark's eclipse does. The .editorconfig helps with the tabs, but we've got this two-tabs-from-previous-indentation line continuation style that requires whacking the c-file-offsets. This will throw emacs warnings when first opening a file in the directory, press '!' to shut it up for the future. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-13 15:37:01 -07:00
Eric Anholt	257999d9a8	freedreno: Make .editorconfig match .dir-locals.el. The editorconfig takes precedence over dir-locals in emacs26 with editorconfig enabled, so the /.editorconfig was affecting these directories. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-13 15:37:01 -07:00
Jason Ekstrand	0745d4bd96	anv: Implement VK_KHR_uniform_buffer_standard_layout There's no real work to do here since we already support scalar block layout which is a direct superset of what this extension allows. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-13 17:20:33 -05:00
Jason Ekstrand	b464504777	vulkan: Update the XML and headers to 1.1.108 Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-13 17:20:33 -05:00
Jason Ekstrand	072227da0a	tu/entrypoints: Import copy It's used without being imported	2019-05-13 17:20:33 -05:00
Karol Herbst	fc800af83b	nv50/ir/nir: make use of SYSTEM_VALUE_MAX when iterating read sysvals Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Pierre Moreau <dev@pmoreau.org>	2019-05-13 23:40:40 +02:00
Karol Herbst	358e52383c	nv50/ir/nir: prefer to shift 1ull instead of 1ll Signed-off-by: Karol Herbst <kherbst@redhat.com> Suggested-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Pierre Moreau <dev@pmoreau.org>	2019-05-13 23:40:40 +02:00
Bas Nieuwenhuizen	1619f20883	radv: Clean up signalled and submitted fields from winsys fences. Other types like syncobj do not need it, so lets make things a bit more uniform. Also reduce confusion what the signalled/submitted referred to (especially with imported fences) Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-05-13 20:36:29 +00:00
Samuel Pitoiset	5555db103e	radv: bump reported version to 1.1.107 VK_AMD_draw_indirect_count has been promoted with the suffix changed to KHR. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-13 21:38:01 +02:00
Eric Anholt	60a64f028d	v3d: Use driconf to expose non-MSAA texture limits for Xorg. The V3D 4.2 HW has a limit to MSAA texture sizes of 4096. With non-MSAA, we can go up to 7680 (actually probably 8138, but that hasn't been validated by the HW team). Exposing 7680 in X11 will allow dual 4k displays.	2019-05-13 12:03:11 -07:00
Eric Anholt	0c31fe9ee7	gallium: Redefine the max texture 2d cap from _LEVELS to _SIZE. The _LEVELS assumes that the max is always power of two. For V3D 4.2, we can support up to 7680 non-power-of-two MSAA textures, which will let X11 support dual 4k displays on newer hardware. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-13 12:03:08 -07:00
Eric Anholt	f33cb272f0	mesa: Replace MaxTextureLevels with MaxTextureSize. In most places (glGetInteger, max_legal_texture_dimensions), we wanted the number of pixels, not the number of levels. Number of levels is easily recovered with util_next_power_of_two() and ffs(). More importantly, for V3D we want to be able to expose a non-power-of-two maximum texture size to cover 2x4k displays on HW that can't quite do 8192 wide. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-13 12:03:05 -07:00
Eric Anholt	ce6dbc0417	mesa: Remove proxy image checks for maximum level. We've already verified this by _mesa_legal_texture_dimensions() before this call. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-13 12:03:03 -07:00
Eric Anholt	d88f3392ff	mesa: Reuse _mesa_max_texture_levels() instead of open-coding it. The shared function has some extension presence checks, but other than that has the same switch statement contents. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-13 12:02:59 -07:00
Vinson Lee	20b42fad9b	intel/tools: Fix build with glibc < 2.27. glibc < 2.27 defines OVERFLOW in /usr/include/math.h. This patch fixes this build error. In file included from ../include/c99_math.h:37:0, from ../src/util/u_math.h:44, from ../src/mesa/main/macros.h:35, from ../src/intel/compiler/brw_reg.h:47, from ../src/intel/tools/i965_asm.h:32, from ../src/intel/tools/i965_gram.y:29: src/intel/tools/i965_gram.tab.c:562:5: error: expected identifier before numeric constant OVERFLOW = 412, ^ Fixes: `70308a5a8a` ("intel/tools: New i965 instruction assembler tool") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110656 Signed-off-by: Vinson Lee <vlee@freedesktop.org> Acked-by: Eric Engestrom <eric@engestrom.ch>	2019-05-13 11:05:48 -07:00
Marek Olšák	84816d1464	st/mesa: enable the ST_DEBUG env var in release and debugoptimized builds Useful for dumping shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-13 13:01:01 -04:00
Nicolai Hähnle	d814c21b1b	radeonsi: overhaul the vertex fetch fixup mechanism The overall goal is to support unaligned loads from vertex buffers natively on SI. In the unaligned case, we fall back to the general case implementation in ac_build_opencoded_load_format. Since this function is fully general, we will also use it going forward for cases requiring fully manual format conversions of dwords anyway. This requires a different encoding of the fix_fetch array, which will now contain the entire format information if a fixup is required. Having to check the alignment of vertex buffers is awkward. To keep the impact on the fast path minimal, the si_context will keep track of which vertex buffers are (not) at least dword-aligned, while the si_vertex_elements will note which vertex buffers have some (at most dword) alignment requirement. Vertex buffers should be dword-aligned most of the time, which allows a fast early-out in almost all cases. Add the radeonsi_vs_fetch_always_opencode configuration variable for testing purposes. Note that it can only be used reliably on LLVM >= 9, because support for byte and short load is required. v2: - add a missing check to si_bind_vertex_elements Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-13 17:07:23 +02:00
Nicolai Hähnle	8a951c3d2f	radeonsi: store sctx->vertex_elements in a local in si_shader_selector_key_vs Purely as a shorthand in the remainder of the function. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-13 17:07:23 +02:00
Nicolai Hähnle	81fe33735a	amd/common: add ac_build_opencoded_fetch_format Implement software emulation of buffer_load_format for all types required by vertex buffer fetches. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-13 17:07:23 +02:00
Jason Ekstrand	712f99934c	nir/validate: Use a single set for SSA def validation The current SSA def validation we do in nir_validate validates three things: 1. That each SSA def is only ever used in the function in which it is defined. 2. That an nir_src exists in an SSA def's use list if and only if it points to that SSA def. 3. That each nir_src is in the correct use list (uses or if_uses) based on whether it's an if condition or not. The way we were doing this before was that we had a hash table which provided a map from SSA def to a small ssa_def_validate_state data structure which contained a pointer to the nir_function_impl and two hash sets, one for each use list. This meant piles of allocation and creating of little hash sets. It also meant one hash lookup for each SSA def plus one per use as well as two per src (because we have to look up the ssa_def_validate_state and then look up the use.) It also involved a second walk over the instructions as a post-validate step. This commit changes us to use a single low-collision hash set of SSA sources for all of this by being a bit more clever. We accomplish the objectives above as follows: 1. The list is clear when we start validating a function. If the nir_src references an SSA def which is defined in a different function, it simply won't be in the set. 2. When validating the SSA defs, we walk the uses and verify that they have is_ssa set and that the SSA def points to the SSA def we're validating. This catches the case of a nir_src being in the wrong list. We then put the nir_src in the set and, when we validate the nir_src, we assert that it's in the set. This takes care of any cases where a nir_src isn't in the use list. After checking that the nir_src is in the set, we remove it from the set and, at the end of nir_function_impl validation, we assert that the set is empty. This takes care of any cases where a nir_src is in a use list but the instruction is no longer in the shader. 3. When we put a nir_src in the set, we set the bottom bit of the pointer to 1 if it's the condition of an if. This lets us detect whether or not a nir_src is in the right list. When running shader-db with an optimized debug build of mesa on my laptop, I get the following shader-db CPU times: With NIR_VALIDATE=0 3033.34 seconds Before this commit 20224.83 seconds After this commit 6255.50 seconds Assuming shader-db is a representative sampling of GLSL shaders, this means that making this change yields an 81% reduction in the time spent in nir_validate. It still isn't cheap but enabling validation now only increases compile times by 2x instead of 6.6x. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-13 14:43:47 +00:00
Jason Ekstrand	bab08c791d	util/set: Add a helper to resize a set Often times you don't know how big a set will be and you want the code to just grow it as needed. However, sometimes you do know and you can avoid a lot of rehashing if you just specify a size up-front. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-13 14:43:47 +00:00
Jason Ekstrand	abb450870e	util/set: Add a search_and_add function This function is identical to _mesa_set_add except that it takes an extra out parameter that lets the caller detect if a replacement happened. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-13 14:43:47 +00:00
Jason Ekstrand	460567eabf	nir/validate: Use a ralloc context for our temporary data All of our hash tables and sets are already using ralloc. There's really no good reason why we don't just make a ralloc context rather than try to remember to clean everything up manually. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-05-13 14:43:47 +00:00
Patrick Lerda	6963f59cae	lima: add Allwinner H5 support The H5 hardware variant requires a specific plb_max_blk number. This value can't be probed at the hardware level. Signed-off-by: Patrick Lerda <patrick9876@free.fr> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-05-13 13:32:55 +02:00
Patrick Lerda	38c5a5a8b5	lima: refactor plb_max_blk Move plb_max_blk to lima_screen, and add a new debug option: LIMA_PLB_MAX_BLK Signed-off-by: Patrick Lerda <patrick9876@free.fr> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-05-13 13:32:55 +02:00
Bas Nieuwenhuizen	f53ebfb450	radv: Do not use extra descriptor space for the 3rd plane. While ImageFormatProperties returns the number of internal descriptors, it turns out that applications do not need to actually allocate more descriptors in the descriptor pool. So if we make descriptors with more planes larger we have to be convervative and always allocate space for the larger descriptors which is a waste given the low usage of this ext. So let us make use of the fact that 3plane formats all have the same formats & dimensions for the last two planes. This way we only need the first half of the descriptor of the 3rd plane and can share the second half of the second plane. This allows us to use 16 bytes for the descriptor which nicely fits into the 16 bytes that are unused right next to the sampler. Fixes: `5564c38212` "radv: Update descriptor sets for multiple planes." Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-05-12 23:02:44 +00:00
Bas Nieuwenhuizen	d6dfb2cf50	radv: Add support for icd loader interface v4. Adds support for physical device functions unknown to the loader. Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-05-13 00:41:31 +02:00
Alyssa Rosenzweig	726f0263e1	panfrost/midgard: Handle csel correctly We use an algebraic pass for the csel optimizations, and use proper vectorized csel ops (i/fcsel_v) for mixed, rather lowering. To avoid regressions along the way, we fix an issue with the copy propagation pass (it should not attempt to propagate constants). Similarly, we take care to break bundles when using csel to fix some scheduler corner cases. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-12 22:21:49 +00:00
Illia Iorin	a35269cf44	iris: Implement ARB_indirect_parameters iris_draw_vbo is divided into two functions to remove unnecessary operations from the loop. This implementation of ARB_indirect_parameters takes into account NV_conditional_render by saving MI_PREDICATE_RESULT at the start of a draw call and restoring it at the end also the result of NV_conditional_render is taken into account when computing predicates that limit draw calls for ARB_indirect_parameters in a similar way to `1952fd8d` in ANV. v2: Optimize indirect draws (suggested by Kenneth Graunke) v3: (by Kenneth Graunke) - Fix an issue where indirect draws wouldn't set patch information before updating the compiled TCS. - Move some code back to iris_draw_vbo to avoid duplicating it. - Fix minor indentation issues. Signed-off-by: Illia Iorin <illia.iorin@globallogic.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-11 23:56:52 -07:00
Kenneth Graunke	21a0be4a79	iris: Split iris_update_draw_info into two functions. Shader draw parameters need updating on each iteration of a multidraw loop, but the primitive based information only needs to be updated once. Also, patch information needs to be recorded before filling out the TCS program key, as it determines the number of HS instances.	2019-05-11 23:54:15 -07:00
Ruslan Kabatsayev	974c4d679c	nir: Fix wrong sign in lower_rcp The nested fma calls were supposed to implement x_new = x + x * (1 - xsrc), but instead current code is equivalent to x_new = x - x (1 - x*src). The result is that Newton-Raphson steps don't improve precision at all. This patch fixes this problem. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110435 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-11 09:25:22 -07:00
Mike Blumenkrantz	7b2468bf6e	intel: drop misleading driver name from gen_get_device_info()	2019-05-11 04:14:06 +00:00
Józef Kucia	24af0f1318	radv: clear vertex bindings while resetting command buffer Only vertex inputs accessed by vertex shader must have valid buffers bound. Signed-off-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Fixes: `5010436e09` "radv: bail out when binding the same vertex buffers"	2019-05-11 02:51:00 +02:00
Marek Olšák	83435e748f	st/mesa: fix 2 crashes in st_tgsi_lower_yuv src/mesa/state_tracker/st_tgsi_lower_yuv.c:68: void reg_dst(struct tgsi_full_dst_register , const struct tgsi_full_dst_register , unsigned int): assertion "dst->Register.WriteMask" failed The second crash was due to insufficient allocated size for TGSI instructions. Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-05-10 20:51:16 -04:00
Kenneth Graunke	72ccefb529	iris: Use full ways for L3 cache setup on Icelake. Anuj fixed this in i965 and anv, but the fix never landed in iris. Fixes tessellation corruption on Icelake. Thanks to Rafael for bisecting this and tracking it down. Fixes: `d0996d5fab` iris: Emit default L3 config for the render pipeline Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-05-10 16:50:14 -07:00
Caio Marcelo de Oliveira Filho	3610081daa	anv: Fix limits when VK_EXT_descriptor_indexing is used Update various limits in VkPhysicalDeviceDescriptorIndexingPropertiesEXT that were previously zero to their values from VkPhysicalDeviceLimits. When using VK_EXT_descriptor_indexing, the former limits will apply to all the descriptor layout sets -- not only those using the new feature bits. For the reference, VK_EXT_descriptor_indexing says "There are new descriptor set layout and descriptor pool creation flags that are required to opt in to the update-after-bind functionality, and there are separate maxPerStage* and maxDescriptorSet* limits that apply to these descriptor set layouts which may be much higher than the pre-existing limits. The old limits only count descriptors in non-updateAfterBind descriptor set layouts, and the new limits count descriptors in all descriptor set layouts in the pipeline layout." Fixes: `6e230d7607` "anv: Implement VK_EXT_descriptor_indexing" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-10 15:15:11 -07:00
Lionel Landwerlin	ad2b4aa378	vulkan/overlay: keep allocating draw data until it can be reused The original implementation assumed that we could allocate the same amount of command buffers as the number of images in the swapchain. But the application could potentially render much faster and rerender into images that have been submitted for presentation but not yet presented. This change keeps on allocating command buffers, vertex buffer, vertex indices as well as a semaphore and a fence for as long as we can't reuse a previously submitted one. This fixes rendering issues in the overlay at high frame rates. v2: Don't recreate semaphores constantly (Józef) v3: Drop useless surface & FreeCommandBuffers (Józef) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110655 Cc: 19.1 <mesa-stable@lists.freedesktop.org> Reviewed-by: Józef Kucia <joseph.kucia@gmail.com>	2019-05-10 21:54:48 +01:00
Lionel Landwerlin	877b371cbb	vulkan/overlay: fix truncating error on 32bit platforms Non dispatchable handles can be uint64_t. When compiling the layer on a 32bit platform, this will lead to casting uint64_t into (void *) which is 32bit, leading to incorrect handles being mapped internally in the layer. v2: Use more HKEY() (Eric) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reported-by: Józef Kucia <joseph.kucia@gmail.com> Fixes: `2d2927938f` ("vulkan/overlay-layer: fix cast errors") Reviewed-by: Józef Kucia <joseph.kucia@gmail.com>	2019-05-10 21:54:48 +01:00
Kenneth Graunke	3f60810de0	i965: Fix memory leaks in brw_upload_cs_work_groups_surface(). This was taking a reference to the 64kB upload buffer and never returning it, leaking a reference each time this atom triggered. This leaked lots of 64kB upload BOs, eventually running us out of of VMA space. This would usually happen when using mpv to watch a movie, after 20-40 minutes. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110134 Fixes: `63d7b33f51` i965/cs: Setup surface binding for gl_NumWorkGroups Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-10 12:50:19 -07:00
Julien Isorce	98b852cd07	st/va: set the visible image dimensions in vlVaDeriveImage This fixes video being rendered incorrectly. User wants height of 360 but internally pipe_video_buffer 's height is 368 in the test below. Test: GST_GL_PLATFORM=egl gst-launch-1.0 videotestsrc ! video/x-raw, width=868, height=360, format=NV12 ! vaapipostproc ! glimagesink Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110443 Signed-off-by: Julien Isorce <jisorce@oblong.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2019-05-10 17:13:31 +00:00
Alyssa Rosenzweig	292187afcc	swrast: Rename blend_func->swrast_blend_func This avoids a conflict with the new (driver-agnostic) blend_func enum in shader_enum.h, which broke the build of swrast (and i965 by extension). My apologies :( Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Fixes: `f41be53a` ("compiler: Add enums for blend state") Cc: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-10 09:34:55 -07:00
Eric Engestrom	6e5728e5c9	travis: fix syntax, and drop unused stuff Fixes: `a988d95389` "ci: Delete autotools build jobs" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-10 17:26:53 +01:00
Alyssa Rosenzweig	006cafc243	nir: Add blend_const_color_rgba sysval This represents a float vec4 constant color, as passed to glBlendColor. While the existing 4 shader sysvals are retained to minimize code churn, a single vectorized intrinsic is required for efficient blending on vector architectures. (This may also apply to archictectures like Bifrost where ALU is scalar but load/store is vector; it largely depends on how blending is implemented per-driver.) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-10 15:49:28 +00:00
Alyssa Rosenzweig	6b0472b181	gallium: Add helper to convert PIPE blending to shader_enum style Complementing the new API-agnostic shader_enum blending style, we add helpers to translate between the two forms. Ideally, we could just use PIPE blending directly, but that makes Vulkan support challenging. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-10 15:49:16 +00:00
Alyssa Rosenzweig	f41be53a17	compiler: Add enums for blend state We add enums corresponding to (GLES) blend state to shader_enums.h, complementing the existing advanced blending enums in the file. This allows us to represent blending state in a driver-agnostic, API-agnostic way to permit lowering. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Eric Anholt <eric@anholt.net>	2019-05-10 15:49:01 +00:00
Jonathan Marek	d0bff89159	nir: allow specifying a set of opcodes in lower_alu_to_scalar This can be used by both etnaviv and freedreno/a2xx as they are both vec4 architectures with some instructions being scalar-only. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-10 15:10:41 +00:00
Jason Ekstrand	f8bda81887	intel/fs/copy-prop: Don't walk all the ACPs for each instruction In order to set up KILL sets, the dataflow code was walking the entire array of ACPs for every instruction. If you assume the number of ACPs increases roughly with the number of instructions, this is O(n^2). As it turns out, regions_overlap() is not nearly as cheap as one would like and shows up as a significant chunk on perf traces. This commit changes things around and instead first builds an array of exec_lists which it uses like a hash table (keyed off ACP source or destination) similar to what's done in the rest of the copy-prop code. By first walking the list of ACPs and populating the table and then walking instructions and only looking at ACPs which probably have the same VGRF number, we can reduce the complexity to O(n). This takes the execution time of the piglit vs-isnan-dvec test from about 56.4 seconds on an unoptimized debug build (what we run in CI) with NIR_VALIDATE=0 to about 38.7 seconds. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-10 09:10:17 -05:00
Jason Ekstrand	20bbc175a4	intel/fs/copy-prop: Purge unused ACPs If the destination of an ACP entry exists only within this block, then there's no need to keep it for dataflow analysis. We can delete it from the out_acp table and avoid growing the bitsets any bigger than we absolutely have to. This reduces the maximum number of global ACP entries in the vs-isnan-dvec with software fp64 on Kaby Lake from 8630 to 3942 and takes the execution time of the piglit vs-isnan-dvec test from about 1:16.2 on an unoptimized debug build (what we run in CI) with NIR_VALIDATE=0 to about 56.4 seconds. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-10 09:10:17 -05:00
Jason Ekstrand	0b6da5bac6	intel/fs/copy-prop: Bump the hash table size to 64 While the number of ACPs is generally not huge compared to the number of blocks, 16 does seem a bit small. Bumping it to 64 takes the execution time of the piglit vs-isnan-dvec test from about 1:18.1 on an unoptimized debug build (what we run in CI) with NIR_VALIDATE=0 to about 1:16.2. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-10 09:10:17 -05:00
Leo Liu	ceba9ff294	winsys/amdgpu: add VCN JPEG to no user fence group There is no user fence for JPEG, the bug triggering kernel WARN_ON(flags & AMDGPU_FENCE_FLAG_64BIT) Signed-off-by: Leo Liu <leo.liu@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: mesa-stable@lists.freedesktop.org	2019-05-10 08:24:49 -04:00
Qiang Yu	e2fc0c4a0c	lima: fix width 4096 resolution GP fail When width=4096 and shift_w=0, block_w=0x100 which overflow the PLBU_CMD 8 bits for it. Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com>	2019-05-10 16:07:40 +08:00
Tomeu Vizoso	1b97d9c180	panfrost: Add CAPFs for conservative rasterization Just do what everybody else but Nouveau does and return 0.0f. This prevents the repeated logging of these messages on startup: Unexpected PIPE_CAPF 6 query Unexpected PIPE_CAPF 7 query Unexpected PIPE_CAPF 8 query Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-10 07:40:52 +02:00
Tomeu Vizoso	c3538ab570	panfrost: Only take the fast paths on buffers aligned to block size As the functions operate on 16-byte blocks. Fixes this Valgrind error: Invalid read of size 4 at 0x5857568: swizzle_bpp1_align16 (pan_swizzle.c:85) by 0x585780F: panfrost_texture_swizzle (pan_swizzle.c:171) by 0x584F587: panfrost_tile_texture (pan_resource.c:489) by 0x584F641: panfrost_transfer_unmap (pan_resource.c:525) by 0x587718D: u_transfer_helper_transfer_unmap (u_transfer_helper.c:516) by 0x5875D85: pipe_transfer_unmap (u_inlines.h:515) by 0x5875F13: u_default_texture_subdata (u_transfer.c:80) by 0x53FFDC3: st_TexSubImage (st_cb_texture.c:1480) by 0x54005BB: st_TexImage (st_cb_texture.c:1709) by 0x5391353: teximage (teximage.c:3105) by 0x5391353: teximage_err (teximage.c:3132) by 0x5391B9B: _mesa_TexImage2D (teximage.c:3170) by 0x5097A77: shared_dispatch_stub_183 (glapi_mapi_tmp.h:18833) Address 0x1e94f1e8 is 0 bytes after a block of size 16 alloc'd at 0x483F5C8: malloc (vg_replace_malloc.c:299) by 0x584F47D: panfrost_transfer_map (pan_resource.c:467) by 0x587694D: u_transfer_helper_transfer_map (u_transfer_helper.c:243) by 0x5875EA7: u_default_texture_subdata (u_transfer.c:59) by 0x53FFDC3: st_TexSubImage (st_cb_texture.c:1480) by 0x54005BB: st_TexImage (st_cb_texture.c:1709) by 0x5391353: teximage (teximage.c:3105) by 0x5391353: teximage_err (teximage.c:3132) by 0x5391B9B: _mesa_TexImage2D (teximage.c:3170) by 0x5097A77: shared_dispatch_stub_183 (glapi_mapi_tmp.h:18833) by 0x4DA8AB: glu::CallLogWrapper::glTexImage2D(unsigned int, int, int, int, int, int, unsigned int, unsigned int, void const*) (in /home/tomeu/deqp-build/modules/gles2/deqp-gles2) Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Cc: 19.1 <mesa-stable@lists.freedesktop.org>	2019-05-10 07:39:39 +02:00
Tomeu Vizoso	554975bafa	panfrost: Fix two uninitialized accesses in compiler Valgrind was complaining of those. NIR_PASS only sets progress to TRUE if there was progress. nir_const_load_to_arr() only sets as many constants as components has the instruction. This was causing some dEQP tests to flip-flop, such as: dEQP-GLES2.functional.fragment_ops.blend.equation_src_func_dst_func.add_src_color_constant_color Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Fixes: `14531d676b` ("nir: make nir_const_value scalar")	2019-05-10 07:37:57 +02:00
Tomeu Vizoso	67b9c196d0	panfrost: ci: Skip running some tests These tests add too much time to the total run time, and some of them even hang the DUTs, even if I haven't been able to reproduce it locally. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-10 07:37:47 +02:00
Tomeu Vizoso	a94cf20051	panfrost: ci: Don't restart Weston There doesn't seem to actually be any noticeably memory leaks on Weston when running dEQP. We do seem to leak quiet a bit in the client, so we still have to run the dEQP runner in batches. This removes the risk of Weston not restarting properly and introducing spurious failures. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-10 07:37:30 +02:00
Tomeu Vizoso	0d0823638f	panfrost: ci: Update list of expected failures This matches the current state of things on both RK3288 and RK3399. Hopefully, from now on we'll only remove stuff from this list. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-10 07:37:23 +02:00
Tomeu Vizoso	8a328c725a	panfrost: ci: Tweak dEQP to improve throughput Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-10 07:37:18 +02:00
Tomeu Vizoso	bbed39bbf2	panfrost: ci: Fix list of tests to run Make sure we have only test case names in the list, excluding names of test groups. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-10 07:37:13 +02:00
Tomeu Vizoso	7842fe3a45	panfrost: ci: Check for incomplete runs To improve robustness, check that we got the expected number of results. Right now we hard-code the expected number of tests run, but with some effort we may be able to infer it. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-10 07:37:05 +02:00
Tomeu Vizoso	8e139250aa	panfrost: ci: Add tests to flip-flop list These tests aren't giving reliable results. Mask them for now. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-10 07:37:00 +02:00
Tomeu Vizoso	dab01348d0	panfrost: ci: Add support for running the tests on RK3288 Build artifacts for armhf and schedule them on a Veyron Chromebook with RK3288. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-10 07:32:29 +02:00
Vasily Khoruzhick	e44a4bae52	lima: fix tile buffer reloading Buffer needs to be reloaded every time unless explicit clear() was called. Fixes rendering issues with wayland compositors. Reviewed-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-05-09 21:45:04 -07:00
Caio Marcelo de Oliveira Filho	f7d53fffa2	anv: Remove special allocation for anv_push_constants The key reason for that mechanism is gone: all the extra optional data that could be in the anv_push_constants was moved elsewhere. At this point, just put anv_push_constants directly in anv_cmd_state (part of anv_cmd_buffer). v2: Remove a NULL check we don't need anymore in anv_cmd_buffer_push_constants(). (Lionel) Fix size we consider for valid push params. (Lionel) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-09 19:01:14 -07:00
Kenneth Graunke	c61862ddfc	iris: Expose PIPE_CAP_DEVICE_RESET_STATUS_QUERY This provides a way for the application to query whether any resets have happened, which lets us expose "robust" contexts. This also enables the KHR_robust_buffer_access_behavior tests.	2019-05-09 16:49:07 -07:00
Kenneth Graunke	343f41781c	iris: Hook up device reset callbacks This mechanism lets the driver inform the state tracker about GPU resets, say for destroying a robust API context and reporting a "device lost" error to the application, making it take action to deal with this.	2019-05-09 16:49:07 -07:00
Kenneth Graunke	c5c12bdd00	iris: Try to recover from GPU hangs. The iris batch module now tries to detect that the kernel has banned our GEM context, creates a new non-banned context, and informs the iris context module that all assumptions about state are now invalid and it needs to reinitialize the relevant state. Based on Chris Wilson's work, but significantly rewritten by me.	2019-05-09 16:49:07 -07:00
Chris Wilson	7402564c07	iris: Add helpers to clone a hardware context. (Chris Wilson wrote this code in a patch titled "i965: Be resilient in the face of GPU hangs"; Ken fixed a bug and copied it to iris.)	2019-05-09 16:49:07 -07:00
Kenneth Graunke	c3701e9070	iris: Mark render batches as non-recoverable. Adapted from Chris Wilson's patch. The comment is largely his. Currently, when iris hangs the GPU, it will continue sending batches which incrementally update the state, assuming it's preserved across batches. However, the kernel's GPU reset support reinitializes the guilty context to the default GPU state (reasonably not wanting to trust the current state). This ends up resetting critical things like STATE_BASE_ADDRESS, causing memory accesses in all subsequent batches to be garbage, and almost certainly result in more hangs until we're banned or we kill the machine. We now ask the kernel to ban our render context immediately, so we notice we've gone off the rails as fast as possible. Eventually, we'll attempt to recover and continue. For now, we just avoid torching the GPU over and over.	2019-05-09 16:49:07 -07:00
Rob Clark	9faf218b8c	freedreno/ir3: fix rasterflat/glxgears Ofc legacy gl features that are broken don't trigger fails in deqp. I should remember to test glxgears more often. Fixes: `7ff6705b8d` freedreno/ir3: convert to "new style" frag inputs Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-09 16:21:05 -07:00
Lionel Landwerlin	f2f6ac1c08	anv: Use corresponding type from the vector allocation We didn't notice this issue much because the 2 struct share a similar layout, expect for the additional fields... We run into that issue in Anv : ==15236== Invalid write of size 8 ==15236== at 0x8CF3939C: anv_state_table_expand_range (anv_allocator.c:211) ==15236== by 0x8CF394D5: anv_state_table_grow (anv_allocator.c:264) ==15236== by 0x8CF3967E: anv_state_table_add (anv_allocator.c:312) ==15236== by 0x8CF3B13C: anv_state_pool_alloc_no_vg (anv_allocator.c:1167) ==15236== by 0x8CF3B2B0: anv_state_pool_alloc (anv_allocator.c:1190) ==15236== by 0x8CF60871: alloc_surface_state (anv_image.c:1122) ==15236== by 0x8CF61FF9: anv_CreateImageView (anv_image.c:1519) ==15236== by 0x8BCBD2ED: vkCreateImageView (trampoline.c:1358) ==15236== Address 0x8994ef10 is 0 bytes after a block of size 128 alloc'd ==15236== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==15236== by 0x8D2578E6: u_vector_init (u_vector.c:47) ==15236== by 0x8CF3929A: anv_state_table_init (anv_allocator.c:168) ==15236== by 0x8CF3A99A: anv_state_pool_init (anv_allocator.c:921) ==15236== by 0x8CF56517: anv_CreateDevice (anv_device.c:1909) ==15236== by 0x8BCB4FBA: terminator_CreateDevice (loader.c:6073) ==15236== by 0x8DD2CB3D: ??? (in /home/djdeath/.steam/ubuntu12_64/libVkLayer_steam_fossilize.so) ==15236== by 0x8DF4D241: vkCreateDevice (in /home/djdeath/.steam/ubuntu12_64/steamoverlayvulkanlayer.so) ==15236== by 0x8BCB35C6: loader_create_device_chain (loader.c:5449) ==15236== by 0x8BCBC230: vkCreateDevice (trampoline.c:838) v2: Rename mmap_cleanups to avoid confusion (Caio) v3: s/fail_mmap_cleanups/fail_cleanups/ (Caio) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110648 Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-09 21:57:26 +01:00
Dylan Baker	79ad8acd01	docs: update calendar, and news item and link release notes for 19.0.4	2019-05-09 13:48:47 -07:00
Dylan Baker	723f74c270	docs: Add SHA256 sums for mesa 19.0.4	2019-05-09 13:46:30 -07:00
Dylan Baker	ce32b71a8c	Docs: add 19.0.4 release notes	2019-05-09 13:46:26 -07:00
Pierre-Eric Pelloux-Prayer	62ed82ea1a	mesa: fix GL_PROGRAM_BINARY_RETRIEVABLE_HINT handling When first implemented in `fefd03e16c` Mesa's behavior was aligned on behavior of Nvidia's driver. This caused a failing test in piglit but was ok since the specification is unclear on this subject. Nvidia's driver behavior has been modified because using version 410.104, the problematic test (program_binary_retrievable_hint) now passes. This commit defers BinaryRetrievableHint update until the next linking so the test passes on Mesa as well. Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-05-09 16:15:20 -04:00
Ian Romanick	1f1007a4ed	nir: Initialize lower_flrp_progress everywhere I don't know why I thought NIR_PASS always set the progress variable. Derp. Fixes: `d41cdef2a5` ("nir: Use the flrp lowering pass instead of nir_opt_algebraic") Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Coverity CID: 1444996 Coverity CID: 1444995 Coverity CID: 1444994 Coverity CID: 1444993 Coverity CID: 1444991 Coverity CID: 1444989	2019-05-09 10:03:51 -07:00
Eric Engestrom	8b3baa2744	gallium: fix typo in comment Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-09 11:14:37 +01:00
Eric Engestrom	86628ed79f	meson: fix a couple typos in comments Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-09 11:14:37 +01:00
Eric Engestrom	6c6af0c8b0	i965_asm: avoid free()ing uninitialized pointers Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-09 10:03:15 +00:00
Eric Engestrom	51597eca84	i965_asm: fix memleak Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-09 10:03:15 +00:00
Samuel Pitoiset	53dfff1c4d	radv: fix setting the number of rectangles when it's dyanmic We need to know the number of rectangles. This fixes new CTS dEQP-VK.draw.discard_rectangles.dynamic_*. Fixes: `5db0bf9994` ("radv: Implement VK_EXT_discard_rectangles.") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-09 11:42:25 +02:00
Chris Wilson	8b81256469	iris: Reorganise execbuf to have a single point of failure Propagate the failure from GEM_EXECBUFFER2, cleanup then report failure if need be. We retain the current behaviour to abort() at the first sign of trouble -- for a non-robustness context, arguably this is the right thing to do as the client cannot recover, and the system state is lost. How to properly integrate with KHR_robustness and reset-strategy is left as a future exercise. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-08 17:21:07 -07:00
Chris Wilson	8b7e19dbc5	drm-uapi: Update i915_drm.h for I915_CONTEXT_PARAM_RECOVERABLE Pull i915_drm.h to include kernel commit ba4fda620a5f7db521aa9e0262cf49854c1b1d9c Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Feb 18 10:58:21 2019 +0000 drm/i915: Optionally disable automatic recovery after a GPU reset for improved resilience in handling GPU hangs.	2019-05-08 17:21:07 -07:00
Dave Airlie	0a42d5b98b	kmsro: add _dri.so to two of the kmsro drivers. Fixes: `8cfc17bdda` (kmsro: Add the rest of the current set of tinydrm drivers.) Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-09 07:15:26 +10:00
Kenneth Graunke	d9b9bb91ff	iris: Report the same video memory settings as i965. This just copy and pastes Ian's code from i965.	2019-05-08 12:43:08 -07:00
Eric Engestrom	5f8d29ab4b	gitlab-ci: add the vulkan overlay layer to the vulkan build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-08 19:51:46 +02:00
Eric Engestrom	c6306125b5	gitlab-ci: add the vulkan overlay layer to the vulkan build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> [ Michel Dänzer: Take changes affecting the docker image from !299, plus remove the unzip package again before generating the image ]	2019-05-08 16:59:02 +00:00
Michel Dänzer	fcf75534ec	gitlab-ci: Don't install WINE packages They were just making the docker image larger for no benefit at this point. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-08 16:59:02 +00:00
Michel Dänzer	82b30094ed	gitlab-ci: Reorder jobs a bit to be generally ordered longer => shorter This makes the longer jobs likely to run earlier, which can help the overall pipeline duration. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-08 16:59:02 +00:00
Michel Dänzer	6897715770	gitlab-ci: Build clover against all supported versions of LLVM And consolidate it all into a single job. It doesn't take much longer than a single version, thanks to ccache. Overall, this single job might be faster or at least use fewer CPU cycles than the two jobs before, while covering thrice as many versions of LLVM. v2: * Move "rm -rf _build" to meson-build.sh. * Set GALLIUM_DRIVERS the same way both times in the meson-clover job, for symmetry. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> # v1	2019-05-08 16:59:02 +00:00
Michel Dänzer	cc2b3a99cc	gitlab-ci: Move meson job script to separate file No functional change intended (except for no longer running meson --version separately, as the version appears early in meson's output anyway). Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-08 16:59:02 +00:00
Michel Dänzer	d0b9a7f0d7	gitlab-ci: Remove superfluous comment about image tag counter suffix We really shouldn't ever need a suffix, otherwise it indicates a failure in coordination. :) In which case, it doesn't really matter how the tag is disambiguated. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-08 16:59:02 +00:00
Dylan Baker	0d59459432	meson: Force the use of config-tool for llvm meson git now has a cmake find method for llvm, but it lacks a couple of features that we use from the config tool version. Until that reaches parity we need to use the config-tool version. CC: 19.0 19.1 <<mesa-stable@lists.freedesktop.org> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-08 09:39:03 -07:00
Brian Paul	a17c1ae165	gallium/util: fix two MSVC compiler warnings Remove stray const qualifier. s/unsigned/enum tgsi_semantic/ Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-05-08 10:05:42 -06:00
Brian Paul	4f54e550e9	gallium/pp: s/uint/enum tgsi_semantic/ to fix MSVC warning Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-05-08 10:05:42 -06:00
Brian Paul	cf5c7beb63	noop: s/enum pipe_transfer_usage/unsigned/ to fix MSVC warning The function pointer declaration in pipe_context uses unsigned for the bitmask. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-05-08 10:05:41 -06:00
Brian Paul	bc517dbbf7	ddebug: fix a few MSVC compiler warnings Don't return an expression in void functions. Replace an unsigned int with proper enum. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-05-08 10:05:41 -06:00
Brian Paul	2e28983ed2	glsl: s/GLboolean/bool/ to silence MSVC compiler warning It complains about mixing GLboolean and bool in the \|= expression. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-05-08 10:05:41 -06:00
Ian Romanick	ed5f024515	nir/flrp: Reassociate add in flrp(±1, b, c) lowering path With this reassociation, this lowering path is still beneficial. Ice Lake total instructions in shared programs: 17220191 -> 17207181 (-0.08%) instructions in affected programs: 999871 -> 986861 (-1.30%) helped: 3703 HURT: 17 helped stats (abs) min: 1 max: 686 x̄: 3.52 x̃: 3 helped stats (rel) min: 0.09% max: 51.97% x̄: 2.21% x̃: 1.35% HURT stats (abs) min: 1 max: 9 x̄: 1.47 x̃: 1 HURT stats (rel) min: 0.08% max: 4.55% x̄: 0.78% x̃: 0.55% 95% mean confidence interval for instructions value: -4.01 -2.99 95% mean confidence interval for instructions %-change: -2.29% -2.11% Instructions are helped. total cycles in shared programs: 360871298 -> 360755040 (-0.03%) cycles in affected programs: 9931334 -> 9815076 (-1.17%) helped: 2388 HURT: 1569 helped stats (abs) min: 1 max: 10228 x̄: 93.54 x̃: 18 helped stats (rel) min: <.01% max: 74.11% x̄: 3.36% x̃: 1.07% HURT stats (abs) min: 1 max: 1917 x̄: 68.27 x̃: 22 HURT stats (rel) min: <.01% max: 44.90% x̄: 3.44% x̃: 1.72% 95% mean confidence interval for cycles value: -39.48 -19.28 95% mean confidence interval for cycles %-change: -0.86% -0.46% Cycles are helped. total spills in shared programs: 12355 -> 12159 (-1.59%) spills in affected programs: 295 -> 99 (-66.44%) helped: 2 HURT: 1 total fills in shared programs: 25398 -> 25207 (-0.75%) fills in affected programs: 288 -> 97 (-66.32%) helped: 2 HURT: 1 LOST: 3 GAINED: 44 Iron Lake total instructions in shared programs: 8169225 -> 8159729 (-0.12%) instructions in affected programs: 1025712 -> 1016216 (-0.93%) helped: 3352 HURT: 0 helped stats (abs) min: 1 max: 6 x̄: 2.83 x̃: 3 helped stats (rel) min: 0.15% max: 12.00% x̄: 1.51% x̃: 1.05% 95% mean confidence interval for instructions value: -2.86 -2.80 95% mean confidence interval for instructions %-change: -1.56% -1.46% Instructions are helped. total cycles in shared programs: 188656796 -> 188612280 (-0.02%) cycles in affected programs: 18633584 -> 18589068 (-0.24%) helped: 3085 HURT: 14 helped stats (abs) min: 2 max: 72 x̄: 14.45 x̃: 12 helped stats (rel) min: 0.02% max: 5.73% x̄: 0.73% x̃: 0.31% HURT stats (abs) min: 2 max: 4 x̄: 3.71 x̃: 4 HURT stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% 95% mean confidence interval for cycles value: -14.55 -14.18 95% mean confidence interval for cycles %-change: -0.76% -0.69% Cycles are helped. GM45 total instructions in shared programs: 5026905 -> 5021856 (-0.10%) instructions in affected programs: 584169 -> 579120 (-0.86%) helped: 1776 HURT: 0 helped stats (abs) min: 1 max: 6 x̄: 2.84 x̃: 3 helped stats (rel) min: 0.15% max: 11.11% x̄: 1.43% x̃: 0.98% 95% mean confidence interval for instructions value: -2.88 -2.80 95% mean confidence interval for instructions %-change: -1.50% -1.37% Instructions are helped. total cycles in shared programs: 129047376 -> 129018918 (-0.02%) cycles in affected programs: 12941924 -> 12913466 (-0.22%) helped: 1722 HURT: 14 helped stats (abs) min: 4 max: 72 x̄: 16.56 x̃: 18 helped stats (rel) min: 0.02% max: 5.73% x̄: 0.72% x̃: 0.30% HURT stats (abs) min: 2 max: 4 x̄: 3.71 x̃: 4 HURT stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% 95% mean confidence interval for cycles value: -16.65 -16.13 95% mean confidence interval for cycles %-change: -0.76% -0.66% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-05-08 07:41:54 -07:00
Ian Romanick	ba203a3cd7	nir/flrp: Fix typo on the flrp(±1, b, c) path After Samuel reported the bisect, I was able to find the bug by inspection. Good thing for well-named varibles. :) Unfortunately, this undoes almost all of the benefit of the original patch. Ice Lake total instructions in shared programs: 17183159 -> 17218166 (0.20%) instructions in affected programs: 1308722 -> 1343729 (2.67%) helped: 98 HURT: 4746 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.47% max: 2.70% x̄: 0.60% x̃: 0.57% HURT stats (abs) min: 1 max: 691 x̄: 7.40 x̃: 8 HURT stats (rel) min: 0.10% max: 700.00% x̄: 5.82% x̃: 2.83% 95% mean confidence interval for instructions value: 6.82 7.64 95% mean confidence interval for instructions %-change: 5.22% 6.15% Instructions are HURT. total cycles in shared programs: 360705959 -> 360853522 (0.04%) cycles in affected programs: 10754380 -> 10901943 (1.37%) helped: 1594 HURT: 3331 helped stats (abs) min: 1 max: 1896 x̄: 119.81 x̃: 60 helped stats (rel) min: <.01% max: 35.48% x̄: 5.06% x̃: 3.64% HURT stats (abs) min: 1 max: 10208 x̄: 101.63 x̃: 38 HURT stats (rel) min: 0.01% max: 878.95% x̄: 9.01% x̃: 2.78% 95% mean confidence interval for cycles value: 21.11 38.81 95% mean confidence interval for cycles %-change: 3.76% 5.15% Cycles are HURT. total spills in shared programs: 12158 -> 12355 (1.62%) spills in affected programs: 98 -> 295 (201.02%) helped: 1 HURT: 2 total fills in shared programs: 25204 -> 25398 (0.77%) fills in affected programs: 94 -> 288 (206.38%) helped: 0 HURT: 3 LOST: 15 GAINED: 8 Iron Lake total instructions in shared programs: 8121430 -> 8166733 (0.56%) instructions in affected programs: 1148353 -> 1193656 (3.95%) helped: 2 HURT: 4046 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 1.85% max: 1.92% x̄: 1.89% x̃: 1.89% HURT stats (abs) min: 1 max: 43 x̄: 11.20 x̃: 11 HURT stats (rel) min: 0.20% max: 716.67% x̄: 7.40% x̃: 3.87% 95% mean confidence interval for instructions value: 11.02 11.37 95% mean confidence interval for instructions %-change: 6.84% 7.94% Instructions are HURT. total cycles in shared programs: 188376326 -> 188601568 (0.12%) cycles in affected programs: 27416674 -> 27641916 (0.82%) helped: 68 HURT: 3947 helped stats (abs) min: 2 max: 222 x̄: 13.88 x̃: 6 helped stats (rel) min: <.01% max: 1.28% x̄: 0.15% x̃: 0.01% HURT stats (abs) min: 2 max: 670 x̄: 57.31 x̃: 64 HURT stats (rel) min: <.01% max: 1811.11% x̄: 4.11% x̃: 1.09% 95% mean confidence interval for cycles value: 55.01 57.20 95% mean confidence interval for cycles %-change: 2.88% 5.19% Cycles are HURT. LOST: 35 GAINED: 3 GM45 total instructions in shared programs: 4979794 -> 5003551 (0.48%) instructions in affected programs: 635174 -> 658931 (3.74%) helped: 1 HURT: 2142 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 1.85% max: 1.85% x̄: 1.85% x̃: 1.85% HURT stats (abs) min: 1 max: 43 x̄: 11.09 x̃: 11 HURT stats (rel) min: 0.20% max: 716.67% x̄: 7.00% x̃: 3.53% 95% mean confidence interval for instructions value: 10.85 11.33 95% mean confidence interval for instructions %-change: 6.25% 7.74% Instructions are HURT. total cycles in shared programs: 128519586 -> 128654990 (0.11%) cycles in affected programs: 17635304 -> 17770708 (0.77%) helped: 46 HURT: 2088 helped stats (abs) min: 4 max: 220 x̄: 18.13 x̃: 6 helped stats (rel) min: <.01% max: 1.28% x̄: 0.15% x̃: 0.01% HURT stats (abs) min: 2 max: 670 x̄: 65.25 x̃: 66 HURT stats (rel) min: <.01% max: 1464.29% x̄: 4.05% x̃: 0.99% 95% mean confidence interval for cycles value: 61.75 65.15 95% mean confidence interval for cycles %-change: 2.58% 5.34% Cycles are HURT. LOST: 38 GAINED: 38 Fixes: `5b908db604` ("nir/flrp: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently") Reported-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-05-08 07:41:26 -07:00
Lionel Landwerlin	43596e5f34	anv: fix use after free Once mem->bo is removed from the cache, it is likely to be freed. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `b80930a6fe` ("anv: add support for VK_EXT_memory_budget") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-08 12:02:13 +01:00
Lionel Landwerlin	a07d06f103	anv: rework queries writes to ensure ordering memory writes We use a mix of MI & PIPE_CONTROL commands to write our queries' data (results & availability). Those commands' memory write order is not guaranteed with regard to their order in the command stream, unless CS stalls are inserted between them. This is problematic for 2 reasons : 1. We copy results from the device using MI commands even though the values are generated from PIPE_CONTROL, meaning we could copy unlanded values into the results and then copy the availability that is inconsistent with the values. 2. We allow the user to poll on the availability values of the query pool from the CPU. If the availability lands in memory before the values then we could return invalid values. This change does 2 things to address this problem : - We use either PIPE_CONTROL or MI commands to write both queries values and availability, so that the ordering of the memory writes guarantees that if availability is visible, results are also visible. - For the occlusion & timestamp queries we apply a CS stall before copying the results on the device, to ensure copying with MI commands see the correct values of previous PIPE_CONTROL writes of availability (required by the Vulkan spec). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reported-by: Iago Toral Quiroga <itoral@igalia.com> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-08 09:49:09 +00:00
Timothy Arceri	e19a8fe033	radv: call constant folding before opt algebraic The pattern of calling opt algebraic first seems to have originated in i965. The order in OpenGL drivers generally doesn't matter because the GLSL IR optimisations do constant folding before opt algebraic. However in Vulkan drivers calling opt algebraic first can result in missed constant folding opportunities. vkpipeline-db results (VEGA64): Totals from affected shaders: SGPRS: 3160 -> 3176 (0.51 %) VGPRS: 3588 -> 3580 (-0.22 %) Spilled SGPRs: 52 -> 44 (-15.38 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 12 -> 12 (0.00 %) dwords per thread Code Size: 261812 -> 261036 (-0.30 %) bytes LDS: 7 -> 7 (0.00 %) blocks Max Waves: 346 -> 348 (0.58 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-08 19:45:01 +10:00
Erik Faye-Lund	ecdab0dfea	docs: drop h1 in header It's generally frowned upon to have more than one H1 per document in HTML4. So let's put the text directly inside the header. This means we can drop the flex-based centering, which makes things a bit easier. We also need to change the padding to rem instead of em, because the em has now changed. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-08 07:18:15 +00:00
Erik Faye-Lund	6e0e550904	docs: harmonize headings and titles We're pretty insonsistent in what the headings and titles are, especially compared to what the articles are listed as in the sidebar. Let's harmonize this. There's a notable exception for meson.html, where the sidebar uses a short-hand form that makes sense in the sidebar, but not in the article due to the visible context being different. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-08 07:18:15 +00:00
Erik Faye-Lund	269474b428	docs: renumber headings It's generally frowned upon to have multiple H1 headings in HTML4. So let's make sure each article has a primary heading for the article, and that that heading is the title that is used in the sidebar. While we're at it, let's update the title in the articles to match the title from the sidebar as well. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-08 07:18:15 +00:00
Erik Faye-Lund	87683ba058	docs: give download-article a primary heading It's generally frowned upon to have multiple H1 headings in HTML4. So let's add a primary heading for the article, and source that from the title used in the sidebar. While we're at it, let's update the title in the article to match the title from the sidebar as well. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-08 07:18:15 +00:00
Erik Faye-Lund	a8df27b0b2	docs: use title-casing for all headings in sidebar We generally use title-casing for headings in the sidebar. But not all headings was constently cased like that. Let's make sure this is consistent. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-08 07:18:15 +00:00
Erik Faye-Lund	f06e698aad	docs: spell out "and" in sidebar There's no need to keep this short, we can just spell out "and" here. Besides, a slash kind of implies "or", but these articles are about both of these, not either. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-08 07:18:15 +00:00
Erik Faye-Lund	7809331cb9	docs: remove pointless list-entry It's quite visible that there's more docs below, we don't need to spell it out for the reader. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-08 07:18:15 +00:00
Erik Faye-Lund	7421fdf68a	docs: spell out faq in sidebar We're not short on space here, so there's little point in abbreviating this. This also matches the heading in the article. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-08 07:18:15 +00:00
Erik Faye-Lund	e4fe83c8a0	docs: spell out "and" in sidebar We're not short on space here, so let's just spell out "and" instead of using the ampersand. This is more consistent with the entry above in the sidebar. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-08 07:18:15 +00:00
Timothy Arceri	4fd8161773	glsl_to_nir: remove unused type_is_int() This was missed in `e00fa99b08`. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-08 14:11:38 +10:00
Timothy Arceri	a01b393c39	Revert "glx: Fix synthetic error generation in __glXSendError" This reverts commit `e91ee763c3`. This seems to have broken a number of wine games. Lets revert everything for now and try again later. Acked-by: Adam Jackson <ajax@redhat.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110632 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110590	2019-05-08 13:16:44 +10:00
Timothy Arceri	024232b26c	radeonsi: add an AMD_TEX_ANISO environment variable This brings it inline with the recently added AMD_DEBUG. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109619	2019-05-08 09:32:25 +10:00
Kenneth Graunke	d568fcd0a0	i965: leave the top 4Gb of the high heap VMA unused This ports commit `9e7b0988d6` from anv to i965. Thanks to Lionel for noticing that it was missing! Fixes: `01058a5522` i965: Add virtual memory allocator infrastructure to brw_bufmgr. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-07 15:45:56 -07:00
Kenneth Graunke	17210c63a9	i965: Force VMA alignment to be a multiple of the page size. This should happen regardless, but let's be paranoid. Fixes: `01058a5522` i965: Add virtual memory allocator infrastructure to brw_bufmgr. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-07 15:45:56 -07:00
Kenneth Graunke	15f134c628	i965: Fix BRW_MEMZONE_LOW_4G heap size. The STATE_BASE_ADDRESS "Size" fields can only hold 0xfffff in pages, and 0xfffff * 4096 = 4294963200, which is 1 page shy of 4GB. So we can't use the top page. Fixes: `01058a5522` i965: Add virtual memory allocator infrastructure to brw_bufmgr. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-07 15:45:56 -07:00
Matt Turner	e8c74a1e16	intel/compiler: Unset flag reg when FB write is not predicated In the FS IR we pretend that the instruction is predicated with (+f0.1) just for flag dependency tracking purposes. Since the instruction doesn't support predication before Haswell, we unset the predicate so we should also unset the flag register so that we can round-trip the disassembly. Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-07 14:33:48 -07:00
Sagar Ghuge	5d7a9e0811	intel/disasm: Disassemble immediate value properly for dim On haswell, for dim instruction we encode immediate float value operand into double float, v2: Fix comment (Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-07 14:33:48 -07:00
Sagar Ghuge	6c83a68ebc	intel/disasm: Disassemble JIP offset for while Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-07 14:33:48 -07:00
Sagar Ghuge	9db616e8a2	intel/compiler: Replicate 16 bit immediate value correctly For the W or UW (signed or unsigned word) source types, the 16-bit value must be replicated in both the low and high words of the 32-bit immediate value. v2: Fix replication in other places as well V3: fix a few nits (Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-07 14:33:48 -07:00
Sagar Ghuge	5211159b5b	intel/compiler: Print quad value in hex format Print quad value same as unsigned quad so that we can distinguish in between quater control disassembled values for e.g 1/2/3[Q] and immediate quad value for e.g 1Q. This allows round-tripping through the assembler/disassembler. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-07 14:33:48 -07:00
Sagar Ghuge	4e828bb48a	intel/tools: Add unit tests for assembler v1: Pass executable object from meson to test(Dylan Baker) v2: Ignore generated output files from git status(Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-07 14:33:48 -07:00
Mika Kuoppala	1fb5ce0a11	intel/tools: Initialize offset correctly for i965_asm If we leave offset uninitialized, access to store will be random depending on stack value and can segfault. Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-07 14:33:48 -07:00
Mika Kuoppala	85da1194ec	intel/tools: Add meson pthread dependancy for i965_asm Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-07 14:33:48 -07:00
Sagar Ghuge	70308a5a8a	intel/tools: New i965 instruction assembler tool Tool is inspired from igt's assembler tool. Thanks to Matt Turner, who mentored me through out this project. v2: Fix memory leaks and naming convention (Caio) v3: Fix meson changes (Dylan Baker) v4: Fix usage options (Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/141	2019-05-07 14:33:38 -07:00
Kenneth Graunke	a232aa5c50	iris: Also handle res->offset for buffer sampler/image views	2019-05-07 13:36:18 -07:00
Mike Blumenkrantz	ddd716e746	iris: support dmabuf imports with offsets this adds support for imports where the image data begins at an offset from the start of the buffer, as used in h/x264 fixes kwg/mesa#47 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-07 13:36:08 -07:00
Roland Scheidegger	748f603390	gallivm: fix broken 8-wide s3tc decoding Brian noticed there was an uninitialized var for the 8-wide case and 128 bit blocks, which made it always crash. Likewise, the 64bit block case had another crash bug due to type mismatch. Color decode (used for all s3tc formats) also had a bogus shuffle for this case, leading to decode artifacts. Fix these all up, which makes the code actually work 8-wide. Note that it's still not used - I've verified it works, and the generated assembly does look quite a bit simpler actually (20-30% less instructions for the s3tc decode part with avx2), however in practice it still seems to be sligthly slower for some unknown reason (tested with openarena) on my haswell box, so for now continue to split things into 4-wide vectors before decoding. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2019-05-07 18:59:38 +02:00
Juan A. Suarez Romero	92dba1c66e	docs: Add relnotes stub for 19.2 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2019-05-07 16:07:29 +00:00
Juan A. Suarez Romero	14a7959cfa	Bump version for 19.1 branch Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2019-05-07 16:02:34 +00:00
Vasily Khoruzhick	6b46399e2f	lima: enable sin and cos lowering for GP GP doesn't support sin/cos natively, so we have to lower them. Reviewed-by: Qiang Yu <yuq825@gmail.com> Tested-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-05-07 15:25:21 +00:00
Vasily Khoruzhick	e67e4e90b2	nir: implement lowering for fsin and fcos Lower sin and cos using Nick's fast sin/cos approximation from https://web.archive.org/web/20180105155939/http://forum.devmaster.net/t/fast-and-accurate-sine-cosine/9648 It's suitable for GLES2, but it throws warnings in dEQP GLES3 precision tests. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com> Tested-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-05-07 15:25:21 +00:00
Rob Clark	b15c46e6bf	freedreno/ir3: move const_state to ir3_shader For a6xx, we construct/emit a single VS const state used for both binning pass and draw pass. So far we were mostly getting lucky that there were not (obvious) mismatches between the const_state (like different lowered immediates) between the binning and draw pass VS ir3_shader_variant. And I guess this situation will come up more as GS and tess is added into the equation. Since really everything about the const state is not specific to the variant, move this. The main exception is lowered immediates, but these are the last to appear in the layout, and it doesn't hurt for each new shader variant to just append any immed's it lowers to the end of the immediate state. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-07 07:26:00 -07:00
Rob Clark	5690f83bb5	freedreno/ir3: split out const_state setup Next patch moves const_state to ir3_shader, before the compile context is created. So move the code around in prep to call it earlier. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-07 07:26:00 -07:00
Rob Clark	9403184ddd	freedreno/ir3: move immediates to const_state They are really part of the constant state, and it will moving things from ir3_shader_variant to ir3_shader if we combine them. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-07 07:26:00 -07:00
Rob Clark	23e7a34466	freedreno/ir3: consolidate const state Combine the offsets of differenet parts of the constant space with (what was formerly known as) ir3_driver_const_layout. Bunch of churn, but no functional change. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-07 07:26:00 -07:00
Rob Clark	ef3eecd66b	freedreno/ir3: move ir3_pointer_size() Move to ir3_compiler so it doesn't depend on the compile context. Prep work for moving constant state from variant (where we have compile context) to shader (where we do not). Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-07 07:26:00 -07:00
Lionel Landwerlin	2d2927938f	vulkan/overlay-layer: fix cast errors Not quite sure what version of GCC/Clang produces errors (8.3.0 locally was fine). v2: also fix an integer literal issue (Karol) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v1) Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-07 10:45:45 +01:00
Samuel Iglesias Gonsálvez	bc66cebc0d	anv: fix alphaToCoverage when there is no color attachment There are tests in CTS for alpha to coverage without a color attachment that are failing. This happens because we remove the shader color outputs when we don't have a valid color attachment for them, but when alpha to coverage is enabled we still want to preserve the the output at location 0 since we need the alpha component. In that case we will also need to create a null render target for RT 0. v2: - We already create a null rt when we don't have any, so reuse that for this case (Jason) - Simplify the code a bit (Iago) v3: - Take alpha to coverage from the key and don't tie this to depth-only rendering only, we want the same behavior if we have multiple render targets but the one at location 0 is not used. (Jason). - Rewrite commit message (Iago) v4: - Make sure we take into account the array length of the shader outputs, which we were no handling correctly either and make sure we also create null render targets for any invalid array entries too. v5: - Simplify removal of unused outputs by using rt_used[] so we don't have to special case alpha to coverage there too. Fixes the following CTS tests: dEQP-VK.pipeline.multisample.alpha_to_coverage_no_color_attachment.* Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Signed-off-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-07 09:35:47 +02:00
Ian Romanick	c866500525	intel/compiler: Don't always require precise lowering of flrp No changes on any other Intel platforms. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8164367 -> 8135551 (-0.35%) instructions in affected programs: 3271235 -> 3242419 (-0.88%) helped: 13636 HURT: 90 helped stats (abs) min: 1 max: 30 x̄: 2.13 x̃: 1 helped stats (rel) min: 0.04% max: 10.77% x̄: 1.16% x̃: 0.97% HURT stats (abs) min: 1 max: 4 x̄: 1.80 x̃: 2 HURT stats (rel) min: 0.26% max: 11.11% x̄: 1.76% x̃: 0.78% 95% mean confidence interval for instructions value: -2.13 -2.07 95% mean confidence interval for instructions %-change: -1.16% -1.13% Instructions are helped. total cycles in shared programs: 188719974 -> 188586222 (-0.07%) cycles in affected programs: 70415766 -> 70282014 (-0.19%) helped: 12563 HURT: 515 helped stats (abs) min: 2 max: 600 x̄: 10.90 x̃: 6 helped stats (rel) min: <.01% max: 5.48% x̄: 0.48% x̃: 0.27% HURT stats (abs) min: 2 max: 54 x̄: 6.07 x̃: 4 HURT stats (rel) min: 0.01% max: 4.48% x̄: 0.24% x̃: 0.08% 95% mean confidence interval for cycles value: -10.56 -9.90 95% mean confidence interval for cycles %-change: -0.47% -0.45% Cycles are helped. LOST: 0 GAINED: 13 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	ab86926156	nir/algebraic: Reassociate open-coded flrp(1, b, c) In a previous verion of this patch, Jason commented, "Re-associating based on whether or not something has a constant value of 1.0 seems a bit sneaky. I think it's well within the rules but it seems like something that could bite you." That is possibly true. The reassociation will generate different results if fabs(b) >= 2**24 and fabs(c) < 0.5. The delta increases as fabs(c) approaches 0. However, i965 has done this same reassociation indirectly for years. We would previously allow nir_op_flrp on all pre-Gen11 hardware even though Gen4 and Gen5 do not have a LRP instruction. Optimizations in nir_opt_algebraic would convert expressions like a+c(b-a) into flrp(a, b, c). On Gen7+, the hardware performs the same arithmetic as a(1-c)+bc. Gen6 seems to implement LRP as a+c(b-a). On Gen4 and Gen5, we would lower LRP to a sequence of instructions that implement a(1-c)+bc. The lowering happens after all constant folding, so we would litterally generate a 1+(-1) instruction sequence in this scenario: one instruction to load either 1 or -1 in a register, and another instruction to add either -1 or 1 to it. This patch just cuts out the middle man. Do the reassociation that we've always done, but do it explicitly at a time when we can benefit from other optimizations. A few cases that were hurt by "nir: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently" are restored by this patch. This includes a few shaders in ET:QW. I tried a similar thing for open-coded flrp(-1, b, c), and it hurt instructions on 35 shaders for ILK without helping any. The helped / hurt cycles was about even. No changes on any other Intel platforms. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8172020 -> 8164367 (-0.09%) instructions in affected programs: 1089851 -> 1082198 (-0.70%) helped: 3285 HURT: 64 helped stats (abs) min: 1 max: 6 x̄: 2.35 x̃: 2 helped stats (rel) min: 0.13% max: 12.00% x̄: 1.15% x̃: 0.83% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.24% max: 0.64% x̄: 0.39% x̃: 0.38% 95% mean confidence interval for instructions value: -2.32 -2.25 95% mean confidence interval for instructions %-change: -1.16% -1.09% Instructions are helped. total cycles in shared programs: 188758338 -> 188719974 (-0.02%) cycles in affected programs: 20004922 -> 19966558 (-0.19%) helped: 3012 HURT: 477 helped stats (abs) min: 2 max: 142 x̄: 13.41 x̃: 12 helped stats (rel) min: 0.01% max: 6.37% x̄: 0.52% x̃: 0.24% HURT stats (abs) min: 2 max: 328 x̄: 4.27 x̃: 4 HURT stats (rel) min: <.01% max: 1.55% x̄: 0.14% x̃: 0.11% 95% mean confidence interval for cycles value: -11.38 -10.62 95% mean confidence interval for cycles %-change: -0.46% -0.41% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	c995d1ca3a	nir/flrp: Lower flrp(a, b, #c) differently This doesn't help on Intel GPUs now because we always take the "always_precise" path first. It may help on other GPUs, and it does prevent a bunch of regressions in "intel/compiler: Don't always require precise lowering of flrp". Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	ae02622d8f	nir/flrp: Lower flrp(a, b, c) differently if another flrp(_, b, c) exists There is little effect on Intel GPUs now because we almost always take the "always_precise" path first. It may help on other GPUs, and it does prevent a bunch of regressions in "intel/compiler: Don't always require precise lowering of flrp". No changes on any other Intel platforms. GM45 and Iron Lake had similar results. (Iron Lake shown) total cycles in shared programs: 188852500 -> 188852484 (<.01%) cycles in affected programs: 14612 -> 14596 (-0.11%) helped: 4 HURT: 0 helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 helped stats (rel) min: 0.09% max: 0.13% x̄: 0.11% x̃: 0.11% 95% mean confidence interval for cycles value: -4.00 -4.00 95% mean confidence interval for cycles %-change: -0.13% -0.09% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	6698d861a5	nir/flrp: Lower flrp(a, b, c) differently if another flrp(a, _, c) exists This doesn't help on Intel GPUs now because we always take the "always_precise" path first. It may help on other GPUs, and it does prevent a bunch of regressions in "intel/compiler: Don't always require precise lowering of flrp". No changes on any Intel platform. Before a number of large rebases this helped cycles in a couple shaders on Iron Lake and GM45. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	5b908db604	nir/flrp: Lower flrp(±1, b, c) and flrp(a, ±1, c) differently No changes on any other Intel platforms. v2: Rebase on 424372e5dd5 ("nir: Use the flrp lowering pass instead of nir_opt_algebraic") Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8189888 -> 8153912 (-0.44%) instructions in affected programs: 1199037 -> 1163061 (-3.00%) helped: 4124 HURT: 10 helped stats (abs) min: 1 max: 40 x̄: 8.73 x̃: 9 helped stats (rel) min: 0.20% max: 86.96% x̄: 4.96% x̃: 3.02% HURT stats (abs) min: 1 max: 2 x̄: 1.20 x̃: 1 HURT stats (rel) min: 1.06% max: 3.92% x̄: 1.62% x̃: 1.06% 95% mean confidence interval for instructions value: -8.84 -8.56 95% mean confidence interval for instructions %-change: -5.12% -4.77% Instructions are helped. total cycles in shared programs: 188606710 -> 188426964 (-0.10%) cycles in affected programs: 27505596 -> 27325850 (-0.65%) helped: 4026 HURT: 77 helped stats (abs) min: 2 max: 646 x̄: 44.99 x̃: 46 helped stats (rel) min: <.01% max: 94.58% x̄: 2.35% x̃: 0.85% HURT stats (abs) min: 2 max: 376 x̄: 17.79 x̃: 6 HURT stats (rel) min: <.01% max: 2.60% x̄: 0.22% x̃: 0.04% 95% mean confidence interval for cycles value: -44.75 -42.87 95% mean confidence interval for cycles %-change: -2.44% -2.17% Cycles are helped. LOST: 3 GAINED: 35 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	23c5501b77	nir/flrp: Lower flrp(#a, #b, c) differently If the magnitudes of #a and #b are such that (b-a) won't lose too much precision, lower as a+c(b-a). No changes on any other Intel platforms. v2: Rebase on 424372e5dd5 ("nir: Use the flrp lowering pass instead of nir_opt_algebraic") Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8192503 -> 8192383 (<.01%) instructions in affected programs: 18417 -> 18297 (-0.65%) helped: 68 HURT: 0 helped stats (abs) min: 1 max: 18 x̄: 1.76 x̃: 1 helped stats (rel) min: 0.19% max: 7.89% x̄: 1.10% x̃: 0.43% 95% mean confidence interval for instructions value: -2.48 -1.05 95% mean confidence interval for instructions %-change: -1.56% -0.63% Instructions are helped. total cycles in shared programs: 188662536 -> 188661956 (<.01%) cycles in affected programs: 744476 -> 743896 (-0.08%) helped: 62 HURT: 0 helped stats (abs) min: 4 max: 60 x̄: 9.35 x̃: 6 helped stats (rel) min: 0.02% max: 4.84% x̄: 0.27% x̃: 0.06% 95% mean confidence interval for cycles value: -12.37 -6.34 95% mean confidence interval for cycles %-change: -0.48% -0.06% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	dd7135d55d	intel/compiler: Use the flrp lowering pass for all stages on Gen4 and Gen5 Previously lower_flrp32 was only set for vertex shaders. Fragment shaders performed a(1-c)+bc lowering during code generation. The shaders with loops hurt are SIMD8 and SIMD16 shaders for a text-identical fragment shader. v2: Rebase on `26391cceaa` ("intel/compiler: Lower ffma on Gen4 and Gen5"). v3: Rebase on `a004e95dd7` ("radeonsi/nir: create si_nir_opts() helper") Iron Lake total instructions in shared programs: 8211385 -> 8185974 (-0.31%) instructions in affected programs: 2503898 -> 2478487 (-1.01%) helped: 9936 HURT: 921 helped stats (abs) min: 1 max: 155 x̄: 2.86 x̃: 2 helped stats (rel) min: 0.10% max: 35.48% x̄: 1.67% x̃: 1.11% HURT stats (abs) min: 1 max: 12 x̄: 3.24 x̃: 2 HURT stats (rel) min: 0.21% max: 13.64% x̄: 1.86% x̃: 0.89% 95% mean confidence interval for instructions value: -2.43 -2.25 95% mean confidence interval for instructions %-change: -1.41% -1.33% Instructions are helped. total cycles in shared programs: 188523186 -> 188401198 (-0.06%) cycles in affected programs: 71541604 -> 71419616 (-0.17%) helped: 11649 HURT: 1871 helped stats (abs) min: 2 max: 930 x̄: 12.62 x̃: 6 helped stats (rel) min: <.01% max: 44.61% x̄: 0.68% x̃: 0.25% HURT stats (abs) min: 2 max: 138 x̄: 13.38 x̃: 8 HURT stats (rel) min: <.01% max: 10.99% x̄: 0.49% x̃: 0.17% 95% mean confidence interval for cycles value: -9.42 -8.63 95% mean confidence interval for cycles %-change: -0.54% -0.50% Cycles are helped. total loops in shared programs: 852 -> 856 (0.47%) loops in affected programs: 0 -> 4 helped: 0 HURT: 4 HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.00% max: 0.00% x̄: 0.00% x̃: 0.00% 95% mean confidence interval for loops value: 1.00 1.00 95% mean confidence interval for loops %-change: 0.00% 0.00% Loops are HURT. LOST: 3 GAINED: 12 GM45 total instructions in shared programs: 5046407 -> 5033694 (-0.25%) instructions in affected programs: 1303584 -> 1290871 (-0.98%) helped: 5010 HURT: 464 helped stats (abs) min: 1 max: 155 x̄: 2.85 x̃: 2 helped stats (rel) min: 0.10% max: 34.38% x̄: 1.63% x̃: 1.08% HURT stats (abs) min: 1 max: 75 x̄: 3.39 x̃: 2 HURT stats (rel) min: 0.20% max: 13.04% x̄: 1.84% x̃: 0.87% 95% mean confidence interval for instructions value: -2.45 -2.20 95% mean confidence interval for instructions %-change: -1.40% -1.28% Instructions are helped. total cycles in shared programs: 128889476 -> 128812366 (-0.06%) cycles in affected programs: 44845402 -> 44768292 (-0.17%) helped: 6079 HURT: 940 helped stats (abs) min: 2 max: 930 x̄: 15.16 x̃: 8 helped stats (rel) min: <.01% max: 41.03% x̄: 0.71% x̃: 0.25% HURT stats (abs) min: 2 max: 138 x̄: 16.01 x̃: 8 HURT stats (rel) min: <.01% max: 10.99% x̄: 0.50% x̃: 0.17% 95% mean confidence interval for cycles value: -11.63 -10.34 95% mean confidence interval for cycles %-change: -0.58% -0.52% Cycles are helped. total loops in shared programs: 633 -> 635 (0.32%) loops in affected programs: 0 -> 2 helped: 0 HURT: 2 total spills in shared programs: 60 -> 69 (15.00%) spills in affected programs: 54 -> 63 (16.67%) helped: 0 HURT: 1 total fills in shared programs: 92 -> 105 (14.13%) fills in affected programs: 80 -> 93 (16.25%) helped: 0 HURT: 1 LOST: 15 GAINED: 15 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v2] Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]	2019-05-06 22:52:29 -07:00
Ian Romanick	d41cdef2a5	nir: Use the flrp lowering pass instead of nir_opt_algebraic I tried to be very careful while updating all the various drivers, but I don't have any of that hardware for testing. :( i965 is the only platform that sets always_precise = true, and it is only set true for fragment shaders. Gen4 and Gen5 both set lower_flrp32 only for vertex shaders. For fragment shaders, nir_op_flrp is lowered during code generation as a(1-c)+bc. On all other platforms 64-bit nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old nir_opt_algebraic method. No changes on any other Intel platforms. v2: Add panfrost changes. Iron Lake and GM45 had similar results. (Iron Lake shown) total cycles in shared programs: 188647754 -> 188647748 (<.01%) cycles in affected programs: 5096 -> 5090 (-0.12%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:29 -07:00
Ian Romanick	158370ed2a	nir/flrp: Add new lowering pass for flrp instructions This pass will soon grow to include some optimizations that are difficult or impossible to implement correctly within nir_opt_algebraic. It also include the ability to generate strictly correct code which the current nir_opt_algebraic lowering lacks (though that could be changed). v2: Document the parameters to nir_lower_flrp. Rebase on top of `3766334923` ("compiler/nir: add lowering for 16-bit flrp") Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:28 -07:00
Ian Romanick	dc566a033c	nir/algebraic: Pull common multiplication out of flrp arguments All Intel platforms had similar results. (Skylake shown) total instructions in shared programs: 15342485 -> 15337495 (-0.03%) instructions in affected programs: 217456 -> 212466 (-2.29%) helped: 1539 HURT: 1 helped stats (abs) min: 1 max: 17 x̄: 3.24 x̃: 3 helped stats (rel) min: 0.22% max: 18.75% x̄: 3.10% x̃: 1.91% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.56% max: 0.56% x̄: 0.56% x̃: 0.56% 95% mean confidence interval for instructions value: -3.39 -3.09 95% mean confidence interval for instructions %-change: -3.24% -2.96% Instructions are helped. total cycles in shared programs: 355734320 -> 355728237 (<.01%) cycles in affected programs: 1851555 -> 1845472 (-0.33%) helped: 835 HURT: 575 helped stats (abs) min: 1 max: 658 x̄: 40.62 x̃: 14 helped stats (rel) min: <.01% max: 35.69% x̄: 3.78% x̃: 1.81% HURT stats (abs) min: 1 max: 322 x̄: 48.40 x̃: 14 HURT stats (rel) min: 0.04% max: 71.02% x̄: 8.06% x̃: 2.43% 95% mean confidence interval for cycles value: -8.50 -0.13 95% mean confidence interval for cycles %-change: 0.48% 1.62% Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree). Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:28 -07:00
Ian Romanick	a83a6e9690	nir/algebraic: Pull common addition out of flrp arguments v2: Augment the late optimization patterns with a couple pre-ffma pass patterns. All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15342982 -> 15342485 (<.01%) instructions in affected programs: 56304 -> 55807 (-0.88%) helped: 235 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.11 x̃: 1 helped stats (rel) min: 0.11% max: 8.82% x̄: 1.27% x̃: 0.74% 95% mean confidence interval for instructions value: -2.31 -1.92 95% mean confidence interval for instructions %-change: -1.46% -1.09% Instructions are helped. total cycles in shared programs: 355734740 -> 355734320 (<.01%) cycles in affected programs: 1028807 -> 1028387 (-0.04%) helped: 134 HURT: 104 helped stats (abs) min: 1 max: 212 x̄: 25.69 x̃: 8 helped stats (rel) min: <.01% max: 9.36% x̄: 1.33% x̃: 0.61% HURT stats (abs) min: 1 max: 203 x̄: 29.06 x̃: 8 HURT stats (rel) min: 0.02% max: 15.76% x̄: 1.76% x̃: 0.46% 95% mean confidence interval for cycles value: -8.51 4.98 95% mean confidence interval for cycles %-change: -0.35% 0.39% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total instructions in shared programs: 10886815 -> 10886390 (<.01%) instructions in affected programs: 36883 -> 36458 (-1.15%) helped: 147 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 2.89 x̃: 3 helped stats (rel) min: 0.35% max: 8.00% x̄: 1.60% x̃: 1.23% 95% mean confidence interval for instructions value: -3.12 -2.67 95% mean confidence interval for instructions %-change: -1.83% -1.38% Instructions are helped. total cycles in shared programs: 154188360 -> 154186902 (<.01%) cycles in affected programs: 388094 -> 386636 (-0.38%) helped: 90 HURT: 58 helped stats (abs) min: 1 max: 243 x̄: 36.80 x̃: 15 helped stats (rel) min: 0.04% max: 9.23% x̄: 1.26% x̃: 0.83% HURT stats (abs) min: 1 max: 684 x̄: 31.97 x̃: 10 HURT stats (rel) min: 0.03% max: 13.50% x̄: 1.15% x̃: 0.51% 95% mean confidence interval for cycles value: -22.62 2.92 95% mean confidence interval for cycles %-change: -0.68% 0.05% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8221239 -> 8220357 (-0.01%) instructions in affected programs: 54560 -> 53678 (-1.62%) helped: 186 HURT: 0 helped stats (abs) min: 1 max: 14 x̄: 4.74 x̃: 3 helped stats (rel) min: 0.34% max: 10.77% x̄: 1.97% x̃: 1.17% 95% mean confidence interval for instructions value: -5.21 -4.28 95% mean confidence interval for instructions %-change: -2.23% -1.72% Instructions are helped. total cycles in shared programs: 188654442 -> 188650364 (<.01%) cycles in affected programs: 1454384 -> 1450306 (-0.28%) helped: 204 HURT: 0 helped stats (abs) min: 2 max: 84 x̄: 19.99 x̃: 18 helped stats (rel) min: 0.02% max: 4.69% x̄: 0.56% x̃: 0.22% 95% mean confidence interval for cycles value: -22.38 -17.60 95% mean confidence interval for cycles %-change: -0.67% -0.46% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-05-06 22:52:28 -07:00
Christian Gmeiner	e00fa99b08	glsl_to_nir: drop supports_ints At initial nir level all drivers are supporting ints. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-07 07:35:59 +02:00
Christian Gmeiner	4e110eca42	nir: nir_shader_compiler_options: drop native_integers Driver which do not support native integers should use a lowering pass to go from integers to floats. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-07 07:35:52 +02:00
Alyssa Rosenzweig	050b934a24	panfrost: Refactor blend descriptors This commit does a fairly large cleanup of blend descriptors, although there should not be any functional changes. In particular, we split apart the Midgard and Bifrost blend descriptors, since they are radically different. From there, we can identify that the Midgard descriptor as previously written was really two render targets' descriptors stuck together. From this observation, we split the Midgard descriptor into what a single RT actually needs. This enables us to correctly dump blending configuration for MRT samples on Midgard. It also allows the Midgard and Bifrost blend code to peacefully coexist, with runtime selection rather than a #ifdef. So, as a bonus, this will help the future Bifrost effort, eliminating one major source of compile-time architectural divergence. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-07 03:21:08 +00:00
Vasily Khoruzhick	d4a249aa09	lima/gpir: enable lowering for ftrunc Reviewed-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-05-07 01:07:27 +00:00
Vasily Khoruzhick	f4659bea7c	lima/gpir: implement nir_op_fmov Reviewed-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-05-07 01:07:27 +00:00
Vasily Khoruzhick	cf1ab4b96b	lima: use int_to_float lowering pass Neither GP nor PP in Mali4x0 support integers, so utilize new pass and set native_integers to true for now until this flag is dropped. Reviewed-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-05-07 01:07:27 +00:00
Vasily Khoruzhick	443c5a3cd6	nir: add int_to_float lowering pass This new pass lowers ints and bools to floats. It allows hardware that doesn't have native integers (e.g. Mali4x0) use the same code paths as modern hardware. It uses newly introduced pass to gather SSA types and should be used as late as possible. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-05-07 01:07:27 +00:00
Timothy Arceri	49025292fb	radeonsi: add config entry for Counter-Strike Global Offensive This fixes rendering issues with gun scopes which is rather important. Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100239	2019-05-07 09:42:09 +10:00
Vasily Khoruzhick	d085920b64	lima/gpir: fix float uniform alignment issue If PIPE_CAP_PACKED_UNIFORMS is not set uniforms are vec4 aligned, so lima_nir_lower_uniform_to_scalar should use first channel of vec4 for float uniforms. Reviewed-by: Qiang Yu <yuq825@gmail.com> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-05-06 14:08:09 -07:00
Erik Faye-Lund	d84b85bc28	draw: flush when setting stream-out targets We need to re-prepare the middle-end state to pick up changes to this state to react correctly to pausing/resuming stream-out. So let's add a flush here. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Fixes: `ec8cbd79ac` "draw/softpipe: EXT_transform_feedback support (v2)" Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-05-06 22:42:37 +02:00
Erik Faye-Lund	ed53e61bec	llvmpipe: pass stream-out targets to draw-module early We currently set this state in the draw-module twice on each draw, but which trashes this state. So far that's not a problem, because we don't really do much from that function. But it turns out, we're going to have to do more; namely flush when the state changes. This will incur a large performance penalty due to the excessive setting. Instead, let's rely on the CSO caching making sure that llvmpipe_set_so_targets doesn't get called needlessly, and setup the state directly there instead. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-05-06 22:42:37 +02:00
Uros Bizjak	fc7649c4b7	doc: Update GL_KHR_robustness in features.txt for r600 glxinfo for Cypress XT [Radeon HD 5870] lists GL_KHR_robustness as supported extension. This was the last missing extension for GL 4.5, so Mark GL 4.5 as all DONE for r600. Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-05-07 06:21:48 +10:00
Chia-I Wu	c7078397ca	virgl: do not use inline writes for subdata Inline writes skip transfer map/unamp at the cost of an extra copy on the data during execbuffer. That is generally a win for small transfers. But the heuristic to use inline writes based on buffer sizes rather than transfer sizes makes little sense. More importantly, inline writes miss optimizations that are done for buffer transfers. Let's just use transfers. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-By: Gert Wollny <gert.wollny@collabora.com>	2019-05-06 10:31:56 -07:00
Chia-I Wu	898be8036d	virgl: rework queries virglrender has been changed such that - VIRGL_CCMD_GET_QUERY_RESULT is fenced - query buffers (PIPE_BIND_CUSTOM) are coherent We can check if a query is ready using DRM_IOCTL_VIRTGPU_WAIT, and also avoid a synchronized transfer to retrieve the query result. When running against an older virglrenderer, it falls back to the old behavior automatically. TF2 @ 640x480 for pts4.dem went from 17fps to 40fps on my testing machine. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-05-06 10:20:40 -07:00
Chia-I Wu	b4da53b0c3	virgl: export resource_is_busy from winsys Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-05-06 10:20:38 -07:00
Samuel Pitoiset	c10808441c	radv: fix rowPitch for R32G32B32 formats on GFX9 The pitch is actually the number of components per row. We found the problem when we implemented some meta operations for these formats and the wrong pitch has been confirmed with a small test case. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108325 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-06 19:07:44 +02:00
Kenneth Graunke	a032a9665f	iris: Enable PIPE_CAP_SURFACE_REINTERPRET_BLOCKS This makes CompressedTexSubImage from a PBO source do proper GPU rendering to upload instead of stalling to map the PBO source on the CPU (then copying it on the CPU). Thanks Bas Nieuwenhuizen for pointing out that Vulkan includes this functionality, and to Jason Ekstrand for writing the code I adapted. Vulkan only supports a single layer, however, and this code tries to support multiple layers as long as it's miplevel 0. Improves performance in Sid Meier's Civilization VI: Average frame time (ms): -3.67423% +/- 1.46201% (n=5) 99th percentile frame time (ms): -5.09910% +/- 3.87874% (n=5)	2019-05-06 09:50:32 -07:00
Bas Nieuwenhuizen	8139efbbbd	radv: Use given stride for images imported from Android. Handled similarly as radeonsi. I checked the offsets are actually used. Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-05-06 15:36:39 +00:00
Erico Nunes	11602ccd5d	lima/ppir: abort compilation in case of unsupported intrinsic Currently ppir continues compilation when there is an unsupported intrinsic, resulting in a shader that will surely not work as intended. This is a problem during piglit runs as some tests don't compile properly due to this but actually still get submitted to the gpu and leave the system in an unstable state after executing, causing further tests to fail. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-05-06 17:15:27 +02:00
Erico Nunes	60a128fe81	lima/ir: print names of unsupported intrinsics While lima still doesn't support some kinds of intrinsics, it is more helpful to display the name of the unsupported instr->intrinsic to make debugging easier. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-05-06 17:15:06 +02:00
John Stultz	c7f2145b4b	mesa: Makefile.sources: Add nir_lower_fb_read.c to Makefile.sources list In commit `a99c360a46` (nir: add pass to lower fb reads), a new file was added that needs to also be added to the Makefile.sources list used by the Android and SCons build system. Cc: Rob Clark <robdclark@chromium.org> Cc: Emil Velikov <emil.l.velikov@gmail.com> Cc: Amit Pundir <amit.pundir@linaro.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Alistair Strachan <astrachan@google.com> Cc: Greg Hartman <ghartman@google.com> Cc: Tapani Pälli <tapani.palli@intel.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Fixes: `a99c360a46` ("nir: add pass to lower fb reads") Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: John Stultz <john.stultz@linaro.org>	2019-05-06 11:29:26 +00:00
John Stultz	d04f44a459	mesa: Makefile.sources: Add ir3_nir_lower_load_barycentric_at_sample/offset to Makefile.sources In commit `2f0b9d2249` ("freedreno/ir3: lower load_barycentric_at_offset") a new file was added that needs to also be added to the Makefile.sources list used by Android and SCons build system. Cc: Rob Clark <robdclark@chromium.org> Cc: Emil Velikov <emil.l.velikov@gmail.com> Cc: Amit Pundir <amit.pundir@linaro.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Alistair Strachan <astrachan@google.com> Cc: Greg Hartman <ghartman@google.com> Cc: Tapani Pälli <tapani.palli@intel.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Fixes: `2f0b9d2249` ("freedreno/ir3: lower load_barycentric_at_offset") Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: John Stultz <john.stultz@linaro.org>	2019-05-06 11:29:26 +00:00
John Stultz	c935862127	mesa: android: freedreno: Fix build failure due to path change The ir3_nir_trig.py file was moved in a previous commit, `aa0fed10d3` (freedreno: move ir3 to common location), so update the Android.gen.mk file to match. Cc: Rob Clark <robdclark@chromium.org> Cc: Emil Velikov <emil.l.velikov@gmail.com> Cc: Amit Pundir <amit.pundir@linaro.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Alistair Strachan <astrachan@google.com> Cc: Greg Hartman <ghartman@google.com> Cc: Tapani Pälli <tapani.palli@intel.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Fixes: `aa0fed10d3` ("freedreno: move ir3 to common location") Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: John Stultz <john.stultz@linaro.org>	2019-05-06 11:29:26 +00:00
Amit Pundir	88105375c9	mesa: android: freedreno: build libfreedreno_{drm,ir3} static libs Add libfreedreno_drm/ir3 to the build Cc: Rob Clark <robdclark@chromium.org> Cc: Emil Velikov <emil.l.velikov@gmail.com> Cc: Amit Pundir <amit.pundir@linaro.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Alistair Strachan <astrachan@google.com> Cc: Greg Hartman <ghartman@google.com> Cc: Tapani Pälli <tapani.palli@intel.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Fixes: `b4476138d5` ("freedreno: move drm to common location") Fixes: `aa0fed10d3` ("freedreno: move ir3 to common location") Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Amit Pundir <amit.pundir@linaro.org> [jstultz: Tweaked to add extra ir3 files from master] Signed-off-by: John Stultz <john.stultz@linaro.org>	2019-05-06 11:29:26 +00:00
Alistair Strachan	0fda3eac31	mesa: android: Remove unnecessary dependency tracking rules The current AOSP master build system breaks building mesa due to the following error: external/mesa3d/src/compiler/Android.glsl.gen.mk:94: error: writing to readonly directory: "external/mesa3d/src/compiler/glsl/ir.h" This error is bogus -- nothing "writes" to ir.h -- but the rule is unnecessary because the generated header that is a dependency of the non-generated header should be added to LOCAL_GENERATED_SOURCES and this will track if the dependency needs to be regenerated. (This change fixes a similar problem affecting nir.h too.) Cc: Rob Clark <robdclark@chromium.org> Cc: Emil Velikov <emil.l.velikov@gmail.com> Cc: Amit Pundir <amit.pundir@linaro.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Alistair Strachan <astrachan@google.com> Cc: Greg Hartman <ghartman@google.com> Cc: Tapani Pälli <tapani.palli@intel.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Alistair Strachan <astrachan@google.com> [jstultz: Forward ported and tweaked commit subject] Signed-off-by: John Stultz <john.stultz@linaro.org>	2019-05-06 11:29:25 +00:00
Bas Nieuwenhuizen	5692351264	radv: Implement cosited_even sampling. Apparently cosited_even was the required one instead of midpoint. This adds slight offset of 0.5 pixels to the coordinates (+ we need the image size to convert to normalized coords) Fixes: `91702374d5` "radv: Add ycbcr lowering pass." Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-05-06 11:09:30 +00:00
Michel Dänzer	28784e494e	Restore erroneously removed .gitignore entry for "build" directory It was removed in "delete autotools .gitignore files", but the build directory is created by scons. [Skip CI] Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-06 12:11:44 +02:00
Bas Nieuwenhuizen	5cbe12ad1b	radv: Disable subsampled formats. Broken on Polaris and since I discovered NV12 is not subsampled, but a 2-plane format I decided I don't really care. Work to do to re-enable: 1) Figure out which devices support it natively. 2) Write some software emulation for the others. Fixes: `52c1adda21` "radv: Add ycbcr format features." Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-05-06 09:53:37 +00:00
Timothy Arceri	1af72fa4d6	util/drirc: add workarounds for bugs in Doom 3: BFG This makes the game playable on radeonsi. Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110143	2019-05-06 17:32:36 +10:00
Rob Clark	bdd273d873	freedreno: remove unused forward struct declaration Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-04 13:59:56 -07:00
Alyssa Rosenzweig	6823873246	panfrost/midgard: iabs cannot run on mul Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-04 19:08:51 +00:00
Alyssa Rosenzweig	cdd9189aad	panfrost/midgard: Lower mixed csel (NIR) Basically, when the conditions of a csel diverge, we scalarize to avoid going into weird code paths during emit. We could be doing better, but this case can't occur organically from GLSL as far as I can, though it does fix lowered atan2. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-04 19:08:51 +00:00
Alyssa Rosenzweig	58a1e1f86c	panfrost/midgard: Fix RA when temp_count = 0 A previous commit by Tomeu aborted RA early, which solves the memory corruption issue, but then generates an incorrect compile. This fixes that. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-04 19:08:51 +00:00
Alyssa Rosenzweig	3d7874c699	panfrost/midgard: Fix integer selection Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-04 19:08:51 +00:00
Alyssa Rosenzweig	31f5a43bf0	panfrost: Support RGB565 FBOs Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-04 19:08:50 +00:00
Alyssa Rosenzweig	f8c7ffa07a	panfrost/midgard/disasm: Handle dest_override generalized Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-04 19:08:50 +00:00
Alyssa Rosenzweig	b6b534c733	panfrost/midgard/disasm: Stub out 64-bit Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-04 19:08:50 +00:00
Alyssa Rosenzweig	8c36ecd4b1	panfrost/midgard/disasm: Print 8-bit sources This handles the usual case. 8-bit register access parallels 16-bit access, but with one major caveat: in 8-bit mode, only half of the register file is actually (directly) accessible as sources. In particular, for each 16-bit integer register (hrN), we can only index a single 8-bit integer (qrN), corresponding to the lower 8-bits. To get the upper 8-bits, it is required to do an explicit shift. For example, to add the bytes of a 16-bit integer hr0.x and get the result as an 8-bit qr0, you'd need to do something like: ilsr hr1.x, hr0.x, #8 iadd qr0.x, qr0.x, qr1.x This scheme diverges from 32-bit registers, in that both the upper and lower halves of a 32-bit register are individually accessible as a pair of half registers. For contrast, to add the lower and upper 16-bits of a 32-bit integer r0.x, you can just: iadd hr0.x, hr0.x, hr1.x Since hr1.x = upper 16-bit of r0.x. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-04 19:08:50 +00:00
Alyssa Rosenzweig	2800e822a4	panfrost/midgard/disasm: Support 8-bit destination Meanwhile, we're forced to disable dest_override, since it's not yet clear how this interacts with other bitnesses (it'll likely need to be overhauled in any case). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-04 19:08:50 +00:00
Alyssa Rosenzweig	d42c37e494	panfrost/midgard: Rename ilzcnt8 -> iclz Per OpenCL. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-04 19:08:50 +00:00
Alyssa Rosenzweig	9559280fc3	panfrost/midgard: Fix crash on unknown op Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-04 19:08:50 +00:00
Alyssa Rosenzweig	96eed4e04b	panfrost/midgard/disasm: Fill in .int mod Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-04 19:08:50 +00:00
Alyssa Rosenzweig	7469df70c8	panfrost/midgard/disasm: Extend print_reg to 8-bit Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-04 19:08:50 +00:00
Alyssa Rosenzweig	055f6def30	panfrost/midgard/disasm: Catch mask errors We silently ignored certain bits of the mask, which causes issues when disassembly 8/64-bit ops. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-04 19:08:50 +00:00
Alyssa Rosenzweig	576a27fd55	panfrost/midgard: reg_mode_full -> reg_mode_32, etc In preparation for 8-bit and 64-bit operands, let's not reinforce the 32-bit-centric biases in the ISA. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-04 19:08:50 +00:00
Rob Clark	2da36dd0b6	freedreno/a6xx: deduplicate a few lines Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-04 11:50:44 -07:00
Rob Clark	555ca49d2b	freedreno: add ubwc_enabled helper Since it is dependent on the tile mode (ie. disabled for smaller mipmap levels), we should handle it a similar way to fd_resource_level_linear(). The code previously mostly did the right thing because the old helper took the tile mode. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-04 11:50:44 -07:00
Rob Clark	62c0b02717	freedreno: move UBWC color offset to fd_resource_offset() Best to keep it encapsulated in the helper which returns layer/level offset (and actually use that helper everywhere) rather than spreading the logic around the code. Also add a helper to find UBWC offset, to complete the encapsulation. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-04 11:50:44 -07:00
Rob Clark	a871b5ffaa	freedreno/a6xx: buffer resources cannot be compressed Small cleanup. They are just an array of data and only ever linear/ uncompressed. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-04 11:50:44 -07:00
Rob Clark	05f5122d4a	freedreno: mark imported resources as valid If someone is importing a buffer, we can't really know the state of it's contents, so assume it is valid. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-04 11:50:44 -07:00
Rob Clark	11583dc655	freedreno/a6xx: UBWC support for images There are still some fallbacks we'll need to handle before we can enable UBWC by default. I think we may need to fallback to uncompressed if image atomic operations are used. And we still need to sort out how to handle image and sampler views of compressed resources if the image/ sampler view is using a format that does not support compression. (I think the latter should hopefully be uncommon outside of deqp/piglit.) But at least this gets us to the point where supertuxkart works properly with UBWC enabled ;-) Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-04 11:50:44 -07:00
Rob Clark	857d9f3b02	freedreno/a6xx: UBWC fixes A few fixes that get UBWC working for the games/benchmarks where I noticed problems before (in particular and manhattan, and stk (modulo image support for UBWC when compute shaders are used for post-process effects): + fix the size of the UBWC meta buffer (ie, the offset to color pixel data) that is returned by ->fill_ubwc_buffer_sizes() + correct size/layout for 8 and 16 byte per pixel formats + limit the supported formats.. Note all formats that can be tiled can be compressed. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-04 11:50:44 -07:00
Rob Clark	6ffb58726b	freedreno: update generated headers Corrects tex state ubwc pitch/size Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-04 11:50:44 -07:00
Rob Clark	fb1488a800	freedreno/a6xx: OUT_RELOC vs OUT_RELOCW fixes Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-04 11:50:44 -07:00
Rob Clark	8c97b3c546	freedreno/ir3: remove assert Fixes dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 and .20 `ca3eb5db66` went from silently truncating the constant state, which was also the wrong thing to do, to an assert. Which then showed up in a couple of dEQPs. Actually there is nothing wrong with larger constant file so just drop the assert. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-04 11:50:44 -07:00
Karol Herbst	7f85283103	spirv/cl: support vload/vstore Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-04 12:27:51 +02:00
Karol Herbst	d11b807da5	nir: Add nir_op_vec helper with that we can simplify code where nir vectors are created v2: merge both lines in nir_vec Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-04 12:27:51 +02:00
Karol Herbst	681fb7ea05	nir: Add a nir_builder_alu variant which takes an array of components v2: rename to nir_build_alu_src_arr Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-04 12:27:51 +02:00
Karol Herbst	c91ea6343f	vtn: handle bitcast with pointer src/dest v2: use vtn_push_ssa and vtn_ssa_value Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-04 12:27:51 +02:00
Mathias Fröhlich	c989661985	mesa: Leave aliasing of vertex and generic0 attribute to the dlist code. Now that dlist compilation again knows if it is inside glBegin/glEnd, we can leave the decision if aliasing should occur to the vertex attribute setter functions instead of doing that at glArrayElement time. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-05-04 07:40:35 +02:00
Mathias Fröhlich	c869387d8a	mesa: Correct the is_vertex_position decision for dlists. We have to use _mesa_inside_dlist_begin_end instead of _mesa_inside_begin_end to see if we are inside a glBegin/glEnd block in case of display lists. So split the is_vertex_position function used in vertex attribute processing into a imm and dlist variant and use the appropriate _mesa_inside_begin_end variant. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-05-04 07:40:35 +02:00
Mathias Fröhlich	5ad54217ff	mesa: Set CurrentSavePrimitive in vbo_save_NotifyBegin. That seems to be lost somewhere. Is needed for correct outside begin/end detection in display list compilation. And is needed for correct aliasing in dlists restablished in the next changes. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-05-04 07:40:35 +02:00
Mathias Fröhlich	0ed7603d97	mesa: Remove the _glapi_table argument from _mesa_array_element. The value is now unused. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-05-04 07:40:35 +02:00
Mathias Fröhlich	3b6f32907f	mesa: Constify static const array in api_arrayelt.c Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-05-04 07:40:35 +02:00
Mathias Fröhlich	68aaf0a4e3	mesa: Remove the now unused _NEW_ARRAY state change flag. Is no longer used, so we have less occasions where NewState is non zero. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-05-04 07:40:35 +02:00
Mathias Fröhlich	7af047c373	mesa: Rip out now unused gl_context::aelt_context. Now this part of gl_context state is unused and can be removed. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-05-04 07:40:35 +02:00
Mathias Fröhlich	b9de48581a	mesa: Implement _mesa_array_element by walking enabled arrays. In glArrayElement, use the bitmask trick to just walk the enabled vao arrays. This should be about equivalent in execution time to walk the prepare aelt_context list. Finally this will allow us to reduce the _mesa_update_state calls in a few patches. v2: Add comments. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-05-04 07:40:19 +02:00
Mathias Fröhlich	7a5dea6320	mesa: Use glVertexAttribNV functions for fixed function attribs. In the glArrayElement implementation, use glVertexAttribNV type functions for fixed function attributes. We do the same in display execution when the list is replayed using immediate mode attribute functions. Using a single set of function pointers enables to use a unified loop to walk the vertex array attributes. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-05-04 07:39:42 +02:00
Mathias Fröhlich	60076a6171	mesa: Factor out index function that will have multiple use. For access to glArrayElement methods factor out a function to get the table lookup index for normalized/integer/double access. The function will be used in the next patch at least twice. v2: Use vertex_format_to_index instead of NORM_IDX. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-05-04 07:39:18 +02:00
Jason Ekstrand	91899495a1	nir: Add a SSA type gathering pass This new pass (which isn't even compile-tested) attempts to determine the ALU type of all the SSA values in a function impl. It takes a greedy approach and assigns intness or floatness to everything it thinks can possibly contain an int or a float. Some values will be labled as both int and float and some will be labled as neither and it is up to the caller to decide what to do with this information. However, for a "nice" shader where the original source contained no bit-casts and no implicit bit-casts were introduced by optimizations, there shouldn't be any overlap in the two sets save for the odd CSEd zero constant. Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-05-04 03:52:05 +00:00
Kenneth Graunke	694d1a08d3	iris: Delete bucketing allocators These add a lot of complexity, and I currently can't measure any performance benefit from having them. In the past, I seem to recall seeing a benefit in drawoverhead scores, but currently it looks like dropping them is either a wash or 1-2% faster. Drop them to simplify allocations.	2019-05-03 19:50:26 -07:00
Kenneth Graunke	bd4b18d255	iris: Force VMA alignment to be a multiple of the page size. This should happen regardless, but let's be paranoid.	2019-05-03 19:48:37 -07:00
Kenneth Graunke	068a700195	iris: leave the top 4Gb of the high heap VMA unused This ports commit `9e7b0988d6` from anv to iris. Thanks to Lionel for noticing that it was missing!	2019-05-03 19:48:37 -07:00
Kenneth Graunke	21062e21d9	iris: Fix 4GB memory zone heap sizes. The STATE_BASE_ADDRESS "Size" fields can only hold 0xfffff in pages, and 0xfffff * 4096 = 4294963200, which is 1 page shy of 4GB. So we can't use the top page.	2019-05-03 19:48:37 -07:00
Julien Isorce	8cd71f399e	st/va: check resource_get_info nullity in vlVaDeriveImage This pipe_screen function is not implemented by all backends. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110443 Signed-off-by: Julien Isorce <jisorce@oblong.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2019-05-03 16:11:55 -07:00
Jason Ekstrand	30fa15e36b	anv,i965: Stop warning about incomplete gen11 support Both drivers are feature-complete and should be running more-or-less at perf at this point. Drop the warning. Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2019-05-03 22:57:35 +00:00
Connor Abbott	d0ea9877b8	nir/algebraic: Don't emit empty initializers for MSVC Just don't emit the transform array at all if there are no transforms v2: - Don't use len(array) > 0 (Dylan) - Keep using ARRAY_SIZE to make the generated C code easier to read (Jason).	2019-05-04 00:13:21 +02:00
Kenneth Graunke	8987152ac1	iris: Resolve textures used by the program, not merely bound textures st/mesa's PBO upload path binds a vertex shader that doesn't use any textures, but leaves the existing sampler views bound in place. This was tricking us into thinking the PBO destination might be bound for texturing in some cases. In Civilization VI, this fixes a false self- dependency issue that was preventing CCS_E compression on upload. Fixing this slightly improves frame times.	2019-05-03 13:03:22 -07:00
Dylan Baker	c613861b23	meson: Don't build glsl cache_test when shader cache is disabled v2: - Use new with_shader_cache variable instead of host_machine.system() == 'windows' Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-03 10:58:31 -07:00
Dylan Baker	a216aea7af	tests/vma: fix build with MSVC Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-03 10:58:27 -07:00
Dylan Baker	5eb0f33e4f	glsl/tests: define ssize_t on windows Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-03 10:58:24 -07:00
Dylan Baker	76338933e9	util/tests: Use define instead of VLA To allow the this test to be built with MSVC, which doesn't support VLAs. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-03 10:58:17 -07:00
Dylan Baker	ff9bf223c2	meson: make nm binary optional This makes nm not required, but used if found. In general I imagine that this means that on windows nm wont be found, and on other platforms it will. v2: - fix gbm and egl symbols check tests to only be run if nm is found - reword commit message to reflect the code change Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-03 10:58:05 -07:00
Dylan Baker	f5eafc2dc6	meson: Make shader-cache a trillean instead of boolean So that it can be implicitly disabled on windows, where it doesn't compile. v2: - Use an auto-option rather than automagic. - fix shader_cache check (== -> !=) v4: - Use new with_shader_cache instead of get_option('shader-cache') elsewhere in the meson build Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-03 10:57:36 -07:00
Dylan Baker	ddc15fba2b	meson: switch gles1 and gles2 to auto options This allows them to default to false on windows, but default to true elsewhere. As a side effect turning off shared-glapi now automatically turns off gles. Shared glapi remains a boolean defaulting to true. v5: - new in this version Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-03 10:57:19 -07:00
Dylan Baker	113bb8d448	glsl: fix general_ir_test with mingw Somewhere down in the depths of the mingw headers 'interface' is defined, change it to iface like a similar patch did. Signed-off-by: Dylan Baker <dylan.c.baker@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-03 10:57:17 -07:00
Dylan Baker	f1d5f2aff3	meson: always define libglapi This allows the identifier to be used even if shared-glapi isn't build, which simplifies a bunch of things. Signed-off-by: Dylan Baker <dylan.c.baker@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-05-03 10:57:10 -07:00
Chuck Atkins	a381dbf253	meson: Fix missing glproto dependency for gallium-glx Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com> Cc: mesa-stable <mesa-stable@lists.freedesktop.org> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-03 13:36:25 -04:00
Samuel Pitoiset	4f18c43d1d	radv: apply the indexing workaround for atomic buffer operations on GFX9 Because the new raw/struct intrinsics are buggy with LLVM 8 (they weren't marked as source of divergence), we fallback to the old instrinsics for atomic buffer operations only. This means we need to apply the indexing workaround for GFX9. The load/store operations still use the new LLVM 8 intrinsics. The fact that we need another workaround is painful but we should be able to clean up that a bit once LLVM 7 support will be dropped. This fixes a GPU hang with AC Odyssey and some rendering problems with Nioh. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110573 Fixes: `31164cf5f7` ("ac/nir: only use the new raw/struct image atomic intrinsics with LLVM 9+") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-03 17:59:12 +02:00
Alyssa Ross	e340d7beef	get_reviewer.pl: improve portability Not all package managers / users will install perl into /usr/bin, but /usr/bin/env /should/ always be present. Using /usr/bin/env means that we can't give the -w argument to Perl, so I added `use warnings' in the script. Reviewed-by: Rob Clark <robclark@freedesktop.org> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-03 14:32:44 +01:00
Lionel Landwerlin	80dc78407d	anv: fix crash when application does not provide push constants Found while running Talos Principle. As far as I can tell running a draw call with a pipeline having push constants without the application having called vkCmdPushConstants gives undefined push constant values. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: mesa-stable@lists.freedesktop.org	2019-05-03 10:21:40 +01:00
Samuel Pitoiset	e68d7bec67	radv: fix radv_get_aspect_format() for D+S formats This restores the previous behaviour before YCBCR landed. For D+S formats, it returns the depth format. This fixes an assertion with Thrones of Britannia. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110540 Fixes: `66507cc656` ("radv: Add single plane image views & meta operations") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-03 09:01:10 +02:00
Caio Marcelo de Oliveira Filho	aa675cef5e	intel/fs: Assert when brw_fs_nir sees a nir_deref_instr Since `09f1de97a7` "anv,i965: Lower away image derefs in the driver" the backend compiler is not expected to handle any derefs, so let's assert on it. This helps identifying problems when a deref is not lowered and "leaks" into the backend compiler. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-02 23:25:30 -07:00
Julien Isorce	a77512635e	r600: implement resource_get_info Factoring code with resource_get_handle. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110443 Signed-off-by: Julien Isorce <jisorce@oblong.com> Reviewed-by: Dave Airlie airlied@redhat.com	2019-05-03 05:54:28 +00:00
Dave Airlie	512a31a412	util/bitset: fix bitset range mask calculations. The MASK macro is used in the RANGE macro, and it should return the pre-bitset word mask for the (b) value. i.e. BITSET_MASK(0) should be undefined since it's meaningless. BITSET_MASK(31) should give 0x7fffffff BITSET_MASK(32) should give 0xffffffff BITSET_MASK(33) should give 0x00000001 BITSET_MASK(64) should give 0xffffffff However then BITSET_RANGE ends up broken for cases where it's (b) value is the 0,32,64 value as in that case the lower mask would be 0 not 0xffffffff. This fixes the unit tests that I've added, and my code that uses bitsets. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Fixes: `bb38cadb1c` "More GLSL code" Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-03 15:23:04 +10:00
Dave Airlie	18973a450e	util/tests: add basic unit tests for bitset The last test here currently fails as there is a bug in bitset.h Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-03 15:23:04 +10:00
Dave Airlie	6fd6246d92	nir: fix lower vars to ssa for larger vector sizes. This has a couple of hardcoded vec4 limits in it, change them to the proper sizing to avoid future issues. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-03 15:23:00 +10:00
Dave Airlie	2774d39366	spirv: fix SpvOpBitSize return value. The spir-v spec says this returns a bool. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-03 15:22:57 +10:00
Kenneth Graunke	5ff5d0a895	iris: Disable dual source blending when shader doesn't handle it This is a port of Danylo's `eca4a6548d` which fixed the hang on i965. It fixes GPU hangs in his new Piglit test, arb_blend_func_extended-dual-src-blending-discard-without-src1. I avoided my own review feedback here, and decided to simply adjust 3DSTATE_PS_BLEND rather than BLEND_STATE_ENTRY[0]. It has never been clear to me which the hardware uses in every case. However, whacking the enable in 3DSTATE_PS_BLEND seems to be sufficient to fix the hang, and that packet is already dynamic, so it's easy to handle. I'd rather avoid making BLEND_STATE_ENTRY[0] dynamic unless I have to.	2019-05-02 21:14:49 -07:00
Jason Ekstrand	be7e9870d6	anv: Stop including POS in FS input limits It is an input but it comes in as part of the shader payload and doesn't count towards the limits. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-05-02 18:56:51 -05:00
Rob Clark	b73dd91f60	nir: fix nir tex print harder Fixes: `691d5a825a` nir: rework tex instruction printing Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-02 15:06:01 -07:00
Erik Faye-Lund	96924aa92e	docs: fixup mistake in contents During a rebase, it seems I accidentally broke the contents-menu, leading to a duplicate link to freedesktop.org. This was obviously not intended. Let's fix this. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Fixes: `7eee13c467` ("docs: use dl/dd instead of blockquote for freedesktop link") Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-05-02 23:23:15 +02:00
Erico Nunes	568e8fc736	lima/ppir: support nir_op_ftrunc Support nir_op_ftrunc by turning it into a mov with a round to integer output modifier. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-05-02 20:55:56 +00:00
Eric Engestrom	1291c68c9c	gitlab-ci: merge meson-glvnd into meson-swr There's no need to have a whole build just for that flag, we can add it to any build. v2: Add a note about why we put glvnd where we did (by anholt). Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v2) Acked-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-02 20:39:12 +00:00
Eric Engestrom	043b54a35d	gitlab-ci: simplify meson job names Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Acked-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-02 20:39:12 +00:00
Eric Engestrom	43f1546420	gitlab-ci: meson-gallium-radeonsi was a subset of meson-gallium-clover-llvm Let's just drop it. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Acked-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-02 20:39:12 +00:00
Eric Engestrom	41407c602c	gitlab-ci: merge several meson jobs Merge the following into `meson-main`/`meson-loader-classic-dri`/ `meson-gallium-swr`: - meson-vulkan - meson-gallium-drivers-other - meson-gallium-st-other Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> [ Michel Dänzer ] * Rebase and fix up commit log. * Don't set VULKAN_DRIVERS in meson-loader-classic-dri. * Remove extraneous whitespace. * Squash in follow-up fixes. Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> [ anholt] * Add a note why nine and swrast landed where they did. * Switch from s/meson-vulkan/meson-main/ to s/meson-loader-classic-dri/meson-main/ which I think was the original intent Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Eric Engestrom <eric.engestrom@intel.com> (anholt changes) Acked-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-02 20:39:12 +00:00
Heinrich	9b80322532	gbm: Improve documentation of BO import - Add GBM_BO_IMPORT_FD_MODIFIER to documentation of supported foreign object types - Add newline before documentation block - Improve language Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Daniel Stone <daniels@collabora.com>	2019-05-02 20:36:38 +00:00
Samuel Pitoiset	62001f3dff	radv: only need to force emit the TCS regs on Vega10 and Raven1 Other GFX9 chips aren't affected. Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-02 22:29:01 +02:00
Marek Olšák	b3a26d4628	glsl: fix and clean up NV_compute_shader_derivatives support - make sure compute shader derivatives are exposed for all extensions - unify duplicated code Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-05-02 16:09:24 -04:00
Marek Olšák	20909284f2	st/dri: decrease input lag by syncing sooner in SwapBuffers It's done by: - decrease the number of frames in flight by 1 - flush before throttling in SwapBuffers (instead of wait-then-flush, do flush-then-wait) The improvement is apparent with Unigine Heaven. Previously: draw frame 2 wait frame 0 flush frame 2 present frame 2 The input lag is 2 frames. Now: draw frame 2 flush frame 2 wait frame 1 present frame 2 The input lag is 1 frame. Flushing is done before waiting, because otherwise the device would be idle after waiting. Nine is affected because it also uses the pipe cap.	2019-05-02 16:09:24 -04:00
Erik Faye-Lund	d30ce03bc0	meson: add build-summary This roughly mirrors what we get from autotools. There's a few differences, though: 1. The "exec_prefix" output has been dropped. Meson doesn't support this, so it makes no sense here. 2. The "llvm-config" output has been dropped. Meson abstracts dependency discovery a bit more than our autotools build-system does, so it's not easy to get this information as-is. 3. HUD extra stats, SWR archs, Shared/Static libs and CFLAGS / CXXFLAGS / LDFLAGS has been dropped. These can be inspected by "meson configure". 4. How we set defines works quite differently in our Meson build-system, and the result isn't quite the same. In particular, the DEFINES output has been dropped, to avoid having to refactor the code too much. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109326 Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-02 18:30:29 +00:00
Erik Faye-Lund	2127403439	meson: give dri- and gallium-drivers separate vars Variables are cheap, and there's little reason for the dri and gallium drivers to work on the same variable for the driver list. So let's split these in two separate lists instead. This makes it easier to inspect these after-the fact, for instance for generating a summary of build-settings. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-02 18:30:29 +00:00
Erik Faye-Lund	28f18915b8	meson: lift driver-collection out into parent build-file This way we can mark the dri_drivers and dri_link arrays as temporary, as all knowledge about them are contained in a single build-file with clearly visible limited life-span. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Dylan Baker <dylan@pnwbakers.com>	2019-05-02 18:30:29 +00:00
Rob Clark	c14b13d0ff	docs: mark KHR_blend_equation_advanced done on a6xx Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-02 11:19:22 -07:00
Rob Clark	8c77e669a8	freedreno/a6xx: smaller hammer for fb barrier We just need to do a sequence of commands to flush the cache. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	6fa8a6d60f	freedreno/a6xx: KHR_blend_equation_advanced support Wire up support to sample from the fb (and force GMEM rendering when we have fb reads). The existing GLSL IR lowering for blend_equation_advanced does the rest. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	650246523b	freedreno/ir3: fb read support Lower load_output to txf_ms_fb and add support for the new texture fetch instruction. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	0704ddb2e5	freedreno/drm: expose GMEM_BASE address Needed for sampling from tile buffer (GMEM). Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	a99c360a46	nir: add pass to lower fb reads Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	a2c89a85f4	nir: fix lower_wpos_ytransform in load_frag_coord case Apparently we never hit this path. Or at least haven't for a rather long time. But in either case (load_deref or load_frag_coord), we can just directly use the intrinsic's ssa dest. So stop passing the nir_variable (which would be NULL in the load_frag_coord case) around and instead just use &intr->dest.ssa. (This ofc means we need to setup the cursor to insert after the instruction, which seems to be another bug of the original implementation.) Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	691d5a825a	nir: rework tex instruction printing The extra comma at the end was annoying me. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-05-02 11:19:22 -07:00
Rob Clark	ca3eb5db66	freedreno/ir3: add some ubo range related asserts And a comment.. since we are mixing units of bytes/dwords/vec4, hopefully this will avoid some unit confusion. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-02 11:19:22 -07:00
Rob Clark	e941faf3e8	freedreno/ir3: add IR3_SHADER_DEBUG flag to disable ubo lowering It isn't quite as simple as not running the pass, since with packed varyings we get load_ubo for block==0 (ie. the "real" uniforms). So instead run the pass normally but decline to lower anything in block > 0 Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-02 11:19:22 -07:00
Rob Clark	f697f61590	freedreno/ir3: fix lowered ubo region alignment Since we emit UBO regions INDIRECTly (ie. not copied into cmdstream but emit by EXT_SRC_ADDR) we need to keep them 4*vec4 aligned. Which the code already mostly did, except for aligning the first UBO region itself (ie. the one after block==0 which is the "real" uniforms). Fixes: `893425a607` freedreno/ir3: Push UBOs to constant file Fixes: `3c8779af32` freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-02 11:19:22 -07:00
Rob Clark	32925f4072	freedreno/ir3: fix shader variants vs UBO analysis Otherwise we zero out the state again, but all the UBO loads that we could lower are already lowered. End result is that we didn't emit the uniforms for lowered UBO access in any case where multiple shader variants are used. Fixes: `893425a607` freedreno/ir3: Push UBOs to constant file Fixes: `3c8779af32` freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-05-02 11:19:22 -07:00
Lionel Landwerlin	ff4168c418	vulkan/overlay: add TODO list Keen on having other people contribute. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 17:02:57 +01:00
Lionel Landwerlin	99cb2d325f	vulkan/overlay: make overriden functions static And fix the unused CmdDrawIndirect. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-02 17:02:57 +01:00
Lionel Landwerlin	f2afd6bd76	vulkan/overlay: make overlay size configurable Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-02 17:02:55 +01:00
Lionel Landwerlin	7d908038ad	vulkan/overlay: add a frame counter option This is useful to normalize the numbers written into the output file as those number are accumulated over a period of time and number of frames. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-02 17:02:35 +01:00
Lionel Landwerlin	81fd6ba7cc	vulkan/overlay: record all select metrics into output file The output looks something like this (csv style) : fps, frame, frame_timing(us), submit, draw_indexed, pipeline_graphics, acquire_timing(us), vert_invocations, frag_invocations, gpu_timing(ns) 480.55, 242, 501512, 247, 1444, 1204, 714, 5827272, 113043296, 121424174 467.80, 234, 500214, 234, 1412, 1176, 648, 5635680, 109436188, 117743760 424.37, 213, 501923, 213, 2130, 1704, 623, 5132448, 99657292, 105474683 472.15, 237, 501962, 237, 2370, 1896, 667, 5710752, 110924644, 122226004 411.32, 206, 500826, 206, 2060, 1648, 709, 4963776, 96491764, 95333273 458.87, 230, 501228, 230, 2300, 1840, 634, 5542080, 107758204, 123112090 475.01, 238, 501044, 238, 2380, 1904, 631, 5734848, 111477480, 122087426 471.08, 236, 500972, 236, 2360, 1888, 655, 5686656, 110498496, 114816162 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-02 17:02:34 +01:00
Lionel Landwerlin	74a9fdd8a2	vulkan/overlay: add a margin to the size of the window Looks a bit better. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-02 17:02:07 +01:00
Lionel Landwerlin	7ba50d8040	vulkan/overlay: add no display option In case you're just interested in data being record to the output file. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-02 17:02:07 +01:00
Lionel Landwerlin	ea7a6fa980	vulkan/overlay: add pipeline statistic & timestamps support v2: switch to VkBase{In,Out}Structure v3: Add timestamps at begin/end of primary command buffers to estimate gpu time spent per submission (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v2)	2019-05-02 17:02:06 +01:00
Lionel Landwerlin	4438188f49	vulkan/overlay: record stats in command buffers and accumulate on exec/submit This significantly reworks how numbers displayed are computed. We accumulate operations written into command buffers and add those to the device when submitted to a queue. These collected values are then used to compute per frame overlay data. We also accumulate the data over the sampling fps period to produce numbers for that period of time. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-05-02 17:02:06 +01:00
Lionel Landwerlin	9eddceef44	vulkan/overlay: update help printout Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 17:02:06 +01:00
Lionel Landwerlin	a1e6b5e9be	vulkan/util: generate a helper function to return pNext struct sizes This will be used to copy chains of structures so that we can alterate some of them. v2: Drop vk_util.h include (Eric) Use VkBaseInStructure directly (Eric) v3: Drop --platforms= param to generator script, instead produce a file with #ifdef based what platforms are compiled. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 17:02:02 +01:00
Tomeu Vizoso	ad7c9ba0ec	panfrost/midgard: Skip liveness analysis for instructions without dest [Alyssa: Add comment explanation] Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-02 15:29:48 +00:00
Tomeu Vizoso	a5dddc2d42	panfrost/midgard: Skip register allocation if there's no work to do Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-02 15:29:41 +00:00
Eric Engestrom	7c15a87aea	gitlab-ci: add scons windows build using mingw Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 15:10:59 +00:00
Eric Engestrom	a34ee4dec7	egl: hard-code destroy function instead of passing it around as a pointer Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-05-02 14:44:16 +00:00
Connor Abbott	6ec4ed48fc	nir/search: Add debugging code to dump the pattern matched This was useful while debugging the previous commit. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-02 16:14:06 +02:00
Connor Abbott	7ce86e6938	nir/search: Add automaton-based pre-searching nir_opt_algebraic is currently one of the most expensive NIR passes, because of the many different patterns we've added over the years. Even though patterns are already sorted by opcode, there are still way too many patterns for common opcodes like bcsel and fadd, which means that many patterns are tried but only a few actually match. One way to fix this is to add a pre-pass over the code that scans it using an automaton constructed beforehand, similar to the automatons produced by lex and yacc for parsing source code. This automaton has to walk the SSA graph and recognize possible pattern matches. It turns out that the theory to do this is quite mature already, having been developed for instruction selection as well as other non-compiler things. I followed the presentation in the dissertation cited in the code, "Tree algorithms: Two Taxonomies and a Toolkit," trying to keep the naming similar. To create the automaton, we have to perform something like the classical NFA to DFA subset construction used by lex, but it turns out that actually computing the transition table for all possible states would be way too expensive, with the dissertation reporting times of almost half an hour for an example of size similar to nir_opt_algebraic. Instead, we adopt one of the "filter" approaches explained in the dissertation, which trade much faster table generation and table size for a few more table lookups per instruction at runtime. I chose the filter which resulted the fastest table generation time, with medium table size. Right now, the table generation takes around .5 seconds, despite being implemented in pure Python, which I think is good enough. Based on the numbers in the dissertation, the other choice might make table compilation time 25x slower to get 4x smaller table size, but I don't think that's worth it. As of now, we get the following binary size before and after this patch: text data bss dec hex filename 11979455 464720 730864 13175039 c908ff before i965_dri.so text data bss dec hex filename 12037835 616244 791792 13445871 cd2aef after i965_dri.so There are a number of places where I've simplified the automaton by getting rid of details in the LHS patterns rather than complicate things to deal with them. For example, right now the automaton doesn't distinguish between constants with different values. This means that it isn't as precise as it could be, but the decrease in compile time is still worth it -- these are the compilation time numbers for a shader-db run with my (admittedly old) database on Intel skylake: Difference at 95.0% confidence -42.3485 +/- 1.375 -7.20383% +/- 0.229926% (Student's t, pooled s = 1.69843) We can always experiment with making it more precise later. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-02 16:14:06 +02:00
Samuel Pitoiset	08be23bfde	radv: set WD_SWITCH_ON_EOP=1 when drawing primitives from a stream output buffer According to RadeonSI, this seems to be required by the hardware to avoid GPU hangs. I think I just forgot to set that bit when I implemented VK_EXT_transform_feedback. This fixes a GPU hang with Space Engineers and DXVK. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110291 Fixes: `b4eb029062` ("radv: implement VK_EXT_transform_feedback") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-02 15:55:46 +02:00
Brian Paul	48107b5a2b	glsl: fix typo in #warning message Trivial. Spotted by Eric Engestrom.	2019-05-02 06:32:57 -06:00
Brian Paul	f0f7c3b03a	svga: add SVGA_NO_LOGGING env var (v2) valgrind crashes when we try to initialize host logging. This env var can be used to disable logging. v2: rebase onto "svga: move host logging to winsys". Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Neha Bhende <bhenden@vmware.com>	2019-05-02 06:09:35 -06:00
Charmaine Lee	9c5f407b0b	svga: move host logging to winsys This patch adds a host_log interface to svga_winsys and moves the host logging code to the winsys layer. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Neha Bhende <bhenden@vmware.com>	2019-05-02 06:09:35 -06:00
Eric Engestrom	da8d9e2d88	wsi/wayland: document lack of vkAcquireNextImageKHR timeout support Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:51:03 +00:00
Daniel Stone	9826e04eca	vulkan/wsi/wayland: Respect non-blocking AcquireNextImage If the client has requested that AcquireNextImage not block at all, with a timeout of 0, then don't make any non-blocking calls. This will still potentially block infinitely given a non-infinte timeout, but the fix for that is much more involved. Signed-off-by: Daniel Stone <daniels@collabora.com> Cc: mesa-stable@lists.freedesktop.org Cc: Chad Versace <chadversary@chromium.org> Cc: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108540 Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Chad Versace <chadversary@chromium.org> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:51:03 +00:00
Erik Faye-Lund	8a67e4d30a	docs: reorder heading and notice All other pages has the heading as ghe first thing in the article. Let's clean this up for consistency. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:16 +00:00
Erik Faye-Lund	561c2b9bfa	docs: drop centered heading for faq The FAQ is the only article we have that uses a centered heading, which makes it look odd compared to the other articles. Let's drop the centering for consistency. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:16 +00:00
Erik Faye-Lund	da4994f252	docs: turn faq-index into an ordered list HTML already have a way of doing automatically ordered lists, so let's use that instead of open-coding one. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:16 +00:00
Erik Faye-Lund	afda72dc10	docs: replace empty list with a none-paragraph Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:16 +00:00
Erik Faye-Lund	a4ee15d5fe	docs: fix closing of list-items Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:16 +00:00
Erik Faye-Lund	b9eaeffaba	docs: fixup list-item tags The list items needs to contain everything part of the item, not just the first paragraph. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:16 +00:00
Erik Faye-Lund	830821aaa4	docs: fix closing of paragraphs Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:16 +00:00
Erik Faye-Lund	02a5698017	docs: add missing lists Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:16 +00:00
Erik Faye-Lund	767c517816	docs: fixup bad paragraphing This markup seems to assume paragraphs survive across block-elements, which isn't the case. Let's rectify that. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:16 +00:00
Erik Faye-Lund	b877722d75	docs: remove stray list-start Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:16 +00:00
Erik Faye-Lund	c61e9aef76	docs: don't pointlessly close and re-start definition lists Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:16 +00:00
Erik Faye-Lund	0ea4ef2473	docs: fix incorrectly closed paragraph Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:16 +00:00
Erik Faye-Lund	d69c790c22	docs: drop paragraph around preformatted text Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:16 +00:00
Erik Faye-Lund	8ef86c9240	docs: start paragraph before closing it Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:16 +00:00
Erik Faye-Lund	41573d486f	docs: close paragraphs before preformatted text It's illegal to nest block-level elements such as <pre> inside <p> in HTML. This means that when the paragraphs gets closed after a <pre>-tag, we end up closing a non-existent tag, so the browser inserts a dummy <p>-tag. This is entirely pointless, so let's just close these tags before the <pre>-tag instead. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:16 +00:00
Erik Faye-Lund	5630540a27	docs: remove stray paragraph-close This isn't matching any paragraph-open tags, so let's get rid of it. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:15 +00:00
Erik Faye-Lund	3bda82b2e5	docs: close lists These lists never got closed. Let's fix that to avoid issues with bad parsers. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:15 +00:00
Erik Faye-Lund	92917e82e8	docs: close paragraphs before lists paragraphs can't contain lists, and attempting to close them after the list just cause an extra, empty paragraph to be created. We don't want that, so let's close the paragraphs before the list intead. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:15 +00:00
Erik Faye-Lund	0c3bab7761	docs: open list-item before closing it A list-item must be openened before it can be closed. So let's replace this closing tag with an opening tag. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:15 +00:00
Erik Faye-Lund	7eee13c467	docs: use dl/dd instead of blockquote for freedesktop link The blockquote happens to match the indentation of the other lists for most browsers, but this isn't a guarantee. Let's instead use a definition-list, which is more strongly connected to a list, so it's more likely to have the same indention. This also makes sure that we don't have similar padding on the right-hand side, in case we change the text-size. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:15 +00:00
Erik Faye-Lund	3f0568d7e5	docs: use h2 instead of b-tag for headings <b>-tags aren't allowed in the root of <body>, so let's replace these with <h2>-tags with some CSS to make them appear as bold. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:15 +00:00
Erik Faye-Lund	e81b6aa311	docs: remove stray paragraph-close This tag tries to close a non-existent paragraph. Let's get rid of it! Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:15 +00:00
Erik Faye-Lund	5b2a7062ff	docs: properly escape ampersand Even in preformatted blocks, ampersands should be escaped. Let's correct this, in case of strict parsers. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:15 +00:00
Erik Faye-Lund	13b990000f	docs: properly escape '>' The '>'-symbol should usually be escaped to avoid confusing strict parsers. While it's very unlikely to cause issues as-is, let's quite it for good measure. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 11:09:15 +00:00
Rhys Perry	13c423629e	radv: fix set_output_usage_mask() with composite and 64-bit types It previously used var->type instead of deref_instr->type and didn't handle 64-bit outputs. This fixes lots of transform feedback CTS tests involving transform feedback and geometry shaders (mostly dEQP-VK.transform_feedback.fuzz.random_geometry.*) v2: fix writemask widening when comp != 0 v3: fix 64-bit variables when comp != 0, again Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-05-02 10:24:20 +01:00
Erik Faye-Lund	8194d3887e	docs: do not hard-code header-height It's generally nicer to do this in terms of em units, as that scales better with text-sizes, if we ever decide to change them. The result is slightly larger than before, but only by a couple of pixels. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 08:45:57 +00:00
Erik Faye-Lund	5ffe4879b6	docs: simplify css-centering With "display: flex;" we can make this a bit more automatic, not requiring a bunch of values to be of specific values to get the right centering. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 08:45:57 +00:00
Erik Faye-Lund	130400b904	docs: use multiple background-images for header This is a bit tidier than to set a background on the h1-text, requiring it to be full height and all. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 08:45:57 +00:00
Erik Faye-Lund	cb0123e37a	docs: remove spurious newline Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 08:45:57 +00:00
Erik Faye-Lund	3eec974143	docs: avoid repeating the color The color attribute is inherited in CSS, so there's no point in repeating this. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 08:45:57 +00:00
Erik Faye-Lund	86e38330d3	docs: avoid repeating the font The font attribute is inherited in CSS, so there's no point in repeating this. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 08:45:57 +00:00
Erik Faye-Lund	755c118a4f	docs: add missing semicolon While it's legal to omit the last semicolon in a CSS block, it's generally not considered good style, as it makes it harder to add new lines. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 08:45:57 +00:00
Erik Faye-Lund	3085eb90a0	docs: remove long commented out css These attributes has been commented out since 2005; I don't think there's a big chance of them making a return as-is. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 08:45:57 +00:00
Erik Faye-Lund	b6321d2f67	docs: remove non-existent css attribute There's no CSS-attribute named "link", so let's remove it. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 08:45:57 +00:00
Erik Faye-Lund	a2b0000d3c	docs: normaize css-indent style Tabs has been around as the indention style of this file since it was created. Some newer CSS has added double-spaces, but let's keep it consistent. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-05-02 08:45:57 +00:00
Thomas Hellstrom	20b7839392	winsys/svga: Don't abort on EBUSY errors from execbuffer This error code typically indicated that a buffer object that was referenced by the command stream was being used for CPU access by another client. The correct action here is to retry after a while. Use usleep() until we have proper kernel support for this wait. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2019-05-02 09:51:15 +02:00
Thomas Hellstrom	c69557c4a2	winsys/svga: Update the drm interface file The file vmwgfx_drm.h was a bit outdated. Update to a recent version, including defines supporting coherent memory. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2019-05-02 09:51:07 +02:00
Thomas Hellstrom	978d66e4d5	svga: Avoid bouncing buffer data in malloced buffers Some constant- and texture upload buffer data may bounce in malloced buffers before being transferred to hardware buffers. In the case of texture upload buffers this seems to be an oversight. In the case of constant buffers, code comments indicate that we want to avoid mapping hardware buffers for reading when copying out of buffers that need modification before being passed to hardware. In this case we avoid data bouncing for upload manager buffers but make sure buffers that we read out from stay in malloced memory. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2019-05-02 09:51:00 +02:00
Thomas Hellstrom	5961189f4e	winsys/svga: Enable the transfer_from_buffer GPU command for vgpu10 We didn't have the path using this command enabled as typically we take an alternate path using DMA uploads. Emable it so that we can exercise that code-path by turning off the DMA path. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2019-05-02 09:50:52 +02:00
Thomas Hellstrom	50e58966fa	winsys/svga: Add an environment variable to force host-backed operation The vmwgfx kernel module has a compatibility mode for user-space that is not guest-backed resource aware. Add an environment variable to facilitate testing of this mode on guest-backed aware kernels: if the environment variable SVGA_FORCE_HOST_BACKED is defined, the driver will use host-backed operation. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Deepak Rawat <drawat@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2019-05-02 09:50:22 +02:00
Samuel Pitoiset	492e828848	ac: tidy up ac_build_llvm8_tbuffer_{load,store} For consistency with ac_build_llvm8_buffer_{load,store}_common helpers and that will help a bit for removing the vec3 restriction. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-02 09:24:05 +02:00
Samuel Pitoiset	6ac10e07c2	radv: implement a workaround for VK_EXT_conditional_rendering Per the Vulkan spec 1.1.107, the predicate is a 32-bit value. Though the AMD hardware treats it as a 64-bit value which means it might fail to discard. I don't know why this extension has been drafted like that but this definitely not fit with AMD. The hardware doesn't seem to support a 32-bit value for the predicate, so we need to implement a workaround. This fixes an issue when DXVK enables conditional rendering with RADV, this also fixes the Sasha conditionalrender demo. Fixes: `e45ba51ea4` ("radv: add support for VK_EXT_conditional_rendering") Reported-by: Philip Rebohle <philip.rebohle@tu-dortmund.de> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-02 09:24:05 +02:00
Samuel Pitoiset	e03e7c510f	radv: fix color conversions for normalized uint/sint formats The hardware actually rounds before conversion. This now matches what values are used when performing fast clears vs slow clears. This fixes a rendering issue with Far Cry 3&4. This also fixes a bunch of CTS tests that use a 8-bit UNORM format (only when the 512*512 image size hint is manually disabled). Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-02 09:24:05 +02:00
Samuel Pitoiset	6162543999	radv: do not need to force emit the TCS regs on Vega20 This chip doesn't need the fixup. This fixes a bunch of dEQP-VK.tessellation tests and avoid random GPU hangs. Cc: "19.0" "19.1" <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-05-02 09:24:05 +02:00
Jason Ekstrand	bf774b56be	util/bitset: Return an actual bool from test macros I want to be able to do BITSET_TEST() != BITSET_TEST() and this isn't currently possible because BITSET_TEST() returns a random bit. Compare to zero to get an actual Boolean. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-05-02 03:12:54 +00:00
Brian Paul	413e55b5b9	glsl: work around MinGW 7.x compiler bug I'm not sure what triggered this, but building with scons platform=windows toolchain=crossmingw machine=x86 build=profile with MinGW g++ 7.3 or 7.4 causes an internal compiler error. We can work around it by forcing -O1 optimization. Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Neha Bhende <bhenden@vmware.com>	2019-05-01 20:06:54 -06:00
Brian Paul	96540e4f0a	llvmpipe: init some vars to NULL to silence MinGW compiler warnings Reviewed-by: Neha Bhende <bhenden@vmware.com>	2019-05-01 20:06:54 -06:00
Marek Olšák	2d48a6959f	radeonsi: set sampler state and view functions for compute-only contexts	2019-05-01 21:16:13 -04:00
Marek Olšák	bfd3d50487	radeonsi: use new atomic LLVM helpers This depends on "ac,ac/nir: use a better sync scope for shared atomics"	2019-05-01 21:16:13 -04:00
Marek Olšák	181dcf0792	st/mesa: don't flush the front buffer if it's a pbuffer This is the best guess I can make here. Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-05-01 21:15:33 -04:00
Marek Olšák	35294f2eca	mesa: fix pbuffers because internally they are front buffers This fixes the egl_ext_device_base piglit test, which uses EGL pbuffers. Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-05-01 21:15:33 -04:00
Marek Olšák	f753f913f5	mesa: rework error handling in glDrawBuffers It's needed by the next pbuffer fix, which changes the behavior of draw_buffer_enum_to_bitmask, so it can't be used to help with error checking. Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-05-01 21:15:33 -04:00
Bas Nieuwenhuizen	0c99b5ace8	radv: Restrict YUVY formats to 1 layer. Fixes: `8bb3cec7c9` "radv: Expose VK_EXT_ycbcr_image_arrays." Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-05-02 02:29:51 +02:00
Bas Nieuwenhuizen	aab201635e	radv: Set is_array in lowered ycbcr tex instructions. Fixes array tests. Fixes: `91702374d5` "radv: Add ycbcr lowering pass." Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-05-02 02:29:51 +02:00
Bas Nieuwenhuizen	2c57d3361a	radv: Fix hang width YCBCR array textures. Forgot to apply the width/height divisor for CB writes resulting in the CB using larger than expected slice sizes. Fixes: `42d159f276` "radv: Add multiple planes to images." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110530 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110526 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-05-02 02:29:51 +02:00
Erico Nunes	257a9b0a94	lima/gpir: add limit of max 512 instructions It has been noted that the lima GP has a limit of 512 instructions, after which the shaders don't work and fail silently. This commit adds a check to make the shader compilation abort when the shader exceeds this limit, so that we get a clear reason for why the program will not work. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-05-02 00:02:58 +00:00
Alyssa Rosenzweig	09c669260f	panfrost: Fix blend shader upload Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-01 23:20:51 +00:00
Alyssa Rosenzweig	910608b29a	panfrost/decode: Hit MRT blend shader enable bits Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-01 23:20:50 +00:00
Alyssa Rosenzweig	b304b30f2c	panfrost: Remove shader dump Redundant via the midgard shader dump. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-05-01 23:20:48 +00:00
David Riley	dec68e32ea	virgl: Re-use and extend queue transfers for intersecting buffer subdatas. Small buffer subdatas which are essentially doing a memcpy were getting bogged down by all the overhead of creating new transfers. Signed-off-by: David Riley <davidriley@chromium.org> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-05-01 15:48:51 -07:00
David Riley	a54c231b56	virgl: Allow transfer queue entries to be found and extended. Intersecting transfer queue entries allow for the possibility of extending an existing transfer instead of creating a new one (and all the associated mappign/unmapping). Signed-off-by: David Riley <davidriley@chromium.org> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-05-01 15:48:46 -07:00
David Riley	e94a9a7f38	virgl: Store mapped hw resource with transfer object. Signed-off-by: David Riley <davidriley@chromium.org> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-05-01 15:48:28 -07:00
Kenneth Graunke	ebbb05b3c9	iris: Fix imageBuffer and PBO download. Recently we added checks to try and deny multisampled shader images. Unfortunately, this messed up imageBuffers, which have sample_count = 0, which are also used in PBO download, causing us hit CPU map fallbacks. Fixes: `b15f5cfd20` iris: Do not advertise multisampled image load/store. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-05-01 14:37:46 -07:00
Dave Airlie	e2fecf57e3	r600: reset tex array override even when no view bound If no view is bound we still should reset the override to 0 and array mode. This should fix misrendering in firefox WebRender since the pbo sampler was removed. Fixes: `1250383e36` (st/mesa: remove sampler associated with buffer texture in pbo logic)	2019-05-02 07:34:32 +10:00
Ian Romanick	85e6865ff6	nir: Saturating integer arithmetic is not associative In 8-bits, iadd_sat(iadd_sat(0x7f, 0x7f), -1) = iadd_sat(0x7f, -1) = 0x7e but, iadd_sat(0x7f, iadd_sat(0x7f, -1)) = iadd_sat(0x7f, 0x7e) = 0x7f Fixes: `272e927d0e` ("nir/spirv: initial handling of OpenCL.std extension opcodes") Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-05-01 09:07:47 -07:00
Eric Engestrom	70da00ffd6	util: move #include out of #if linux This #include is needed for `NULL`, which is used on all OSes, not just Linux. Reported-by: Juan A. Suarez Romero <jasuarez@igalia.com> Fixes: `316964709e` "util: add os_read_file() helper" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>	2019-05-01 15:45:47 +00:00
Alok Hota	a44420d9cc	swr/rast: Add general SWTag statistics Update Archrast parser to use stats, used with an internal tool Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-05-01 15:11:30 +00:00
Alok Hota	b8adb540a0	swr/rast: Add string handling to AR event framework For use by an internal tool Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-05-01 15:11:30 +00:00
Alok Hota	f355f03388	swr/rast: Add initial SWTag proto definitions Update gen_archrast.py to properly generate event IDs Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-05-01 15:11:30 +00:00
Alok Hota	396831adf8	swr/rast: Cleanup and generalize gen_archrast - Update meson.build - Includes current_build_dir() fix meson/swr: replace hard-coded path with current_build_dir() Fixes: `93cd9905c8` "swr/rast: Cleanup and generalize gen_archrast" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Alok Hota <alok.hota@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> - Clean up meson.build (remove foreach loop, replace with single call) - Update SConscript - use `$SOURCES` to call `CodeGenerate` with multiple source files Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-05-01 15:11:30 +00:00
Eric Engestrom	47f419d0b3	gitlab-ci: build vulkan drivers in clang build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2019-05-01 14:37:31 +00:00
Erik Faye-Lund	f753ac355e	softpipe: setup pixel_offset for all primitive types If we don't update this for all primitive-types, we end up rendering slightly offset points and lines up until the point where the first triangle gets drawn. This is obviously not correct, and violates OpenGL's repeatability rule. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Fixes: `ca9c413647` ("softpipe: Respect gl_rasterization_rules in primitive setup.") Reviewed-By: Gert Wollny <gert.wollny@collabora.com>	2019-05-01 13:53:02 +00:00
Jonathan Marek	0c6702cfa5	nir: improve convert_yuv_to_rgb Use a different arrangement of constants to allow more ffma. A vec4 backend will now use 3 fma for yuv_to_rgb. On freedreno/ir3, it is down from 10 to 7 alu (4 fma, 3 mul, 3 add to 7 fma). Other backends shouldn't be hurt. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Eric Anholt <eric@anholt.net> Tested-by: Ian Romanick <ian.d.romanick@intel.com>	2019-05-01 04:13:36 -07:00
Gert Wollny	becd192801	doc: Update feature matrix Since softpipe doesn't truely support multisample, I've not added softpipe to the "Enhanced per-sample shading" even though with the advertised GLSL level ARB_gpu_shader5 is advertised. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-05-01 08:40:46 +02:00
Gert Wollny	6162ce6c60	softpipe: Increase the GLSL feature level This will enable calls to the interpolateAt* functions, but also a bunch of other features. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-05-01 08:40:39 +02:00
Gert Wollny	338017c58a	softpipe: Add support for TGSI_OPCODE_INTERP_CENTROID Like with interpolatAtSample this is also not really implementing the according sampling and will only work correctly for pixels that are fully covered, but since softpipe only supports one sample this is good enough for now. v2: Correct spelling (Roland Scheidegger) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-05-01 08:40:20 +02:00
Gert Wollny	c3df4e0601	softpipe: Add support for TGSI_OPCODE_INTERP_OFFSET Since for this opcode the offsets are given manually the function should actually also work for non-zero offsets, but the related piglits only ever test with offset 0. Accordingly the patch satisfies "fs-interpolateatoffset-*". Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-05-01 08:40:16 +02:00
Gert Wollny	27bfd57bc7	softpipe: Add (fake) support for TGSI_OPCODE_INTERP_SAMPLE Softpipe doesn't support more than one sample, so this function implements the interpolation at sample 0 and adds a stub to make it possible to interpolate at other samples. As it is this makes the piglits "fs-interpolateatsample-*" pass, but they only ever test sample 0 anyway. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-05-01 08:40:10 +02:00
Gert Wollny	e405e32d36	softpipe: Add an per-input array for interpolator correctors to machine This adds entry points for correcting the interpolation values if the interpolation is done by using one of the interpolateAt* functions. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-05-01 08:40:06 +02:00
Gert Wollny	5f0959f8df	softpipe: Factor out evaluation of the source indices We will need these for per sample interpolation as well Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-05-01 08:39:58 +02:00
Gert Wollny	7d5c8d3589	softpipe: evaluate cube the faces on a per sample bases Now that the LOD is evaluated up front the cube faces can also be evauate on a per sample basis instead of using the quad. This fixes a large number of deqp gles 3 and 31 cube texture tests. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-05-01 08:23:23 +02:00
Gert Wollny	aacdce2879	softpipe: keep input lod for explicite derivatives This only affects anisotropic interpolation. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-05-01 08:23:19 +02:00
Gert Wollny	d4b6ae223f	softpipe: tie in new code path for lod evaluation This enables the use of explicit gradients. Also remove an unused parameter when changing the interfaces. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-05-01 08:23:07 +02:00
Gert Wollny	9e26a0ed8f	softpipe: Move selection of shadow values up and clean parameter list The shadow evaluation compare parameter is stored in different locations, depending on the texture type. Move the values to a common location free the lod storage and to be able to reduce the number of parameters. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-05-01 08:23:02 +02:00
Gert Wollny	41dc16b928	softpipe: Pipe gather_comp through from st_tgsi_get_samples The value is stored in the lod components and this will be overwritten when swithcing to the new code path. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-05-01 08:22:56 +02:00
Gert Wollny	724a73509e	softpipe: Prepare handling explicit gradients This only adds corde that is not yet enabled. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-05-01 08:22:47 +02:00
Gert Wollny	7c004d093a	softpipe: Factor gradient evaluation out of the lambda evaluation this is useful when we want to use explicit gradients. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-05-01 08:22:28 +02:00
Andrii Simiklit	5c581b3dd6	egl: return correct error code for a case req ver < 3 with forward-compatible The EGL_KHR_create_context spec says: "If an OpenGL context is requested and the values for attributes EGL_CONTEXT_MAJOR_VERSION_KHR and EGL_CONTEXT_MINOR_VERSION_KHR, when considered together with the value for attribute EGL_CONTEXT_OPENGL_FORWARD_COMPATIBLE_BIT_KHR, specify an OpenGL version and feature set that are not defined, than an EGL_BAD_MATCH error is generated." This case is already correctly handled a bit below in the same source file. The correct handling was added by commit: `63beb3df` Reported-by: Ian Romanick <idr@freedesktop.org> Here: https://bugzilla.freedesktop.org/show_bug.cgi?id=92552#c9 Fixes: `11cabc45b7` "egl: rework handling EGL_CONTEXT_FLAGS" Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>	2019-05-01 00:14:00 +00:00
Timothy Arceri	90f3bf7437	radeonsi/nir: call radeonsi nir opts before the scan pass Some of the opts are not called in the general optimastion loop in the state trackers glsl -> nir conversion. We need to call the radeonsi specific optimisation once before scanning over the nir otherwise we can end up gathering info on code that is later removed. Fixes an assert in the piglit test: ./bin/varying-struct-centroid_gles3 Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-01 09:41:07 +10:00
Timothy Arceri	a004e95dd7	radeonsi/nir: create si_nir_opts() helper We will make use of this in the following commit. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-05-01 09:41:07 +10:00
Alok Hota	4c68acba37	swr/rast: early exit on empty triangle mask Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-04-30 19:48:12 +00:00
Alok Hota	e7f381e9ca	swr/rast: add guards for cpuid on Linux Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-04-30 19:48:12 +00:00
Alok Hota	ae436203d9	swr/rast: add flat shading Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-04-30 19:48:12 +00:00
Alok Hota	9d01f4d631	swr/rast: add SWR_STATIC_ASSERT() macro Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-04-30 19:48:12 +00:00
Alok Hota	3851c6c9bf	swr/rast: update guardband rects at draw setup It's dependent on other state fields Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-04-30 19:48:12 +00:00
Alok Hota	2729d847ce	swr/rast: add more llvm intrinsics Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-04-30 19:48:12 +00:00
Julien Isorce	0e3a348bec	st/va: properly set stride and offset in vlVaDeriveImage Using the new resource_get_info function. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110443 Signed-off-by: Julien Isorce <jisorce@oblong.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2019-04-30 17:53:12 +00:00
Julien Isorce	1cec049d4d	radeonsi: implement resource_get_info Re-use existing si_texture_get_offset. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110443 Signed-off-by: Julien Isorce <jisorce@oblong.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-30 17:53:12 +00:00
Julien Isorce	a3c202de0a	gallium: add resource_get_info to pipe_screen Generic plumbing. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110443 Signed-off-by: Julien Isorce <jisorce@oblong.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-30 17:53:12 +00:00
Rob Clark	ec6c229763	freedreno/ir3: fixes for half reg in/out Needs to update max_half_reg, or be remapped to full reg and update max_reg accordingly, depending on generation.. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-30 10:39:24 -07:00
Axel Davy	ce57f4f7c4	st/nine: Check discard_delayed_release is set before allocating more When discard_delayed_release is set (default), we allocate more buffers and use a different buffer wait path. Check if it is set, and use the old paths if not (the alternative buffer wait path could still be used, but there is no advantage to using it in this case). Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:52 +02:00
Axel Davy	b71c300c70	st/nine: Throttle rendering similarly for thread_submit thread_submit's throttling depending on the number of internal back buffers, and wasn't affected by the driver requested throttling value. Now it is. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:52 +02:00
Axel Davy	562f5a35c8	st/nine: Optimize a bit writeonly buffers Optimize writeonly by passing PIPE_TRANSFER_WRITE for these buffers instead of the safer PIPE_TRANSFER_READ_WRITE. This seems to improve the performance of d3d8 games using d3d8to9. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:52 +02:00
Axel Davy	92117c989c	st/nine: Use TGSI_SEMANTIC_GENERIC for fog We used TGSI_SEMANTIC_FOG for fog, however on vs/ps 3, fog is allowed to have 4 components (even on the ff pipeline according to a wine test). Since gallium's TGSI_SEMANTIC_FOG has only one component, use TGSI_SEMANTIC_GENERIC instead. Fixes: https://github.com/iXit/Mesa-3D/issues/346 Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:52 +02:00
Axel Davy	bade3bf615	st/nine: Enable computing const_ranges All the pieces for constant compact are ready, thus enable the path. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:52 +02:00
Axel Davy	5c67db6889	st/nine: Handle const_ranges in nine_state Handle slot mapping if there is one. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:52 +02:00
Axel Davy	9942ba2ea3	st/nine: Cache constant buffer size The shader constant buffer size with the constant compaction code can vary depending on the shader variant compiled (for example if fog constants are required, etc). Thus instead of using fixed size for the shader, add in the variant cache the size required, pass it to the context, and use this value. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:52 +02:00
Axel Davy	a3cdc466e7	st/nine: Propagate const_range to context As with the constant compaction we map the constant slots to new slots, we need to pass that information to the context which is in charge of uploading the constants. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:52 +02:00
Axel Davy	7761cda686	st/nine: Prepare constant compaction in nine_shader When indirect addressing is not used, we know exactly which constants are accessed, and thus can have them located in consecutive slots. We thus parse again the shader with a slot map for compaction. The path contains the work inside nine_shader.c for this path, but it needs some other commits to work, and thus is not enabled yet by this commit. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:52 +02:00
Axel Davy	db404507b4	st/nine: Refactor counting of constants Track the number of slots used Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:51 +02:00
Axel Davy	737df40a63	st/nine: Track constant slots used This tracking will be useful for constant compaction Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:51 +02:00
Axel Davy	d2cab4562c	st/nine: Refactor ct_ctor The refactoring will make it easier to parse the shader twice for the constant compaction path. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:51 +02:00
Axel Davy	6f3da226e6	st/nine: Make swvp_on imply IS_VS swvp cannot happen with ps, thus it makes sense to force it to false with ps. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:51 +02:00
Axel Davy	d57d1436d3	st/nine: Refactor shader constants ureg_src computation Put the shader constant code in one place to better change that code in future commits. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:51 +02:00
Axel Davy	6d86292f8a	st/nine: Manually upload vs and ps constants In future commits we will introduce more fine-grained uploads Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:51 +02:00
Axel Davy	1ddeb43537	st/nine: use helper ureg_DECL_sampler everywhere Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:51 +02:00
Axel Davy	3717ec4157	st/nine: Compact pixel shader key Compact the shader key to make room for new elements. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:51 +02:00
Axel Davy	2acbd977d7	st/nine: Compact nine_ff_get_projected_key Only the first four sampler slots can be used by ff ps < 0x14, thus the size of the key can be reduced. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:51 +02:00
Axel Davy	a92a43d41d	st/nine: Refactor param->rel Refactor param->rel to enable different paths for constants and inputs relative addressing. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:51 +02:00
Axel Davy	5974401a4a	st/nine: Regroup param->rel tests Regroup all the param->rel assertions into one assertion for better clarity and better covering. param->rel on an input can only happen with float constants for vs, or with inputs on vs/ps 3.0. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:51 +02:00
Axel Davy	12654a2fda	st/nine: Control shader constant inlining with drirc Until we use async shader compilation for constant inlining, don't enable it unless user asks for it. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:51 +02:00
Axel Davy	95f25bef54	st/nine: Recompile optimized shaders based on b/i consts Boolean and Integer constants are used in d3d9 for flow control. Boolean are used for if/then/else and Integer constants for loops. The compilers can generate better code if these values are known at compilation. I haven't met so far a game that would change the values of these constants frequently (and when they do, they set to the values used for the previous draw call, and thus the changes get filtered out). Thus it makes sense to inline these constants and recompile the shaders. The commit sets a bound to the number of variants for a given shader to avoid too many shaders to be generated. One drawback is it means more shader compilations. It would probably make sense to compile these shaders asynchronously or let the user control the behaviour with an env var, but this is not done here. The games I tested hit very few shader variants, and the performance impact was negligible, but it could help for games with uber shaders. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:51 +02:00
Axel Davy	e57267a09e	drirc: Add Gallium nine workaround for Rayman Legends The game requires it to display many textures properly. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:51 +02:00
Axel Davy	c097ff3617	st/nine: Add drirc option to use data_internal for dynamic textures dynamic textures seem to have predictable stride. This stride should be the same as for a ram buffer. It seems some game don't check the actual stride value, assuming it to be the expected one. Thus this workaround (protected by drirc option) is to use an intermediate ram buffer. Fixes Rayman Legends texture issues when enabled. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:51 +02:00
Axel Davy	7dcc85b46e	st/nine: Support internal compressed format for volumes Reuse the generic path to support compressed formats. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:51 +02:00
Axel Davy	1b0a7d0557	st/nine: Support internal compressed format for surfaces Reuse the generic path to support compressed formats. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:51 +02:00
Axel Davy	22c41d2d81	st/nine: Refactor volume GetSystemMemPointer It will make it easier to reuse in another place. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:51 +02:00
Axel Davy	85c9d92067	st/nine: Refactor surface GetSystemMemPointer It will make it easier to reuse in another place. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:50 +02:00
Axel Davy	8ba4f73911	st/nine: rename _conversion to _internal Rename these variables to a new name which will fit new usages introduced in later commits. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:50 +02:00
Axel Davy	4a51a7c1da	st/nine: Optimize volume upload with conversion Use nine_context_box_upload instead of locking the pipe for volume upload with format conversion. nine_context_box_upload already handles format conversion. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:50 +02:00
Axel Davy	fac3f99377	st/nine: Optimize surface upload with conversion Use nine_context_box_upload instead of locking the pipe for surface upload with format conversion. nine_context_box_upload already handles format conversion. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:50 +02:00
Axel Davy	4ca6b1dfd1	st/nine: Fix SINCOS input SINCOS takes an input with replicated swizzle. the swizzle can be on any component, not just x. Enable it to read from any component, but also use a temporary register to avoid dst/src aliasing. No known game is fixed by this change as it seems the input swizzle is commonly on x for this instruction, and src and dst don't alias. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:50 +02:00
Axel Davy	f4ae483c44	st/nine: Ignore nooverwrite for systemmem Systemmem has a specific behaviour we don't mimick exactly. That makes Halo feel free to use nooverwrite with it all the time, even when reading again at the same location. Ignore nooverwrite to have proper synchronization. Fixes: https://github.com/iXit/Mesa-3D/issues/348 Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:50 +02:00
Axel Davy	fd3a870401	st/nine: Enable modifiers on ps 1.X texcoords For many ps 1.X instructions, we were reading the texcoords directly, instead of through tx_src_param, resulting in modifiers getting ignored. Use tx_src_param for all these instructions. Fixes: https://github.com/iXit/Mesa-3D/issues/337 Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:50 +02:00
Axel Davy	1fc0714039	st/nine: Always return OK on SetSoftwareVertexProcessing This would need more tests to know exactly if INVALIDCALL can be returned in some situations. It seems some games expect D3D_OK, even when noop and illegal. Fixes: https://github.com/iXit/Mesa-3D/issues/302 https://github.com/iXit/Mesa-3D/issues/338 Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:50 +02:00
Axel Davy	d9a4025fa3	st/nine: Finish if nooverwrite after normal mapping d3d's nooverwrite and gallium's unsynchronized have different semantics. Indeed nooverwrite says the applications won't write to locations needed by previous draws, which is less strong than unsynchronized which won't synchronize previous writes. Thus in case app is locking without discard/nooverwrite, then using nooverwrite, we need to add a synchronization. Fixes: https://github.com/iXit/wine-nine-standalone/issues/29 Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:50 +02:00
Axel Davy	e502c4d892	st/nine: Fix buffer/texture unbinding in nine_state_clear Previously nine_state_clear was not using NineBindBufferToDevice and NineBindTextureToDevice to unbind buffers and textures (but used nine_bind) This was resulting in an uncorrect bind count for these resources. Combined with `0ec4e5f630` Some buffers were scheduled to be uploaded directly after they were locked (because the bind count incorrectly assumed they were needed for the next draw call), which resulted in uploads before the data was written. To simplify a bit the code (and because I needed to add a pointer to device), remove the stateblock usage from nine_state_clear and rename to nine_device_state_clear. Fixes: https://github.com/iXit/Mesa-3D/issues/345 Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:50 +02:00
Axel Davy	bb3b8f8e01	st/nine: Upload managed buffers only at draw using them When a draw call is emited, buffers in the device->update_buffers list are uploaded. This patch removes buffers from the list if they are not bound anymore. Behaviour found studying: https://github.com/iXit/Mesa-3D/issues/345 Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:50 +02:00
Axel Davy	5df96995ef	st/nine: Upload managed textures only at draw using them When a draw call is emited, textures in the device->update_textures list are uploaded. This patch removes textures from the list if they are not bound anymore. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:50 +02:00
Axel Davy	394420ebb3	st/nine: Use FLT_MAX/2 for RCP clamping This seems to fix Rayman (which adds things to the RCP result, and thus gets an Inf), while not having regressions. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:49 +02:00
Axel Davy	64a45ba7f8	st/nine: Fix D3DWindowBuffer_release for old wine nine support No-one reported bugs for that, but is seems `c442dd7890` and previous commits used APIs not defined until nine minor version 3. This patch should prevent crash in this case. Also turn off the resize feature in this case, as we won't prevent a buffer leak anymore. Cc: "19.0" mesa-stable@lists.freedesktop.org Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-04-30 19:18:49 +02:00
Eric Engestrom	0cff98c8a0	turnip: update to use the new features struct names These were updated in version 1.1.106 of vulkan.h to make more sense with the extension names. We may as well keep with the times. See also: `90108deb27` "anv: Update to use the new features struct names" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-04-30 16:55:38 +01:00
Eric Engestrom	941b2f4dcd	radv: update to use the new features struct names These were updated in version 1.1.106 of vulkan.h to make more sense with the extension names. We may as well keep with the times. See also: `90108deb27` "anv: Update to use the new features struct names" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-30 16:55:18 +01:00
Eric Engestrom	b80930a6fe	anv: add support for VK_EXT_memory_budget Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-30 15:40:33 +00:00
Eric Engestrom	316964709e	util: add os_read_file() helper readN() taken from igt. os_read_file() inspired by igt_sysfs_get() Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-30 15:40:33 +00:00
Rafael Antognolli	2fae99bcbd	iris: Enable fast clear colors on gen11. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-30 08:31:44 -07:00
Rafael Antognolli	cf3cadacdf	iris: Update the surface state clear color address when available. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-30 08:31:44 -07:00
Rafael Antognolli	91bcbfc351	iris: Use the linear version of the surface format during fast clears. Newer gens (> 9) will start doing the linear -> sRGB conversion of the clear color for us, if we use a sRGB surface format. So let's make sure that doesn't happen and keep the same semantics as before. Even though the hardware could convert the clear color for us during fast clear, that converted color is only used for sampling. For resolve, the original color would be used (without the conversion). So we convert it ourselves and the same converted color gets used for both sampling and resolving, simplifying the whole logic. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-30 08:31:44 -07:00
Rafael Antognolli	56927a8cf5	iris: Support sRGB fast clears even if the colorspaces differ. We were disabling fast clears if the view format had a different colorspace than the resource format (sRGB vs linear or vice-versa). But we actually support them if we use the view format to decide if we should encode the clear color into sRGB colorspace. Also add a missing linear -> sRGB surface format conversion (we don't want the clear color to be encoded to sRGB again during resolve). v2: Do not track sRGB colorspace during fast clears (Nanley). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-30 08:31:44 -07:00
Eric Engestrom	abb2c7c9d3	egl: fixup autotools-specific wording Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-04-30 15:25:40 +00:00
Eric Engestrom	fe73c74691	docs: haiku can be built using meson Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-04-30 15:25:40 +00:00
Eric Engestrom	88ed5f611d	docs: use past tense when talking about autotools Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-04-30 15:25:40 +00:00
Eric Engestrom	46d6883a13	docs: replace autotools intructions with meson equivalent Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-04-30 15:25:40 +00:00
Eric Engestrom	1936bad9ec	docs: drop autotools python information Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-04-30 15:25:40 +00:00
Eric Engestrom	8c7b8fcd0c	docs: remove unsupported GL function name mangling This was only supported in autotools, which has since been deleted. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-04-30 15:25:40 +00:00
Ian Romanick	bfc6486819	mesa: Add missing display list support for GL_FOG_COORDINATE_SOURCE Fixes: `fe5d67d95f` ("Implement EXT_fog_coord and EXT_secondary_color.") Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Cc: Brian Paul <brianp@vmware.com>	2019-04-30 07:52:59 -07:00
Alejandro Piñeiro	9b6a00e66e	docs: document MESA_GLSL=errors keyword Added with commit `0161691f35`, still checked on shaderapi.c _mesa_get_shader_flag method. Fixes: `0161691f35` "mesa: add GLSL_REPORT_ERRORS debug flag" Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-04-30 15:45:33 +01:00
Khem Raj	da84d071a6	winsys/svga/drm: Include sys/types.h vmw_screen.h uses dev_t which is defines in sys/types.h this header is required to be included for getting dev_t definition. This issue happens on musl C library, it is hidden on glibc since sys/types.h is included through another system headers Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-04-30 13:50:25 +01:00
Ross Burton	1c1efa4ca9	Revert "meson: drop GLESv1 .so version back to 1.0.0" This patch claimed that the autotools build generates libGLESv1_CM.so.1.0.0, but it doesn't: es1api_libGLESv1_CM_la_LDFLAGS = \ -no-undefined \ -version-number 1:1 \ $(GC_SECTIONS) \ $(LD_NO_UNDEFINED) Revert commit `cc15460e18` to ensure that the autotools and meson builds produce the same libraries. Fixes: `cc15460e18` "meson: drop GLESv1 .so version back to 1.0.0" Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-04-30 13:49:20 +01:00
Juan A. Suarez Romero	8d621e8ff7	anv: enable descriptor indexing capabilities This enables the remaining capabilities in SPV_EXT_descriptor_indexing. Fixes: `6e230d7607` "anv: Implement VK_EXT_descriptor_indexing" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-30 09:23:46 +02:00
Juan A. Suarez Romero	06c9d7f9f9	radv: enable descriptor indexing capabilities This enables the remaining capabilities in SPV_EXT_descriptor_indexing. Fixes: `0e10790558` "radv: Enable VK_EXT_descriptor_indexing." Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-30 09:23:23 +02:00
Juan A. Suarez Romero	bbbe00a101	spirv: add missing SPV_EXT_descriptor_indexing capabilities Add ShaderNonUniformEXT, UniformBufferArrayNonUniformIndexingEXT, SampledImageArrayNonUniformIndexingEXT, StorageBufferArrayNonUniformIndexingEXT, StorageImageArrayNonUniformIndexingEXT, InputAttachmentArrayNonUniformIndexingEXT, UniformTexelBufferArrayNonUniformIndexingEXT and StorageTexelBufferArrayNonUniformIndexingEXT capabilities. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-30 09:22:45 +02:00
Caio Marcelo de Oliveira Filho	1fb6630636	spirv: Properly handle SpvOpAtomicCompareExchangeWeak The code was handling the Weak variant in some cases, but missing others, e.g. the get_deref_nir_atomic_op. Add all the missing cases with the same behavior of the non-Weak SpvOpAtomicCompareExchange. Note that the Weak variant is basically an alias, as SPIR-V 1.3, Revision 7 says "OpAtomicCompareExchangeWeak Deprecated (use OpAtomicCompareExchange). Has the same semantics as OpAtomicCompareExchange." Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-29 19:02:44 -07:00
Tomeu Vizoso	cc6bbf6397	panfrost/ci: Initial commit These files implement running almost all of deqp-gles2 on Chomebooks of the rk3399-gru-kevin type in Collabora's LAVA lab. The approach follows what is currently being used for virglrenderer, but scheduling the actual test jobs via LAVA. We start by building a container in Docker that contains a suitable rootfs and kernel for the DUT, deqp and all dependencies for building Mesa itself. The Mesa is built and the rootfs, deqp and Mesa are combined in a cpio ramdisk. A LAVA job is generated, submitted to LAVA and the results are processed by simply comparing them to the expectations that are stored in git. Any code that changes the expectations (hopefully tests are fixed) needs to also update the expectations file. The next step is adding support for other devices, possibly in other LAVA labs. In order to use this, the repository has to be configured to run the gitlab-ci.yaml file from the panfrost/ci dir, and a LAVA token needs to be setup. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-30 01:22:43 +00:00
Rafael Antognolli	b15f5cfd20	iris: Do not advertise multisampled image load/store. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-29 17:04:04 -07:00
Rob Clark	9cb8037e54	freedreno/a6xx: pre-bake UBWC flags in texture-view Small cleanup. No need to defer this to emit time. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-29 17:01:01 -07:00
Rob Clark	8506ebfb95	freedreno/a6xx: small texture emit cleanup Prep work for fb_read (blend_equation_advanced) Switch to using 'enum pipe_shader_type' everywhere, and (optional, in non-cache / slowpath case) pass ctx instead of image/ssbo state. In the fb_read case we also need to access the framebuffer state, so having the ctx simplifies things. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-29 17:01:01 -07:00
Rob Clark	da327afb2a	freedreno/ir3: switch fragcoord to sysval Because who are we kidding... it is a sysval. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-29 17:01:01 -07:00
Plamena Manolova	11518384c4	i965: Re-enable fast color clears for GEN11. This patch re-enables fast color clears for GEN11. It also ensures that we use linear color formats for sRGB surfaces during fast clears. Signed-off-by: Plamena Manolova <plamena.n.manolova@gmail.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-04-29 21:19:59 +00:00
Rafael Antognolli	9175c7058e	intel/blorp: Make blorp update the clear color in gen11. Hardware docs say that Gen11 requires the use of two MI_ATOMICs of size QWORD when updating the clear color. The second MI_ATOMIC also needs CS Stall and Return Data Control set. v2: Remove include of srgb header (Lionel) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-29 21:19:59 +00:00
Rafael Antognolli	f8c3f408a6	intel/genxml: Update MI_ATOMIC genxml definition. Change some of the single bit fields to booleans, and add an enum with the definition of the ATOMIC_OPCODE. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-29 21:19:59 +00:00
Jordan Justen	38ffd7ce79	intel/genxml: Support base-16 in value & start fields in gen_sort_tags.py With python's int(), if the optional second parameter is 0, then python will support the 0x prefix for hex numbers. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-29 21:19:58 +00:00
Plamena Manolova	232c0f6489	isl: Set ClearColorConversionEnable. The ClearColorConversionEnable bit needs to be set for GEN11 when inderect clear colors are used. Signed-off-by: Plamena Manolova <plamena.n.manolova@gmail.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2019-04-29 21:19:58 +00:00
Eric Engestrom	1587586182	delete autotools input files Leftovers from when autotools was deleted. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-04-29 21:17:19 +00:00
Eric Engestrom	7ca8ba199f	delete autotools .gitignore files One special case, `src/util/xmlpool/.gitignore` is not entirely deleted, as `xmlpool.pot` still gets generated (eg. by `ninja xmlpool-pot`). Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-04-29 21:17:19 +00:00
Kenneth Graunke	f3bdffc33d	iris: Only enable GL_AMD_depth_clamp_separate on Gen9+ The hardware feature is new as of Gen9+. I accidentally enabled it on Gen8.	2019-04-29 13:25:12 -07:00
Kenneth Graunke	dcfca0af7c	iris: Set XY Clipping correctly. I was setting it based off a pipe_rasterizer_state field that appears to be entirely dead outside of the draw module respecting it. I should be setting it when the primitive type reaching the SF is neither points nor lines. This is, unfortunately, rather dirty, as we have to look at the rasterizer state, the geometry shader state, the tessellation evaluation shader state, and the primitive type...	2019-04-29 10:53:23 -07:00
Rhys Perry	bd4c661ad0	ac,ac/nir: use a better sync scope for shared atomics https://reviews.llvm.org/rL356946 (present in LLVM 9 and later) changed the meaning of the "system" sync scope, making it no longer restricted to the memory operation's address space. So a single address space sync scope is needed for shared atomic operations (such as "system-one-as" or "workgroup-one-as") otherwise buffer_wbinvl1 and s_waitcnt instructions can be created at each shared atomic operation. This mostly reimplements LLVMBuildAtomicRMW and LLVMBuildAtomicCmpXchg to allow for more sync scopes and uses the new functions in ac->nir with the "workgroup-one-as" or "workgroup" sync scopes. F1 2017 (4K, Ultra High settings, TAA), avg FPS : 59 -> 59.67 (+1.14%) Strange Brigade (4K, ~highest settings), avg FPS : 51.5 -> 51.6 (+0.19%) RotTR/mountain (4K, VeryHigh settings, FXAA), avg FPS : 57.2 -> 57.2 (+0.0%) RotTR/tomb (4K, VeryHigh settings, FXAA), avg FPS : 42.5 -> 43.0 (+1.17%) RotTR/valley (4K, VeryHigh settings, FXAA), avg FPS : 40.7 -> 41.6 (+2.21%) Warhammer II/fallen, avg FPS : 31.63 -> 31.83 (+0.63%) Warhammer II/skaven, avg FPS : 37.77 -> 38.07 (+0.79%) Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-29 18:20:44 +01:00
Hal Gentz	e91ee763c3	glx: Fix synthetic error generation in __glXSendError To quote Uli Schlachter, who understands this stuff more than I do: > The function __glXSendError() in mesa's src/glx/glx_error.c invents an X11 > protocol error out of thin air. For the sequence number it uses dpy->request. > This is the sequence number of the last request that was sent. _XError() will > then update dpy->last_request_read based on the sequence number of the error > that just "came in". > > If now another something comes in with a sequence number less than > dpy->last_request_read, since sequence numbers are monotonically increasing, > widen() will incorrectly add 1<<32 to the sequence number and things might go > downhill afterwards. `__glXSendErrorForXcb` was also patched, as that's the function that `glXCreateContextAttribsARB` actually uses. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99781 Cc: mesa-stable@lists.freedesktop.org Fixes: `ad503c41` 'apple: Initial import of libGL for OSX from AppleSGLX svn repository' Reviewed-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Hal Gentz <zegentzy@protonmail.com>	2019-04-29 12:52:48 -04:00
Lionel Landwerlin	9628631a38	Revert "anv: limit URB reconfigurations when using blorp" In commit 0d46e404 ("anv: limit URB reconfigurations when using blorp") we tried to limit the number of URB reconfiguration by checking if the last allocation is large enough to fit the blorp dispatch. We used the last bound pipeline to compare the allocation. The problem with this is that the pipeline is bound but its commands might not have been emitted into the command buffer yet. Let's just revert commit `0d46e40467` since it didn't seem to yield any performance improvement. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: 0d46e404 ("anv: limit URB reconfigurations when using blorp") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110535 Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-29 11:41:27 +00:00
Erik Faye-Lund	cc5b8a938a	mesa/st: remove always-false state This code is essentially dead now. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-29 10:28:38 +00:00
Erik Faye-Lund	be110ba2e4	mesa/st: accept NULL and empty buffer objects It's prefectly legal and well-defined to render using a non-existing or empty buffer object. The data coming out of the buffer object isn't well defined unless we have the robustness flag set on the context, but that's a different matter, and up to the shader hardware; it's the same as out-of-bounds reads. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-29 10:28:38 +00:00
Erik Faye-Lund	ef13691e0c	swr: support NULL-resources It's legal for a buffer-object to have a NULL-resource, but let's just skip over it, as there's nothing to do. This patch switches the order of the conditionals in swr_update_derived, so the logic becomes a bit more straight forward: if (is_user_buffer) ... else if (resource) ... else ... ...instead of this: if (!is_user_buffer) if (resource) ... else ... else ... Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-04-29 10:28:38 +00:00
Erik Faye-Lund	04b0c6e9df	nouveau: support NULL-resources It's legal for a buffer-object to have a NULL-resource, but let's just skip over it, as there's nothing to do. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Acked-by: Karol Herbst <kherbst@redhat.com>	2019-04-29 10:28:38 +00:00
Erik Faye-Lund	a11945d179	i915: support NULL-resources It's legal for a buffer-object to have a NULL-resource, but let's just skip over it, as there's nothing to do. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-04-29 10:28:38 +00:00
Erik Faye-Lund	a8e8204b18	gallium/u_vbuf: support NULL-resources It's legal for a buffer-object to have a NULL-resource, but let's just skip over it, as there's nothing to do. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-29 10:28:38 +00:00
Erik Faye-Lund	0607ceb655	mesa/st: remove impossible error-check st_setup_current never sets this flag, and it's already checked against right before. So let's remove this pointless check. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-29 10:28:38 +00:00
Andres Gomez	c81fbb42d9	glsl/linker: check for xfb_offset aliasing From page 76 (page 80 of the PDF) of the GLSL 4.60 v.5 spec: " No aliasing in output buffers is allowed: It is a compile-time or link-time error to specify variables with overlapping transform feedback offsets." Currently, this is expected to fail, but it succeeds: " ... layout (xfb_offset = 0) out vec2 a; layout (xfb_offset = 0) out vec4 b; ... " Fixes the following piglit test: tests/spec/arb_enhanced_layouts/compiler/transform-feedback-layout-qualifiers/xfb_offset/invalid-overlap.vert Fixes the following test: KHR-GL44.enhanced_layouts.xfb_output_overlapping v2: - Use a data structure to track the used components instead of a nested loop (Ilia). v3: - Take the BITSET_WORD array out from the gl_transform_feedback_buffer struct and make it local to the validation process (Timothy). - Do not use a nested scope for the validation (Timothy). v4: - Add reference to the fixed piglit test in the commit log. - Add reference to the fixed VK-GL-CTS test in the commit log (Tapani). - Empty initialize the BITSET_WORD pointers array (Tapani). Cc: Timothy Arceri <tarceri@itsqueeze.com> Cc: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-04-29 12:13:29 +02:00
Patrick Lerda	812288bf0f	lima/ppir: fix pointer referenced after a free Issue detected by valgrind. Fixes: `92d7ca4b1c` ("gallium: add lima driver") Signed-off-by: Patrick Lerda <patrick9876@free.fr> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-04-29 10:43:51 +02:00
Eleni Maria Stea	bb953de96c	radv: consider MESA_VK_VERSION_OVERRIDE when setting the api version Before setting the physical device API version, we should check if the MESA_VK_VERSION_OVERRIDE environment variable is set and take it into account. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-29 09:00:51 +02:00
Kenneth Graunke	9dcf90d7ba	intel/fs: Don't emit empty ELSE blocks. While we can clean this up later, it's trivial to not generate the stupid code in the first place, which saves some optimization work. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-28 22:36:09 -07:00
Kenneth Graunke	2b44b27dbe	nir: Add a new nir_cf_list_is_empty_block() helper. Helper and name suggested by Eric Anholt. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-28 22:36:08 -07:00
Kenneth Graunke	08dc93c67c	glsl/list: Add an exec_list_is_singular() helper. Similar to list_is_singular() in util/list.h. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-28 22:35:42 -07:00
Tapani Pälli	376c3e8f87	anv: expose VK_EXT_queue_family_foreign on Android VK_ANDROID_external_memory_android_hardware_buffer requires this extension. It is safe to enable it since currently aux usage is disabled for ahw buffers. Fixes following dEQP extension dependency test on Android: dEQP-VK.api.info.device#extensions Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-29 07:31:02 +03:00
Andreas Baierl	c960323a81	lima/ppir: Add gl_FragCoord handling Treat gl_FragCoord variable as a system value and lower the w component with a nir pass. Add the necessary bits for correct codegen. Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-04-29 02:46:44 +00:00
Andreas Baierl	b82de2b4d7	nir: add rcp(w) lowering for gl_FragCoord On some hardware (e.g. Mali400) the shader needs to apply some transformations for correct gl_FragCoord handling. The lowering actions look like the following in pseudocode: gl_FragCoord.xyz = gl_FragCoord_orig.xyz gl_FragCoord.w = 1.0 / gl_FragCoord_orig.w Add this lowering as a nir pass in preparation for using it in the driver. Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-29 02:46:44 +00:00
Romain Failliot	7050eccd77	docs: changed "Done" to "DONE" in features.txt Mesamatrix.net expects uppercase. Acked-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-29 09:32:01 +10:00
Alyssa Rosenzweig	ec65e1b763	panfrost: Workaround -bshadow regression I have no idea what's happening here, but let's not regress an app that used to work in the mean time while we're figuring it out.. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-28 21:39:20 +00:00
Alyssa Rosenzweig	3978614d88	panfrost/midgard: Safety check immediate precision degradations Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-28 21:34:32 +00:00
Alyssa Rosenzweig	0ebf1047a4	panfrost: Use fp32 (not fp16) varyings In a perfect world, we'd use fp16 varyings for mediump and fp32 for highp, allowing us to get a performance win without sacrificing conformance. Unfortunately, we're not there (yet), so it's better we assume always fp32 than always fp16 to avoid artefacts / breaking a lot of deqp. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-28 21:34:32 +00:00
Alyssa Rosenzweig	a81267f228	panfrost/midgard: imov workaround Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-28 21:34:32 +00:00
Alyssa Rosenzweig	53d6e11393	panfrost/midgard: Fix tex propogation Unbreaks mpv. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-28 21:34:32 +00:00
Alyssa Rosenzweig	68a1508dc9	panfrost/midgard: Fix regressions in -bjellyfish Two fixes here, one is that we tried to copyprop non-strictly-SSA values which was bound to fly in our face. The other was peeling back the imov workaround.. Turns out we still need that. More research is needed still, but let's not regress real apps. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-28 21:34:32 +00:00
Alyssa Rosenzweig	bdaa23b32b	panfrost/midgard: Only copyprop without an outmod With an outmod, we would need to propagate that through, which is for future work. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-28 21:34:32 +00:00
Alyssa Rosenzweig	a3d6a3dfc4	Revert "panfrost/midgard: Extend copy propagation pass" Fixes: commit `b53b4573c3`. Optimization gone wrong. In the future, we should try this again (it's a net win if implemented right), but at the moment this just regresses. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-28 21:34:32 +00:00
Samuel Pitoiset	07745f9494	radv: add missing VEGA20 chip in radv_get_device_name() Otherwise it returns "AMD RADV unknown". Cc: 19.0 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-27 12:16:23 +02:00
Kenneth Graunke	6bd4cb920e	iris: Fix zeroing of transform feedback offsets in strange cases. Some of the dEQP.functional.transform_feedback tests end up doing the following sequence of operations: 1. BeginTransformFeedback 2. PauseTransformFeedback 3. Draw 4. ResumeTransformFeedback At step 1, we'd pack 3DSTATE_SO_BUFFER commands saying to zero the SO_WRITE_OFFSET registers. At step 2, we disable streamout, so step 3 doesn't bother emitting those commands. Then, step 4 re-packs new 3DSTATE_SO_BUFFER commands with offset = 0xFFFFFFFF, saying to continue appending at the existing offset. This loads the value from the BO as the offsets - but we never actually zeroed it. So, just maintain a flag saying "we actually emitted the commands", and stomp offset back to zero until we emit some.	2019-04-27 01:07:14 -07:00
Eric Anholt	edb04953c8	vc4: Fall back to renderonly if the vc4 driver doesn't have v3d. I have a platform with vc4 display but V3D 4.x. We can fall back on kmsro's probing to bring up the v3d gallium driver. Acked-by: Rob Clark <robdclark@chromium.org>	2019-04-26 15:02:03 -07:00
Eric Anholt	7e069832a0	kmsro: Add support for V3D. Like vc4, we expect to have SOCs with various displays that have a single V3D instance for rendering. v2: Add v3d to the list of drivers that make enabling kmsro valid. Acked-by: Rob Clark <robdclark@chromium.org>	2019-04-26 14:59:32 -07:00
Marek Olšák	a8a0e5c03c	radeonsi: don't ignore PIPE_FLUSH_ASYNC Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2019-04-26 15:44:39 -04:00
Eric Anholt	fb0611df3d	v3d: Fix detection of TMU write sequences in register spilling. We can't use the QPU functions to detect this until register allocation is done and we've moved inst->dst into inst->qpu. Fixes bad TMU sequences from register spilling in KHR-GLES31.core.compute_shader.shared-max.	2019-04-26 12:42:30 -07:00
Eric Anholt	18894a5e5a	v3d: Fix detection of the last ldtmu before a new TMU op. We were looking at the start instruction, instead of scanning through the list of following instructions to find any more ldtmus.	2019-04-26 12:42:30 -07:00
Eric Anholt	575caab895	v3d: Re-add support for memory_barrier_shared. Looks like I lost it in a rebase conflict resolution. We'd hit the unknown intrinsic assertion in KHR-GLES31.core.compute_shader.shared-struct. Fixes: `6b1c659825` ("v3d: Add Compute Shader compilation support.")	2019-04-26 12:42:30 -07:00
Eric Anholt	971a13d805	Revert "v3d: Disable PIPE_CAP_BLIT_BASED_TEXTURE_TRANSFER." This reverts commit `ccce940947`, leaving a note as to why we had to (corruption in chromium, breaking some GLES3.1 tests).	2019-04-26 12:42:30 -07:00
Eric Anholt	49071b2e3f	v3d: Don't try to update the shadow texture for separate stencil. There are two cases where v3d's sampler view's resource doesn't match the base's: shadow textures for sampling from raster, and pointing at the separate depth texture for z32f_s8x24. We only want to update shadow for the first case. Fixes dEQP-GLES31.functional.stencil_texturing.render.depth32f_stencil8_draw when run after the previous testcase.	2019-04-26 12:42:30 -07:00
Eric Anholt	4358904c06	v3d: Add a note about i/o indirection for future performance work.	2019-04-26 12:42:30 -07:00
Eric Anholt	c74d0e7f62	vc4: Use _mesa_hash_table_remove_key() where appropriate.	2019-04-26 12:42:30 -07:00
Eric Anholt	d8486c2ad7	v3d: Use _mesa_hash_table_remove_key() where appropriate.	2019-04-26 12:42:30 -07:00
Eric Anholt	24587ae8ae	v3d: Assert that we do request the normal texturing return data. An unused tex should be DCEed, but if it wasn't we'd run into trouble with not doing a TMUWT.	2019-04-26 12:42:30 -07:00
Eric Anholt	42210a4351	v3d: Apply the GFXH-930 workaround to the case where the VS loads attrs. We were emitting a dummy load for when the VS doesn't load any attributes, but we also need to emit a dummy load for when the render VS loads attributes but the binner VS doesn't. Fixes simulator assertion failures and GPU hangs on KHR-GLES31.core.texture_gather.\*	2019-04-26 12:42:30 -07:00
Eric Anholt	448fc3ea42	v3d: Fill in the ignored segment size fields to appease new simulator. We are assured that the input segment size field is ignored for !separate_segs mode, and now the simulator wants an in-range value set regardless of whether it's functionally ignored or not.	2019-04-26 12:40:31 -07:00
Tapani Pälli	af06963d24	glsl: use empty brace initializer fixes following warning with clang: warning: suggest braces around initialization of subobject Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-04-26 12:24:41 -07:00
coypu	976004d0e7	gbm: don't return void Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-04-26 12:04:26 -07:00
Tapani Pälli	7a7f182dac	nir: use braces around subobject in initializer Used same syntax as elsewhere with Mesa sources, verified result against MSVC with godbolt.org. fixes following warning with clang: warning: suggest braces around initialization of subobject v2: empty braces -> braces around subobject (Caio, Kristian) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-04-26 12:01:22 -07:00
Kristian H. Kristensen	a7c70bb2a1	freedreno/drm: Quiet pointer to u64 conversion warning	2019-04-26 11:58:44 -07:00
Alok Hota	8bfb34fd0a	swr/rast: enforce use of tile offsets Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-04-26 13:00:45 -05:00
Alok Hota	0e49963212	swr/rast: AVX512 support compiled in by default - Emulation of AVX512 built into SIMDLIB - Remove associated macros - Remove knobs controlling AVX512 and let emulation handle it - Refactor variable names for SIMD16 Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-04-26 13:00:38 -05:00
Alok Hota	0bf1df2bb6	swr/rast: Remove deprecated 4x2 backend code - Use 8x2 tiling by default - Remove associated macros - Use SIMDLIB emulation for SIMD16 on SIMD8 hardware - Remove code rot in Load/StoreTile Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-04-26 13:00:24 -05:00
Tomasz Figa	e8bf4efceb	llvmpipe: Always return some fence in flush (v2) If there is no last fence, due to no rendering happening yet, just create a new signaled fence and return it, to match the expectations of the EGL sync fence API. Fixes random "Could not create sync fence 0x3003" assertion failures from Skia on Android, coming from the following code: https://android.googlesource.com/platform/frameworks/base/+/master/libs/hwui/pipeline/skia/SkiaOpenGLPipeline.cpp#427 Reproducible especially with thread count >= 4. One could make the driver always keep the reference to the last fence, but: - the driver seems to explicitly destroy the fence whenever a rendering pass completes and changing that would require a significant functional change to the code. (Specifically, in lp_scene_end_rasterization().) - it still wouldn't solve the problem of an EGL sync fence being created and waited on without any rendering happening at all, which is also likely to happen with Android code pointed to in the commit. Therefore, the simple approach of always creating a fence is taken, similarly to other drivers, such as radeonsi. Tested with piglit llvmpipe suite with no regressions and following tests fixed: egl_khr_fence_sync conformance eglclientwaitsynckhr_flag_sync_flush eglclientwaitsynckhr_nonzero_timeout eglclientwaitsynckhr_zero_timeout eglcreatesynckhr_default_attributes eglgetsyncattribkhr_invalid_attrib eglgetsyncattribkhr_sync_status v2: - remove the useless lp_fence_reference() dance (Nicolai), - explain why creating the dummy fence is the right approach. Signed-off-by: Tomasz Figa <tfiga@chromium.org> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-04-26 11:26:33 +01:00
Emil Velikov	591955d82d	llvmpipe: correctly handle waiting in llvmpipe_fence_finish Currently if the timeout differs from 0, we'll end up with infinite wait... even if the user is perfectly clear they don't want that. Use the new lp_fence_timedwait() helper guarding both waits in an !lp_fence_signalled block like the rest of llvmpipe. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-04-26 11:26:33 +01:00
Emil Velikov	5b284fe6bc	llvmpipe: add lp_fence_timedwait() helper The function is analogous to lp_fence_wait() while taking at timeout (ns) parameter, as needed for EGL fence/sync. v2: - use absolute UTC time, as per spec (Gustaw) - bail out on cnd_timedwait() failure (Gustaw) v3: - check count/rank under mutex (Gustaw) Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> (v1) Reviewed-by: Gustaw Smolarczyk <wielkiegie@gmail.com>	2019-04-26 11:26:33 +01:00
Emil Velikov	bd0c4e360d	vulkan/wsi: don't use DUMB_CLOSE for normal GEM handles Currently we get normal GEM handles from PrimeFDToHandle, yet we close then with DUMB_CLOSE. Use GEM_CLOSE instead. Fixes: `da997ebec9` ("vulkan: Add KHR_display extension using DRM [v10]") Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: Keith Packard <keithp@keithp.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2019-04-26 11:26:33 +01:00
Emil Velikov	c962a78f18	vulkan/wsi: check if the display_fd given is master As effectively required by the extension, we need to ensure we're master Currently drivers employ vendor specific solutions, which check if the device behind the fd is capable, yet none of them do the master check. In the radv case, if acceleration is available. Instead of duplicating the check in each driver, keep it where it's needed and used. Note this copies libdrm's drmIsMaster() to avoid depending on bleeding edge version of the library. v2: set the fd to -1 if not master (Bas) Fixes: `da997ebec9` ("vulkan: Add KHR_display extension using DRM [v10]") Cc: Andres Rodriguez <andresx7@gmail.com> Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: Keith Packard <keithp@keithp.com> Reported-by: Andres Rodriguez <andresx7@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2019-04-26 11:26:33 +01:00
Emil Velikov	1a9367c134	turnip: drop dead close(master_fd) The fd is -1, thus the block of if (fd != -1) close(fd) is dead code. Cc: Chad Versace <chadversary@chromium.org> Reviewed-by: Chia-I Wu <olvaffe@gmail.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2019-04-26 11:26:33 +01:00
Jason Ekstrand	00d4e78ea9	nir/algebraic: Optimize integer cast-of-cast These have been popping up more and more with the OpenCL work and other bits causing extra conversions to/from 64-bit. Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-04-26 04:26:08 -05:00
Jason Ekstrand	934f178341	anv/descriptor_set: Don't fully destroy sets in pool destroy/reset In `105002bd2d`, we fixed a memory leak bug where we weren't properly destroying descriptor when destroying/resetting a descriptor pool. However, the only real leak that happened was that we we take a reference to the descriptor set layout in the descriptor set and we weren't dropping our reference. Everything else in the descriptor set is tied to the pool itself and doesn't need to be freed on a per-set basis. This commit changes the destroy/reset functions to only bother walking the list of sets to unref the layouts and otherwise we just assume that the whole-pool destroy/reset takes care of the rest. Now that we're doing more non-trivial things with descriptor sets such as allocating things with util_vma_heap, per-set destruction is starting to show up on perf traces. This takes reset back to where it's supposed to be as a cheap whole-pool operation. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-26 05:40:28 +00:00
Jason Ekstrand	baf4802e3e	anv: Better handle 32-byte alignment of descriptor set buffers In `c520f4dec9`, we chose to align the sizes of descriptor set buffers to 32 bytes. We have to align the descriptor set buffer to 32B so that it's valid for using with push constants. We align the size as well so we don't leave lots of holes with util_vma_heap_alloc. Unfortunately, we were only aligning it for alloc and not for free so we were still creating piles of holes when we delete descriptor sets. This causes terrible perf for the allocator once we've deleted piles of descriptor sets. This commit reworks the code so that we align the descriptor set buffer size to 32B for both alloc and free. The result is that it takes the new crucible vkResetDescriptorPool from 104.567719 to 2.898354 seconds. Fixes: `c520f4dec9` "anv: Add a concept of a descriptor buffer" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110497 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-26 05:40:28 +00:00
Dave Airlie	d946cbe9f5	nir: fix bit_size in lower indirect derefs. This fixes a case where we are expecting 64-bit but generate 32-bit consts and validate gets angry. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Dave Airlie <airlied@redhat.com>	2019-04-26 12:59:43 +10:00
Kenneth Graunke	529ace7887	iris: Silence unused function warning	2019-04-25 17:33:56 -07:00
Marek Olšák	c5f65bfe6c	glsl: fix shader_storage_blocks_write_access for SSBO block arrays (v2) This fixes KHR-GL45.compute_shader.resources-max on radeonsi. Fixes: `4e1e8f684b` "glsl: remember which SSBOs are not read-only and pass it to gallium" v2: use is_interface_array, protect again assertion failures in u_bit_consecutive Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-04-25 18:57:38 -04:00
Rob Clark	a6ab27dcab	docs/features: update GL too Forgot to update corresponding entries for desktop GL.. kinda wish we didn't have to update both GLES and GL tables. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-25 15:48:19 -07:00
Rob Clark	7a57cfbed6	freedreno/a6xx: sample-shading support Enables: OES_sample_shading OES_sample_variables OES_shader_multisample_interpolation Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-25 14:13:31 -07:00
Rob Clark	ee2e3a07bb	freedreno/ir3: sample-shading support The compiler support for: OES_sample_shading OES_sample_variables OES_shader_multisample_interpolation Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-25 14:13:31 -07:00
Rob Clark	85949c52b4	freedreno: wire up core sample-shading support Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-25 14:13:31 -07:00
Rob Clark	c8e825aaac	freedreno/ir3: fix load_interpolated_input slot The so->inputs[] table is in units of vec4 Fixes: `7ff6705b8d` freedreno/ir3: convert to "new style" frag inputs Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-25 14:13:31 -07:00
Rob Clark	49f922d96c	freedreno/a6xx: add VALIDREG/CONDREG helper macros There are a few places that we check if a shader stage input reg is used/valid (ie. not r63.x).. and there are about to be a bunch more. So add some helper macros for less open-coding. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-25 14:13:31 -07:00
Rob Clark	f4b4d6cf23	freedreno/ir3: rename frag_vcoord -> ij_pixel Since this is what the value actually is. Cleanup the name before adding more different i,j related values for sample-shading. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-25 14:13:31 -07:00
Rob Clark	5be415fc2b	freedreno/ir3: remove bogus assert tex instruction can actually return 16b values. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-25 14:13:31 -07:00
Rob Clark	2f0b9d2249	freedreno/ir3: lower load_barycentric_at_offset Calculates i,j at specified offset within a pixel. A new load_size_ir3 intrinsic is used in conjunction with fddx/fddy to translate the offset into primitive space and adjust the i,j from load_barycentric_pixel accordingly. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-25 14:13:31 -07:00
Rob Clark	c4f423aa36	freedreno/ir3: lower load_barycentric_at_sample This lowers load_barycentric_at_sample to load_sample_pos_from_id plus load_barycentric_at_offset. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-25 14:13:31 -07:00
Rob Clark	4e3ce224a7	freedreno: update generated headers Pull in updates for sample shading. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-25 14:13:31 -07:00
Rob Clark	6d6ec2d4d2	freedreno/ir3: cleanup instruction builder macros De-duplicate the "normal" and "flags" versions of the macros, and while at it go ahead and add "flags" versions for all the remaining macros, since we'll at least need INSTR1F in a following commit. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-25 14:13:31 -07:00
Rob Clark	77b3b96a3b	freedreno/ir3: more emit-cat5 fixes Couple more opcodes which don't take a sampler id as first arg. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-25 14:13:31 -07:00
Rob Clark	9032f0690c	freedreno/ir3: fix rgetpos decoding It takes an argument. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-25 14:13:31 -07:00
Rob Clark	4d08c1b595	compiler: rename SYSTEM_VALUE_VARYING_COORD And add corresponding enums for different sorts of varying interpolation. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-25 14:13:31 -07:00
Rob Clark	96d2e4ab8a	freedreno: add robustness support Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-25 14:13:31 -07:00
Rob Clark	6503918689	freedreno/drm: update for robustness Update UABI header and add FD_PP_PGTABLE and FD_NR_FAULTS params. Robustness can be supported by a kernel which provides the new ABI if it also indicates that per-process pagetables are in use. Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-25 14:13:07 -07:00
Alyssa Rosenzweig	77d091d0c5	panfrost/midgard: Add new bitwise ops These fused NOT-ops could maybe help somehow...? Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-25 20:37:46 +00:00
Alyssa Rosenzweig	bcabcfe3ad	panfrost/midgard: Identify inand This was previously thought to be inot, but it's actually a bit more general than that! :) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-25 20:37:45 +00:00
Alyssa Rosenzweig	5f942db190	panfrost/midgard: Copy prop for texture registers We'll want to unify this with main copy prop (and extend to varyings), but that'll take more care to handle some special cases, so leave it as a stub pass for now. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-25 20:37:45 +00:00
Alyssa Rosenzweig	4d821a1101	panfrost/midgard: Optimize csel involving 0 Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-25 20:37:45 +00:00
Alyssa Rosenzweig	b53b4573c3	panfrost/midgard: Extend copy propagation pass This extends copy propagation to respect output modifiers for ALU instructions, as well as potentially fixing some bugs related to looping (all dEQP loop tests pass). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-25 20:37:45 +00:00
Alyssa Rosenzweig	7bc91b487b	panfrost/midgard: Reduce fmax(a, 0.0) to fmov.pos This will allow us to copyprop away the move and eliminate the instruction entirely. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-25 20:37:45 +00:00
Bas Nieuwenhuizen	295536d47a	radv: Expose Vulkan 1.1 for Android. We have the YCBCR feature now. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen	8bb3cec7c9	radv: Expose VK_EXT_ycbcr_image_arrays. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen	fc9248e13e	radv: Enable YCBCR conversion feature. This enabled the basic YCBCR features. We support basic multiplane formats using 8-bit and 16-bit unorms, as well as YUV2 formats. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen	379b82dace	radv: Add ycbcr subsampled & multiplane formats to csv. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen	52c1adda21	radv: Add ycbcr format features. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen	b769a549ee	radv: Add hashing for the ycbcr samplers. Otherwise caching gets very confused. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen	5c3467e74a	radv: Run the new ycbcr lowering pass. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen	91702374d5	radv: Add ycbcr lowering pass. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen	5564c38212	radv: Update descriptor sets for multiple planes. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen	7f6732ac69	radv: Add ycbcr samplers in descriptor set layouts. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen	427024bf2e	ac/nir: Add support for planes. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen	dc917c8073	radv: Allow mixed src/dst aspects in copies. e.g. COLOR + PLANE_2, as well COLOR + COLOR for multiplane images. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen	b2cfa231d0	radv: Add support for image views with multiple planes. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen	65c4f612aa	radv: Add ycbcr conversion structs. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen	a837768857	radv: Support different source & dest aspects for planar images in blit2d. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen	66507cc656	radv: Add single plane image views & meta operations. Copies & clear of multiplane images is not allowed so we do not have to handle that case. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen	42d159f276	radv: Add multiple planes to images. No functional changes. This temporarily uses plane 0 for everything. Long term plan is that only single plane images get to use metadata like htile/dcc/cmask/fmask. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen	d3225e533f	radv: Add logic for multisample format descriptions. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen	09c4a911e5	radv: Add logic for subsampled format descriptions. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-25 19:56:20 +00:00
Caio Marcelo de Oliveira Filho	055f6281d4	intel/fs: Don't handle texop_tex for shaders without implicit LOD These will be lowered by nir_lower_tex() with the lower_tex_when_implicit_lod_not_supported, so don't need the extra handling here. Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-25 12:13:06 -07:00
Caio Marcelo de Oliveira Filho	d5ac5d6e83	nir: Add option to lower tex to txl when shader don't support implicit LOD We already add the LOD src, so go ahead and update the texop as well when this option is set. v2: Make it an option. (Rob Clark) v3: Use a more concise name suggested by Jason. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-04-25 12:13:06 -07:00
Topi Pohjolainen	ff642fb0e6	intel/compiler/fs/icl: Use dummy masked urb write for tess eval One cannot write the URB arbitrarily and therefore the message has to be carefully constructed. The clever tricks originate from Kenneth and Jason, I'm just writing the patch. Fixes GPU hangs on ICL with Vulkan CTS. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2019-04-25 22:00:43 +03:00
Andrii Simiklit	4e9592c5fa	iris: make the TFB result visible to others OpenGL 4.6 Spec: "5.3.3 Rules ....... Note: “Updates” via rendering or transform feedback are treated consistently with updates via GL commands. Once EndTransformFeedback has been issued, any subsequent command in the same context that uses the results of the transform feedback operation will see the results." v2: removed a wrong comment ( Kenneth Graunke <kenneth@whitecape.org> ) v3: - flush+dirty depends on buffers usage history - removed an old hack ( Kenneth Graunke <kenneth@whitecape.org> ) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110404 Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-25 11:48:04 -07:00
Kenneth Graunke	aa7306b4cf	iris: Some tidying for preemption support Just enable it during init_render_context on Gen10+, and move the Gen9 state tracking into iris_genx_state so it only exists on Gen9. Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-04-25 11:26:24 -07:00
Marek Olšák	383f406591	radeonsi: remove dirty slot masks from scissor and viewport states All registers in the array need to be updated if any of them is changed. Only apps writing gl_ViewportIndex were affected by this bug.	2019-04-25 11:49:38 -04:00
Marek Olšák	440135e5a0	radeonsi/gfx9: rework the gfx9 scissor bug workaround (v2) Needed to track context rolls caused by streamout and ACQUIRE_MEM. ACQUIRE_MEM can occur outside of draw calls. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110355 v2: squashed patches and done more rework Cc: 19.0 <mesa-stable@lists.freedesktop.org>	2019-04-25 11:49:38 -04:00
Marek Olšák	bc0d924507	radeonsi/gfx9: set that window_rectangles always roll the context Cc: 19.0 <mesa-stable@lists.freedesktop.org>	2019-04-25 11:49:38 -04:00
Jon Turney	5d310015c5	meson: Force '.so' extension for DRI drivers DRI driver loadable modules are always installed with install_megadriver.py with names ending with '.so', irrespective of platform. Force the name the loadable module is built with to match, so install_megadriver.py doesn't spin trying to remove non-existent symlinks. Fixes: `c77acc3c` "meson: remove meson-created megadrivers symlinks"	2019-04-25 12:40:16 +01:00
Nicolai Hähnle	9445a4ab43	radeonsi: add radeonsi_sync_compile option Force the driver thread to sync immediately with a compiler thread (but compilation still happens in a separate thread). This can be useful to simplify debugging compiler issues. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-25 12:35:29 +02:00
Nicolai Hähnle	ca95adf8ff	radeonsi: add radeonsi_aux_debug option for aux context debug dumps Enabling this option will create ddebug-style dumps for the aux context, except that instead of intercepting the pipe_context layer we just dump the IB contents on flush. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-25 12:35:27 +02:00
Nicolai Hähnle	fea3dcb844	ddebug: expose some helper functions as non-inline Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-25 12:35:24 +02:00
Nicolai Hähnle	ac0b60fa47	ddebug: dump driver state into a separate file Due to asynchronous execution, it's not clear which of the draws the state may refer to. This also works around an issue encountered with radeonsi where dumping the driver state itself caused a hang. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-25 12:35:21 +02:00
Nicolai Hähnle	b7fab7b02d	ddebug: log calls to pipe->flush This can be useful when internal draws lead to a hang. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-25 12:35:19 +02:00
Nicolai Hähnle	fe0d2b3d37	ddebug: set thread name For better debuggability. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-25 12:35:16 +02:00
Nicolai Hähnle	563faa3903	util/u_log: flush auto loggers before starting a new page Without this, command stream dumps of radeonsi may misleadingly end up in a later page. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-25 12:35:09 +02:00
Nicolai Hähnle	8bef4df196	radeonsi: add si_debug_options for convenient adding/removing of options Move the definition of radeonsi_clear_db_cache_before_clear there, as well as radeonsi_enable_nir. This removes the AMD_DEBUG=nir option. We currently still have two places for options: the driconf machinery and AMD_DEBUG/R600_DEBUG. If we are to have a single place for options, then the driconf machinery should be preferred since it's more flexible. The only downside of the driconf machinery was that adding new options was quite inconvenient. With this change, a simple boolean option can be added with a single line of code, same as for AMD_DEBUG. One technical limitation of this particular implementation is that while almost all driconf features are available, the translation machinery doesn't pick up the description strings for options added in si_debvug_options. In practice, translations haven't been provided anyway, and this is intended for developer options, so I'm not too worried. It could always be added later if anybody really cares. v2: - use bool instead of uint8_t for options - si_debug_options.inc -> si_debug_options.h Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-25 12:31:02 +02:00
Michel Dänzer	5078d66a86	gitlab-ci: Use meson buildtype debug instead of default debugoptimized This can save a lot of time for some of the meson CI jobs. Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-04-25 10:51:41 +02:00
Juan A. Suarez Romero	b06ae53606	Revert "intel/compiler: split is_partial_write() into two variants" This reverts commit `40b3abb4d1`. It is not clear that this commit was entirely correct, and unfortunately it was pushed by error. CC: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-25 09:19:10 +02:00
Timothy Arceri	b155f74d7b	nir: fix nir_remove_unused_varyings() We were only setting the used mask for the first component of a varying. Since the linking opts split vectors into scalars this has mostly worked ok. However this causes an issue where for example if we split a struct on one side of the interface but not the other, then we can possibly end up removing the first components on the side that was split and then incorrectly remove the whole struct on the other side of the varying. With this change we simply mark all 4 components for each slot used by a struct. We could possibly make this more fine gained but that would require a more complex change. This fixes a bug in Strange Brigade on RADV when tessellation is enabled, all credit goes to Samuel Pitoiset for tracking down the cause of the bug. Fixes: `f1eb5e6399` ("nir: add component level support to remove_unused_io_vars()") Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-25 16:37:36 +10:00
Lionel Landwerlin	f15409ee55	i965: fix icelake performance query enabling This was a rebase issue which lost of change to a file moved from i965 to src/intel/perf. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `134e750e16` ("i965: extract performance query metrics") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-25 01:11:54 +00:00
Marek Olšák	36cfe5fd62	radeonsi: add BOs after need_cs_space need_cs_space may clear the buffer list. Fixes: `951d60f8cd` "radeonsi: delay adding BOs at the beginning of IBs until the first draw" Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-24 20:59:07 -04:00
Marek Olšák	45ca7798dc	glsl: handle interactions between EXT_gpu_shader4 and texture extensions also, EXT_texture_buffer_object has to be enabled separately. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-24 20:45:15 -04:00
Marek Olšák	e71936a731	st/mesa: expose EXT_gpu_shader4 if GLSL 1.40 is supported Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-24 20:45:15 -04:00
Marek Olšák	503f94b43f	mesa: only allow EXT_gpu_shader4 in the compatibility profile Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-24 20:45:15 -04:00
Marek Olšák	ba265d1144	mesa: expose EXT_texture_buffer_object This is needed for exposing the samplerBuffer functions under EXT_gpu_shader4. v2: - expose it in the compat profile only - make it an alias of EXT_gpu_shader4 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> (v1) Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-24 20:45:15 -04:00
Marek Olšák	825c35999c	glsl: allow "varying out" for fragment shader outputs with EXT_gpu_shader4 Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-24 20:45:15 -04:00
Marek Olšák	4ff3b8e18a	glsl: add texture builtin functions for EXT_gpu_shader4 v2: some fixes to texture functions thanks to piglit tests Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> (v1) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1) Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> (v1) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-24 20:45:15 -04:00
Marek Olšák	8dbe23c8c6	glsl: add arithmetic builtin functions for EXT_gpu_shader4 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-24 20:45:15 -04:00
Marek Olšák	7004114102	glsl: add builtin variables for EXT_gpu_shader4 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-24 20:45:15 -04:00
Marek Olšák	1a973aa5e1	glsl: apply some 1.30 and other rules to EXT_gpu_shader4 as well Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-24 20:45:15 -04:00
Chris Forbes	85fefd1913	glsl: enable types for EXT_gpu_shader4 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-24 20:45:15 -04:00
Marek Olšák	a7f38e7fbd	glsl: add `unsigned int` type for EXT_GPU_shader4 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-24 20:45:15 -04:00
Chris Forbes	2d8f4fff49	glsl: enable noperspective\|flat\|centroid for EXT_gpu_shader4 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-24 20:45:15 -04:00
Chris Forbes	8740726e46	glsl: add scaffolding for EXT_gpu_shader4 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-24 20:45:15 -04:00
Marek Olšák	1faf833949	mesa: enable glGet for EXT_gpu_shader4 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-24 20:45:15 -04:00
Eric Anholt	d23b47fda5	v3d: Disable SSBOs and atomic counters on vertex shaders. The CTS fails on dEQP-GLES31.functional.shaders.opaque_type_indexing.atomic_counter.*vertex when they are enabled, due to the VS being run for both bin and render. I think this behavior is expected to be valid, but I can't find text in atomic counters or SSBO specs saying so (the closed I found was in shader_image_load_store). Just disable it for now, since the closed source driver doesn't expose vertex atomic counters/SSBOs either.	2019-04-24 17:24:11 -07:00
Eric Anholt	97316d3783	st/mesa: Don't set atomic counter size != 0 if MAX_SHADER_BUFFERS == 0. This is just asking for tests to get confused about the HW supporting atomics in this shader stage or not, such as dEQP-GLES31.functional.shaders.opaque_type_indexing.atomic_counter.const_expression_vertex. v2: Rebase on the other atomic cleanups that have happened since posting. v3: Commit message tweak by Marek. Signed-off-by: Eric Anholt <eric@anholt.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-24 17:24:11 -07:00
Kenneth Graunke	2812ef2a26	iris: Advertise EXT_texture_sRGB_R8 support Using the luminance format, like both brw and anv do.	2019-04-24 16:49:13 -07:00
Kenneth Graunke	59aa7c924d	iris: Enable GL_AMD_depth_clamp_separate We support this, we just forgot to turn it on.	2019-04-24 16:49:13 -07:00
Marek Olšák	131d56edfb	util: fix a compile failure in u_compute.c on windows	2019-04-24 19:04:20 -04:00
Mike Blumenkrantz	c7c59f75e5	iris: enable preemption support for gen10 this automatically enables preemption on gen10 where it is disabled by default but still available Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-04-24 14:47:47 -07:00
Mike Blumenkrantz	7315882023	iris: add preemption support on gen9 this is basically just porting the following two commits to gallium: `d8b50e152a` `5c454661c6` resolves kwg/mesa#49 Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-04-24 14:47:08 -07:00
Kenneth Graunke	21688a306b	iris: Split iris_flush_and_dirty_for_history into two helpers. We create two new helpers, iris_flush_bits_for_history, and iris_dirty_for_history, then use them in the existing function. The first accumulates flush bits based on res->bind_history, but doesn't actually perform a flush. This allows us to accumulate flush bits by looping over multiple resources, but ultimately emit a single flush for all of them. The latter flags dirty bits without flushing, which again allows us to handle multiple resources, but also is more convenient when writing from the CPU where we don't need a flush (as in commit `4d12236072`).	2019-04-24 13:31:32 -07:00
Dave Airlie	3323cf08f0	intel/compiler: fix uninit non-static variable. (v2) Pointed out by coverity. v2: init nir_locals also. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-25 06:06:57 +10:00
Dave Airlie	ce17e413de	virgl/drm: insert correct handles into the table. (v3) This inserts a handle for the flink name and a handle the correct gem handle for the bo. v2: fix handles/names confusion (Lepton Wu) v3: set flink name correctly (Lepton Wu) Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-04-25 06:05:43 +10:00
Dave Airlie	8a39f83fb2	virgl/drm: handle flink name better. This realigns this code with code from radeon. Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-04-25 06:05:43 +10:00
Dave Airlie	92ef4cf9f0	virgl/drm: cleanup buffer from handle creation (v2) This cleans up and realigns this code with what is in radeon v2: fix names->handles (Lepton Wu) Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-04-25 06:05:43 +10:00
Kenneth Graunke	19b246257d	iris: Actually put Mesa in GL_RENDERER string I constructed the right thing and then returned the other one.	2019-04-24 12:54:27 -07:00
Jiang, Sonny	69430d7e59	va: use a compute shader for the blit Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-04-24 15:47:41 -04:00
Marek Olšák	7fc3d21646	gallium: add PIPE_CAP_PREFER_COMPUTE_BLIT_FOR_MULTIMEDIA	2019-04-24 15:47:41 -04:00
Dylan Baker	5aedf48713	docs: update calendar, and news item and link release notes for 19.0.3	2019-04-24 10:53:04 -07:00
Dylan Baker	6bd7d4f19e	docs: Add SHA256 sums for mesa 19.0.3	2019-04-24 10:50:39 -07:00
Dylan Baker	7cb9043879	docs: add relnotes for 19.0.3	2019-04-24 10:50:37 -07:00
Marek Olšák	09e4771af9	gallium: set PIPE_CAP_MAX_FRAMES_IN_FLIGHT to 2 for all drivers Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-24 10:41:04 -04:00
Rafael Antognolli	f2041d2a92	intel/isl: Resize clear color buffer to full cacheline Fixes MCS fast clear gpu hangs with Vulkan CTS on ICL in CI. v2 (Nanley): In the title s/Align/Resize/ Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Tested-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-04-24 08:56:42 +03:00
Jason Ekstrand	45957c05b0	anv/descriptor_set: Properly align descriptor buffer to a page Instead of aligning and then taking inline uniforms into account, we need to take inline uniforms into account and then align to a page. Otherwise, we may not be aligned to a page and allocation may fail. Fixes: `43f40dc7cb` "anv: Implement VK_EXT_inline_uniform_block" Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-24 05:40:27 +00:00
Jason Ekstrand	3d33c13eca	anv/descriptor_set: Only vma_heap_finish if we have a descriptor buffer Fixes: `7bb34ecff9` "anv: release memory allocated by bo_heap when..." Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-24 05:40:27 +00:00
Jason Ekstrand	0bc1942c9d	anv/descriptor_set: Destroy sets before pool finalization Fixes: `105002bd2d` "anv: destroy descriptor sets when pool gets..." Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-24 05:40:27 +00:00
Jason Ekstrand	6be603edf7	anv/descriptor_set: Unlink sets from the pool in set_destroy anv_descriptor_pool_free_set is called on the clean-up path of anv_descriptor_set_create and the set may not have been added to the pool's list of sets yet. While we're here, we move adding it to that list into set_create for symmetry. Fixes: `105002bd2d` "anv: destroy descriptor sets when pool gets..." Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-24 05:40:27 +00:00
Tapani Pälli	4add3c6880	android/iris: fix driinfo header filename Fixes iris driver Android build. Fixes: `faa52e328e` "iris: Add mechanism for iris-specific driconf options" Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-23 22:25:17 -07:00
Ian Romanick	21223acf7d	intel/fs: Fix D to W conversion in opt_combine_constants Found by GCC warning: src/intel/compiler/brw_fs_combine_constants.cpp: In function ‘bool needs_negate(const fs_reg, const imm)’: src/intel/compiler/brw_fs_combine_constants.cpp:306:34: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits] return ((reg->d & 0xffffu) < 0) != (imm->w < 0); ~~~~~~~~~~~~~~~~~~~^~~ The result of the bit-and is a 32-bit value with the top bits all zero. This will never be < 0. Instead of masking off the bits, just cast to int16_t and let the compiler handle the actual conversion. Fixes: `e64be391dd` ("intel/compiler: generalize the combine constants pass") Cc: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-23 19:48:33 -07:00
Alyssa Rosenzweig	e4ec814c39	panfrost/midgard: Remove assembler This code is outdated and unused; now that the compiler is mature, there's no point keeping it around in-tree (or at all). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:44:00 +00:00
Ryan Houdek	2cd1aa3429	panfrost: Adds Bifrost shader disassembler utility This code is stable and can live upstream independently while the rest of the Bifrost stack comes up. v2: Added a verbose flag to hide away some of the more verbose features that nobody really needs [The Bifrost disassembler is written by Connor Abbott, Lyude Paul, and Ryan Houdek.] Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:39:01 +00:00
Alyssa Rosenzweig	bb1aff3007	panfrost/midgard: Add "op commutes?" property Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:22:32 +00:00
Alyssa Rosenzweig	1f345bc7d6	panfrost/midgard: Refactor opcode tables We create an all-encompassing opcode table for handling name and properties, removing a number of ad hoc opcode tables which became brittle and quickly out of date. While we're at it, we fix some incorrect opcodes relating to ball/bany, and move a small function out to midgard_compile.c. Together these changes should allow compilation without warnings, along with helping the codebase health considerably. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:22:32 +00:00
Alyssa Rosenzweig	4d995e0da8	panfrost/midgard: Optimize MIR in progress loop Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:22:32 +00:00
Alyssa Rosenzweig	e9f84f1447	panfrost/midgard: Implement copy propagation Most copy prop should occur at the NIR level, but we generate a fair number of moves implicitly ourselves, etc... long story short, it's a net win to also do simple copy prop + DCE on the MIR. As a bonus, this fixes the weird imov precision bug once and for good, I think. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:22:32 +00:00
Alyssa Rosenzweig	fcdfb67711	panfrost/midgard: Set integer mods Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:22:32 +00:00
Alyssa Rosenzweig	422aceb407	panfrost/midgard: Document sign-extension/zero-extension bits (vector) For floating point ops, these bits determine the "negate?" and "abs?" modifiers. For integer ops, it turns out they control how sign/zero extension work, useful for mixing types. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:22:32 +00:00
Alyssa Rosenzweig	b453c877d9	panfrost/midgard: Update integer op list In the future, we might want to switch to a table-based approach, but for now, at least have it current. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:22:32 +00:00
Alyssa Rosenzweig	0b380a7868	panfrost/midgard: Remove unused mir_next_block Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:22:32 +00:00
Alyssa Rosenzweig	879ff866b6	panfrost/midgard: Fix off-by-one in successor analysis This reduces register pressure substantially since we get smaller liveness ranges. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:22:31 +00:00
Alyssa Rosenzweig	521ac6e5b1	panfrost/midgard: Track loop depth This fixes nested loops. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:22:31 +00:00
Alyssa Rosenzweig	84f09ff433	panfrost/midgard: Dead code eliminate MIR We reshuffle the existing "dead move elimination" pass into a generic dead code elimination layer, fixing bugs incurred with looping in the process. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:22:31 +00:00
Alyssa Rosenzweig	328a5ef598	panfrost: Use actual imov instruction The bug this worked around is no longer applicable, it seems -- remove the hack that breaks more than it fixes. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:22:31 +00:00
Alyssa Rosenzweig	12cd89da81	panfrost: Disable indirect outputs for now The hardware needs this lowered anyway; for now, might as well use mesa's default lowering for pure conformance reasons. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:22:31 +00:00
Alyssa Rosenzweig	9db5816e02	panfrost/midgard: imul can only run on mul This restriction makes sense logically. Not sure why it wasn't obeyed before. In conjunction with previous commit's disclaimer, fixes dEQP-GLES2.functional.shaders.loop.for_dynamic_iterations. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:22:31 +00:00
Alyssa Rosenzweig	a1aaf72915	panfrost/midgard: Don't try to inline constants on branches Along with a corresponding fix to the move elimination pass (not included here yet -- I just have it disabled for now), this will fix dEQP-GLES2.functional.shaders.loops.for_uniform_iterations.* Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:22:31 +00:00
Alyssa Rosenzweig	c0fb2605dc	panfrost: Respect backwards branches in RA Fixes a bunch of issues with looping. Honestly, I'm not sure why loops worked at all before. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:22:31 +00:00
Alyssa Rosenzweig	7d45bd9c91	panfrost/midgard: Remove useless MIR dump Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:22:31 +00:00
Alyssa Rosenzweig	8b15f8a343	panfrost/midgard: Respect component of bcsel condition Fixes a bunch of non-vec4 indexing.varying_array tests. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:22:31 +00:00
Alyssa Rosenzweig	6a466c0a06	panfrost/midgard: Implement indirect loads of varyings/UBOs This adds preliminary support for indirect loads of varying arrays and uniform arrays, bringing a few new tests in shader.indexing.* to passing, although there remains a number of cases still missing. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:22:31 +00:00
Alyssa Rosenzweig	1f7b3884c9	panfrost/midgard: Pipe through varying arrays Varying arrays sometimes are lowered to a series of directly accessed varyings (which we handled okay), but when indirectly accessed, they appear as a single array; we need to handle this as well. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:22:31 +00:00
Alyssa Rosenzweig	042d0bb5c3	panfrost/mdg/disasm: Print raw varying_parameters The semantics of this field are not well understood; it is better to print it unconditionally along with the other unknown state, rather than silently eat the value. Without this change, some critical state was being lost in some shaders (notably, the offset for load/store scratchpad intructions found in shaders that spill registers.) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-24 02:22:31 +00:00
Kenneth Graunke	864873dea9	iris: Prefer staging blits when destination supports CCS_E. Otherwise our textures don't get color compression. Thanks to Eero Tamminen for noticing this was missing! Improves performance of GLB27_FillTestC24Z16 on my Apollolake laptop with single channel RAM by 2.3x. Reported-by: Eero Tamminen <eero.t.tamminen@intel.com>	2019-04-23 18:59:27 -07:00
Marek Olšák	d8b296d3ad	gallium: replace drm_driver_descriptor::configuration with driconf_xml PIPE_CAPs are better. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-23 21:20:26 -04:00
Marek Olšák	8ae50e6004	gallium: replace DRM_CONF_SHARE_FD with PIPE_CAP_DMABUF Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-23 21:20:26 -04:00
Marek Olšák	e3841368f3	gallium: replace DRM_CONF_THROTTLE with PIPE_CAP_MAX_FRAMES_IN_FLIGHT Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-23 21:20:24 -04:00
Marek Olšák	a20800f49d	st/dri: simplify throttling code Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-23 21:19:48 -04:00
Marek Olšák	d9838f653a	gallium: document conservative rasterization flags Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-23 21:19:48 -04:00
Ian Romanick	26391cceaa	intel/compiler: Lower ffma on Gen4 and Gen5 flrp32 is also a 3-source instruction, but there is another pending series that handles that for Gen4 and Gen5. v2: Rebase on "intel/compiler: Don't have sepearate, per-Gen nir_options" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-23 17:50:28 -07:00
Ian Romanick	fd1fa9afc7	intel/compiler: Don't have sepearate, per-Gen nir_options Instead, just have separate scalar vs. vector nir_options and do per-Gen "fix ups". Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-23 17:50:16 -07:00
Ian Romanick	3b087f668f	glsl: Silence may unused parameter warnings in glsl/ir.h Every file that included glsl/ir.h had a warning like: src/compiler/glsl/ir.h: In member function ‘virtual bool ir_rvalue::is_lvalue(const _mesa_glsl_parse_state) const’: src/compiler/glsl/ir.h:236:64: warning: unused parameter ‘state’ [-Wunused-parameter] virtual bool is_lvalue(const struct _mesa_glsl_parse_state state = NULL) const ^ Cc: Samuel Pitoiset <samuel.pitoiset@gmail.com> Fixes: `fa4ebf6b8d` ("glsl: add _mesa_glsl_parse_state object to is_lvalue()") Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-23 17:49:19 -07:00
Timothy Arceri	a6b7068ff5	st/mesa/radeonsi: fix race between destruction of types and shader compilation Commit `624789e370` moved the destruction of types out of atexit() and made use of a ref count instead. This is useful for avoiding a crash where drivers such as radeonsi are still compiling in a thread when the app exits and has not called MakeCurrent to change from the current context. While the above scenario is technically an app bug we shouldn't crash. However that change caused another race condition between the shader compilation tread in radeonsi and context teardown functions. This patch makes two changes to fix this new problem: First we explicitly call _mesa_destroy_shader_compiler_types() when destroying the st context rather than calling it indirectly via _mesa_free_context_data(). We do this as we must call it after st_destroy_context_priv() so that we don't destory the glsl types before the compilation threads finish. Next wait for the shader threads to finish in si_destroy_context() this also means we need to call context destroy before destroying the queues in si_destroy_screen(). Fixes: `624789e370` ("compiler/glsl: handle case where we have multiple users for types") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-24 10:23:10 +10:00
Bas Nieuwenhuizen	3844ed8d44	radv: Add adaptive_sync driconfig option and enable it by default. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-23 23:49:39 +00:00
Bas Nieuwenhuizen	f2e0f5c3c4	vulkan/wsi: Add X11 adaptive sync support based on dri options. The dri options are optional. When the dri options are not provided the WSI will not use adaptive sync. FWIW I think for xf86-video-amdgpu this still requires an X11 config option, so only people who opt in can get possible regressions from this. So then the remaining question is: why do this in the WSI? It has been suggested in another MR that the application sets this. However, I disagree with that as I don't think we'll ever get a reasonable set of applications setting it. The next questions is whether this can be a layer. It definitely can be as implemented now. However, I think this generally fits well with the function of the WSI. Furthemore, for e.g. the DISPLAY WSI this is much harder to do in a layer. Of course, most of the WSI could almost be a layer, but I think this still fits best in the WSI. Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-23 23:49:39 +00:00
Bas Nieuwenhuizen	3c2e8267d0	radv: Add support for driconf. This includes 0 options. The cache parsing is located at a position where we can easily add config filtering by VkApplicationInfo. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-23 23:49:39 +00:00
Mike Blumenkrantz	b53d256db8	iris: add support for INTEL_conservative_rasterization this hooks up the iris gallium driver to existing mesa bits which handle the implementation resolves kwg/mesa#8 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-23 16:36:30 -07:00
Mike Blumenkrantz	e00f6a0605	st/mesa: indicate intel extension support for inner_coverage based on cap if the driver (iris) indicates support for the inner_coverage pipe cap, this will set the necessary states in the driver flags and rasterizer structs Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-23 16:36:16 -07:00
Mike Blumenkrantz	1b9041c76a	gallium: add pipe cap for inner_coverage conservative raster mode this can be used by drivers which support the extension to indicate support Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-23 16:36:00 -07:00
Kenneth Graunke	2208d5a683	iris: Fix DrawTransformFeedback math when there's a buffer offset We need to subtract the starting offset from the final offset before dividing by the stride. See src/intel/vulkan/genX_cmd_buffer.c:3142. Not known to fix anything.	2019-04-23 15:57:07 -07:00
Kenneth Graunke	38db20245b	iris: Make some offset math helpers take a const isl_surf pointer	2019-04-23 15:47:10 -07:00
Caio Marcelo de Oliveira Filho	7e2684ce01	spirv: Handle SpvOpDecorateId This operation decorate with an Id instead of a Literal or String. It is used by HlslCounterBufferGOOGLE (provided by SPV_GOOGLE_hlsl_functionality1). Even if we don't do anything with that decoration, we must be able to parse SPIR-V that uses it. Fixes: `891886da2f` "spirv: Add no-op support for VK_GOOGLE_hlsl_functionality1" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-23 14:58:01 -07:00
Caio Marcelo de Oliveira Filho	7b66d584a3	spirv: Rename vtn_decoration literals to operands Decorations (and ExecutionModes) can have not only literals, but also Ids associated with them. So rename the field to the more general name "Operand" used by the spec. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-23 14:58:01 -07:00
Lionel Landwerlin	0fb0058f18	anv: fix argument name for vkCmdEndQuery Doesn't fix anything but it's not the right function prototype. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `673f33c77d` ("anv: Implement CmdBegin/EndQueryIndexed") Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>	2019-04-24 04:33:26 +08:00
Chia-I Wu	cc53815ae1	virgl: skip empty cmdbufs Several empty cmdbufs are submitted by app/xserver per frame, from glamor_block_handler for example. Let's skip them. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-23 19:07:48 +00:00
Eric Anholt	ec686a66db	gallium: Remove the malloc pipebuffer manager. This has been unused since r600 stopped using it in 2010. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Kristian Høgsberg <hoegsberg@gmail.com>	2019-04-23 10:36:07 -07:00
Eric Anholt	6345dfc8f3	gallium: Remove the "alt" pipebuffer manager interface. This one would allocate from two underlying pools, but has never been used. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Kristian Høgsberg <hoegsberg@gmail.com>	2019-04-23 10:36:07 -07:00
Eric Anholt	8e31a4f27f	gallium: Remove the ondemand pipebuffer manager. I couldn't find any uses in the tree since its introduction. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Kristian Høgsberg <hoegsberg@gmail.com>	2019-04-23 10:36:07 -07:00
Eric Anholt	f5c08d9818	gallium: Remove the pool pipebuffer manager. Noticed while trying to decide if pipebuffer was of any use to me, and found that nothing has used it in the last 10 years at least. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Kristian Høgsberg <hoegsberg@gmail.com>	2019-04-23 10:36:07 -07:00
Jonathan Marek	d133f55a99	freedreno: a2xx: same gmem2mem sequence for all tiles Set REG_A2XX_RB_COPY_DEST_OFFSET in the tile init as it won't get touched by the draw batch. Then gmem2mem is the same for all tiles. Similar to what is done in a6xx, but only for gmem2mem. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-04-23 17:13:32 +00:00
Jonathan Marek	4107e0678a	freedreno: a2xx: enable batch reordering Batch reordering on a2xx is now tested and functional. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-04-23 17:13:32 +00:00
Jonathan Marek	7f670ca5fd	freedreno: a2xx: use nir_lower_io for TGSI shaders Allows removing the load_deref/store_deref code in the compiler. tgsi_to_nir now uses screen instead of options so we can simplify that too. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-04-23 17:13:32 +00:00
Jonathan Marek	bce4f11dbc	freedreno: a2xx: disable PIPE_CAP_PACKED_UNIFORMS a2xx driver is currently broken when PIPE_CAP_PACKED_UNIFORMS is enabled, disable it for now. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-04-23 17:13:32 +00:00
Jonathan Marek	418c3d9a4f	freedreno: a2xx: fix builtin blit program compilation tgsi_to_nir now requires a screen pointer and is used by fd2_prog_init. fd2_prog_init is used before fd_context_init so set the pointer manually. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-04-23 17:13:32 +00:00
Jonathan Marek	33cafb41a2	svga: add new ATC formats to the format conversion table Fixes the static assertion error. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-04-23 17:11:56 +00:00
Jonathan Marek	0e696416f9	freedreno: a2xx: add GL_AMD_compressed_ATC_texture support Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-04-23 17:11:56 +00:00
Jonathan Marek	734409096b	freedreno: a3xx: add GL_AMD_compressed_ATC_texture support Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-04-23 17:11:56 +00:00
Jonathan Marek	0719a5f646	st/mesa: add ATC support Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-04-23 17:11:56 +00:00
Jonathan Marek	bfa72e4d52	llvmpipe, softpipe: no support for ATC textures Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-04-23 17:11:56 +00:00
Jonathan Marek	ea254fcd3c	gallium: add ATC format support Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-04-23 17:11:56 +00:00
Jonathan Marek	73c1d7e8c9	mesa: add GL_AMD_compressed_ATC_texture support Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-04-23 17:11:56 +00:00
Marek Olšák	951d60f8cd	radeonsi: delay adding BOs at the beginning of IBs until the first draw so that bound compute shader resources won't be added when they are not needed and same for graphics. Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-04-23 11:36:36 -04:00
Marek Olšák	09bb8c8557	radeonsi: add helper si_get_minimum_num_gfx_cs_dwords Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-04-23 11:36:34 -04:00
Marek Olšák	c59d238bb0	radeonsi: add si_cp_copy_data Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-04-23 11:36:33 -04:00
Marek Olšák	694e320643	winsys/amdgpu: clean up and remove nonsensical assertion The assertion considers max_dw from the current IB in the chain, but big_ib_buffer is a buffer for the next IB, which can be smaller. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-04-23 11:36:31 -04:00
Marek Olšák	1807f6cfe9	winsys/amdgpu: enable chaining for compute IBs Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-04-23 11:36:06 -04:00
Marek Olšák	b99bed6246	winsys/amdgpu: reorder chunks, make BO_HANDLES first, IB and FENCE last Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-04-23 11:28:56 -04:00
Marek Olšák	437d032b7d	winsys/amdgpu: make IBs writable and expose their address Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-04-23 11:28:56 -04:00
Marek Olšák	2313176817	ac: add REWIND and GDS registers to register headers Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-04-23 11:28:56 -04:00
Marek Olšák	35cd57df2e	ac: add ac_get_i1_sgpr_mask Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-04-23 11:28:56 -04:00
Marek Olšák	bfb9287599	ac: add radeon_info::is_pro_graphics Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-04-23 11:28:56 -04:00
Marek Olšák	64d6cc982d	ac: add radeon_info::marketing_name, replacing the winsys callback Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-04-23 11:28:56 -04:00
Marek Olšák	9b33465481	tgsi/scan: add uses_drawid Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-04-23 11:28:56 -04:00
Kenneth Graunke	77449d7c41	iris: Track valid data range and infer unsynchronized mappings. Applications frequently call glBufferSubData() to consecutive regions of a VBO to append new vertex data. If no data exists there yet, we can promote these to unsynchronized writes, even if the buffer is busy, since the GPU can't be doing anything useful with undefined content. This can avoid a bunch of unnecessary blitting on the GPU. u_threaded_context would do this for us, and in fact prohibits us from doing so (see TC_TRANSFER_MAP_NO_INFER_UNSYNCHRONIZED). But we haven't hooked that up yet, and it may be useful to disable u_threaded_context when debugging...at which point we'd still want this optimization. At the very least, it would let us measure the benefit of threading independently from this optimization. And it's not a lot of code. Removes most stall avoidance blits in "Total War: WARHAMMER." On my Skylake GT4e at 1920x1080, this appears to improve performance in games by the following (but I did not do many runs for proper statistics gathering): ---------------------------------------------- \| DiRT Rally \| +2% (avg) \| + 2% (max) \| \| Bioshock Infinite \| +3% (avg) \| + 9% (max) \| \| Shadow of Mordor \| +7% (avg) \| +20% (max) \| ----------------------------------------------	2019-04-23 00:24:08 -07:00
Kenneth Graunke	768b17a7ad	iris: Make a resource_is_busy() helper This checks both "is it busy" and "do we have work queued up for it"?	2019-04-23 00:24:08 -07:00
Kenneth Graunke	5ad0c88dbe	iris: Replace buffer backing storage and rebind to update addresses. This implements PIPE_CAP_INVALIDATE_BUFFER and invalidate_resource(), as well as the PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE flag. When either of these happen, we swap out the backing storage of the buffer for a new idle BO, allowing us to write to it immediately without stalling or queueing a blit. On my Skylake GT4e at 1920x1080, this improves performance in games: ----------------------------------------------- \| DiRT Rally \| +25% (avg) \| +17% (max) \| \| Bioshock Infinite \| +22% (avg) \| +11% (max) \| \| Shadow of Mordor \| +27% (avg) \| +83% (max) \| -----------------------------------------------	2019-04-23 00:24:08 -07:00
Kenneth Graunke	0a082b6560	iris: Make memzone_for_address non-static I want to use this in iris_resource.c.	2019-04-23 00:24:08 -07:00
Kenneth Graunke	72277044e2	iris: Make a gl_shader_stage -> pipe_shader_stage helper function This is probably not the best place for it, but I don't feel like moving the one out of the TGSI translator today, and we already have the other direction here, so...shrug	2019-04-23 00:24:08 -07:00
Kenneth Graunke	b45dff1da8	iris: Rework image views to store pipe_image_view. This will be useful when rebinding images.	2019-04-23 00:24:08 -07:00
Kenneth Graunke	2f60850a3f	iris: Rework UBOs and SSBOs to use pipe_shader_buffer This unifies a bunch of the UBO and SSBO code to use common structures. Beyond iris_state_ref, pipe_shader_buffer also gives us a buffer size, which can be useful when filling out the surface state.	2019-04-23 00:24:08 -07:00
Kenneth Graunke	00d4019676	iris: Track bound constant buffers This helps avoid having to iterate over [0, PIPE_MAX_CONSTANT_BUFFERS) looking to see if any resources are bound.	2019-04-23 00:24:08 -07:00
Kenneth Graunke	4d12236072	iris: Mark constants dirty on transfer unmap even if no flushes occur I have various conditions in place to try and avoid unnecessary PIPE_CONTROL flushes, especially to batches which may have never used the buffer being mapped. But if we do a CPU map to a bound constant buffer, we still need to mark push constants dirty, even if there's nothing happening in batches that would warrant a flush. Fixes obvious misrendering in the "XCOM 2: War of the Chosen" menus (lots of rainbow colored triangles). Fixes lots of blinking elements in "Shadow of Mordor". Fixes missing crowd rendering in "DiRT Rally".	2019-04-23 00:24:08 -07:00
Lionel Landwerlin	b1ba7ffdbd	intel: workaround VS fixed function issue on Gen9 GT1 parts The issue is noticeable in the dEQP-GLES31.functional.geometry_shading.layered.render_with_default_layer_3d test where a triangle goes missing when we use the maximum number of URB entries as specified by the documentation. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107505 Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-23 13:41:20 +08:00
Matt Turner	4ec258ac3c	intel/compiler: Improve fix_3src_operand() Allow ATTR and IMM sources unconditionally (ATTR are just GRFs, IMM will be handled by opt_combine_constants(). Both are already allowed by opt_copy_propagation(). Also allow FIXED_GRF if the regioning is 8,8,1. Could also allow other stride=1 regions (e.g., 4,4,1) and scalar regions but I don't think those occur. This is sufficient to allow a pass added in a future commit (fs_visitor::lower_linterp) to avoid emitting extra MOV instructions. I removed the 'src.stride > 1' case because it seems wrong: 3-src instructions on Gen6-9 are align16-only and can only do stride=1 or stride=0. A run through Jenkins with an assert(src.stride <= 1) never triggers, so it seems that it was dead code. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-04-22 16:54:31 -07:00
Matt Turner	8aae7a3998	intel/compiler: Add unit tests for sat prop for different exec sizes The two new unit tests verify that propagating a saturate between instructions of different exec sizes does not happen. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-04-22 16:54:21 -07:00
Matt Turner	54d4d34b96	intel/compiler: Use SIMD16 instructions in fs saturate prop unit test Will allow us to test that propagation between instructions of different exec sizes does not happen (in the next commit). The stray-looking change in intervening_dest_write is to adjust the size of the texture result to keep the test functioning identically when the instructions' exec sizes are doubled. Without the change, the texture does not overwrite the destination fully as the unit test intends. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-04-22 16:54:17 -07:00
Rafael Antognolli	70e03e220c	intel/fs: Remove fs_generator::generate_linterp from gen11+. We now have a lowering pass that will do this at the fs_visitor level, so we can remove this code from gen11+. v2: Reduce size of the "i" array from 4 to 2 (Matt). Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-22 16:54:00 -07:00
Rafael Antognolli	9ea90aae1e	intel/fs: Add a lowering pass for linear interpolation. On gen11, instead of using a PLN instruction, we convert FS_OPCODE_LINTERP to 2 or 4 multiply adds. That is done in the fs_generator code. This patch adds a lowering pass that does the same thing at the fs_visitor. It also drops the usage of NF types, since we don't need the extra precision and it lets us skip the accumulator. With all that, some optimizations will still be run on the generated code, and we should get better scheduling. v2: Update comment about saturation and conditional mod (Matt) Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-22 16:54:00 -07:00
Rafael Antognolli	c0504569ea	intel/fs: Move the scalar-region conversion to the generator. Move the scalar-region conversion from the IR to the generator, so it doesn't affect the Gen11 path. We need the non-scalar regioning for a later lowering pass that we are adding. v2: Better commit message (Matt) Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-22 16:54:00 -07:00
Rafael Antognolli	0778748eba	intel/fs: Only propagate saturation if exec_size is the same. Otherwise it could propagate the saturation from a SIMD16 instruction into a SIMD8 instruction. With that, only part of the destination register, which is the source of the move with saturation, would have been updated. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-22 16:53:55 -07:00
Kenneth Graunke	087f92c59a	i965: Tidy bogus indentation left by previous commit I left code indented one level too far in the previous commit to make the diff easier to review. Drop that extra level now. Fixes: `6981069fc8` i965: Ignore uniform storage for samplers or images, use binding info Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-22 15:41:56 -07:00
Kenneth Graunke	6981069fc8	i965: Ignore uniform storage for samplers or images, use binding info gl_nir_lower_samplers_as_deref creates new top level sampler and image uniforms which have been split from structure uniforms. i965 assumed that it could walk through gl_uniform_storage slots by starting at var->data.location and walking forward based on a simple slot count. This assumed that structure types were walked in a particular order. With samplers and images split out of structures, it becomes impossible to assign meaningful locations. Consider: struct S { sampler2D a; sampler2D b; } s[2]; The gl_uniform_storage locations for these follow this map: 0 => a[0], 1 => b[0], 2 => a[0], 3 => b[0]. But the new split variables look like: sampler2D lowered_a[2]; sampler2D lowered_b[2]; and there is no way to know that there's effectively a stride to get to the location for successive elements of a[] or b[]. So, working with location becomes effectively impossible. Ultimately, the point of looking at uniform storage was to pull out the bindings from the opaque index fields. gl_nir_lower_samplers_as_derefs can obtain this information while doing the splitting, however, and sets up var->data.binding to have the desired values. We move gl_nir_lower_samplers before brw_nir_lower_image_load_store so gl_nir_lower_samplers_as_derefs has the opportunity to set proper image bindings. Then, we make the uniform handling code skip sampler(-array) variables, and handle image param setup based on var->data.binding. Fixes Piglit tests/spec/glsl-1.10/execution/samplers/uniform-struct, this time without regressing dEQP-GLES2.functional.uniform_api.random.3. Fixes: `f003859f97` nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_deref Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-22 15:39:55 -07:00
Kenneth Graunke	47303b466c	Revert "glsl: Set location on structure-split sampler uniform variables" This reverts commit `9e0c744f07`, which regressed dEQP-GLES2.functional.uniform_api.random.3. It turns out that the newly produced location is meaningless and impossible to consume by drivers that want to look at gl_uniform_storage, so it's probably better to leave it unset (0) than a number that looks usable. Leave a tombstone^Wcomment to discourage the next person from making the obvious looking fix. See the next commit for a longer description of the problem. This breaks tests/spec/glsl-1.10/execution/samplers/uniform-struct on i965, which was originally fixed by the revert. The next commit will fix it again. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-22 15:39:55 -07:00
Marek Olšák	b58e5fb6f3	radeonsi: use CP DMA for the null const buffer clear on CIK This is a workaround for a thread deadlock that I have no idea why it occurs. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108879 Fixes: `9b331e462e` Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-22 16:05:52 -04:00
Danylo Piliaiev	f280c36c08	drirc: Add workaround for Epic Games Launcher Epic Games Launcher could be launched in opengl mode with "-opengl" option. It creates 4.4 opengl core context however it uses deprecated functionality e.g. default vertex buffer object. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110462 Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-04-22 16:04:19 -04:00
Kenneth Graunke	1566054459	iris: Track bound and writable SSBOs Marek recently extended pipe->set_shader_buffers() to take an extra writable_bitmask parameter, indicating which SSBOs are writable (some may be bound read-only). We can use this to decide whether to set EXEC_OBJECT_WRITE when pinning. Avoiding the write flag can save us some cross-batch flushing if the SSBO is used for reading in both the render and compute engines.	2019-04-22 11:31:14 -07:00
Chia-I Wu	e9c5e13344	virgl: clear vertex_array_dirty Clear vertex_array_dirty after the state is emitted. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-22 10:19:47 -07:00
Lubomir Rintel	e983a975c6	gallivm: disable NEON instructions if they are not supported The LLVM project made some questionable decisions about defaults for armv7 (e.g. they enable NEON that is not there on NVIDIA and Marvell platforms). On top of that, getHostCPUFeatures() doesn't disable missing machine attributes. Finally, -neon alone is not sufficient to disable emmision of NEON instructions. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-22 09:47:49 -07:00
Lubomir Rintel	bc6bfc861f	gallivm: guess CPU features also on ARM getHostCPUFeatures() is also available on ARM, for even longer time than for x86. Use it -- it potentially enables instructions that may speed things up. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: <mesa-stable@lists.freedesktop.org> Closes: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/518 Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-22 09:47:39 -07:00
Kenneth Graunke	36478b9f77	iris: Enable the dual_color_blend_by_location driconf option. This fixes rendering in Unigine Valley 1.0 and Heaven 4.0.	2019-04-22 09:36:36 -07:00
Kenneth Graunke	faa52e328e	iris: Add mechanism for iris-specific driconf options Based on Nicolai's `0f8c5de869`. Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-04-22 09:35:36 -07:00
Jason Ekstrand	ccb25aaeaf	nir: Use the NIR_SRC_AS_ macro to define nir_src_as_deref We have a macro for this now; no reason to hand-roll it for derefs. While we're here, move the NIR_DEFINE_CAST for derefs down to where all the other ones are. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-22 15:23:24 +00:00
Jason Ekstrand	2314db10bf	anv,radv: Update release notes for newly implemented extensiosn A lot has happened in those two drivers since the 19.0 release and we keep forgetting to update release notes. Time to bring everything up to date again before 19.1 gets released. Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-22 14:47:23 +00:00
Samuel Pitoiset	b3e3440c87	radv: add VK_NV_compute_shader_derivates support Only computeDerivativeGroupLinear is supported for now. All crucible tests pass. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-22 14:51:57 +02:00
Ian Romanick	a6ccc4c0c8	intel/fs: Add support for float16 to the fsign optimizations Commit `ad98fbc217` ("intel/fs: Refactor code generation for nir_op_fsign to its own function") criss-crossed with `c2b8fb9a81` ("anv/device: expose VK_KHR_shader_float16_int8 in gen8+"), and I was not paying enough attention when I rebased. This adds back the float16 changes and enables the optimization. v2: Incorporate more changes from `19cd2f5deb` and `a8d8b1a139` that I missed in the previous version. Fixes: `ad98fbc217` ("intel/fs: Refactor code generation for nir_op_fsign to its own function") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110474 Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]	2019-04-20 20:49:34 -07:00
Icenowy Zheng	3e91c7d544	lima: add Android build Currently only meson build supported is added for lima driver. Add Android build support for lima. Signed-off-by: Icenowy Zheng <icenowy@aosc.io> Acked-by: Qiang Yu <yuq825@gmail.com>	2019-04-21 01:05:19 +00:00
Andre Heider	8b13aac966	st/nine: skip position checks in SetCursorPosition() For HW cursors, "cursor.pos" doesn't hold the current position of the pointer, just the position of the last call to SetCursorPosition(). Skip the check against stale values and bump the d3dadapter9 drm version to expose this change of behaviour. Signed-off-by: Andre Heider <a.heider@gmail.com> Reviewed-by: Axel Davy <davyaxel0@gmail.com>	2019-04-20 13:06:29 +02:00
Jason Ekstrand	828ec41154	anv: Rework the descriptor set layout create loop Previously, we were storing the per-binding create info pointer in the immutable_samplers field temporarily so that we can switch the order in which we walk the loop. However, now that we have multiple arrays of structs to walk, it makes more sense to store an index of some sort. Because we want to leave immutable_samplers as NULL for undefined bindings, we store index + 1 and then subtract one later. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 23:26:41 +00:00
Jason Ekstrand	2b388c3d04	anv: Ignore descriptor binding flags if bindingCount == 0 I missed this on the first go round. The bindingCount field of VkDescriptorSetLayoutBindingFlagsCreateInfoEXT is allowed to be zero which means the flags array is ignored. Fixes: `d6c9bd6e01` "anv: Put binding flags in descriptor set layouts" Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 23:26:41 +00:00
Alyssa Rosenzweig	648cda258b	panfrost/mdg: Use shared fsign lowering Fixes failures in shaders.operator.common_functions.sign.* Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-19 23:15:57 +00:00
Alyssa Rosenzweig	31d9caa239	panfrost: Fixup vertex offsets to prevent shadow copy Mali attribute buffers have to be 64-byte aligned. However, Gallium enforces no such requirement; for unaligned buffers, we were previously forced to create a shadow copy (slow!). To prevent this, we instead use the offseted buffer's address with the lower bits masked off, and then add those masked off bits to the src_offset. Proof of correctness included, possibly for the opportunity to say "QED" unironically. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-19 22:50:20 +00:00
Alyssa Rosenzweig	e008d4f011	panfrost: Track BO lifetime with jobs and reference counts This (fairly large) patch continues work surrounding the panfrost_job abstraction to improve job lifetime management. In particular, we add infrastructure to track which BOs are used by a particular job (currently limited to the vertex buffer BOs), to reference count these BOs, and to automatically manage the BOs memory based on the reference count. This set of changes serves as a code cleanup, as a way of future proofing for allowing flushing BOs, and immediately as a bugfix to workaround the missing reference counting for vertex buffer BOs. Meanwhile, there are a few cleanups to vertex buffer handling code itself, so in the short-term, this allows us to remove the costly VBO staging workaround, since this patch addresses the underlying causes. v2: Use pipe_reference for BO reference counting, rather than managing it ourselves. Don't duplicate hash-table key removal. Fix vertex buffer counting. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-19 22:50:20 +00:00
Andres Gomez	a151500dd1	docs/relnotes: add support for VK_KHR_shader_float16_int8 v2: radv also supports it now (Samuel Pitoiset). Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-20 00:29:16 +02:00
Jason Ekstrand	9ce7c29724	anv/nir: Add a central helper for figuring out SSBO address formats Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	470422870a	nir: Add helpers for getting the type of an address format Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	6e230d7607	anv: Implement VK_EXT_descriptor_indexing Now that everything is in place to do bindless for all resource types except input attachments and UBOs, VK_EXT_descriptor_indexing is "trivial". Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	d6c9bd6e01	anv: Put binding flags in descriptor set layouts Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	c0d9926df7	anv: Use bindless handles for images Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	83af92e593	intel/fs: Add support for bindless image load/store/atomic Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	e6803f6b6f	anv: Use bindless textures and samplers This commit changes anv to put bindless handles and sampler pointers into the descriptor buffer and use those instead of bindful when we run out of binding table space. This "spilling" of descriptors allows to to advertise an almost unbounded number of images and samplers. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	bf61f057f7	anv: Pass the plane into lower_tex_deref Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	f16fcb9db7	anv: Use write_image_view to initialize immutable samplers Instead of setting it manually, call the helper. When setting descriptor sets becomes more complicated than just setting some struct values, this will keep immutable sampler handling correct. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	e612c3b9bf	anv: Count the number of planes in each descriptor binding Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	843286d324	intel/fs: Add support for bindless texture ops We add two new texture sources for bindless surface and sampler handles. Bindless surface handles are expected to be pre-shifted so that the 20-bit surface state table index is in the top 20 bits of the 32-bit handle. This lets us avoid any extra shifts in the shader. Bindless sampler handles are 32-byte aligned byte offsets from general state base address. We use 32-byte aligned instead of 16-byte aligned to avoid having to use more indirect messages than needed. It means we can't tightly pack samplers but that's probably not a big deal. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	2edf29b933	intel,nir: Lower TXD with a bindless sampler When we have a bindless sampler, we need an instruction header. Even in SIMD8, this pushes the instruction over the sampler message size maximum of 11 registers. Instead, we have to lower TXD to TXL. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	bd56ce8ce5	anv: Implement VK_KHR_shader_atomic_int64 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	79fb0d27f3	anv: Implement SSBOs bindings with GPU addresses in the descriptor BO This commit adds a new way for ANV to do SSBO bindings by just passing a GPU address in through the descriptor buffer and using the A64 messages to access the GPU address directly. This means that our variable pointers are now "real" pointers instead of a vec2(BTI, offset) pair. This carries a few of advantages: 1. It lets us support a virtually unbounded number of SSBO bindings. 2. It lets us implement VK_KHR_shader_atomic_int64 which we couldn't implement before because those atomic messages are only available in the bindless A64 form. 3. It's way better than messing around with bindless handles for SSBOs which is the only other option for VK_EXT_descriptor_indexing. 4. It's more future looking, maybe? At the least, this is what NVIDIA does (they don't have binding based SSBOs at all). This doesn't a priori mean it's better, it just means it's probably not terrible. The big disadvantage, of course, is that we have to start doing our own bounds checking for robustBufferAccess again have to push in dynamic offsets. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	3cf78ec2bd	anv: Lower some SSBO operations in apply_pipeline_layout In order to avoid the potential overhead of A64 operations on all SSBO ops, we look for those SSBO ops where we can get to the descriptor set from the SSBO access operation and lower those to a binding-table approach. When robustBufferAccess is enabled, this lets the hardware do the bounds checking for us. It also avoids some potentially expensive 64-bit integer calculations. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	e7a1e8f735	anv: Add a has_a64_buffer_access to anv_physical_device This is more descriptive and a bit nicer than checking for gen >= 8 && use_softpin everywhere. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	b1a633d9fb	intel/nir: Re-run int64 lowering in postprocess_nir We're about to start doing 64-bit pointer calculations in ANV. They will get applied after brw_preprocess_nir which is where we currently do 64-bit integer arithmetic lowering. Because we're adding 64-bit integer arithmetic after the initial lowering has happened, we need to lower again. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	995dc4e5c3	nir/lower_io: Expose some explicit I/O lowering helpers Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	146deec9ef	anv/pipeline: Add skeleton support for spilling to bindless If the number of surfaces or samplers exceeds what we can put in a table, we will want to spill out to bindless. There is no bindless support yet but this gets us the basic framework that will be used by later commits. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	a7d4871846	anv/pipeline: Sort bindings by most used first This commit just sorts the bindings by how often they're used vs the array size of the binding. This will let us make more nuanced decisions about what goes in the binding table vs. what to make bindless. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	a5a0dc08f1	anv: Add a #define for the max binding table size This also fixes a bug where we mis-calculate maximum binding table sizes and may return true in vkGetDescriptorSetLayoutSupport even for sets too large to fit in a binding table. Fixes: `ddc4069122` "anv: Implement VK_KHR_maintenance3" Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	3b755b52e8	anv: Put image params in the descriptor set buffer on gen8 and earlier This is really where they belong; not push constants. The one downside here is that we can't push them anymore for compute shaders. However, that's a general problem and we should figure out how to push descriptor sets for compute shaders. This lets us bump MAX_IMAGES to 64 on BDW and earlier platforms because we no longer have to worry about push constant overhead limits. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Jason Ekstrand	83b943cc2f	anv: Make all VkDeviceMemory BOs resident permanently We spend a lot of time in the driver adding things to hash sets to track residency. The reality is that a properly built Vulkan app uses large memory objects and sub-allocates from them. In a typical frame, most of if not all of those allocations are going to be resident for the entire frame so we're really not saving ourselves much by tracking fine-grained residency. Just throwing everything in the validation list does make it a little bit more expensive inside the kernel to walk the list and ensure that all our VA is in order. However, without relocations, the overhead of that is pretty small. If we ever do run into a memory pressure situation where the fine- grained residency could even potentially help, we would likely be swapping one page out to make room for another within the draw call and performance is totally lost at that point. We're better off swapping out other apps and just letting ours run a whole frame. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 19:56:42 +00:00
Rob Clark	a9241edfa3	freedreno/ir3: fix const assert Fixes: `fe8c57e859` freedreno/ir3: use nir_src_as_uint in a few places Signed-off-by: Rob Clark <robdclark@chromium.org>	2019-04-19 12:36:06 -07:00
Kristian H. Kristensen	bcb81b4d48	gallium/auxiliary/vl: Fix a couple of warnings Remove unused functions and mark unhandled default case with unreachable. Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-04-19 16:17:37 +00:00
Kristian H. Kristensen	0719fc4c31	egl/dri2: Mark potentially unused 'display' variable with MAYBE_UNUSED Sometimes there is no X11 platform. Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-04-19 16:17:37 +00:00
Kristian H. Kristensen	b5a3567b51	ralloc: Fully qualify non-virtual destructor call This suppresses warning about calling a non-virtual destructor in a non-final class with virtual functions: src/compiler/glsl/ast.h:53:4: warning: destructor called on non-final 'ast_node' that has virtual functions but non-virtual destructor [-Wdelete-non-virtual-dtor] DECLARE_LINEAR_ZALLOC_CXX_OPERATORS(ast_node); Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-04-19 16:17:37 +00:00
Kristian H. Kristensen	41593f3c37	nir_opcodes.py: Saturate to expression that doesn't overflow Compiler warns about overflow when assigning UINT64_MAX to something smaller than a uin64_t: src/compiler/nir/nir_constant_expressions.c:16909:50: warning: implicit conversion from 'unsigned long long' to 'uint1_t' (aka 'unsigned char') changes value from 18446744073709551615 to 255 [-Wconstant-conversion] uint1_t dst = (src0 + src1) < src0 ? UINT64_MAX : (src0 + src1); ~~~ ^~~~~~~~~~ Shift UINT64_MAX down to the appropriate maximum value for the type being assigned to. Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-19 16:17:37 +00:00
Kristian H. Kristensen	15605cc9d4	glsl_to_nir: Initialize debug variable If we want to assert on found == true when the loop exits early, we need to initialize it to false. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-19 16:17:37 +00:00
Kristian H. Kristensen	3ecfe20648	tgsi: Mark tgsi_strings_check() unused It's there to hold the static asserts, don't warning about it being unused. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-04-19 16:17:37 +00:00
Lionel Landwerlin	0d46e40467	anv: limit URB reconfigurations when using blorp If the last graphics pipeline bound to the command buffer has enough space in its VS URB entries for Blorp then avoid reconfiguring the URB partitions. v2: s/0/MESA_SHADER_VERTEX/ (Caio) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-19 16:58:06 +01:00
Lionel Landwerlin	84e70556fb	intel/devinfo: add basic sanity tests on device database v2: #undef NDEBUG (Eric) Use inc_include & inc_src (Eric) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Anuj Phogat anuj.phogat@gmail.com	2019-04-19 15:56:21 +00:00
Lionel Landwerlin	773e6aa9fd	intel/devinfo: fix missing num_thread_per_eu on ICL There was an assumption that num_thread_per_eu would be set in the Gen8 features. Since this is mostly the same of all gen8->11 (except GEN9_LP that overwrites it) let's just factor it out. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable@lists.freedesktop.org Acked-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Anuj Phogat anuj.phogat@gmail.com	2019-04-19 15:56:21 +00:00
Eric Anholt	38c75aff4c	nir: Use the nir_builder _imm helpers in setting up deref offsets. When looking at the dEQP nested_struct_array_dynamic_index_fragment code after lowering, I was horrified at the amount of adding and multiplying by 0 we were doing. The builder _imm helpers handle that for you so that the following optimization passes have less work to do. Plus, it's easier to read. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-19 08:45:14 -07:00
Eric Anholt	9ac5ec2f90	nir: Fix deref offset calculation for structs. We were calcuating the offset for the field within the struct, and just dropping it on the floor. Fixes a regression in KHR-GLES3.shaders.struct.local.nested_struct_array_dynamic_index_fragment and a few of its friends since the scratch lowering commit. Fixes: `e8e159e9df` ("nir/deref: Add helpers for getting offsets") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-19 08:45:14 -07:00
Erico Nunes	2288b59ddc	lima: enable nir fsign lowering in ppir The mali utgard pp doesn't support a sign instruction. Use the nir lowering function for fsign to implement fsign in ppir. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-19 15:42:23 +00:00
Erico Nunes	4577eb7b7c	nir/algebraic: add lowering for fsign The mali utgard pp doesn't support a sign instruction. In the ARM offline shader compiler, the sign function is implemented using sub(gt(0.0, a), lt(0.0, a)). This is a generic optimization, so implement it in the nir level when lower_fsign is set, alongside the lowering for isign. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-19 15:42:23 +00:00
Brian Paul	f9c594cdf5	docs: s/Aptril/April/ Found by Manuel Huber. Trivial.	2019-04-19 08:30:27 -06:00
Erico Nunes	56230f0428	lima/ppir: support ppir_op_ceil Add a few missing ppir_op_ceil enum handling entries to implement nir_op_fceil in lima ppir. Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-04-19 10:22:03 +00:00
Bas Nieuwenhuizen	8d2654a419	radv: Support VK_EXT_inline_uniform_block. Basically just reserve the memory in the descriptor sets. On the shader side we construct a buffer descriptor, since AFAIU VGPR indexing on 32-bit pointers in LLVM is still broken. This fully supports update after bind and variable descriptor set sizes. However, the limits are somewhat arbitrary and are mostly about finding a reasonable division of a 2 GiB max memory size over the set. v2: - rebased on top of master (Samuel) - remove the loading resources rework (Samuel) - only load UBO descriptors if it's a pointer (Samuel) - use LLVMBuildPtrToInt to avoid IR failures (Samuel) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v2)	2019-04-19 09:21:47 +02:00
Samuel Pitoiset	2b515a8259	ac/nir: use the new raw/struct SSBO atomic intrisics for comp_swap This is actually fixed now. This change requires LLVM r358579. Make sure to have it in your tree, otherwise the following piglit will hang: tests/spec/arb_shader_storage_buffer_object/execution/ssbo-atomicCompSwap-int.shader_test Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-19 09:20:15 +02:00
Samuel Pitoiset	895e10d2db	ac/nir: only use the new raw/struct SSBO atomic intrinsics with LLVM 9+ They are buggy with older LLVM version, see r358579. Fixes: `78c551aca1` ("ac/nir: use new LLVM 8 intrinsics for SSBO atomics except cmpswap") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-19 09:20:13 +02:00
Samuel Pitoiset	31164cf5f7	ac/nir: only use the new raw/struct image atomic intrinsics with LLVM 9+ They are buggy with LLVM 8 because they weren't marked as source of divergence, see r358579. Fixes: `dd0172e865` ("radv: Use structured intrinsics instead of indexing workaround for GFX9.")" Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-19 09:20:09 +02:00
Kenneth Graunke	a913fbf124	iris: Be less aggressive at postdraw work skipping We empty the cache sets when flushing the batch, at which point we need to add any framebuffer related BOs even though the bindings haven't changed. So, we now do the cache set tracking unconditionally. For now, we continue skipping resolve work based on the same conditions in the predraw functions - the thinking is if we didn't trigger resolves, there's nothing to update here. Time will tell if this works. Partly reverts commit `365886ebe1`, and fixes Unigine Valley rendering on Gen9+. Drops drawoverhead scores by about 10-12%. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110353	2019-04-18 18:51:58 -07:00
Jason Ekstrand	cd4ffb376f	intel/fs: Account for live range lengths in spill costs The current register allocator has a concept of "spill benefit" which is based on the number of nodes with which a given node interferes. The idea is that you want to spill stuff with high interference because those are the most likely registers to help when spilling. However, this fails to take into account the length of the live range so the allocator frequently picks "cheap" (not many uses) registers which are actually very short lived and so spilling them doesn't help with the pressure situation. This commit takes into account the length of the live range to make long-lived registers more likely to get spilled than short-lived ones. This encourages the spill chooser to choose slightly larger registers which will affect a larger area of the program and hopefully we have to spill fewer of them to get the same reduction in over-all register pressure. Shader-db results on Kaby Lake: total spills in shared programs: 23664 -> 12050 (-49.08%) spills in affected programs: 19243 -> 7629 (-60.35%) helped: 296 HURT: 8 total fills in shared programs: 32028 -> 25139 (-21.51%) fills in affected programs: 20378 -> 13489 (-33.81%) helped: 295 HURT: 16 Of course, most of that is in Deus Ex... Shader-db results on Kaby Lake (without Deus Ex): total spills in shared programs: 6479 -> 5834 (-9.96%) spills in affected programs: 3231 -> 2586 (-19.96%) helped: 40 HURT: 4 total fills in shared programs: 17165 -> 17099 (-0.38%) fills in affected programs: 6951 -> 6885 (-0.95%) helped: 40 HURT: 7 Even without Deus Ex, the spill help is pretty respectable. The worst hurt shaders were one compute shader in Aztec Ruins and one fragment shader in KSP that were each hurt by around 13% fill 9% spill. VkPipeline-db results on Kaby Lake: total spills in shared programs: 9149 -> 8069 (-11.80%) spills in affected programs: 5197 -> 4117 (-20.78%) helped: 27 HURT: 16 total fills in shared programs: 26390 -> 25477 (-3.46%) fills in affected programs: 12662 -> 11749 (-7.21%) helped: 24 HURT: 22 The Vulkan results were decidedly more mixed but we don't have nearly as many apps in that database yet. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 23:04:45 +00:00
Gurchetan Singh	1fd635862f	virgl/vtest: bump up protocol version + support encoded transfers This more accurately reflects what the drm winsys does. Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org> Reviewed-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-By: Piotr Rak <p.rak@samsung.com>	2019-04-18 15:39:23 -07:00
Gurchetan Singh	b5698562e4	virgl/vtest: wait after issuing a transfer get Otherwise, there's artifacts when running Unigine Valley with protocol version 2. We can get away with not waiting for most buffers, but let's be conservative. Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org> Reviewed-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-By: Piotr Rak <p.rak@samsung.com>	2019-04-18 15:39:18 -07:00
Gurchetan Singh	581ab2bc70	virgl/vtest: modify sending and receiving data for shared memory We need to copy the shared memory region to the display target. Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org> Reviewed-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-By: Piotr Rak <p.rak@samsung.com>	2019-04-18 15:39:12 -07:00
Gurchetan Singh	96c3418e06	virgl/vtest: receive and handle shared memory fd The only tricky part is with protocol 0 we can either have a display target or resource backing store. With protocol 2 we can have both. Make the map/unmap functions only deal with the resource backing store. v2: Handle MSAA texture case. v3: spelling v4: Fix dangling else (@prak) v5: mmap --> os_mmap (@prak) + added comments (@gerddie) Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org> Reviewed-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-By: Piotr Rak <p.rak@samsung.com>	2019-04-18 15:39:05 -07:00
Gurchetan Singh	9a638bc7c2	virgl/vtest: plumb support for shared memory Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org> Reviewed-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-By: Piotr Rak <p.rak@samsung.com>	2019-04-18 15:38:58 -07:00
Gurchetan Singh	9881733e32	virgl/vtest: add utilities for receiving fds v2: recieve --> receive (airlied@) Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org> Reviewed-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-By: Piotr Rak <p.rak@samsung.com>	2019-04-18 15:38:52 -07:00
Gurchetan Singh	0dd661777a	virgl/vtest: execute a transfer_get when flushing the front buffer This just moves everything to a helper function -- "flush_front_buffer" will be used later. virgl_vtest_resource_map / virgl_vtest_resource_unmap already take care to map the display target. Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org> Reviewed-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-By: Piotr Rak <p.rak@samsung.com>	2019-04-18 15:38:44 -07:00
Gurchetan Singh	599d55371c	virgl: wait after a flush We really need to wait under certain circumstances, or we can end up writing to memory the same time the host is reading. Partial revert of d6dc68 ("virgl: use uint16_t mask instead of separate booleans"). Test cases: - dEQP-GLES31.functional.texture.texture_buffer.render_modify.as_vertex_array.bufferdata on vtest protocol version 2 - Flickering during Alien Isolation Fixes: d6dc68 ("virgl: use uint16_t mask instead of separate booleans") Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org> Reviewed-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-By: Piotr Rak <p.rak@samsung.com>	2019-04-18 15:38:04 -07:00
Lionel Landwerlin	dfd79079da	anv: fix uninitialized pthread cond clock domain Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `843775bab7` ("anv: Rework fences") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 23:23:03 +01:00
Kristian H. Kristensen	e731f2648d	.gitignore: Remove autotool artifacts Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-04-18 14:12:43 -07:00
Eric Anholt	12f6c34806	v3d: Fix atomic cmpxchg in shaders on hardware. In what might be my first case of finding a divergence between hardware and simpenrose for v3d 4.x, it seems that despite what the spec claims, you actually need specific values in the TYPE field for atomic ops. Fixes dEQP-GLES31.functional..compswap.	2019-04-18 13:24:55 -07:00
Eric Anholt	1ce143ca19	v3d: Fix an invalid reuse of flags generation from before a thrsw. Noticed while debugging the last GLES 3.1 failure, though it doesn't seem to affect that bug.	2019-04-18 13:24:55 -07:00
Jason Ekstrand	db4a70e678	anv: Drop some unneeded ANV_FROM_HANDLE for physical devices Ever since `48ed2a7bb0`, we've had one at the top of the function. Reviewed-by: Caio Marcelo de Oliveira Filho caio.oliveira@intel.com	2019-04-18 20:12:57 +00:00
Jason Ekstrand	981209d175	anv: Re-sort the GetPhysicalDeviceFeatures2 switch statement Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-18 20:12:57 +00:00
Marek Olšák	7bc33a5cd5	radeonsi/gfx9: use the correct condition for the DPBB + QUANT_MODE workaround Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-18 15:58:45 -04:00
Ian Romanick	6b97fa9a99	nir/algebraic: Strength reduce some compares of x and -x Converting the x vs -x comparison to an x vs 0 comparison enable cmod propagation to help. The seems to be a win everywhere except Gen7. Skylake and Broadwell had similar results. (Broadwell shown) total instructions in shared programs: 15566733 -> 15566014 (<.01%) instructions in affected programs: 72617 -> 71898 (-0.99%) helped: 302 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 2.38 x̃: 2 helped stats (rel) min: 0.15% max: 7.69% x̄: 1.28% x̃: 0.98% 95% mean confidence interval for instructions value: -2.55 -2.21 95% mean confidence interval for instructions %-change: -1.40% -1.16% Instructions are helped. total cycles in shared programs: 413014786 -> 413015475 (<.01%) cycles in affected programs: 707594 -> 708283 (0.10%) helped: 227 HURT: 101 helped stats (abs) min: 1 max: 612 x̄: 36.07 x̃: 20 helped stats (rel) min: 0.04% max: 19.39% x̄: 2.25% x̃: 1.49% HURT stats (abs) min: 2 max: 334 x̄: 87.90 x̃: 45 HURT stats (rel) min: 0.07% max: 14.51% x̄: 4.54% x̃: 3.36% 95% mean confidence interval for cycles value: -8.12 12.32 95% mean confidence interval for cycles %-change: -0.67% 0.34% Inconclusive result (value mean confidence interval includes 0). Haswell and Ivy Bridge had similar results. (Haswell shown) total instructions in shared programs: 13828220 -> 13827881 (<.01%) instructions in affected programs: 60887 -> 60548 (-0.56%) helped: 253 HURT: 6 helped stats (abs) min: 1 max: 5 x̄: 1.36 x̃: 1 helped stats (rel) min: 0.16% max: 3.85% x̄: 0.81% x̃: 0.64% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.26% max: 0.89% x̄: 0.47% x̃: 0.27% 95% mean confidence interval for instructions value: -1.39 -1.23 95% mean confidence interval for instructions %-change: -0.85% -0.70% Instructions are helped. total cycles in shared programs: 386870095 -> 386894412 (<.01%) cycles in affected programs: 1537307 -> 1561624 (1.58%) helped: 127 HURT: 188 helped stats (abs) min: 1 max: 381 x̄: 17.89 x̃: 4 helped stats (rel) min: 0.02% max: 14.33% x̄: 1.00% x̃: 0.33% HURT stats (abs) min: 2 max: 5585 x̄: 141.43 x̃: 14 HURT stats (rel) min: 0.03% max: 11.50% x̄: 1.65% x̃: 1.06% 95% mean confidence interval for cycles value: 21.95 132.45 95% mean confidence interval for cycles %-change: 0.32% 0.85% Cycles are HURT. Sandy Bridge total instructions in shared programs: 10896339 -> 10896276 (<.01%) instructions in affected programs: 10757 -> 10694 (-0.59%) helped: 49 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.29 x̃: 1 helped stats (rel) min: 0.12% max: 1.85% x̄: 0.87% x̃: 0.89% 95% mean confidence interval for instructions value: -1.42 -1.15 95% mean confidence interval for instructions %-change: -1.03% -0.72% Instructions are helped. total cycles in shared programs: 155091003 -> 155090480 (<.01%) cycles in affected programs: 102761 -> 102238 (-0.51%) helped: 51 HURT: 0 helped stats (abs) min: 1 max: 36 x̄: 10.25 x̃: 4 helped stats (rel) min: 0.02% max: 2.57% x̄: 0.76% x̃: 0.36% 95% mean confidence interval for cycles value: -12.98 -7.53 95% mean confidence interval for cycles %-change: -0.97% -0.56% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8234667 -> 8234652 (<.01%) instructions in affected programs: 2063 -> 2048 (-0.73%) helped: 15 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.30% max: 1.56% x̄: 0.82% x̃: 0.81% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.97% -0.67% Instructions are helped. total cycles in shared programs: 188700906 -> 188700598 (<.01%) cycles in affected programs: 283480 -> 283172 (-0.11%) helped: 83 HURT: 3 helped stats (abs) min: 2 max: 8 x̄: 3.78 x̃: 4 helped stats (rel) min: 0.04% max: 0.55% x̄: 0.15% x̃: 0.12% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.02% max: 0.04% x̄: 0.03% x̃: 0.04% 95% mean confidence interval for cycles value: -3.87 -3.29 95% mean confidence interval for cycles %-change: -0.16% -0.12% Cycles are helped. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 12:37:48 -07:00
Ian Romanick	f3d6df719c	nir/algebraic: Fix some 1-bit Boolean weirdness Skylake, Broadwell, and Haswell had similar results. (Skylake shown) total cycles in shared programs: 372594532 -> 372594460 (<.01%) cycles in affected programs: 46854 -> 46782 (-0.15%) helped: 9 HURT: 0 helped stats (abs) min: 2 max: 22 x̄: 8.00 x̃: 2 helped stats (rel) min: 0.02% max: 0.41% x̄: 0.16% x̃: 0.09% 95% mean confidence interval for cycles value: -14.34 -1.66 95% mean confidence interval for cycles %-change: -0.28% -0.04% Cycles are helped. Ivy Bridge total instructions in shared programs: 12038379 -> 12038373 (<.01%) instructions in affected programs: 1278 -> 1272 (-0.47%) helped: 3 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.31% max: 0.77% x̄: 0.54% x̃: 0.55% total cycles in shared programs: 180889027 -> 180888997 (<.01%) cycles in affected programs: 29979 -> 29949 (-0.10%) helped: 5 HURT: 0 helped stats (abs) min: 1 max: 16 x̄: 6.00 x̃: 5 helped stats (rel) min: 0.02% max: 0.34% x̄: 0.11% x̃: 0.07% 95% mean confidence interval for cycles value: -13.40 1.40 95% mean confidence interval for cycles %-change: -0.27% 0.05% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge total cycles in shared programs: 155091021 -> 155091003 (<.01%) cycles in affected programs: 8842 -> 8824 (-0.20%) helped: 2 HURT: 0 No changes on Iron Lake or GM45. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 12:37:48 -07:00
Ian Romanick	403aac7500	nir/algebraic: Replace a pattern where iand with a Boolean is used as a bcsel All of the affected shaders are in Mad Max. I noticed this while looking at some other things. I tried a couple similar patterns, but the affect on cycles was general negative. It may be worth revisiting this later. v2: Rebase on 1-bit Boolean changes. All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15282073 -> 15282053 (<.01%) instructions in affected programs: 1192 -> 1172 (-1.68%) helped: 14 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.43 x̃: 1 helped stats (rel) min: 1.16% max: 2.17% x̄: 1.65% x̃: 1.39% 95% mean confidence interval for instructions value: -1.73 -1.13 95% mean confidence interval for instructions %-change: -1.91% -1.38% Instructions are helped. total cycles in shared programs: 372595954 -> 372594532 (<.01%) cycles in affected programs: 11477 -> 10055 (-12.39%) helped: 14 HURT: 0 helped stats (abs) min: 76 max: 122 x̄: 101.57 x̃: 104 helped stats (rel) min: 7.76% max: 15.62% x̄: 12.94% x̃: 14.78% 95% mean confidence interval for cycles value: -111.05 -92.09 95% mean confidence interval for cycles %-change: -14.90% -10.98% Cycles are helped. No changes on any Gen6 or earlier platforms. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 12:37:48 -07:00
Ian Romanick	25bfba3335	nir/algebraic: Recognize open-coded copysign(1.0, a) All of the affected shaders are in Mad Max. The inner part of the pattern is itself an open-coded sign(a). I tried using that as a pattern, but the results were not good. A bunch of shaders were helped for instructions, but overall cycles, spill, and fills were hurt. v2: Rebase on 1-bit Boolean changes. v3: Fix order of copysign() parameters in comments and commit message. Noticed by Matt. All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15282141 -> 15282073 (<.01%) instructions in affected programs: 6106 -> 6038 (-1.11%) helped: 17 HURT: 0 helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 helped stats (rel) min: 1.02% max: 2.20% x̄: 1.15% x̃: 1.06% 95% mean confidence interval for instructions value: -4.00 -4.00 95% mean confidence interval for instructions %-change: -1.30% -1.00% Instructions are helped. total cycles in shared programs: 372597886 -> 372595954 (<.01%) cycles in affected programs: 32701 -> 30769 (-5.91%) helped: 17 HURT: 0 helped stats (abs) min: 6 max: 216 x̄: 113.65 x̃: 118 helped stats (rel) min: 0.40% max: 21.86% x̄: 6.20% x̃: 5.83% 95% mean confidence interval for cycles value: -152.84 -74.45 95% mean confidence interval for cycles %-change: -8.89% -3.51% Cycles are helped. No changes on any Gen6 or earlier platforms. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 12:37:48 -07:00
Ian Romanick	1711bf6cf2	intel/fs: Generate better code for fsign multiplied by a value v2: Rebase on v2 changes in previous two commits. v3: Rebase on `85c35885b3` ("nir: Rework nir_src_as_alu_instr to not take a pointer"). shader-db results: Skylake and Broadwell had similar results. (Skylake shown) total instructions in shared programs: 15297100 -> 15282141 (-0.10%) instructions in affected programs: 956685 -> 941726 (-1.56%) helped: 4527 HURT: 0 helped stats (abs) min: 1 max: 221 x̄: 3.30 x̃: 2 helped stats (rel) min: 0.07% max: 10.53% x̄: 1.85% x̃: 1.37% 95% mean confidence interval for instructions value: -3.48 -3.12 95% mean confidence interval for instructions %-change: -1.88% -1.81% Instructions are helped. total cycles in shared programs: 372809551 -> 372597886 (-0.06%) cycles in affected programs: 13645512 -> 13433847 (-1.55%) helped: 4362 HURT: 125 helped stats (abs) min: 1 max: 2088 x̄: 50.73 x̃: 28 helped stats (rel) min: 0.01% max: 28.20% x̄: 2.77% x̃: 2.39% HURT stats (abs) min: 1 max: 1836 x̄: 76.90 x̃: 28 HURT stats (rel) min: <.01% max: 34.36% x̄: 3.03% x̃: 1.42% 95% mean confidence interval for cycles value: -50.98 -43.37 95% mean confidence interval for cycles %-change: -2.67% -2.55% Cycles are helped. total spills in shared programs: 23465 -> 23463 (<.01%) spills in affected programs: 42 -> 40 (-4.76%) helped: 1 HURT: 0 total fills in shared programs: 31766 -> 31763 (<.01%) fills in affected programs: 69 -> 66 (-4.35%) helped: 1 HURT: 0 Haswell total instructions in shared programs: 13839992 -> 13828311 (-0.08%) instructions in affected programs: 712503 -> 700822 (-1.64%) helped: 3477 HURT: 0 helped stats (abs) min: 1 max: 221 x̄: 3.36 x̃: 2 helped stats (rel) min: 0.07% max: 10.64% x̄: 1.96% x̃: 1.52% 95% mean confidence interval for instructions value: -3.58 -3.14 95% mean confidence interval for instructions %-change: -2.01% -1.92% Instructions are helped. total cycles in shared programs: 387026330 -> 386872483 (-0.04%) cycles in affected programs: 11329966 -> 11176119 (-1.36%) helped: 3307 HURT: 139 helped stats (abs) min: 2 max: 1776 x̄: 49.58 x̃: 18 helped stats (rel) min: 0.01% max: 20.38% x̄: 2.27% x̃: 1.79% HURT stats (abs) min: 1 max: 2314 x̄: 72.68 x̃: 20 HURT stats (rel) min: <.01% max: 33.99% x̄: 2.28% x̃: 0.96% 95% mean confidence interval for cycles value: -49.31 -39.98 95% mean confidence interval for cycles %-change: -2.15% -2.01% Cycles are helped. LOST: 1 GAINED: 0 Ivy Bridge total instructions in shared programs: 12045602 -> 12038463 (-0.06%) instructions in affected programs: 623837 -> 616698 (-1.14%) helped: 2498 HURT: 0 helped stats (abs) min: 1 max: 39 x̄: 2.86 x̃: 2 helped stats (rel) min: 0.05% max: 10.00% x̄: 1.30% x̃: 1.05% 95% mean confidence interval for instructions value: -2.96 -2.75 95% mean confidence interval for instructions %-change: -1.34% -1.26% Instructions are helped. total cycles in shared programs: 181025675 -> 180891323 (-0.07%) cycles in affected programs: 11329329 -> 11194977 (-1.19%) helped: 2439 HURT: 47 helped stats (abs) min: 1 max: 1565 x̄: 57.06 x̃: 26 helped stats (rel) min: 0.02% max: 24.56% x̄: 2.02% x̃: 1.64% HURT stats (abs) min: 1 max: 1269 x̄: 102.51 x̃: 43 HURT stats (rel) min: 0.11% max: 52.94% x̄: 4.15% x̃: 1.34% 95% mean confidence interval for cycles value: -59.91 -48.17 95% mean confidence interval for cycles %-change: -1.99% -1.82% Cycles are helped. Sandy Bridge, Iron Lake, and GM45 had similar results. (Sandy Bridge shown) total instructions in shared programs: 10896368 -> 10896339 (<.01%) instructions in affected programs: 3767 -> 3738 (-0.77%) helped: 17 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 1.71 x̃: 1 helped stats (rel) min: 0.13% max: 9.52% x̄: 3.58% x̃: 2.73% 95% mean confidence interval for instructions value: -2.27 -1.14 95% mean confidence interval for instructions %-change: -5.14% -2.03% Instructions are helped. total cycles in shared programs: 155091109 -> 155091021 (<.01%) cycles in affected programs: 47241 -> 47153 (-0.19%) helped: 15 HURT: 8 helped stats (abs) min: 2 max: 81 x̄: 15.73 x̃: 4 helped stats (rel) min: 0.03% max: 10.59% x̄: 1.55% x̃: 0.71% HURT stats (abs) min: 14 max: 32 x̄: 18.50 x̃: 17 HURT stats (rel) min: 0.32% max: 2.79% x̄: 2.43% x̃: 2.71% 95% mean confidence interval for cycles value: -14.59 6.93 95% mean confidence interval for cycles %-change: -1.41% 1.08% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]	2019-04-18 12:38:05 -07:00
Ian Romanick	06d2c11641	intel/fs: Add a scale factor to emit_fsign Normally fsign generates -1, 0, or +1. The new scale factor, S, causes fsign to generate -S, 0, or +S. v2: Rebase on v2 changes in previous commit. v3: Rebase on `85c35885b3` ("nir: Rework nir_src_as_alu_instr to not take a pointer"). Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]	2019-04-18 12:37:48 -07:00
Ian Romanick	ad98fbc217	intel/fs: Refactor code generation for nir_op_fsign to its own function v2: Call emit_fsign from inside the existing switch statement. Suggested by Matt. Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 12:37:48 -07:00
Ian Romanick	90430d0488	intel/fs: Eliminate dead code first This simplifies the later patch "i965/fs: Generate better code for fsign multiplied by a value". shader-db results: Broadwell and Skylake had similar results. (Skylake shown) total cycles in shared programs: 372808735 -> 372809551 (<.01%) cycles in affected programs: 1519520 -> 1520336 (0.05%) helped: 243 HURT: 277 helped stats (abs) min: 1 max: 226 x̄: 34.05 x̃: 5 helped stats (rel) min: 0.01% max: 13.88% x̄: 1.46% x̃: 0.27% HURT stats (abs) min: 1 max: 1810 x̄: 32.82 x̃: 5 HURT stats (rel) min: 0.01% max: 16.03% x̄: 1.56% x̃: 0.29% 95% mean confidence interval for cycles value: -7.18 10.32 95% mean confidence interval for cycles %-change: -0.17% 0.46% Inconclusive result (value mean confidence interval includes 0). Sandy Bridge, Haswell and Ivy Bridge had similar results. (Sandy Bridge shown) total cycles in shared programs: 155091458 -> 155091109 (<.01%) cycles in affected programs: 370797 -> 370448 (-0.09%) helped: 24 HURT: 36 helped stats (abs) min: 1 max: 331 x̄: 103.17 x̃: 41 helped stats (rel) min: 0.02% max: 7.70% x̄: 2.07% x̃: 0.56% HURT stats (abs) min: 1 max: 291 x̄: 59.08 x̃: 10 HURT stats (rel) min: 0.02% max: 5.29% x̄: 1.02% x̃: 0.15% 95% mean confidence interval for cycles value: -37.92 26.28 95% mean confidence interval for cycles %-change: -0.88% 0.45% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (GM45 shown) total cycles in shared programs: 129133970 -> 129133978 (<.01%) cycles in affected programs: 111966 -> 111974 (<.01%) helped: 3 HURT: 1 helped stats (abs) min: 2 max: 4 x̄: 2.67 x̃: 2 helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% HURT stats (abs) min: 16 max: 16 x̄: 16.00 x̃: 16 HURT stats (rel) min: 0.07% max: 0.07% x̄: 0.07% x̃: 0.07% 95% mean confidence interval for cycles value: -12.93 16.93 95% mean confidence interval for cycles %-change: -0.05% 0.08% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 12:37:48 -07:00
Kristian H. Kristensen	a90aa14f5a	freedreno: Fix format string warning Modifiers are uin64_t. Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-04-18 11:46:13 -07:00
Kristian H. Kristensen	9c82a55efc	freedreno/a6xx: Add helper for incrementing regid Increments the regid by specified amount unless regid is is r63.x (invalid). Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-04-18 11:46:13 -07:00
Kristian H. Kristensen	6aa211b316	freedreno: Use enum values from matching enum We get a couple of warnings from using mismatched enum values. This fixes that. Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>	2019-04-18 11:46:13 -07:00
Kristian H. Kristensen	c34b285b38	freedreno/a2xx: Fix redundant if statement We test the condition, declare a few variables, then test the exact same condition again. Let's not do that. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-04-18 11:46:13 -07:00
Kristian H. Kristensen	18ce6ac632	freedreno/ir3: Mark ir3_context_error() as NORETURN Fixes a few warnings. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-04-18 11:46:13 -07:00
Jason Ekstrand	c6463f8ac2	nir: Add a nir_src_as_intrinsic() helper Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-18 17:12:44 +00:00
Jason Ekstrand	85c35885b3	nir: Rework nir_src_as_alu_instr to not take a pointer Other nir_src_as_* functions just take a nir_src. It's not that much more memory copying and the constness preserving really isn't worth the cognitive dissonance. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-18 17:12:44 +00:00
Jason Ekstrand	eee994e769	nir: Drop "struct" from some nir_* declarations Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-18 17:12:44 +00:00
Lionel Landwerlin	db5b372bb9	anv: implement WaEnableStateCacheRedirectToCS This 3d performance workaround was initially put in the kernel but the media driver requires different settings so the register has been whitelisted in i915 [1] and userspace drivers are left initializing it as they wish. [1] : https://patchwork.freedesktop.org/series/59494/ Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2019-04-18 17:43:08 +01:00
Lionel Landwerlin	eaadb62c9e	i965: implement WaEnableStateCacheRedirectToCS This 3d performance workaround was initially put in the kernel but the media driver requires different settings so the register has been whitelisted in i915 [1] and userspace drivers are left initializing it as they wish. [1] : https://patchwork.freedesktop.org/series/59494/ Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2019-04-18 17:43:08 +01:00
Lionel Landwerlin	d1be67db39	iris: implement WaEnableStateCacheRedirectToCS This 3d performance workaround was initially put in the kernel but the media driver requires different settings so the register has been whitelisted in i915 [1] and userspace drivers are left initializing it as they wish. [1] : https://patchwork.freedesktop.org/series/59494/ Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2019-04-18 17:43:08 +01:00
Iago Toral Quiroga	c2b8fb9a81	anv/device: expose VK_KHR_shader_float16_int8 in gen8+ v2 (Jason): - Merge shaderFloat16 and shaderInt8 enablement into a single patch. - Merge extension enable. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)	2019-04-18 13:23:03 +02:00
Iago Toral Quiroga	5a5d44b713	anv/pipeline: support Float16 and Int8 SPIR-V capabilities in gen8+ v2: - Merge Float16 and Int8 capabilities into a single patch (Jason) - Merged patch that enabled SPIR-V front-end checks for these caps (except for Int8, which was already merged) v3: - Keep capabilities sorted (Jason) v4: - SpvCapabilityFloat16 support already added in master (Juan) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)	2019-04-18 13:23:03 +02:00
Iago Toral Quiroga	e6ee07a664	compiler/spirv: move the check for Int8 capability So it is right after the checks for the other various Int* capabilities. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 13:23:03 +02:00
Iago Toral Quiroga	8ed6d74c92	intel/compiler: validate region restrictions for mixed float mode v2: - Adapted unit tests to make them consistent with the changes done to the validation of half-float conversions. v3 (Curro): - Check all the accummulators - Constify declarations - Do not check src1 type in single-source instructions. - Check for all instructions that read accumulator (either implicitly or explicitly) - Check restrictions in src1 too. - Merge conditional block - Add invalid test case. v4 (Curro): - Assert on 3-src instructions, as they are not validated. - Get rid of types_are_mixed_float(), as we know instruction is mixed float at that point. - Remove conditions from not verified case. - Fix brackets on conditional. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-04-18 13:22:46 +02:00
Iago Toral Quiroga	58d6417e59	intel/compiler: validate conversions between 64-bit and 8-bit types v2: - Add some tests with UB type too (Jason) v3: - consider implicit conversions from 2src instructions too (Curro). v4: - Do not check src1 type in single-source instructions (Curro). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	7376d57a9c	intel/compiler: validate region restrictions for half-float conversions v2: - Consider implicit conversions in 2-src instructions too (Curro) - For restrictions that involve destination stride requirements only validate them for Align1, since Align16 always requires packed data. - Skip general rule for the dst/execution type size ratio for mixed float instructions on CHV and SKL+, these have their own set of rules that we'll be validated separately. v3 (Curro): - Do not check src1 type in single-source instructions. - Check restriction on src1. - Remove invalid test. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	6ff52f0628	intel/compiler: also set F execution type for mixed float mode in BDW The section 'Execution Data Types' of 3D Media GPGPU volume, which describes execution types, is exactly the same in BDW and SKL+. Also, this section states that there is a single execution type, so it makes sense that this is the wider of the two floating point types involved in mixed float mode, which is what we do for SKL+ and CHV. v2: - Make sure we also account for the destination type in mixed mode (Curro). Acked-by: Francisco Jerez <currojerez@riseup.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	100debc3c9	intel/compiler: implement SIMD16 restrictions for mixed-float instructions v2: f32to16/f16to32 can use a :W destination (Curro) v3: check destination is packed (Curro). Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	6d87c651c9	intel/compiler: skip MAD algebraic optimization for half-float or mixed mode It is very likely that this optimzation is never useful and we'll probably just end up removing it, so let's not bother adding more cases to it for now. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	64b93292ac	intel/compiler: remove inexact algebraic optimizations from the backend NIR already has these and correctly considers exact/inexact qualification, whereas the backend doesn't and can apply the optimizations where it shouldn't. This happened to be the case in a handful of Tomb Raider shaders, where NIR would skip the optimizations because of a precise qualification but the backend would then (incorrectly) apply them anyway. Besides this, considering that we are not emitting much math in the backend these days it is unlikely that these optimizations are useful in general. A shader-db run confirms that MAD and LRP optimizations, for example, were only being triggered in cases where NIR would skip them due to precise requirements, so in the near future we might want to remove more of these, but for now we just remove the ones that are not completely correct. Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	ddd1706ab3	intel/compiler: fix cmod propagation for non 32-bit types v2: - Do not propagate if the bit-size changes Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	66002eeebe	intel/compiler: add a brw_reg_type_is_integer helper v2: - Fixed typo: meant BRW_REGISTER_TYPE_UB instead BRW_REGISTER_TYPE_UV Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	44e1affaec	intel/compiler: implement is_zero, is_one, is_negative_one for 8-bit/16-bit There are no 8-bit immediates, so assert in that case. 16-bit immediates are replicated in each word of a 32-bit immediate, so we only need to check the lower 16-bits. v2: - Fix is_zero with half-float to consider -0 as well (Jason). - Fix is_negative_one for word type. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	e64be391dd	intel/compiler: generalize the combine constants pass At the very least we need it to handle HF too, since we are doing constant propagation for MAD and LRP, which relies on this pass to promote the immediates to GRF in the end, but ideally we want it to support even more types so we can take advantage of it to improve register pressure in some scenarios. v2 (Jason): - Support 64-bit types too. - Check if we need to set the half-float flag if the immediate already existed. - Multiply the size of the immediate by the width of the copy Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	fb990bd76e	intel/eu: force stride of 2 on NULL register for Byte instructions The hardware only allows a stride of 1 on a Byte destination for raw byte MOV instructions. This is required even when the destination is the NULL register. Rather than making sure that we emit a proper NULL:B destination every time we need one, just fix it at emission time. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	ce68a061de	intel/compiler: ask for an integer type if requesting an 8-bit type v2: - Assign BRW_REGISTER_TYPE_B directly for 8-bit (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	092b147774	intel/compiler: rework conversion opcodes Now that we have the regioning lowering pass we can just put all of these opcodes together in a single block and we can just assert on the few cases of conversion instructions that are not supported in hardware and that should be lowered in brw_nir_lower_conversions. The only cases what we still handle separately are the conversions from float to half-float since the rounding variants would need to fallthrough and we are already doing this for boolean opcodes (since they need to negate), plus there is also a large comment about these opcodes that we probably want to keep so it is just easier to keep these separate. Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	472244b374	intel/compiler: activate 16-bit bit-size lowerings also for 8-bit Particularly, we need the same lowewrings we use for 16-bit integers. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	40b3abb4d1	intel/compiler: split is_partial_write() into two variants This function is used in two different scenarios that for 32-bit instructions are the same, but for 16-bit instructions are not. One scenario is that in which we are working at a SIMD8 register level and we need to know if a register is fully defined or written. This is useful, for example, in the context of liveness analysis or register allocation, where we work with units of registers. The other scenario is that in which we want to know if an instruction is writing a full scalar component or just some subset of it. This is useful, for example, in the context of some optimization passes like copy propagation. For 32-bit instructions (or larger), a SIMD8 dispatch will always write at least a full SIMD8 register (32B) if the write is not partial. The function is_partial_write() checks this to determine if we have a partial write. However, when we deal with 16-bit instructions, that logic disables some optimizations that should be safe. For example, a SIMD8 16-bit MOV will only update half of a SIMD register, but it is still a complete write of the variable for a SIMD8 dispatch, so we should not prevent copy propagation in this scenario because we don't write all 32 bytes in the SIMD register or because the write starts at offset 16B (wehere we pack components Y or W of 16-bit vectors). This is a problem for SIMD8 executions (VS, TCS, TES, GS) of 16-bit instructions, which lose a number of optimizations because of this, most important of which is copy-propagation. This patch splits is_partial_write() into is_partial_reg_write(), which represents the current is_partial_write(), useful for things like liveness analysis, and is_partial_var_write(), which considers the dispatch size to check if we are writing a full variable (rather than a full register) to decide if the write is partial or not, which is what we really want in many optimization passes. Then the patch goes on and rewrites all uses of is_partial_write() to use one or the other version. Specifically, we use is_partial_var_write() in the following places: copy propagation, cmod propagation, common subexpression elimination, saturate propagation and sel peephole. Notice that the semantics of is_partial_var_write() exactly match the current implementation of is_partial_write() for anything that is 32-bit or larger, so no changes are expected for 32-bit instructions. Tested against ~5000 tests involving 16-bit instructions in CTS produced the following changes in instruction counts: Patched \| Master \| % \| ================================================ SIMD8 \| 621,900 \| 706,721 \| -12.00% \| ================================================ SIMD16 \| 93,252 \| 93,252 \| 0.00% \| ================================================ As expected, the change only affects SIMD8 dispatches. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	0986199b31	intel/compiler: workaround for SIMD8 half-float MAD in gen8 Empirical testing shows that gen8 has a bug where MAD instructions with a half-float source starting at a non-zero offset fail to execute properly. This scenario usually happened in SIMD8 executions, where we used to pack vector components Y and W in the second half of SIMD registers (therefore, with a 16B offset). It looks like we are not currently doing this any more but this would handle the situation properly if we ever happen to produce code like this again. v2 (Jason): - Move this workaround to the lower_regioning pass as an additional case to has_invalid_src_region() - Do not apply the workaround if the stride of the source operand is 0, testing suggests the problem doesn't exist in that case. v3 (Jason): - We want offset % REG_SIZE > 0, not just offset > 0 - Use a helper to compute the offset Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	aaae24179f	intel/compiler: fix ddy for half-float in Broadwell Broadwell has restrictions that apply to Align16 half-float that make the Align16 implementation of this invalid for this platform. Use the gen11 path for this instead, which uses Align1 mode. The restriction is not present in cherryview, gen9 or gen10, where the Align16 implementation seems to work just fine. v2: - Rework the comment in the code, move the PRM citation from the commit message to the comment in the code (Matt) - Cherryview isn't affected, only Broadwell (Matt) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1) Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	60c7c6d3ba	intel/compiler: fix ddx and ddy for 16-bit float We were assuming 32-bit elements. Also, In SIMD8 we pack 2 vector components in a single SIMD register, so for example, component Y of a 16-bit vec2 starts is at byte offset 16B. This means that when we compute the offset of the elements to be differentiated we should not stomp whatever base offset we have, but instead add to it. v2 - Use byte_offset() helper (Jason) - Merge the fix for SIMD8: using byte_offset() fixes that too. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1) Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	8f40d392b9	intel/compiler: set correct precision fields for 3-source float instructions Source0 and Destination extract the floating-point precision automatically from the SrcType and DstType instruction fields respectively when they are set to types :F or :HF. For Source1 and Source2 operands, we use the new 1-bit fields Src1Type and Src2Type, where 0 means normal precision and 1 means half-precision. Since we always use the type of the destination for all operands when we emit 3-source instructions, we only need set Src1Type and Src2Type to 1 when we are emitting a half-precision instruction. v2: - Set the bit separately for each source based on its type so we can do mixed floating-point mode in the future (Topi). v3: - Use regular citation style for the comment referencing the PRM (Matt). - Decided not to add asserts in the emission code to check that only mixed HF/F types are used since such checks would break negative tests for brw_eu_validate.c (Matt) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	e6b7410187	intel/compiler: allow half-float on 3-source instructions since gen8 Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	ee049f6b71	intel/compiler: don't compact 3-src instructions with Src1Type or Src2Type bits We are now using these bits, so don't assert that they are not set. In gen8, if these bits are set compaction is not possible. On gen9 and CHV platforms set_3src_control_index() checks these bits (and others) against a table to validate if the particular bit combination is eligible for compaction or not. v2 - Add more detail in the commit message explaining the situation for SKL+ and CHV (Jason) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	120c970619	intel/compiler: add new half-float register type for 3-src instructions This is available since gen8. v2: restore previously existing assertion. v3: don't use separate tables for gen7 and gen8, just assert that we don't use half-float before gen8 (Matt) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	4ab2b97a8f	intel/compiler: add instruction setters for Src1Type and Src2Type. The original SrcType is a 3-bit field that takes a subset of the types supported for the hardware for 3-source instructions. Since gen8, when the half-float type was added, 3-source floating point operations can use use mixed precision mode, where not all the operands have the same floating-point precision. While the precision for the first operand is taken from the type in SrcType, the bits in Src1Type (bit 36) and Src2Type (bit 35) define the precision for the other operands (0: normal precision, 1: half precision). Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	a8d8b1a139	intel/compiler: drop unnecessary temporary from 32-bit fsign implementation Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	19cd2f5deb	intel/compiler: implement 16-bit fsign v2: - make 16-bit be its own separate case (Jason) v3: - Drop the result_int temporary (Jason) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	4588f4a604	intel/compiler: handle extended math restrictions for half-float Extended math with half-float operands is only supported since gen9, but it is limited to SIMD8. In gen8 we lower it to 32-bit. v2: quashed together the following patches (Jason): - intel/compiler: allow extended math functions with HF operands - intel/compiler: lower 16-bit extended math to 32-bit prior to gen9 - intel/compiler: extended Math is limited to SIMD8 on half-float Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (allow extended math functions with HF operands, extended Math is limited to SIMD8 on half-float)	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	114f4e6c29	intel/compiler: lower some 16-bit float operations to 32-bit The hardware doesn't support half-float for these. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	b6a454791b	intel/compiler: assert restrictions on conversions to half-float There are some hardware restrictions that brw_nir_lower_conversions should have taken care of before we get here. v2: - rebased on top of regioning lowering pass Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	66806405af	intel/compiler: handle b2i/b2f with other integer conversion opcodes Since we handle booleans as integers this makes more sense. v2: - rebased to incorporate new boolean conversion opcodes v3: - rebased on top regioning lowering pass Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v2)	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	92f4761198	intel/compiler: split float to 64-bit opcodes from int to 64-bit Going forward having these split is a bit more convenient since these two groups have different restrictions. v2: - Rebased on top of new regioning lowering pass. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Iago Toral Quiroga	3e377c68f8	intel/compiler: add a NIR pass to lower conversions Some conversions are not directly supported in hardware and need to be split in two conversion instructions going through an intermediary type. Doing this at the NIR level simplifies a bit the complexity in the backend. v2: - Consider fp16 rounding conversion opcodes - Properly handle swizzles on conversion sources. v3 - Run the pass earlier, right after nir_opt_algebraic_late (Jason) - NIR alu output types already have the bit-size (Jason) - Use 'is_conversion' to identify conversion operations (Jason) v4: - Be careful about the intermediate types we use so we don't lose range and avoid incorrect rounding semantics (Jason) Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-18 11:05:18 +02:00
Dominik Drees	829f278ad0	Add no_aos_sampling GALLIVM_PERF option This forces using general sampling and should improve precision and performance in some cases.	2019-04-17 22:16:19 +00:00
Samuel Pitoiset	ad6dc13fc7	ac: use struct/raw store intrinsics for 8-bit/16-bit int with LLVM 9+ This changes requires LLVM r356465. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-17 22:10:30 +02:00
Samuel Pitoiset	26ea506235	ac: use struct/raw load intrinsics for 8-bit/16-bit int with LLVM 9+ This changes requires LLVM r356465. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-17 22:10:28 +02:00
Samuel Pitoiset	6fd5e39b60	ac: add support for more types with struct/raw LLVM intrinsics LLVM 9+ now supports 8-bit and 16-bit types. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-17 22:10:25 +02:00
Samuel Pitoiset	9cf55b022d	radv: add VK_KHR_shader_atomic_int64 but disable it for now No support for 64-bit compare&swap atomic operations. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-17 21:59:56 +02:00
Samuel Pitoiset	d118e382dd	ac/nir: add 64-bit SSBO atomic operations support Except compare&swap which is still buggy. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-17 21:59:54 +02:00
Samuel Pitoiset	78c551aca1	ac/nir: use new LLVM 8 intrinsics for SSBO atomics except cmpswap Use the raw version (ie. IDXEN=0) because vindex is unused. Use the old intrinsic for compare&swap because the new one hangs the GPU for some reasons. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-17 21:59:52 +02:00
Roland Scheidegger	dded2edf8b	gallivm: fix saturated signed add / sub with llvm 9 llvm 8 removed saturated unsigned add / sub x86 sse2 intrinsics, and now llvm 9 removed the signed versions as well - they were proposed for removal earlier, but the pattern to recognize those was very complex, so it wasn't done then. However, instead of these arch-specific intrinsics, there's now arch-independent intrinsics for saturated add / sub, both for signed and unsigned, so use these. They should have only advantages (work with arbitrary vector sizes, optimal code for all archs), although I don't know how well they work in practice for other archs (at least for x86 they do the right thing). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110454 Reviewed-by: Brian Paul <brianp@vmware.com>	2019-04-17 17:42:13 +02:00
Juan A. Suarez Romero	b74e605cf4	meson: Add dependency on genxml to anvil genfiles This fixes a race condition where anv_gen_files are executed before genxml files, which causes a build failure v2: add dependency on idep_genxml (Lionel) Fixes: `d1992255bb` ("meson: Add build Intel "anv" vulkan driver") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-17 15:49:55 +02:00
Lionel Landwerlin	baf59e40cd	intel/perf: constify accumlator parameter Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	93dbe52ab0	intel/perf: drop counter size field We can deduct the size from another field, let's just save some space. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	a646485c28	i965: perf: add mdapi pipeline statistics queries on gen10/11 The Gen10+ expected format adds an additional counter which we can't disclose yet. We can still make the size of the expected query result match. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	d855906366	intel/perf: stub gen10/11 missing definitions Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	d47cc4acbf	i965: move mdapi guid into intel/perf One more thing we want to share between the different APIs. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	b48d6d7471	i965: move mdapi result data format to intel/perf We want to reuse this in Anv. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	2be07fc751	i965: move brw_timebase_scale to device info Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	41b54b5faf	i965: move OA accumulation code to intel/perf We'll want to reuse this in our Vulkan extension. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	f6bba7760f	i965: move mdapi data structure to intel/perf We'll want to reuse those structures later on. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	134e750e16	i965: extract performance query metrics We would like to reuse performance query metrics in other APIs. Let's make the query code dealing with the processing of raw counters into human readable values API agnostic. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <mark.a.janes@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-17 14:10:42 +01:00
Lionel Landwerlin	603ddda622	i965: store device revision in gen_device_info Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-17 14:10:42 +01:00
Topi Pohjolainen	ea42ba36b9	intel/compiler/icl: Use tcs barrier id bits 24:30 instead of 24:27 Similarly to `1cc17fb731` Fixes gpu hangs with dEQP-VK.tessellation.shader_input_output.barrier Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2019-04-17 14:55:49 +03:00
Erik Faye-Lund	ce1761edab	virgl: document potentially failing blit This blit can fail, but this is not new; in the old version we didn't even try to blit in this case. So let's just document the limitation for now, and leave this for another day. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	3fdacf1c39	virgl: do color-conversion during when mapping transfer When running on OpenGL ES, we can't just map any format for reading, because of limitations on glReadPixels. So let's fall back to the blit code-path, and translate the pixels to the correct format in the end. This fixes the remaining failures of KHR-GL32.packed_pixels.* apart from the sRGB tests. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	9e9d9b352e	virgl: only blit if resource is read Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	fba03322a2	virgl: get readback-formats from host Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	749bbd39c7	gallium/util: support translating between uint and sint formats Without this, we can't for instance convert between r8_sint and r8g8b8a8_sint. But that's pretty useful, so let's support it as well. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	f31b65f1c1	virgl: make sure bind is set for non-buffers Otherwise, virglrenderer will reject the resource. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	afbd68378a	virgl: support write-back with staged transfers We currently don't support writing to resources that uses a temporary staging-resource to resolve the pixels. If a write-bit was set, we forgot to perform a blit back to the old resource, followed by trying to update the wrong resource, which lacks backing-storage. The end-result would be that nothing useful happened. This approach also fixes a few smaller bugs, like using the wrong box (without x y and z zeroed out), which means a partial update of a multisampled texture could result in the wrong part of the texture being updated. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	0bc8683ffa	virgl: use pipe_box for blit dst-rect Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	121e366632	virgl: rewrite core of virgl_texture_transfer_map Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	1f27bd3f2b	virgl: return error if allocating resolve_tmp fails Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	fc8b1ca33a	virgl: wait for the right resource In case we're resolving, we need to wait for the resolved resource instead of the original one. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	6263304b2d	virgl: check for readback on correct resource Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	ac932ff822	virgl: make unmap queuing a bit more straight-forward It's hard to read the code that decides if we want to queue up an unmap or destroy the transfer right away. So let's make it a bit simpler, by setting a bool in case we want to queue it. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	b08e73308e	virgl: simplify virgl_texture_transfer_unmap logic There's no reason to keep an extra indentation level here, let's merge the two if-conditions. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	7dd601a399	virgl: track full virgl_resource instead of just virgl_hw_res Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	c62434f106	virgl: tmp_resource -> templ This isn't the temporary resource itself, it's the template that we'll create the resource from. So let's name it appropriately. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:08 +00:00
Erik Faye-Lund	18a721fd56	virgl: remove pointless transfer-counter This is only written to, never read. Let's just get rid of it. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-17 07:27:07 +00:00
Timothy Arceri	3c5a9ab9f0	radeonsi/nir: fix scanning of bindless images Fixes: `d62d434fe9` ("ac/nir_to_llvm: add image bindless support")	2019-04-17 09:56:56 +10:00
Kenneth Graunke	c4478889b7	iris: Add texture cache flushing hacks for blit and resource_copy_region This is a port of Jason's `8379bff6c4` from i965 to iris. We can't find anything relevant in the documentation and no one we've talked to has been able to help us pin down a solution. Unfortunately, we have to put the hack in both iris_blit() and iris_copy_region(). st/mesa's CopyImage() implementation sometimes chooses to use pipe->blit() instead of pipe->resource_copy_region(). For blits, we only do the hack if the blit source format doesn't match the underlying resource (i.e. it's reinterpreting the bits). Hopefully this should not be too common.	2019-04-16 13:04:22 -07:00
Eric Anholt	697e2e1f26	v3d: Always set up the qregs for CSD payload. We were failing to set up payload[1] for use by LocalInvocationIndex/ID and shared variable accesses if gl_WorkGroupID/gl_GlobalInvocationID wasn't used (possibly because you only have one workgroup). You're always going to use payload[1], and payload[0] is common enough and we have DCE in the backend to clean it up if it happens to not be used.	2019-04-16 12:10:39 -07:00
Eric Anholt	1bc71e8b65	v3d: Only look up the 3rd texture gather offset for non-arrays. Fixes assertion failures in the CTS since Karol's cleanup when NIR started noticing that we were reading an invalid component. Fixes: `5450f1c9fb` ("v3d: prefer using nir_src_comp_as_int over nir_src_as_const_value")	2019-04-16 12:07:59 -07:00
Caio Marcelo de Oliveira Filho	a0dae78e72	spirv: Tell which opcode or value is unhandled when failing v2: When available, include the opcode name too. (Karol) v3: Use more to_string helpers. (Karol) Include the wrong bit_size in those failures. Include the capability number in spv_check_supported. Provide vtn_fail_with_* macros to avoid noise in the call sites. v4: Provide macros only for opcode and decoration, which have enough usages to justify them. (Jason) Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-04-16 11:11:10 -07:00
Caio Marcelo de Oliveira Filho	0ccfe741b1	spirv: Add more to_string helpers Also, use a set to identify repeated values. The previous arrangement worked when the repetitions were one after another, but in some of the new cases they are not. Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-04-16 11:11:10 -07:00
Jason Ekstrand	583a4d9a27	intel/mi_builder: Disable mem_mem tests on IVB Tested-by: Clayton Craft <clayton.a.craft@intel.com>	2019-04-16 12:59:12 -05:00
Kenneth Graunke	33314cf410	iris: Change vendor and renderer strings This patch changes the GL_VENDOR string from "Mesa Project" to "Intel". This makes GLX_MESA_query_renderer report "Vendor: Intel (0x8086)" instead of "Vendor: Mesa Project (0x8086)" which is arguably wrong. We now also use a consistent vendor string across Windows and Linux. It also prepends "Mesa" to the GL_RENDERER string, both to credit the community and have a distinguishing mark between the two drivers. We drop "DRI" compared to i965, as it's not really that important. Improves performance in Portal by 1.8x. Iris is now 3.86% faster than i965 at the portal-d1.dem timedemo on my Kabylake laptop. One change is that Portal selects the MapBufferRange path based on the vendor string, and iris's BufferSubData path is still missing the storage invalidation optimization.	2019-04-16 10:27:20 -07:00
Jason Ekstrand	56d9532316	intel/mi_builder: Re-order an initializer The order doesn't matter in C99 but some C++ compilers seem to care. Tested-by: Clayton Craft <clayton.a.craft@intel.com>	2019-04-16 12:07:15 -05:00
Jason Ekstrand	ba0f203ae8	nir/algebraic: Use a cache to avoid re-emitting structs This takes the stupid simplest and most reliable approach to reducing redundancy that I could come up with: Just use the struct declaration as the cach key. This cuts the size of the generated C file to about half and takes about 50 KiB off the .data section. size before (release build): text data bss dec hex filename 5363833 336880 13584 5714297 573179 _install/lib64/libvulkan_intel.so size after (release build): text data bss dec hex filename 5229017 285264 13584 5527865 545939 _install/lib64/libvulkan_intel.so Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-04-16 16:40:15 +00:00
Jason Ekstrand	0c712fd404	nir/algebraic: Move the template closer to the render function Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-04-16 16:40:15 +00:00
Kenneth Graunke	4c3c417b00	iris: Move iris_debug_recompile calls before uploading. Order of operations is important, otherwise we'll find the program we just uploaded as the "old" compile and get confused why nothing is different between the two keys. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-04-16 09:01:20 -07:00
Kenneth Graunke	04f97eefa3	iris: Print the reason for shader recompiles. I was lazy earlier and hadn't bothered typing / refactoring this. Now I'm hitting some extra recompiles and would like to see why. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-04-16 09:01:18 -07:00
Kenneth Graunke	fad7801afd	i965: Move program key debugging to the compiler. The i965 driver has a bunch of code to compare two sets of program keys and print out the differences. This can be useful for debugging why a shader needed to be recompiled on the fly due to non-orthogonal state dependencies. anv doesn't do recompiles, so we didn't need to share this in the past - but I'd like to use it in iris. This moves the bulk of the code to the compiler where it can be reused. To make that possible, we need to decouple it from i965 - we can't get at the brw program cache directly, nor use brw_context to print things. Instead, we use compiler->shader_perf_log(), and simply pass in keys. We put all of this debugging code in brw_debug_recompile.c, and only export a single function, for simplicity. I also tidied the code a bit while moving it, now that it all lives in one file. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-04-16 09:01:15 -07:00
Marek Olšák	4f715868a9	winsys/amdgpu: don't set GTT with GDS & OA placements on APUs Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2019-04-16 10:24:19 -04:00
Marek Olšák	d3ce8a7f6b	nir: optimize gl_SampleMaskIn to gl_HelperInvocation for radeonsi when possible Acked-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-16 10:24:19 -04:00
suresh guttula	d98f6380cb	st/va/enc: Add support for frame_cropping_flag of VAEncSequenceParameterBufferH264 This patch will add support for frame_cropping when the input size is not matched with aligned size. Currently vaapi driver ignores frame cropping values provided by client. This change will update SPS nalu with proper cropping values. Signed-off-by: Satyajit Sahu <satyajit.sahu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2019-04-16 10:15:09 -04:00
suresh guttula	05cc018ae6	radeon/vce:Add support for frame_cropping_flag of VAEncSequenceParameterBufferH264 This patch will add support for frame_cropping when the input size is not matched with aligned size. Currently vaapi driver ignores frame cropping values provided by client. This change will update SPS nalu with proper cropping values. v2: Moving default crop setting to else when enc_frame_cropping_flag is not set. Signed-off-by: Satyajit Sahu <satyajit.sahu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2019-04-16 10:15:09 -04:00
suresh guttula	8becf5b46d	vl: Add cropping flags for H264 This patch adds cropping flags for H264 in pipe_h264_enc_pic_control. Signed-off-by: Satyajit Sahu <satyajit.sahu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2019-04-16 10:15:09 -04:00
Tapani Pälli	624789e370	compiler/glsl: handle case where we have multiple users for types Both Vulkan and OpenGL might be using glsl_types simultaneously or we can also have multiple concurrent Vulkan instances using glsl_types. Patch adds a one time init to track number of users and will release types only when last user calls _glsl_type_singleton_decref(). This change fixes glsl_type memory leaks we have with anv driver. v2: reuse hash_mutex, cleanup, apply fix also to radv driver and rename helper functions (Jason) v3: move init, destroy to happen on GL context init and destroy Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-16 12:58:00 +03:00
Danylo Piliaiev	04508f57d1	intel/compiler: Do not reswizzle dst if instruction writes to flag register If we write to the flag register changing the swizzle would change what channels are written to the flag register. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110201 Fixes: `4cd1a0be` Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: <ian.d.romanick@intel.com>	2019-04-16 09:42:08 +00:00
Michel Dänzer	9b2473c7a4	gitlab-ci: Use LLVM 3.4 from Debian jessie for scons-llvm job This gets us closer to the officially supported minimum version of LLVM, which is 3.3. Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-04-16 10:57:55 +02:00
Michel Dänzer	5789bd935e	gitlab-ci: Do not use subshells for compiling dependencies bash subshells don't inherit the -e option by default, so failures in the subshell commands wouldn't cause the CI job to fail. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-04-16 10:57:55 +02:00
Michel Dänzer	172ccfffda	gitlab-ci: Drop unused clang 5/6 packages Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-04-16 10:57:55 +02:00
Michel Dänzer	3fca2b760c	gitlab-ci: Use clang 8 instead of 7 Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-04-16 10:57:55 +02:00
Michel Dänzer	979df83940	gitlab-ci: Remove unused Debian packages from Docker image v2: * Also remove autotools, now that the Mesa autotools build system has been dropped. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> # v1	2019-04-16 10:41:07 +02:00
Michel Dänzer	792d6987a3	gitlab-ci: Remove unneded (stuff from) APT command lines We either compile these locally, or they are dependencies of other packages we install. v2: * Adapt to leaving self-compiled packages untouched. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-04-16 10:41:07 +02:00
Michel Dänzer	e9de19ffca	gitlab-ci: Install most packages from Debian buster We now use the C frontend of GCC 8 instead of 6 (required tweaking the before_script for the clang job). We cannot use the C++ frontend of GCC 7 or newer yet, because upstream GCC 7 changed some C++ name mangling stuff in backwards incompatible ways, and LLVM < 6.0 packages aren't available in buster. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-04-16 10:41:07 +02:00
Michel Dänzer	ecb3eedc54	gitlab-ci: Use Debian packages instead of pip ones for meson and scons Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-04-16 10:41:07 +02:00
Michel Dänzer	caf83e96e4	gitlab-ci: Use HTTPS for APT repositories Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-04-16 10:41:07 +02:00
Michel Dänzer	d00b1c4511	gitlab-ci: Use Debian stretch instead of Ubuntu bionic The APT archive used by the Ubuntu docker image can be slow, even timing out sometimes, causing spurious failures of the containers-build job. The Debian docker image uses deb.debian.org, which is backed by a content distribution network. One downside is that stretch only has GCC 6, whereas bionic had 7. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-04-16 10:14:21 +02:00
Gert Wollny	1c5ff3a6d0	doc/features: Add a few extensions to the feature matrix These additions already landed but I forgot to update the feature matrix. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-16 08:01:13 +00:00
Samuel Pitoiset	ecbe6cb805	radv: sort the shader capabilities alphabetically Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-16 09:14:22 +02:00
Kenneth Graunke	024a57d23c	iris: Make shader_perf_log print to stderr if INTEL_DEBUG=perf is set This matches i965's behavior, and makes sure that shader compiler messages are visible when setting INTEL_DEBUG=perf.	2019-04-15 23:33:03 -07:00
Samuel Pitoiset	8704bd5588	radv: enable shaderInt8 on SI and CIK No CTS failures. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-16 08:22:54 +02:00
Chia-I Wu	c45c889f95	virgl: fix fence fd version check Fixes: `d1a1c21e76` ("virgl: native fence fd support") Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-04-15 23:25:47 +00:00
Chia-I Wu	442e75071b	virgl: introduce virgl_drm_fence virgl_drm_fence can wrap either a fence fd or a virgl_hw_res. Because a fence fd is cheaper than a virgl_hw_res, we use it whenever it is available. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-04-15 23:25:47 +00:00
Chia-I Wu	334103efbf	virgl: hide fence internals from the driver Fence fds are cheaper than resources. We want to let winsys make the decision and use fence fds whenever they are supported. This commit prepares the work. For the moment, we create a resource _and_ a fence fd when supports_fences is true. This will be fixed such that we create a resource _or_ a fence fd. (And because of a version check bug that we will fix later, supports_fences is actually never true). Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-04-15 23:25:47 +00:00
Chia-I Wu	a23c091988	virgl: handle fence_server_sync in winsys It does not need help from the driver. This also fixes one issue where the fence is ignored when the transfer queue is full. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-04-15 23:25:47 +00:00
Roland Scheidegger	88e0bbf24a	gallivm: fix bogus assert in get_indirect_index 0 is a valid value as max index, and the code handles it fine. This isn't commonly seen, as it will only happen with array declarations of size 1. Fixes piglit tests/shaders/complex-loop-analysis-bug.shader_test Fixes: `a3c898dc97` "gallivm: fix improper clamping of vertex index when fetching gs inputs" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110441 Reviewed-by: Brian Paul <brianp@vmware.com>	2019-04-16 00:49:38 +02:00
Andres Gomez	42351c21bb	glsl/linker: always validate explicit locations for first and last interfaces Until now, we were only doing this when linking a SSO program. However, nothing avoids linking a non SSO program which doesn't have both a VS and FS. In those cases, we also need to report the usual linking errors, if happening. v2: Use a better name for the renamed function (Timothy). Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-15 22:34:50 +00:00
Rhys Perry	6281517f3e	vc4: fix build Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Fixes: `5131b7a43f` ('gallium: add support for formatted image loads')	2019-04-15 23:27:21 +01:00
Andres Gomez	dbb309dd71	docs: drop Andres Gomez from the release cycles Juan A. Suarez takes his place and the shorter loop makes Dylan repeating earlier. Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-04-15 22:03:17 +00:00
Kenneth Graunke	0f3dc832bc	iris: Fix FLUSH_EXPLICIT handling with staging buffers. I neglected to blit the staging buffer back to the real one at transfer_flush_region (FlushMappedBufferRange) time.	2019-04-15 14:51:01 -07:00
Kenneth Graunke	62b2ce0592	iris: Preserve all PIPE_TRANSFER flags in xfer->usage We need to preserve PIPE_TRANSFER_FLUSH_EXPLICIT, DISCARD_RANGE, and so on, but don't want to pass them to iris_bo_map(). So, keep them all, but mask them off when calling map. Chris Wilson told me to do this a long time ago and he was right.	2019-04-15 14:51:01 -07:00
Kenneth Graunke	9c52dce6a9	iris: Actually mark blorp_copy_buffer destinations as written.	2019-04-15 14:51:01 -07:00
grmat	8cb50edebf	drirc: add Spectacle, Falkon to a-sync blacklist Spectacle is the plasma screenshot utility Falkon is a KDE web browser that should succeed Konqueror	2019-04-15 17:38:44 -04:00
davidbepo	10d33ddd50	drirc: add Waterfox to adaptive-sync blacklist	2019-04-15 17:27:15 -04:00
El Christianito	4d02f591cb	drirc: add Budgie WM to adaptive-sync blacklist Budgie Window Manager is an increasingly used alternative to GNOME and MATE. Default in Solus OS, also used in other distros. Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-04-15 17:27:15 -04:00
Dylan Baker	a988d95389	ci: Delete autotools build jobs Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Matt Turner <mattst88@gmail.com>	2019-04-15 13:44:41 -07:00
Dylan Baker	b165ac972b	docs: drop most autoconf references There's still a few in here, but those docs are already so out of date that it probably makes more sense to delete them. Such as the GLES docs which still claim we only support 1.1 and 2.0, with no mention of 3.x at all. v2: - Add docs for testing back end (Eric Engestrom) - Drop more autootols references - meson is now required not recommended - Add $PWD Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Matt Turner <mattst88@gmail.com>	2019-04-15 13:44:34 -07:00
Dylan Baker	95aefc94a9	Delete autotools Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Matt Turner <mattst88@gmail.com>	2019-04-15 13:44:29 -07:00
Marek Olšák	de0c97c817	radeonsi: enable GL_EXT_shader_image_load_formatted no changes - the driver doesn't use the format Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-15 16:18:07 -04:00
Rhys Perry	a35f2bbb85	st/mesa: add support for EXT_shader_image_load_formatted v3: rebase Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2) Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-04-15 16:18:07 -04:00
Rhys Perry	082d180a22	mesa, glsl: add support for EXT_shader_image_load_formatted v3: rebase Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2) Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-04-15 16:18:07 -04:00
Rhys Perry	5131b7a43f	gallium: add support for formatted image loads v3: rebase v3: make use of u_pipe_screen_get_param_defaults Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-04-15 16:18:07 -04:00
Samuel Pitoiset	bf4a0485d9	radv: set ACCESS_NON_READABLE on stores for copy/fill/clear meta shaders The compiler will emit GLC=1. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-15 21:36:53 +02:00
Bas Nieuwenhuizen	f6fdd39eab	radv: Use local buffers for the global bo list. Even if we don't use local buffers in general. Turns out that even though the performance is not the best the kernel still does it better than our own list. We still have to keep the radv bo list for buffers that are shared externally. This improves Talos on lowest quality setting (so as CPU bound as possible) by ~10% if the global bo list is enabled. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-15 20:39:38 +02:00
Bas Nieuwenhuizen	af9534b9f3	ac: Move has_local_buffers disable to radeonsi. In radv we had a separate flag to actually use it + an env option to experimentally use it. The common code setting has_local_buffers to false of course broke that experimental option. Also the "enable on APU" did not make sense for RADV as it is still disabled by default. Fixes: `b21a4efb55` "radv/winsys: allow local BOs on APUs" Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-15 20:39:28 +02:00
Bas Nieuwenhuizen	a589d8c0ab	radv: Add bolist RADV_PERFTEST flag. To test global_bo_list performance. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-15 20:39:05 +02:00
Marek Olšák	dbab755ecf	ac: fix incorrect bindless atomic code in visit_image_atomic Coverity: CID 1444664 Fixes: `d62d434fe9` ("ac/nir_to_llvm: add image bindless support") Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-15 12:52:02 -04:00
Rhys Perry	8671cfe2a2	nir,ac/nir: fix cube_face_coord Seems it was missing the "/ ma + 0.5" and the order was swapped. Fixes: `a1a2a8dfda` ('nir: add AMD_gcn_shader extended instructions') Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-15 17:22:47 +01:00
Jason Ekstrand	90108deb27	anv: Update to use the new features struct names These were updated in version 1.1.106 of vulkan.h to make more sense with the extension names. We may as well keep with the times. Acked-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-15 13:25:43 +00:00
Jason Ekstrand	7f113c07b2	vulkan: Update the XML and headers to 1.1.106 Acked-by: Dave Airlie <airlied@redhat.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-15 13:25:43 +00:00
Timothy Arceri	8f74a60c43	nir: fix packing components with arrays When gathering info for unmovable types we need to handle arrays. While we dont support packing/moving arrays we do support packing scalar components with these arrays. Fixes piglit: tests/spec/arb_enhanced_layouts/execution/component-layout/vs-fs-array-interleave-range.shader_test Fixes: `5eb17506e1` ("nir: do not pack varying with different types") Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-15 19:25:12 +10:00
Samuel Pitoiset	14f03978ed	radv: enable VK_KHR_shader_float16_int8 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-15 10:43:55 +02:00
Samuel Pitoiset	bbe8febd93	spirv: add SpvCapabilityFloat16 support Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-15 10:43:52 +02:00
Kenneth Graunke	8bf9b7b5b6	intel: Emit 3DSTATE_VF_STATISTICS dynamically Pipeline statistics queries should not count BLORP's rectangles. (23) How do operations like Clear, TexSubImage, etc. affect the results of the newly introduced queries? DISCUSSION: Implementations might require "helper" rendering commands be issued to implement certain operations like Clear, TexSubImage, etc. RESOLVED: They don't. Only application submitted rendering commands should have an effect on the results of the queries. Piglit's arb_pipeline_statistics_query-vert_adj exposes this bug when the driver is hacked to always perform glBufferData via a GPU staging copy (for debugging purposes). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-14 19:58:04 -07:00
Jason Ekstrand	47709ca146	nir/validate: Require unused bits of nir_const_value to be zero Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-04-14 22:25:56 +02:00
Jason Ekstrand	c4b28d1730	nir/load_const_to_scalar: Get rid of a bit size switch statement Now that nir_const_value is a scalar, we don't need the switch on bit size in order to pluck off components properly. Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-04-14 22:25:56 +02:00
Jason Ekstrand	893dd34702	spirv: Drop some unneeded bit size switch statements Now that nir_const_value is a scalar, we don't need the switch on bit size in order copy components around properly. Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-04-14 22:25:56 +02:00
Jason Ekstrand	b8197a01a9	nir/constant_folding: Get rid of a bit size switch statement Now that nir_const_value is a scalar, we don't need the switch on bit size in order to swizzle them properly. Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-04-14 22:25:56 +02:00
Karol Herbst	14531d676b	nir: make nir_const_value scalar v2: remove & operator in a couple of memsets add some memsets v3: fixup lima Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)	2019-04-14 22:25:56 +02:00
Karol Herbst	73d883037d	spirv: reduce array size in vtn_handle_constant we already assert above that there are no more than 3 sources, so it doesn't make sense to use an array of 4 sources Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-14 22:25:56 +02:00
Karol Herbst	e72beacb95	nir/loop_analyze: use nir_const_value.b for boolean results, not u32 Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-14 22:25:56 +02:00
Jason Ekstrand	10602db78c	nir/print: Use nir_src_as_int for array indices Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-04-14 22:25:56 +02:00
Jason Ekstrand	9b1e4bab6b	nir/builder: Add a nir_imm_zero helper v2: replace nir_zero_vec with nir_imm_zero (Karol Herbst) Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-04-14 22:25:56 +02:00
Karol Herbst	daaf777376	nir/builder: Move nir_imm_vec2 from blorp into the builder While we're here, fix a typo which caused it to actually return a vec4 with the third and fourth components zero. Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-14 22:25:56 +02:00
Karol Herbst	606b74035e	lima: use nir_src_as_float Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-04-14 22:25:56 +02:00
Karol Herbst	fe8c57e859	freedreno/ir3: use nir_src_as_uint in a few places v2 (Jason Ekstrand): - Add even more places Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-14 22:25:56 +02:00
Karol Herbst	bbf2ecaf35	intel/nir: use nir_src_is_const and nir_src_as_uint Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-14 22:25:56 +02:00
Jason Ekstrand	6b1c398bcb	intel/nir: Take a nir_tex_instr and src index in brw_texture_offset This makes things a bit simpler and it's also more robust because it no longer has a hard dependency on the offset being a 32-bit value.	2019-04-14 22:25:56 +02:00
Karol Herbst	2a36699ed3	radv: use nir constant helpers Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-14 22:25:56 +02:00
Karol Herbst	adb2263014	amd/nir: some cleanups Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-14 22:25:56 +02:00
Alyssa Rosenzweig	1e2cb3e964	panfrost/midgard: Use shared nir_lower_viewport_transform v2: Run before lowering I/O. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-04-14 19:16:29 +00:00
Alyssa Rosenzweig	2ce4adefa5	nir: Add nir_lower_viewport_transform On Mali hardware (supported by Panfrost and Lima), the fixed-function transformation from world-space to screen-space coordinates is done in the vertex shader prior to writing out the gl_Position varying, rather than in dedicated hardware. This commit adds a shared NIR pass for implementing coordinate transformation and lowering gl_Position writes into screen-space gl_Position writes. v2: Run directly on derefs before io/vars are lowered to cleanup the code substantially. Thank you to Qiang for this suggestion! v3: Bikeshed continues. v4: Add to Makefile.sources (per Jason's comment). Bikeshed comment. Ian and Qiang's reviews are from v3, but no real functional changes from v4. Rob's review is from v4. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Suggested-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-04-14 19:15:13 +00:00
Alyssa Rosenzweig	89b02bffcb	panfrost: Cleanup indexed draw handling As part of this cleanup, we use the newly-exposed u_vbuf_get_minmax_index, deduplicating quite a bit of bookkeeping. We also centralize the draw_flags tracking to make this code cleaner / futureproofed; we have already had bugs regarding this field so we might as well get it right now. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-14 15:25:46 +00:00
Alyssa Rosenzweig	74b17b9a9f	panfrost/midgard: Drop dependence on mesa/st This was used as a workaround for uniform sizing which was fixed in `771adffe` ("st: Lower uniforms in st in the...") Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-14 15:25:46 +00:00
Mauro Rossi	1af7701666	draw: fix building error in draw_gs_init() Fixes the following building error happening with Android build system: external/mesa/src/gallium/auxiliary/draw/draw_gs.c:740:79: error: address of array 'draw->gs.tgsi.machine->PrimitiveOffsets' will always evaluate to 'true' [-Werror,-Wpointer-bool-conversion] if (!draw->gs.tgsi.machine->Primitives[i] \|\| !draw->gs.tgsi.machine->PrimitiveOffsets) ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~ 1 error generated. Fixes: `7720ce3` ("draw: add support to tgsi paths for geometry streams. (v2)") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-04-14 18:07:02 +10:00
Qiang Yu	b46b661f53	lima/gpir: fix alu check miss last store slot Fixes: `92d7ca4b1c` "gallium: add lima driver" Signed-off-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-04-14 12:10:23 +08:00
Qiang Yu	8d91cd64aa	lima/gpir: fix compile fail when two slot node Come from glmark2-es2 jellyfish test. Fixes: `92d7ca4b1c` "gallium: add lima driver" Signed-off-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>	2019-04-14 12:10:23 +08:00
Vasily Khoruzhick	fef2f10cc2	lima: add support for depth/stencil fbo attachments and textures Hardware supports writing back Z/S buffers and sampling from them, so add support for that. Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com> Tested-by: Icenowy Zheng <icenowy@aosc.io>	2019-04-14 01:16:00 +00:00
Vasily Khoruzhick	a817f0fec6	lima: use individual tile heap for each GP job. Looks like it's somehow used by subsequent PP job, so we have to preserve its contents until PP job is done. Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com> Reviewed-by: Qiang Yu <yuq825@gmail.com> Tested-by: Icenowy Zheng <icenowy@aosc.io>	2019-04-14 01:16:00 +00:00
Christian Gmeiner	b6bed115a5	nir: add lower_ftrunc Port TGSI TRUNC lowering to nir Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-13 17:54:48 +00:00
Mauro Rossi	e538dd67de	android: fix LLVM version string related building errors Adding \ prior to " in llvm version string fixes the following building errors: external/mesa/src/gallium/drivers/r600/r600_pipe_common.c:1290:14: error: expected ')' ", LLVM " MESA_LLVM_VERSION_STRING ^ <command line>:8:34: note: expanded from here ^ external/mesa/src/gallium/drivers/r600/r600_pipe_common.c:1287:10: note: to match this '(' snprintf(rscreen->renderer_string, sizeof(rscreen->renderer_string), ^ 1 error generated. Fixes: 05b114e ("simplify LLVM version string printing") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-04-13 18:56:14 +02:00
Lionel Landwerlin	9e7b0988d6	anv: leave the top 4Gb of the high heap VMA unused In `628c9ca908` I forgot to apply the same -4Gb of the high address of the high heap VMA. This was previously computed in the HIGH_HEAP_MAX_ADDRESS. Many thanks to James for pointing this out. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reported-by: Xiong, James <james.xiong@intel.com> Fixes: `628c9ca908` ("anv: store heap address bounds when initializing physical device") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-13 12:08:23 +00:00
Eric Anholt	dc402be73e	v3d: Use the new lower_to_scratch implementation for indirects on temps. We can use the same register spilling infrastructure for our loads/stores of indirect access of temp variables, instead of doing an if ladder. Cuts 50% of instructions and max-temps from 2 KSP shaders in shader-db. Also causes several other KSP shaders with large bodies and large loop counts to not be force-unrolled. The change was originally motivated by NOLTIS slightly modifying register pressure in piglit temp mat4 array read/write tests, triggering register allocation failures.	2019-04-12 16:16:58 -07:00
Jason Ekstrand	18ed82b084	nir: Add a pass for selectively lowering variables to scratch space This commit adds new nir_load/store_scratch opcodes which read and write a virtual scratch space. It's up to the back-end to figure out what to do with it and where to put the actual scratch data. v2: Drop const_index comments (by anholt) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-12 15:59:31 -07:00
Eric Anholt	8a2d91e124	v3d: Detect the correct number of QPUs and use it to fix the spill size. We were missing a * 4 even if the particular hardware matched our assumption.	2019-04-12 15:59:31 -07:00
Eric Anholt	11ba8a46e4	v3d: Add missing dumping for the spill offset/size uniforms.	2019-04-12 15:59:31 -07:00
Eric Anholt	42cf57f186	v3d: Add missing base offset to CS shared memory accesses. This code is so touchy, trying to emit the minimum amount of address math. Some day we'll move it all to NIR, I hope.	2019-04-12 15:59:31 -07:00
Eric Anholt	6b1c659825	v3d: Add Compute Shader compilation support. While waiting for the CSD UABI to get reviewed, I keep having to rebase the CS patch. Just land the compiler side for now to keep it from diverging. For now this covers just GLES 3.1 compute shaders, not CL kernels.	2019-04-12 15:59:31 -07:00
Eric Anholt	1e0a72ce09	v3d: Replace the old shader-db env var output with the ARB_debug_output. We're using ARB_debug_output for the main shader-db, but I had this env var left around from the shader-db-2 support (vc4 apitrace-based). Keep the env var around since it's nice sometimes to get the stats on a shader you're optimizing without having to do a shader-db run, but drop the old formatting that's not useful and keeps tricking me when I go to add another measurement to the shader-db output.	2019-04-12 15:59:31 -07:00
Eric Anholt	b02dbaa8ce	v3d: Include the number of max temps used in the shader-db output. This gives us finer-grained feedback on how we're doing on register pressure than "did we trigger a new shader to spill or drop thread count?"	2019-04-12 15:59:24 -07:00
Eric Anholt	276ec879fd	v3d: Drop a note for the future about PIPE_CAP_PACKED_UNIFORMS.	2019-04-12 15:58:28 -07:00
Eric Anholt	89b7df552b	v3d: Add and use a define for the number of channels in a QPU invocation. A shader invocation always executes 16 channels together, so we often end up multiplying things by this magic 16 number. Give it a name.	2019-04-12 15:58:28 -07:00
Eric Anholt	b88ef3bd76	nir: Add a comment about how intrinsic definitions work. I was thinking about a refactor, and needed to read this first. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-12 15:56:12 -07:00
Eric Anholt	35355b4860	nir: Drop remaining references to const_index in favor of the call to use. Please don't make me read a const_index[] expression ever again. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-12 15:56:04 -07:00
Eric Anholt	6e4d3d0a2f	nir: Drop comments about the constant_index slots for load/stores. The constant_index slots are named right there in the intrinsic definition, and the comment is just a chance to get out of sync. Noticed while reviewing the lower_to_scratch changes that copy-and-pasted wrong comments, and load_ubo and load_per_vertex_output had incorrect comments currently. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-12 15:55:55 -07:00
Sagar Ghuge	066d2aebc0	intel/fs: Remove unused condition from opt_algebraic case We will never hit a condition where we have src1 and src2 as immediate operands. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-12 13:47:57 -07:00
Kenneth Graunke	9e0c744f07	glsl: Set location on structure-split sampler uniform variables gl_nir_lower_samplers_as_deref splits structure uniform variables, creating new variables for individual fields. As part of that, it calculates a new location. It then never set this on the new variables. Thanks to Michael Fiano for finding this bug. Fixes crashes on i965 with Piglit's new tests/spec/glsl-1.10/execution/samplers/uniform-struct test, which was reduced from the failing case in Michael's app. Fixes: `f003859f97` nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_deref Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-12 10:35:08 -07:00
Mateusz Krzak	f4fc2ece57	panfrost: use os_mmap and os_munmap 32-bit needs mmap64 for 64-bit offsets. We get 64-bit offsets from kernel. Signed-off-by: Mateusz Krzak <kszaquitto@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-12 16:33:00 +00:00
Mateusz Krzak	411da8b80d	panfrost: cast bo_handles pointer to uintptr_t first Required for 64-bit kernel to interpret the pointer from 32-bit userspace. Signed-off-by: Mateusz Krzak <kszaquitto@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-12 16:33:00 +00:00
Jason Ekstrand	7eaaff18cb	anv/pipeline: Fix MEDIA_VFE_STATE::PerThreadScratchSpace on gen7 We were always programming it with the Broadwell convention which is too large by a factor of two on Haswell and just plain wrong on IVB and BYT. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable@lists.freedesktop.org	2019-04-12 16:08:35 +00:00
Eric Engestrom	da1a5a19bd	gitlab-ci: add lima to the build Suggested-by: Karol Herbst <karolherbst@gmail.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-04-12 15:43:19 +00:00
Marek Olšák	f4ae188d50	ac: use the common helper ac_apply_fmask_to_sample Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-12 11:35:31 -04:00
Marek Olšák	971bc10177	radeonsi: set AC_FUNC_ATTR_READNONE for image opcodes where it was missing Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-12 11:34:39 -04:00
Marek Olšák	467ff6ebfe	mesa: don't overwrite existing shader files with MESA_SHADER_CAPTURE_PATH Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-04-12 11:34:39 -04:00
Marek Olšák	bd2995c8b7	glsl: allow the #extension directive within code blocks for the dri option for Viewperf 13 Acked-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-12 11:34:39 -04:00
Samuel Pitoiset	6718bb57ac	ac/nir: remove some useless integer casts for ALU operations Sources are always casted to integers. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 17:30:55 +02:00
Samuel Pitoiset	8a6442075f	ac/nir: remove useless integer cast in visit_image_load() ac_build_image_opcode() casts if necessary and buffer images are casted too. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 17:30:55 +02:00
Samuel Pitoiset	ffbb62f808	ac/nir: remove useless integer cast in adjust_sample_index_using_fmask() It's already casted if necessary in ac_build_image_opcode(). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 17:30:55 +02:00
Samuel Pitoiset	7b5b27a685	ac/nir: remove useles LLVMGetUndef for nir_op_pack_64_2x32_split Trivial. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 17:30:55 +02:00
Samuel Pitoiset	fd4041987b	ac: add ac_build_load_helper_invocation() helper Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 17:30:55 +02:00
Samuel Pitoiset	590a4c8981	ac: add ac_build_ddxy_interp() helper Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 17:30:55 +02:00
Samuel Pitoiset	4cb13e9462	ac: add ac_build_umax() and use it where possible This changes the predicate from LessThan to Equal. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 17:30:55 +02:00
Samuel Pitoiset	cf88bfa75a	ac/nir: make use of ac_build_umin() where possible This changes the predicate from LessThan to Equal. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 17:30:54 +02:00
Samuel Pitoiset	15dd81913f	ac/nir: make use of ac_build_imin() where possible This changes the predicate from LessThan to Equal. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 17:30:54 +02:00
Samuel Pitoiset	d7a0c0d53b	ac/nir: make use of ac_build_imax() where possible Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 17:30:54 +02:00
Karol Herbst	a55c7352d6	lima: add bool parameter to type_size function Fixes: `035759b61b` ("nir/i965/freedreno/vc4: add a bindless bool to type size functions") Signed-off-by: Karol Herbst <kherbst@redhat.com> Tested-by: Icenowy Zheng <icenowy@aosc.io> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-04-12 17:08:53 +02:00
Karol Herbst	98934e6aa1	nvc0/nir: enable bindless texture Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-04-12 09:02:59 +02:00
Karol Herbst	89a81fbd98	nv50/ir/nir: add support for bindless images Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-04-12 09:02:59 +02:00
Karol Herbst	b286cdedb7	nv50/ir/nir: handle bindless texture Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-04-12 09:02:59 +02:00
Timothy Arceri	d62d434fe9	ac/nir_to_llvm: add image bindless support With this all piglit bindless image tests pass on radeonsi. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 09:02:59 +02:00
Timothy Arceri	55fb93b586	ac/nir_to_llvm: make get_sampler_desc() more generic and pass it the image intrinsic This will be required by the bindless support in the following patches. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 09:02:59 +02:00
Karol Herbst	4a3c04a11f	glsl/nir: add support for lowering bindless images_derefs v2: handle atomics as well make use of nir_rewrite_image_intrinsic v3: remove call to nir_remove_dead_derefs v4: (Timothy Arceri) dont actually call lowering yet Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v3) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 09:02:59 +02:00
Karol Herbst	0b2e8d9e17	glsl/nir: fetch the type for images from the deref instruction fixes retrieving the sampler type for bindless images stored inside structs. Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 09:02:59 +02:00
Karol Herbst	d7bbb3caf1	glsl_to_nir: handle bindless textures v2: add support for AMD Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1) Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 09:02:59 +02:00
Timothy Arceri	035759b61b	nir/i965/freedreno/vc4: add a bindless bool to type size functions This required to calculate sizes correctly when we have bindless samplers/images. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 09:02:59 +02:00
Karol Herbst	3b2a9ffd60	nir: move brw_nir_rewrite_image_intrinsic into common code Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-12 09:02:59 +02:00
Icenowy Zheng	400f0bfba1	lima: lower bool to float when building shaders Both processors of Mali Utgard are float-only, so bool are not acceptable data type of them. Fortunately the NIR compiler infrastructure has a lower pass to lower bool to float. Call this lower pass to lower bool to float for both GP and PP. This makes Glamor on Xorg server 1.20.3 at least doesn't hang when starting gtk3-demo. The old map of nir op bcsel is changed to fcsel, and the map of b2f32 in PP is dropped because it's not needed now (it's originally only mapped to ppir_op_mov). Signed-off-by: Icenowy Zheng <icenowy@aosc.io> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-04-12 13:40:47 +08:00
Tomeu Vizoso	8f1c686bca	panfrost: Guard against reading past end of buffer Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-12 07:12:17 +02:00
Tomeu Vizoso	c35ae93803	panfrost: split asserts in pandecode Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-12 07:11:52 +02:00
Dave Airlie	604d89c2d1	llvmpipe: fix undefined shift 1 << 31. Pointed out by coverity. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-04-12 08:54:02 +10:00
Dave Airlie	4690f90728	swrast: fix undefined shift of 1 << 31 Pointed out by coverity Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-04-12 08:53:59 +10:00
Dave Airlie	e4ed08873b	draw: fix undefined shift of (1 << 31) Pointed out by a coverity scan. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-04-12 08:53:10 +10:00
Kenneth Graunke	4fcb749044	iris: Actually pin the scratch BO. We were pinning it for compute shaders, and pinning it when restoring saved buffers, but we never actually pinned it in the original batch for VS/TCS/TES/GS/FS. Fixes rendering in GFXBench5's Tessellation demo and a bunch of Piglit geometry shader tests.	2019-04-11 15:03:27 -07:00
Lionel Landwerlin	628c9ca908	anv: store heap address bounds when initializing physical device We can then reuse those bounds to initialize the VMA heaps at logical device creation. This fixes an issue on EHL which has only 36bits of VMA. We were incorrectly using the fixed 48bits upper bound to initialize the logical device heap, resulting in addresses beyong the device's limits. v2: Don't confuse heap size (limited by system memory) and VMA size (limited by number of addressing bits the platform has) v3: Fix low heap vma_size :( (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reported-by: James Xiong <james.xiong@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)	2019-04-11 22:56:43 +01:00
Jason Ekstrand	316a98dec9	intel/common: Support bigger right-shifts with mi_builder Because why not?	2019-04-11 18:04:09 +00:00
Jason Ekstrand	0d6dea0ac8	anv/cmd_buffer: Use gen_mi_sub instead of gen_mi_add with a negative Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-11 18:04:09 +00:00
Jason Ekstrand	d17dd46b09	anv: Move mi_memcpy and mi_memset to gen_mi_builder Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-11 18:04:09 +00:00
Jason Ekstrand	bacb21fc6b	anv: Use gen_mi_builder for queries Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-11 18:04:09 +00:00
Jason Ekstrand	48da45891e	anv: Use gen_mi_builder for conditional rendering Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-11 18:04:09 +00:00
Jason Ekstrand	a3b0894afc	anv: Use gen_mi_builder for indirect dispatch Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-11 18:04:09 +00:00
Jason Ekstrand	b829dc30c1	anv: Use gen_mi_builder for indirect draw parameters Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-11 18:04:09 +00:00
Jason Ekstrand	0122a6f037	anv: Use gen_mi_builder for computing resolve predicates Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-11 18:04:09 +00:00
Jason Ekstrand	83b46ad6d8	anv: Use gen_mi_builder for CmdDrawIndirectByteCount Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-11 18:04:09 +00:00
Jason Ekstrand	8b8deeca78	intel/common: Add unit tests for gen_mi_builder Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-11 18:04:09 +00:00
Jason Ekstrand	2f7fcd103e	intel/common: Add a MI command builder Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-11 18:04:09 +00:00
Eric Anholt	8f065596d2	v3d: Add an optimization pass for redundant flags updates. Our exec masking introduces lots of redundant flags updates, and even without that there will be cases where NIR comparisons on the same sources for different reasons may generate the same comparison instruction before the selection. total instructions in shared programs: 6492930 -> 6460934 (-0.49%) total uniforms in shared programs: 2117460 -> 2115106 (-0.11%) total spills in shared programs: 4983 -> 4987 (0.08%) total fills in shared programs: 6408 -> 6416 (0.12%)	2019-04-11 09:24:02 -07:00
Lubomir Rintel	3dd2001993	kmsro: Extend to include armada-drm This allows using the Marvell Armada display controllers (with the armada drm modesetting driver) along with the render-only drivers, such as Etnaviv on an OLPC XO-1.75 laptop. v2: - Add to Android.mk too Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-11 15:53:29 +00:00
Icenowy Zheng	a155c26a66	lima: implement blit with util_blitter As we have already prepared for using util_blitter, use it to implement lima_blit. Signed-off-by: Icenowy Zheng <icenowy@aosc.io> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-04-11 13:45:51 +00:00
Icenowy Zheng	318ccbe7b2	lima: make lima_context_framebuffer subtype of pipe_framebuffer_state Currently the lima driver saves the framebuffer state in its from-scratch struct lima_context_framebuffer. However, util_blitter requires to save framebuffer with standard struct pipe_framebuffer_state. Make the lima_context_framebuffer a subtype of the standard pipe_framebuffer_state, thus the standard part can be used for util_blitter framebuffer state saving. Signed-off-by: Icenowy Zheng <icenowy@aosc.io> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-04-11 13:45:51 +00:00
Icenowy Zheng	8d27bc351f	lima: add dummy set_sample_mask function The set_sample_mask function is required in util_blitter. Add a dummy one to make util_blitter work. Signed-off-by: Icenowy Zheng <icenowy@aosc.io> Reviewed-by: Qiang Yu <yuq825@gmail.com>	2019-04-11 13:45:51 +00:00
Eric Engestrom	8c780e54a3	gitlab-ci: build gallium extra hud Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-04-11 13:15:18 +00:00
Eric Engestrom	c77acc3ceb	meson: remove meson-created megadrivers symlinks Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110356 Fixes: `aa7afe324c` "meson: strip rpath from megadrivers" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Tested-by: Mike Lothian <mike@fireburn.co.uk> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-11 12:40:16 +00:00
Timothy Arceri	9e3740c47f	nir: initialise some variables in opt_if_loop_last_continue() Fixes a couple of Coverity warnings CID 1444626. Fixes: `e30804c602` ("nir/radv: remove restrictions on opt_if_loop_last_continue()") Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-04-11 20:38:03 +10:00
Juan A. Suarez Romero	83f1b0e95b	nir/xfb: do not use bare interface type In commit `3b3653c4cf` we decided not to use bare types; hence do not use bare type when comparing with interface type to find out if the xfb variable is an array block. This fixes dEQP-VK.transform_feedback.* tests. Fixes: `3b3653c4cf` ("nir/spirv: don't use bare types, remove assert in split vars for testing") CC: Dave Airlie <airlied@redhat.com> CC: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-11 11:52:45 +02:00
Michel Dänzer	b48e64f903	gitlab-ci: Run CI pipeline for all branches in the main repository In turn, do not run the pipeline for the master branch in forked repositories. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-04-11 11:22:41 +02:00
Erik Faye-Lund	b60a13d5cb	virgl: use debug_printf instead of fprintf While we're at it, prefix the string with "VIRGL: ", to match similar code elsewhere in virgl. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-11 09:53:25 +02:00
Erik Faye-Lund	7394ef4a72	virgl: do not warn about display-target binding We never want to display a transfer-temp surface, so let's ignore that flag when calculating the new binding flags. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-11 09:53:22 +02:00
Erik Faye-Lund	27d94a83cd	virgl: only warn about unchecked flags The other flags are already vetted, so there's no point in reporting them. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-11 09:53:15 +02:00
Erik Faye-Lund	8f1a147d68	virgl: unsigned int -> unsigned We don't usually spell out the int part of unsigned. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-11 09:53:10 +02:00
Tapani Pälli	ef923088d2	egl: setup fds array correctly when exporting dmabuf For formats with multiple planes, application will pass a num_planes sized fds array which should be initialized properly in case fds amount utilized by the driver is less than the number of planes. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-11 10:16:03 +03:00
Dylan Baker	4122f55574	docs: update calendar, and news item and link release notes for 19.0.2	2019-04-10 20:51:58 -07:00
Dylan Baker	9cb011e7c8	docs: Add sha256 sums for 19.0.2	2019-04-10 20:50:41 -07:00
Dylan Baker	9725c59756	docs: Add release notes for 19.0.2	2019-04-10 20:50:39 -07:00
Jan Vesely	6ec9733b9f	gallium/aux: Report error if loading of a pipe driver fails. Skip over non-existent files. Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-10 22:17:09 -04:00
Rob Herring	2b780fe893	kmsro: Add platform support for exynos and sun4i v2: - add Android.mk change Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Qiang Yu <yuq825@gmail.com>	2019-04-11 09:57:53 +08:00
Rob Herring	b1da1946c7	kmsro: Add lima renderonly support Enable using lima for KMS renderonly. This still needs KMS driver name mapping to kmsro to be used automatically. Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Qiang Yu <yuq825@gmail.com>	2019-04-11 09:57:53 +08:00
Qiang Yu	92d7ca4b1c	gallium: add lima driver v2: - use renamed util_dynarray_grow_cap - use DEBUG_GET_ONCE_FLAGS_OPTION for debug flags - remove DRM_FORMAT_MOD_ARM_AGTB_MODE0 usage - compute min/max index in driver v3: - fix plbu framebuffer state calculation - fix color_16pc assemble - use nir_lower_all_source_mods for lowering neg/abs/sat - use float arrary for static GPU data - add disassemble comment for static shader code - use drm_find_modifier v4: - use lima_nir_lower_uniform_to_scalar v5: - remove nir_opt_global_to_local when rebase Cc: Rob Clark <robdclark@gmail.com> Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Eric Anholt <eric@anholt.net> Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de> Signed-off-by: Arno Messiaen <arnomessiaen@gmail.com> Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Signed-off-by: Erico Nunes <nunes.erico@gmail.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Koen Kooi <koen@dominion.thruhere.net> Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: marmeladema <xademax@gmail.com> Signed-off-by: Paweł Chmiel <pawel.mikolaj.chmiel@gmail.com> Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Rohan Garg <rohan@garg.io> Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com>	2019-04-11 09:57:53 +08:00
Qiang Yu	64eaf60ca7	drm-uapi: add lima_drm.h Acked-by: Eric Anholt <eric@anholt.net> Signed-of-by: Qiang Yu <yuq825@gmail.com>	2019-04-11 09:57:53 +08:00
Qiang Yu	d26faef2e9	gallium/u_vbuf: export u_vbuf_get_minmax_index This helper function can be used by driver which always need min/max index. Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Qiang Yu <yuq825@gmail.com>	2019-04-11 09:57:53 +08:00
Qiang Yu	dc37942c4e	u_dynarray: add util_dynarray_grow_cap This is for the case that user only know a max size it wants to append to the array and enlarge the array capacity before writing into it. v2: - rename newsize to newcap - rename util_dynarray_enlarge to util_dynarray_grow_cap Signed-off-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-11 09:57:53 +08:00
Qiang Yu	509dd6e20b	u_math: add ushort_to_float/float_to_ushort v2: - return 0 for NaN too Signed-off-by: Qiang Yu <yuq825@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-04-11 09:57:53 +08:00
Guido Günther	c73fd79cee	gallium: trace: Add missing fence related wrappers Without that kmscube with GALLIUM_TRACE would segfault like: #0 0x0000000000000000 in () #1 0x0000ffff8f311760 in dri2_create_fence_fd (_ctx=0xaaaae266b8b0, fd=10) at ../src/gallium/state_trackers/dri/dri_helpers.c:122 #2 0x0000ffff90788670 in dri2_create_sync (drv=0xaaaae2667910, disp=0xaaaae26691f0, type=12612, attrib_list=0xaaaae26b9290) at ../src/egl/drivers/dri2/egl_dri2.c:2993 #3 0x0000ffff90776a9c in _eglCreateSync (disp=0xaaaae26691f0, type=12612, attrib_list=0xaaaae26b9290, orig_is_EGLAttrib=0, invalid_type_error=12292) at ../src/egl/main/eglapi.c:1823 #4 0x0000ffff90776be4 in eglCreateSyncKHR (dpy=0xaaaae26691f0, type=12612, int_list=0xfffff662e828) at ../src/egl/main/eglapi.c:1848 Signed-off-by: Guido Günther <agx@sigxcpu.org> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-04-10 21:31:16 -04:00
Mark Janes	eda36feb2b	intel/tools: Remove redundant definitions of INTEL_DEBUG INTEL_DEBUG is declared extern and defined in gen_debug.c Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-10 13:15:33 -07:00
Mark Janes	2393cc7f00	intel/common: move gen_debug to intel/dev libintel_common depends on libintel_compiler, but it contains debug functionality that is needed by libintel_compiler. Break the circular dependency by moving gen_debug files to libintel_dev. Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-10 13:15:33 -07:00
Mike Blumenkrantz	03d6d01fe2	iris: support INTEL_NO_HW environment variable Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-10 12:59:17 -07:00
Jian-Hong Pan	7295487c6d	intel: Fix the description of Coffeelake pci-id 0x3E98 According to Intel website [1], the description of chipset 8086:3E98 is Intel(R) UHD Graphics 630. Besides, xserver also mentions it as "Intel(R) UHD Graphics 630 (Coffeelake 3x8 GT2)" in commit d3a26bbf (DRI2: Add another Coffeelake PCI ID) [2]. This patch modifies the description to sync with xserver. [1]: https://ark.intel.com/content/www/us/en/ark/products/134896/intel-core-i5-9600k-processor-9m-cache-up-to-4-60-ghz.html [2]: `d3a26bbf61` Fixes: commit `44f1dcf9b3` "i965: Add a new CFL PCI ID." Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Anuj Phogat anuj.phogat@gmail.com	2019-04-10 12:31:00 -07:00
Jan Vesely	460846981a	Partially revert "gallium: fix autotools build of pipe_msm.la" This partially reverts commit `356ec7a219`. There are symbols needed by libglsl missing, so we might as well skip the entire library. Fixes: `356ec7a219` Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Acked-by: Dieter Nützel <Dieter@nuetzel-hh.de> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Tested-by: Vinson Lee <vlee@freedesktop.org>	2019-04-10 14:52:52 -04:00
Eric Anholt	afad1f7d62	vc4: Upload CS/VS UBO uniforms together. Same as I did for V3D, drop all this code trying to GC the non-indirectly-loaded uniforms from the UBO that's used for indirect access of gallium cb[0]. While it does successfully drop some of those, it came at the cost of uploading the VS's indirect unifroms twice, for the bin and render versions of the shader. With the UBO loads simplified, I was also able to easily backport V3D's change to pack a UBO offset into the uniform_data[] field so that we don't need to do the add of the uniform base in the shader. As a bonus, now vc4 doesn't depend on mesa/st type_size functions. total uniforms in shared programs: 25514 -> 25490 (-0.09%) total instructions in shared programs: 77019 -> 76836 (-0.24%)	2019-04-10 11:45:30 -07:00
Eric Anholt	0204fb77e0	vc4: Split UBO0 and UBO1 address uniform handling. I'm going to extend how UBO0 works in a moment.	2019-04-10 11:45:30 -07:00
Eric Anholt	7347d09d6a	vc4: Don't forget to set the range when scalarizing our uniforms. In the next commit, we'll want this for handling UBO access clamping. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-10 11:45:30 -07:00
Eric Anholt	771adffec1	st: Lower uniforms in st in the !PIPE_CAP_PACKED_UNIFORMS case as well. PIPE_CAP_PACKED_UNIFORMS conflates several things: Lowering uniforms i/o at the st level instead of the backend, packing uniforms with no padding at all, and lowering to UBOs. Requiring backends to lower uniforms i/o for !PIPE_CAP_PACKED_UNIFORMS leads to the driver needing to either link against the type size function in mesa/st, or duplicating it in the backend. Given that all backends want this lower-io as far as I can tell, just move it to mesa/st to resolve the link issue and avoid the driver author needing to understand st's uniforms layout. Incidentally, fixes uniform layout failures in nouveau in: dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_fragment dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_vertex dEQP-GLES2.functional.shaders.struct.uniform.sampler_array_fragment dEQP-GLES2.functional.shaders.struct.uniform.sampler_array_vertex and I think in Lima as well. v2: fix indents Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-10 11:44:20 -07:00
Lionel Landwerlin	3053d5a4f2	anv: don't use default pipeline cache for hits for VK_EXT_pipeline_creation_feedback If the user didn't provide a pipeline cache and we're using the default internal pipeline cache, then we shouldn't consider a cache hit for VK_EXT_pipeline_creation_feedback as the application did not provide a cache. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `6601e5d6fc` ("anv: implement VK_EXT_pipeline_creation_feedback") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-10 18:45:04 +01:00
Marek Olšák	53f715fafb	Revert "glsl: fix shader_storage_blocks_write_access for SSBO block arrays" This reverts commit `b7ca074cc0`. It broke a lot of tests.	2019-04-10 10:48:56 -04:00
Karol Herbst	0c4706563a	glsl/standalone: add GLES3.1 and GLES3.2 compatibility also set some constants for SSBOs. With that it can compile the shader from: dEQP-GLES31.functional.ssbo.layout.random.all_per_block_buffers.18 Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-10 16:16:36 +02:00
Erik Faye-Lund	7c05c95d05	virgl: use debug_printf instead of fprintf While we're at it, prefix the string with "VIRGL: ", to match similar code elsewhere in virgl. Fixes: `d7b3196976` ("virgl: Return an error if we use fp64 on top of GLES") Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2019-04-10 14:27:45 +02:00
Gert Wollny	04e672257c	virgl: Enable passing arrays as input to fragment shaders This is needed to properly handle interpolateAt* when the input to be interpolated is passed as array in the original GLSL. Currently, the the GLSL compiler would lower selecting the correct input so that the interpolant parameter to interpolateAt* is a temporary, and this can not be used to create a valid shader on the host side, because here the parameter must a shader input. By allowing the passing the created TGSI allows to create proper GLSL. This is related to the virglrenderer bug https://gitlab.freedesktop.org/virgl/virglrenderer/issues/74 v2: Squash the two patches handling these flags into another Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-10 11:09:40 +02:00
Gert Wollny	872519c663	Gallium: Add new CAP that indicated whether IO array definitions can be shriked PIPE_CAP_TGSI_SKIP_SHRINK_IO_ARRAYS is added to indicate whether the TGSI pass to shrink IO arrays should be skipped to enforce the originally declared array sizes and locations instead. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-10 11:09:40 +02:00
Samuel Pitoiset	a182adfd83	wsi: allow to override the present mode with MESA_VK_WSI_PRESENT_MODE This is common to all Vulkan drivers and all WSI. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-10 09:10:01 +02:00
Samuel Pitoiset	09b4049be3	radv: enable VK_AMD_gpu_shader_half_float Should be safe to enable as all instructions seem to support 16-bit. Unfortunately, there is no CTS test. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-10 09:07:17 +02:00
Rhys Perry	fd1fc255d9	ac: add 16-bit support to ac_build_ddxy() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-10 09:05:58 +02:00
Samuel Pitoiset	bc6d486c78	ac/nir: fix nir_op_b2f16 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-10 09:05:55 +02:00
Lepton Wu	1f063c0bfb	virgl: Set bind when creating temp resource. virgl render complains about "Illegal resource" when running dEQP-EGL.functional.color_clears.single_context.gles2.rgb888_window, the reason is that a zero bind value was given for temp resource. Signed-off-by: Lepton Wu <lepton@chromium.org> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-04-09 19:25:25 -07:00
Bas Nieuwenhuizen	028ce52739	radv: Add non-uniform indexing lowering. This patch does it as late as possible so the potential extra basic blocks don't inhibit other optimizations. Big thanks to Jason for writing the lowering pass. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-10 02:04:13 +02:00
Bas Nieuwenhuizen	282bacab4a	nir: Add access qualifiers on load_ubo intrinsic. Otherwise nir_lower_non_uniform_access crashes when it tries to get the access of a load_ubo. Fixes: `8ed583fe52` "spirv: Handle the NonUniformEXT decoration" Fixes: `e50ab2c0f2` "nir: Add access flags to deref and SSBO atomics" Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-10 02:04:04 +02:00
Marek Olšák	b7ca074cc0	glsl: fix shader_storage_blocks_write_access for SSBO block arrays CTS: GL45-CTS.compute_shader.resources-max Fixes: `4e1e8f684b` "glsl: remember which SSBOs are not read-only and pass it to gallium" Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-09 19:25:35 -04:00
Khaled Emara	f0fb73dcf6	freedreno: PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENT unreachable statement There seems to be a duplicate return statement, as A2XX doesn't support shader buffers. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-04-09 17:31:06 -04:00
Lionel Landwerlin	ed009e68c5	genxml: sort xml files using new script Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-09 18:24:03 +01:00
Lionel Landwerlin	903e142f0d	genxml: add a sorting script Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-09 18:23:34 +01:00
Eric Engestrom	eb699c1575	bin: drop unused import from install_megadrivers.py Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-09 16:20:37 +00:00
Juan A. Suarez Romero	ec7a33af58	anv: advertise 8 subtexel/mipmap precision bits So far ANV was advertising 4 bits for both subTexelPrecisionBits and mipmapPrecisionBits. But these values were not actually verified. But it seems the right value is actually 8 bits for both cases. Unfortunately Intel PRM does not clarify how many bits the hardware use. For the mipmap case, there is the following reference in PRM Volume 6 (3D Media GPGPU), specifically in LOD Computation Pseudocode: ``` Bias: S4.8 MinLod: U4.8 MaxLod: U4.8 Base: U4.1 MIPCnt: U4 SurfMinLod: U4.8 ResMinLod: U4.8 `` We have other clues, though: - On one side, dEQP-VK.texture.explicit_lod.* tests fail when using 4 bits, but work when using 8 bits. These tests try to mimic the expected behaviour as much real as possible, and they use the reported subTexelPrecisionBits and mipmapPrecisionBits reported to get this. - On the other side, the equivalent driver for Windows is reporting 8 bits for both elements. Not sure if they got to verify it from the PRM or from a diffent source. CC: Jason Ekstrand <jason@jlekstrand.net> CC: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-09 15:28:42 +00:00
Boyuan Zhang	d507bcdcf2	st/va: reverse qt matrix back to its original order The quantiser matrix that VAAPI provides has been applied with inverse z-scan. However, what we expect in MPEG2 picture description is the original order. Therefore, we need to reverse it back to its original order. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110257 Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com>	2019-04-09 10:51:03 -04:00
Andres Gomez	75a3dd97aa	glsl/linker: location aliasing requires types to have the same width From the OpenGL 4.60.5 spec, section 4.4.1 Input Layout Qualifiers, Page 67, (Location aliasing): " Further, when location aliasing, the aliases sharing the location must have the same underlying numerical type and bit width (floating-point or integer, 32-bit versus 64-bit, etc.) and the same auxiliary storage and interpolation qualification." Additionally, we have improved the linker error descriptions. Specifically, when taking structs into account we were producing a linker error because we assumed that all components in each location were used and that would cause component aliasing. This is not accurate of the actual problem. Now, the failure specifies that the underlying numerical type incompatibility is the cause for the failure. Fixes the following piglit test: tests/spec/arb_enhanced_layouts/linker/component-layout/vs-to-fs-width-mismatch-double-float.shader_test v2: - Do not assert if we see invalid numerical types. These come straight from shader code, so we should produce linker errors if shaders attempt to do location aliasing on variables that are not numerical such as records. - While we are at it, improve error reporting for the case of numerical type mismatch to include the shader stage. v3: - Allow location aliasing of images and samplers. If we get these it means bindless support is active and they should be handled as 64-bit integers (Ilia) - Make sure we produce link errors for any non-numerical type for which we attempt location aliasing, not just structs. v4: - Rebased with minor fixes (Andres). - Added fixing tag to the commit log (Andres). v5: - Remove the helper function and check individually for the underlying numerical type and bit width (Timothy). - Implicitly, assume that any non-treated type which is checked for its underlying numerical type is either integer or float and has a defined bit width (Timothy). - Implicitly, assume that structs are the only non-treated non-numerical type (Timothy). - Improve the linker error descriptions and commit log (Andres). Fixes: `13652e7516` ("glsl/linker: Fix type checks for location aliasing") Cc: Ilia Mirkin <imirkin@alum.mit.edu> Cc: Timothy Arceri <tarceri@itsqueeze.com> Cc: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-09 12:56:50 +02:00
Gert Wollny	b999865f55	softpipe: Enable PIPE_CAP_TEXTURE_BUFFER_OFFSET_ALIGNMENT The offset alignment must be set to s16 because the tile cache is implemented to require this. This enables ARB_buffer_texture_range and OES_texture_buffer for softpipe. The according deqp-gles31 tests pass. Also update the feature table. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-04-09 08:17:45 +00:00
Gert Wollny	8cf8dfe408	softpipe: Add an extra code path for the buffer texel lookup With buffers the addressing is done on a per-byte bases so the code path for normal textures doesn't work properly. Also add an assert to make sure that the bit cound for storing the X coordinate is large enough. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-04-09 08:17:44 +00:00
Gert Wollny	47dd7c4054	softpipe: raise number of bits used for X coordinate texture lookup With buffers the addressing is done on a per byte basis and we with a maximal block size of 16 byte we have to take into acount four more bits. For simplicity just remove the TEX_TILE_SIZE_LOG2, which is 5 bit. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-04-09 08:17:44 +00:00
Gert Wollny	11f219a5ee	softpipe: Don't use mag filter for gather op For the gather op no magnifictaion filter is provided, so always use the filter given for minification (which is the linear filter) Fixes: `0dff1533f2` softpipe: Use mag texture filter also for clamped lod == 0 Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-04-09 09:50:13 +02:00
Jason Ekstrand	6279074de1	nir: Get rid of global registers We have a pass to lower global registers to locals and many drivers dutifully call it. However, no one ever creates a global register ever so it's all dead code. It's time we bury it. Acked-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-09 00:29:36 -05:00
Jason Ekstrand	b28bad89b9	nir: Get rid of nir_register::is_packed All we ever do is initialize it to zero, clone it, print it, and validate it. No one ever sets or uses it. Acked-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-09 00:29:36 -05:00
Dave Airlie	ff852fdc05	virgl: add support for ARB_indirect_parameters The protocol changes are already in place for it. Reviewed-By: Gert Wollny <gert.wollny@collabora.com>	2019-04-09 14:25:01 +10:00
Dave Airlie	05ff2dbf13	virgl: add support for ARB_multi_draw_indirect This will pass the multi draw through to the host if it has support for it instead of using the st to emulate it Reviewed-By: Gert Wollny <gert.wollny@collabora.com>	2019-04-09 14:15:24 +10:00
Dave Airlie	316b785c59	virgl: add support for missing command buffer binding. When I added indirect support I forgot this, however to use it now we need to check for a new enough capability on the host side. Reviewed-By: Gert Wollny <gert.wollny@collabora.com>	2019-04-09 14:15:12 +10:00
Caio Marcelo de Oliveira Filho	899fd66b44	docs: Add NV_compute_shader_derivatives to 19.1.0 relnotes	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	45a4129392	anv: Implement VK_NV_compute_shader_derivatives Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	bd73531677	spirv: Add support for DerivativeGroup capabilities As defined in SPV_NV_compute_shader_derivatives. These control how the invocations are arranged in a CS when doing derivative and related operations (which are also enabled by the extension). Since we expect valid SPIR-V, we don't need to do more work at SPIR-V level to enable the derivative and related operations to be called. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	956226c8ba	iris: Enable NV_compute_shader_derivatives Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	f9b29c4a58	gallium: Add PIPE_CAP_COMPUTE_SHADER_DERIVATIVES To enable NV_compute_shader_derivatives, which allows derivatives (and texture lookups with implicit derivatives) in compute shaders. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	c9d1569689	i965: Advertise NV_compute_shader_derivatives Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	94abc53030	intel/fs: Use NIR_PASS_V when lowering CS intrinsics This will make that step visible in NIR_PRINT=1. v2: Also use the macro for the cleanup passes. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	0425b34b79	intel/fs: Don't loop when lowering CS intrinsics This was needed when certain intrinsics were lowered to other ones that were defined by the same pass. After `060817b2` "intel,nir: Move gl_LocalInvocationID lowering to nir_lower_system_values" we don't need the loop anymore. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	3ee3024804	intel/fs: Add support for CS to group invocations in quads When using quads, instead of mapping the elements to the next 4 local invocation indices, we map the two next in the "current" row and two next in the "next row". A side effect is that a thread will execute the indices in a different order. We now perform the lowering of both local invocation ID and index together -- and don't rely anymore on lowering done by nir_lower_system_values. That is convenient when doing the math for quads, because we need X and Y to get the right invocation index. When the pass progresses, fold the constants and clean up to reduce the noise from the indexing math. This implements the derivative_group_quadsNV semantics from NV_compute_shader_derivatives. v2: Take subgroup_id into account, otherwise only values in the first subgroup would be used. (Jason) v3: Calculate invocation index and ID together, to avoid duplicating some math in the quads case when both index and ID are used. (Jason) v4: Don't call cleanup passes as part of the lowering, let that to the call site. (Jason) Change calculation to use less instructions. (Jason) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v3) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	ef0339d5ea	intel/fs: Use TEX_LOGICAL whenever implicit lod is supported Make sure we include compute shaders that have a derivative group defined. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	fcbc5ccaae	nir: Don't set LOD=0 for compute shader that has derivative group When using NV_compute_shader_derivatives to set a derivative group, a compute shader supports texture with implicit LOD calculation, so don't set an explicit LOD. Note if the extension is used but the derivative group is not specified, it will default to LOD=0 as before. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:33 -07:00
Caio Marcelo de Oliveira Filho	d08a74d2bf	nir/algebraic: Lower CS derivatives to zero when no group defined In compute shaders if no derivative group is defined, the derivatives will always be zero. Specified in NV_compute_shader_derivatives. To make the check more convenient, add a "info" local variable to the generated code so we can refer to it in the Python rules. (Jason) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho	3c5ddaeacd	glsl: Parse and propagate derivative_group to shader_info NV_compute_shader_derivatives allow selecting between two possible arrangements (quads and linear) when calculating derivatives and certain subgroup operations in case of Vulkan. So parse and propagate those up to shader_info.h. v2: Do not fail when ARB_compute_variable_group_size is being used, since we are still clarifying what is the right thing to do here. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho	ca60f0b7ba	glsl: Enable texture builtins for NV_compute_shader_derivatives Renamed a few predicates from "fs_only" to be "derivative_only" (or similar pairs). Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho	09a3273fe7	glsl: Enable derivative builtins for NV_compute_shader_derivatives Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho	289478ea89	glsl: Remove redundant conditions when asserting in_qualifier As the code evolved, we ended up with a redundant conditions. Clean this up. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-08 19:29:32 -07:00
Caio Marcelo de Oliveira Filho	163655b33e	mesa: Extension boilerplate for NV_compute_shader_derivatives Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-08 19:29:32 -07:00
Timothy Arceri	e30804c602	nir/radv: remove restrictions on opt_if_loop_last_continue() When I implemented opt_if_loop_last_continue() I had restricted this pass from moving other if-statements inside the branch opposite the continue. At the time it was causing a bunch of spilling in shader-db for i965. However Samuel Pitoiset noticed that making this pass more aggressive significantly improved the performance of Doom on RADV. Below are the statistics he gathered. 28717 shaders in 14931 tests Totals: SGPRS: 1267317 -> 1267549 (0.02 %) VGPRS: 896876 -> 895920 (-0.11 %) Spilled SGPRs: 24701 -> 26367 (6.74 %) Code Size: 48379452 -> 48507880 (0.27 %) bytes Max Waves: 241159 -> 241190 (0.01 %) Totals from affected shaders: SGPRS: 23584 -> 23816 (0.98 %) VGPRS: 25908 -> 24952 (-3.69 %) Spilled SGPRs: 503 -> 2169 (331.21 %) Code Size: 2471392 -> 2599820 (5.20 %) bytes Max Waves: 586 -> 617 (5.29 %) The codesize increases is related to Wolfenstein II it seems largely due to an increase in phis rather than the existing jumps. This gives +10% FPS with Doom on my Vega56. Rhys Perry also benchmarked Doom on his VEGA64: Before: 72.53 FPS After: 80.77 FPS v2: disable pass on non-AMD drivers Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1) Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-09 11:29:41 +10:00
Dave Airlie	c6cf602121	softpipe: add support for vertex streams (v2) This enables the ARB_gpu_shader5 vertex streams on softpipe. v2: only enable when not using llvm. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2019-04-09 11:20:39 +10:00
Dave Airlie	7720ce32aa	draw: add support to tgsi paths for geometry streams. (v2) This hooks up the geometry shader processing to the TGSI support added in the previous commits. It doesn't change the llvm interface other than to keep things building. v2: fix some regressions caused by primitiveoffsets Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2019-04-09 11:19:38 +10:00
Dave Airlie	ddb9ad363d	softpipe: add support for indexed queries. We need indexed queries to retrieve the geom shader info. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2019-04-09 11:19:38 +10:00
Dave Airlie	00fe67c015	tgsi: add support for geometry shader streams. This adds support to retrieve the primitive counts for each stream, along with the offset for each primitive into the output array. It also adds support for parsing the stream argument to the emit and end instructions. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2019-04-09 11:19:38 +10:00
Dave Airlie	333746011d	draw: add stream member to stats callback This just adds space for the member to the callback, doesn't change anything else. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2019-04-09 11:19:38 +10:00
Chia-I Wu	63b823130d	vulkan/wsi: make wl_drm optional When wl_drm is missing and the driver supports modifiers, use zwp_linux_dmabuf_v1 for the list of supported formats and for buffer creation. Limit the supported formats to those with modifiers, which are WL_DRM_FORMAT_{ARGB8888,XRGB8888} currently. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2019-04-09 00:42:30 +00:00
Chia-I Wu	5318858f35	vulkan/wsi: add wsi_wl_display_dmabuf Add wsi_wl_display_dmabuf for zwp_linux_dmabuf_v1-related states. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2019-04-09 00:42:30 +00:00
Chia-I Wu	fd7fecf59a	vulkan/wsi: add wsi_wl_display_drm Add wsi_wl_display_drm for wl_drm-related states. We will move formats into the struct in a later commit. Remove the unnecessary check for wl_registry_bind failures. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2019-04-09 00:42:30 +00:00
Chia-I Wu	22dcb080d9	vulkan/wsi: refactor drm_handle_format Refactor the swtich statement in drm_handle_format out to wsi_wl_display_add_wl_format. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2019-04-09 00:42:30 +00:00
Chia-I Wu	2d214d9405	vulkan/wsi: create wl_drm wrapper as needed When modifiers are specified, we have to use dmabuf rather than wl_drm. We don't need the wrapper in that case. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2019-04-09 00:42:30 +00:00
Chia-I Wu	ab74937b2c	vulkan/wsi: move modifier array into wsi_wl_swapchain This avoids repeated checks for each wsi_wl_image. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2019-04-09 00:42:30 +00:00
Adam Jackson	52426ce4a9	drisw: Try harder to probe whether MIT-SHM works XQueryExtension merely tells you whether the extension exists, it doesn't tell you whether you're local enough for it to work. XShmQueryVersion is not enough to discover this either, you need to provoke the server to do actual work, and if it thinks you're remote it will throw BadRequest at you. So send an invalid ShmDetach and use the error code to distinguish local from remote. [airlied: fixed bug not resetting xshm_error to 0 on success, which made later stuff fail completely.] Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Adam Jackson <ajax@redhat.com>	2019-04-09 09:50:24 +10:00
Jason Ekstrand	50f3535d1f	nir/search: Search for all combinations of commutative ops Consider the following search expression and NIR sequence: ('iadd', ('imul', a, b), b) ssa_2 = imul ssa_0, ssa_1 ssa_3 = iadd ssa_2, ssa_0 The current algorithm is greedy and, the moment the imul finds a match, it commits those variable names and returns success. In the above example, it maps a -> ssa_0 and b -> ssa_1. When we then try to match the iadd, it sees that ssa_0 is not b and fails to match. The iadd match will attempt to flip itself and try again (which won't work) but it cannot ask the imul to try a flipped match. This commit instead counts the number of commutative ops in each expression and assigns an index to each. It then does a loop and loops over the full combinatorial matrix of commutative operations. In order to keep things sane, we limit it to at most 4 commutative operations (16 combinations). There is only one optimization in opt_algebraic that goes over this limit and it's the bitfieldReverse detection for some UE4 demo. Shader-db results on Kaby Lake: total instructions in shared programs: 15310125 -> 15302469 (-0.05%) instructions in affected programs: 1797123 -> 1789467 (-0.43%) helped: 6751 HURT: 2264 total cycles in shared programs: 357346617 -> 357202526 (-0.04%) cycles in affected programs: 15931005 -> 15786914 (-0.90%) helped: 6024 HURT: 3436 total loops in shared programs: 4360 -> 4360 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 23675 -> 23666 (-0.04%) spills in affected programs: 235 -> 226 (-3.83%) helped: 5 HURT: 1 total fills in shared programs: 32040 -> 32032 (-0.02%) fills in affected programs: 190 -> 182 (-4.21%) helped: 6 HURT: 2 LOST: 18 GAINED: 5 Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2019-04-08 21:38:48 +00:00
Lionel Landwerlin	48e48b8560	intel: add dependency on genxml generated files Drivers using genxml will start compilation before generated files are created, so add a dependency to it. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Cc: mesa-stable@lists.freedesktop.org	2019-04-08 20:52:47 +00:00
Marek Olšák	4b63f57cbc	radeonsi: fix a crash when unbinding sampler states Acked-by: James Zhu <James.Zhu@amd.com>	2019-04-08 15:23:32 -04:00
Samuel Pitoiset	775191cd99	radv: fix getting the vertex strides if the bindings aren't contiguous Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110349 Fixes: `a66b186beb` ("radv: use typed buffer loads for vertex input fetches") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-08 21:17:15 +02:00
Lionel Landwerlin	ce790c96a9	anv: implement VK_KHR_swapchain revision 70 This revision allows for images to be : - created by reusing image parameters from swapchain - bound to memory from a swapchain v2: Add color attachment flag Use same implicit WSI parameters (tiling, samples, usage) v3: Fix missing break in vk_foreach_struct_const() switch (Lionel) v4: Fix accessing image aspects before android resolve (Tapani) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-04-08 18:27:02 +01:00
Eric Engestrom	ed91ca0629	vk/util: remove unneeded array index This is an array of 1, so [0] is the only content, and meson already flattens the list so this is unnecessary. Also, all the other uses of vk_api_xml don't do that. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-04-08 17:03:00 +00:00
Samuel Pitoiset	27b8f3ecc3	ac/nir: fix intrinsic names for atomic operations with LLVM 9+ This fixes the following LLVM error when using RADV_DEBUG=checkir: Intrinsic name not mangled correctly for type arguments! Should be: llvm.amdgcn.buffer.atomic.add.i32 i32 (i32, <4 x i32>, i32, i32, i1)* @llvm.amdgcn.buffer.atomic.add The cmpswap operation still uses the old intrinsic. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-08 13:16:50 +02:00
Alyssa Rosenzweig	4209a27c61	panfrost: Remove "mali_unknown6" nonsense This structure was used maaaany moons ago as a placeholder for the varying meta (now unified with mali_attr_meta and essentially fully decoded). I don't know why it's still in the file. Let's wack it. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-07 16:05:42 +00:00
Alyssa Rosenzweig	b19d1a1e63	panfrost/midgard: Enable lower_find_lsb This is exactly what the blob does. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-07 16:01:49 +00:00
Alyssa Rosenzweig	65816ad6e8	panfrost/midgard: Add ibitcount8 op The mechanics of this opcode are a little opaque, but essentially, it's used in 8-bit mode to do a bit count in parallel of a uint and then doing a ton of clever iadd/imov ops to recombine. v2: Correct opcode. Thank you to jernej on IRC for noticing this awkward typo! Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-07 16:01:12 +00:00
Alyssa Rosenzweig	6cba9acb75	panfrost/midgard: Add ilzcnt op Used for implementing findLSB/MSB Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-07 16:00:39 +00:00
Alyssa Rosenzweig	2e7555b14b	panfrost/midgard: Add umin/umax opcodes Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-07 15:59:05 +00:00
Alyssa Rosenzweig	d84ee49027	panfrost: Add tilebuffer load? branch Also document branches better. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-07 15:58:44 +00:00
Alyssa Rosenzweig	7cccc89f80	panfrost/decode: Add flags for tilebuffer readback These flags are set when reading back the tilebuffer from a fragment shader via various mechanisms (including ARM_shader_framebuffer_fetch and EXT_pixel_local_storage). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-07 15:58:19 +00:00
Karol Herbst	1aabb79bdc	panfrost/midgard: use nir_src_is_const and nir_src_as_uint Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-07 15:56:10 +00:00
Jason Ekstrand	10a2fdacfa	vc4: Prefer nir_src_comp_as_uint over nir_src_as_const_value Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-07 15:13:36 +02:00
Karol Herbst	5450f1c9fb	v3d: prefer using nir_src_comp_as_int over nir_src_as_const_value Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-07 15:13:36 +02:00
Kenneth Graunke	4e802089bc	gallium/util: Add const to u_range_intersect This doesn't modify the range, so it can accept a const pointer. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-04-07 00:21:12 -07:00
Greg V	c5a6e72e15	gallium/hud: add CPU usage support for FreeBSD Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-04-07 06:47:57 +00:00
Kenneth Graunke	9c46046f79	iris: Silence unused variable warnings in release mode	2019-04-06 15:58:16 -07:00
Jason Ekstrand	ad8c145658	nir/algebraic: Add some logical OR and AND patterns The new OR pattern has been seen in the wild and can end up being generated by GLSLang. Not sure about the other two new patterns but we may as well throw them in for completeness. While we're here, we can drop the '@bool' specifier from the one pattern because specifying True already implies 1-bit which basically implies boolean. Shader-db results on Kaby Lake: total instructions in shared programs: 15321227 -> 15321129 (<.01%) instructions in affected programs: 3594 -> 3496 (-2.73%) helped: 6 HURT: 0 total cycles in shared programs: 357481321 -> 357479725 (<.01%) cycles in affected programs: 44109 -> 42513 (-3.62%) helped: 6 HURT: 0 VkPipeline-DB results on Kaby Lake: total instructions in shared programs: 3770504 -> 3769734 (-0.02%) instructions in affected programs: 19058 -> 18288 (-4.04%) helped: 163 HURT: 0 total cycles in shared programs: 1417583701 -> 1417569727 (<.01%) cycles in affected programs: 750958 -> 736984 (-1.86%) helped: 158 HURT: 1 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-05 18:39:06 -05:00
Jason Ekstrand	03a72d96d8	nir/algebraic: Drop some @bool specifiers Now that we have one-bit booleans, we don't need to rely on looking at parent instructions in order to figure out if a value is a Boolean most of the time. We can drop these specifiers and now the optimizations will apply more generally. Shader-DB results on Kaby Lake: total instructions in shared programs: 15321168 -> 15321227 (<.01%) instructions in affected programs: 8836 -> 8895 (0.67%) helped: 1 HURT: 31 total cycles in shared programs: 357481781 -> 357481321 (<.01%) cycles in affected programs: 146524 -> 146064 (-0.31%) helped: 22 HURT: 10 total spills in shared programs: 23675 -> 23673 (<.01%) spills in affected programs: 11 -> 9 (-18.18%) helped: 1 HURT: 0 total fills in shared programs: 32040 -> 32036 (-0.01%) fills in affected programs: 27 -> 23 (-14.81%) helped: 1 HURT: 0 No change in VkPipeline-DB Looking at the instructions hurt, a bunch of them seem to be a case where doing exactly the right thing in NIR ends up doing the wrong-ish thing in the back-end because flags are dumb. In particular, there's a case where we have a MUL followed by a CMP followed by a SEL and when we turn that SEL into an OR, it uses the GRF result of the CMP rather than the flag result so the CMP can't be merged with the MUL. Those shaders appear to schedule better according to the cycle estimates so I guess it's a win? Also it helps spilling in one Car Chase compute shader. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-05 18:39:00 -05:00
Andrii Simiklit	cade9001b1	util: clean the 24-bit unused field to avoid an issues This is a field of FLOAT_32_UNSIGNED_INT_24_8_REV texture pixel. OpenGL spec "8.4.4.2 Special Interpretations" is saying: "the second word contains a packed 24-bit unused field, followed by an 8-bit index" The spec doesn't require us to clear this unused field however it make sense to do it to avoid some undefined behavior in some apps. Suggested-by: Eric Anholt <eric@anholt.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Eric Anholt <eric@anholt.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110305 Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>	2019-04-05 21:33:53 +00:00
Caio Marcelo de Oliveira Filho	c037dbb0ef	nir: Take if_uses into account when repairing SSA If a def is used as an condition before its definition, we should also consider this a case to repair. When repairing, make sure we rewrite any if conditions too. Found in while inspecting a SPIR-V conversion from a 'continue block' that contains a conditional branch. We pull the continue block up to the beggining of the loop, and the condition in the branch ends up defined afterwards. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Fixes: `364212f1ed` "nir: Add a pass to repair SSA form"	2019-04-05 09:43:46 -07:00
Marek Olšák	26e161b1e9	tegra: fix the build after the set_shader_buffers change	2019-04-05 11:18:39 -04:00
James Zhu	0f416b85fb	gallium/auxiliary/vl: Add barrier/unbind after compute shader launch. Add memory barrier sync for multiple launch cases, and unbind completed resources after launch. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-05 09:50:52 -04:00
James Zhu	4bbc9c493f	gallium/auxiliary/vl: Fixed blank issue with compute shader Multiple init buffer within one open instance will cause blank issue. Updating viewport per frame will fix this issue. Signed-off-by: James Zhu <James.Zhu@amd.com> Tested-by: Bruno Milreu <bmilreu@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-05 09:50:52 -04:00
James Zhu	32b861d46d	gallium/auxiliary/vl: Fixed blur issue with weave compute shader Correct wrong interpolatation with top/bottom row which caused blur issue. Signed-off-by: James Zhu <James.Zhu@amd.com> Tested-by: Bruno Milreu <bmilreu@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-05 09:50:52 -04:00
Emil Velikov	a28dc6b57f	docs: update calendar, add news item and link release notes for 18.3.6 Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2019-04-05 13:24:29 +01:00
Emil Velikov	d5ba84dc52	docs: add sha256 checksums for 18.3.6 Signed-off-by: Emil Velikov <emil.velikov@collabora.com> (cherry picked from commit `eb9da68cbf`)	2019-04-05 13:20:26 +01:00
Emil Velikov	9b537f2d21	docs: add release notes for 18.3.6 Signed-off-by: Emil Velikov <emil.velikov@collabora.com> (cherry picked from commit `b03f51c4b4`)	2019-04-05 13:20:25 +01:00
Samuel Pitoiset	5eb17506e1	nir: do not pack varying with different types The current algorithm only supports packing 32-bit types. If a shader uses both 16-bit and 32-bit varyings, we shouldn't compact them together. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-05 13:57:42 +02:00
Gert Wollny	0dff1533f2	softpipe: Use mag texture filter also for clamped lod == 0 Follow the spec when selecting the magnification filter (OpenGL 4.5, section 8.14): If λ(x, y) is less than or equal to the constant c (see section 8.15) the texture is said to be magnified; While we're here also silence a potential warning about implicit float to double conversion. v2: Update commit message to contain a reference to the spec as pointed out by Eric. Fixes a number of dEQP GLES2 and GLES3 test out of: dEQP-GLES2.functional.texture.filtering.* dEQP-GLES2.functional.texture.vertex.2d.filtering.* dEQP-GLES3.functional.texture.vertex..filtering. dEQP-GLES3.functional.texture.filtering.* dEQP-GLES3.functional.texture.shadow.2d.* Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-05 09:07:45 +02:00
Tapani Pälli	361f3d19f1	iris: handle aux properly in iris_resource_get_handle Disable aux when resource seen the first time and EXPLICIT_FLUSH not being set. This fixes issues seen when launching Xorg and CCS_E getting utilized. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-04 23:35:24 -07:00
Eric Anholt	276d22c52d	v3d: Add some more new packets for V3D 4.x. The T/G shader references and common state will be needed for GLES 3.2.	2019-04-04 17:30:35 -07:00
Eric Anholt	4c70f276bc	v3d: Don't try to use the TFU blit path if a scissor is enabled. We'll need to do a render-based blit for scissors, since the TFU (as seen in this conditional) can only update a whole surface. Fixes: `976ea90bdc` ("v3d: Add support for using the TFU to do some blits.") Fixes piglit fbo-scissor-blit.	2019-04-04 17:30:35 -07:00
Eric Anholt	62360e92ec	v3d: Bump the maximum texture size to 4k for V3D 4.x. 4.1 and 4.2 both have the same 16k limit, but it I'm seeing GPU hangs in the CTS at 8k and 16k. 4k at least lets us get one 4k display working. Cc: mesa-stable@lists.freedesktop.org	2019-04-04 17:30:35 -07:00
Eric Anholt	e3063a8b2f	v3d: Add support for handling OOM signals from the simulator. I have v3d allocating enough initial allocation memory that we've been passing tests without it, but to match kernel behavior more it would be good to actually exercise the OOM path.	2019-04-04 17:30:35 -07:00
Illia Iorin	a113a42e73	mesa/main: Fix multisample texture initialize Sampler of Multisample textures wasn't initialized correct. So when texture object created as multisample its sampler is initialized in a individual case. We change the initial state of TEXTURE_MIN_FILTER and TEXTURE_MAG_FILTER to NEAREST. These changes are approved by KhronosGroup. https://github.com/KhronosGroup/OpenGL-API/issues/45 Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com> Signed-off-by: Illia Iorin <illia.iorin@globallogic.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109057	2019-04-05 11:28:10 +11:00
Sergii Romantsov	a7d40a13ec	glsl: Fix input/output structure matching across shader stages Section 7.4.1 (Shader Interface Matching) of the OpenGL 4.30 spec says: "Variables or block members declared as structures are considered to match in type if and only if structure members match in name, type, qualification, and declaration order." Fixes: * layout-location-struct.shader_test v2: rebased against master and small fixes Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108250	2019-04-05 11:02:23 +11:00
Dave Airlie	738921afd9	ddebug: add compute functions to help hang detection Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-05 10:01:08 +10:00
Dave Airlie	0ea386128b	iris: avoid use after free in shader destruction While playing with compute shaders, I was getting a random crash, noticed that bind_state was using the old shader info for comparision, but gallium allows the shader to be deleted while bound, so this could lead to a use after free. This can't happen using the cso cache. As it tracks all of this. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-05 09:57:44 +10:00
Marek Olšák	42f63e6334	radeonsi: set exact shader buffer read/write usage in CS Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-04 19:28:52 -04:00
Marek Olšák	4e1e8f684b	glsl: remember which SSBOs are not read-only and pass it to gallium Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-04 19:28:52 -04:00
Marek Olšák	66a82ec6f0	gallium: add writable_bitmask parameter into set_shader_buffers to indicate write usage per buffer. This is just a hint (it will be used by radeonsi). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-04-04 19:28:52 -04:00
Danylo Piliaiev	b19494c54e	iris: Fix assert when using vertex attrib without buffer binding The GL 4.5 spec says: "If any enabled array’s buffer binding is zero when DrawArrays or one of the other drawing commands defined in section 10.4 is called, the result is undefined." The result is undefined but it should not crash. Fixes: gl-3.1-vao-broken-attrib Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-04 22:57:24 +00:00
Tapani Pälli	61cc379371	iris: move iris_flush_resource so we can call it from get_handle Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-04 13:36:51 -07:00
Kenneth Graunke	8d9e169bdd	iris: Save/restore MI_PREDICATE_RESULT, not MI_PREDICATE_DATA. MI_PREDICATE_DATA is an intermediate storage for the MI_PREDICATE command's calculations - it holds the result of the subtraction when the compare operation is SRCS_EQUAL or DELTAS_EQUAL. But the actual result of the predication is MI_PREDICATE_RESULT, which is what we want to copy from the render context to the compute context.	2019-04-04 11:41:10 -07:00
Eric Engestrom	d1dd3cbcc7	util/process: document memory leak We consider it acceptable, but let's still document it in case people notice it and are not sure why it's there. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-04-04 16:09:52 +00:00
Eric Engestrom	05b114e526	simplify LLVM version string printing Figure it out once in the build system, then just use that all over the place. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-04 16:08:11 +00:00
Guido Günther	593614f4d4	gallium/u_dump: util_dump_sampler_view: Dump u.tex.first_level Dump u.tex.first_level instead of dumping u.tex.last_level twice. Signed-off-by: Guido Günther <agx@sigxcpu.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-04 17:30:19 +02:00
Guido Günther	a5e24dc416	gallium: ddebug: Add missing fence related wrappers Without that `GALLIUM_DDEBUG=always kmscube -A` would segfault like #0 0x0000000000000000 in () #1 0x0000ffffa72a3c54 in dri2_get_fence_fd (_screen=0xaaaaed4f2090, _fence=0xaaaaed9ef880) at ../src/gallium/state_trackers/dri/dri_helpers.c:140 #2 0x0000ffffa8744824 in dri2_dup_native_fence_fd (drv=0xaaaaed5010c0, disp=0xaaaaed5029a0, sync=0xaaaaed9ef7c0) at ../src/egl/drivers/dri2/egl_dri2.c:3050 #3 0x0000ffffa87339b8 in eglDupNativeFenceFDANDROID (dpy=0xaaaaed5029a0, sync=0xaaaaed9ef7c0) at ../src/egl/main/eglapi.c:2107 #4 0x0000aaaabd29ca90 in () #5 0x0000aaaabd401000 in () Signed-off-by: Guido Günther <agx@sigxcpu.org> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>	2019-04-04 17:30:15 +02:00
Danylo Piliaiev	3fdfface3e	st/mesa: Fix GL_MAP_COLOR with glDrawPixels GL_COLOR_INDEX Documentation for glDrawPixels with GL_COLOR_INDEX says: "If the GL is in color index mode, and if GL_MAP_COLOR is true, the index is replaced with the value that it references in lookup table GL_PIXEL_MAP_I_TO_I" We are always in RGBA mode and there is nothing in documentation about GL_MAP_COLOR in RGBA mode for GL_COLOR_INDEX. Scale and bias are also only applicable for RGBA format and not mentioned for GL_COLOR_INDEX. Thus the behaviour will be on par with i965. Fixes: gl-1.0-drawpixels-color-index Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-04-04 10:38:32 -04:00
Eric Engestrom	f6ceed205c	gallium/hud: fix rounding error in nic bps computation While at it, fix typo in "rounding error" :P Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-04 13:59:24 +00:00
Eric Engestrom	9d6ea55263	gallium/hud: prevent buffer overflow Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-04 13:59:24 +00:00
Eric Engestrom	4633d13854	gallium/hud: fix memory leaks Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-04-04 13:59:24 +00:00
Marek Olšák	b563460b49	radeonsi: enable displayable DCC on Ravens	2019-04-04 09:53:24 -04:00
Marek Olšák	1f21396431	radeonsi: add support for displayable DCC for multi-RB chips A compute shader is used to reorder DCC data from aligned to unaligned.	2019-04-04 09:53:24 -04:00
Marek Olšák	2c09eb4122	radeonsi: add support for displayable DCC for 1 RB chips This is the simpler codepath - just disable RB and pipe alignment for DCC.	2019-04-04 09:53:24 -04:00
Marek Olšák	029bfa3d25	radeonsi: add ability to bind images as image buffers so that we can bind DCC (texture) as an image buffer.	2019-04-04 09:53:24 -04:00
Marek Olšák	fe3bfd7971	radeonsi/gfx9: add support for PIPE_ALIGNED=0 Needed by displayable DCC. We need to flush L2 after rendering if PIPE_ALIGNED=0 and DCC is enabled.	2019-04-04 09:53:24 -04:00
Marek Olšák	e457454cb6	amd/addrlib: fix uninitialized values for Addr2ComputeDccAddrFromCoord Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-04 09:30:40 -04:00
Tapani Pälli	41f76dd513	iris: move variable to the scope where it is being used iris_upload_border_color is passed a pointer which points to variable that is introduced in a different scope. CID: 1444296 Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-04 04:43:20 +00:00
Tapani Pälli	3cea9f981a	st/nir: run st_nir_opts after 64bit ops lowering CID: 1444309 Fixes: `9ab1b1d022` "st/nir: Move 64-bit lowering later" Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-04-04 07:38:10 +03:00
Alyssa Rosenzweig	b34d8222c7	panfrost: Size tiled temp buffers correctly This should lower transient memory usage and improve performance slightly (due to less memory to malloc/free, better cache locality, etc). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-04 03:51:43 +00:00
Alyssa Rosenzweig	c0183e8eed	panfrost: Respect box->width in tiled stores This fixes a regression uploading partial tiled textures introduced sometime during the cubemap series. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-04 03:51:43 +00:00
Alyssa Rosenzweig	3b38a7e505	panfrost: Cleanup some indirection in pan_resource Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-04 03:51:43 +00:00
Alyssa Rosenzweig	7e8de5a707	panfrost: Implement system values This patch implements system values via specially-crafted uniforms. While we previously had an ad hoc system for passing the viewport into the vertex shader, this commit generalizes the system to allow for arbitrary system values to be added to both shader stages. While we're at it, we clean up uniform handling code (which was considerably muddied to handle the ad hoc viewport uniform). This commit serves as both a cleanup of the existing codebase and the precursor to new functionality, like implementing textureSize(). Concurrent with these changes is respecting the depth transform, which was not possible with the old fixed uniform system and here serves as a proof-of-correctness test (as well as justifying the NIR changes). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-04-04 03:44:15 +00:00
Alyssa Rosenzweig	a83862754e	nir: Add "viewport vector" system values While a partial set of viewport system values exist, these are scalar values, which is a poor fit for viewport transformations on vector ISAs like Midgard (where the vec3 values for scale and offset each need to be coherent in a vec4 uniform slot to take advantage of vectorized transform math). This patch adds vec3 scale/offset fields corresponding to the 3D Gallium viewport / glViewport+depth Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-04 03:44:09 +00:00
Erik Faye-Lund	b85ca86c1e	virgl: also destroy all read-transfers For texture write-transfers, we either free them on the transfer-queue or right away. But for read-transfers, we currently only destroy them in case they used a temp-resource. This leads to occasional resource-leaks. Let's add a call to virgl_resource_destroy_transfer in the missing case. Do the same thing for buffers as well, but the logic is a bit easier to follow there. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Fixes: `f0e71b1088` ("virgl: use transfer queue") Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-03 18:59:23 +02:00
Dylan Baker	4c332a1f9f	meson: Error if LLVM is turned off but clover it turned on Since clover has a hard requirement on LLVM v2: - make error message more specific Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-04-03 09:41:24 -07:00
Dylan Baker	29912f2ea4	meson: Error if LLVM doesn't have rtti when building clover We already do this for nouveau, but it's required for clover too.	2019-04-03 09:41:24 -07:00
Alyssa Rosenzweig	138865e676	panfrost: Remove support for legacy kernels Previously, there was minimal support for interoperating with legacy kernels (reusing kernel modules originally designed for proprietary legacy userspaces, rather than for upstream-friendly free software stacks). Now that the Panfrost kernel is stabilising, this commit drops the legacy code path. Panfrost users need to use a modern, mainline kernel supporting the Panfrost kernel driver from this commit forward. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-04-03 15:21:30 +00:00
Lucas Stach	43db0632e7	etnaviv: only try to construct scanout resource when on KMS winsys Trying to construct a scanout capable buffer will only ever work when when we are on top of a KMS winsys, as the render node isn't capable of allocating contiguous buffers. Tested-by: Marius Vlad <marius.vlad@collabora.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-04-03 12:54:09 +02:00
Lucas Stach	3d8da347ac	etnaviv: flush all pending contexts when accessing a resource with the CPU When setting up a transfer to a resource, all contexts where the resource is pending must be flushed. Otherwise a write transfer might be started in the current context before all contexts that access the resource in shared (read) mode have been executed. Fixes: `64813541d5` (etnaviv: fix resource usage tracking across different pipe_context's) Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Tested-By: Guido Günther <agx@sigxcpu.org>	2019-04-03 12:54:09 +02:00
Lucas Stach	f317ee1aff	etnaviv: don't flush own context when updating resource use The context is self synchronizing at the GPU side, as commands are executed in order. We must not flush our own context when updating the resource use, as that leads to excessive flushing on effectively every draw call, causing huge CPU overhead. Fixes: `64813541d5` (etnaviv: fix resource usage tracking across different pipe_context's) Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-04-03 12:54:09 +02:00
Christian Gmeiner	c7cddc2787	etnaviv: shrink struct etna_3d_state Drop struct members which are only written to but never read from. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>	2019-04-03 12:54:09 +02:00
Dave Airlie	11e1fa11d6	intel/compiler: use defined size for vector components If we increase vector sizing later it would be nice to avoid tripped over this again. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-03 13:59:06 +10:00
Dave Airlie	eb8fefe090	nir: use proper array sizing define for vectors If we increase the vector size in the future it would be good to not have to fix these up, this should change nothing at present. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-03 13:59:06 +10:00
Timothy Arceri	d8ce915a61	Revert "nir: propagate known constant values into the if-then branch" This reverts commit `4218b6422c`. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110311	2019-04-03 13:24:18 +11:00
Timothy Arceri	4218b6422c	nir: propagate known constant values into the if-then branch Helps Max Waves / VGPR use in a bunch of Unigine Heaven shaders. shader-db results radeonsi (VEGA): Totals from affected shaders: SGPRS: 5505440 -> 5505872 (0.01 %) VGPRS: 3077520 -> 3077296 (-0.01 %) Spilled SGPRs: 39032 -> 39030 (-0.01 %) Spilled VGPRs: 16326 -> 16326 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 744 -> 744 (0.00 %) dwords per thread Code Size: 123755028 -> 123753316 (-0.00 %) bytes Compile Time: 2751028 -> 2560786 (-6.92 %) milliseconds LDS: 1415 -> 1415 (0.00 %) blocks Max Waves: 972192 -> 972240 (0.00 %) Wait states: 0 -> 0 (0.00 %) vkpipeline-db results RADV (VEGA): Totals from affected shaders: SGPRS: 160 -> 160 (0.00 %) VGPRS: 88 -> 88 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 18268 -> 18152 (-0.63 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 26 -> 26 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-04-03 10:04:48 +11:00
Lepton Wu	250fffac15	virgl: close drm fd when destroying virgl screen. This fd was create in virgl_drm_screen_create and should be closed in virgl_drm_screen_destroy. Signed-off-by: Lepton Wu <lepton@chromium.org> Reviewed-by: Chia-I Wu <olvaffe@gmail.com>	2019-04-02 15:29:47 -07:00
Rafael Antognolli	08c44b47a9	iris: Enable fast clears on gen8. Since we are now properly storing the clear color with SCS bits, we can now enable fast clears on gen8 too. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-02 15:26:48 -07:00
Rafael Antognolli	7339660e80	iris: Add aux.sampler_usages. We want to skip some types of aux usages (for instance, ISL_AUX_USAGE_HIZ when the hardware doesn't support it, or when we have multisampling) when sampling from the surface. Instead of checking for those cases while filling the surface state and leaving it blank, let's have a version of aux.possible_usages for sampling. This way we can also avoid allocating surface state for the cases we don't use. Fixes: `a8b5ea8ef0` "iris: Add function to update clear color in surface state." Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-02 15:26:45 -07:00
Rafael Antognolli	dfc5620a41	iris: Do not allocate clear_color_bo for gen8. Since we are not using it for the clear color, there's no need to allocate it. Fixes: `a8b5ea8ef0` "iris: Add function to update clear color in surface state." Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-02 15:26:41 -07:00
Rafael Antognolli	c26d8a887d	iris: Manually apply fast clear color channel overrides. At the fast clear time, the only swizzle we have available is actually the identity swizzle (which we use for most rendering). So the call to swizzle_color_value() becomes simply a no-op, and doesn't properly zero out the unused channels. We have to manually override those channels. Fixes: `a8b5ea8ef0` "iris: Add function to update clear color in surface state." Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-02 15:26:38 -07:00
Rafael Antognolli	2660667284	iris/gen8: Re-emit the SURFACE_STATE if the clear color changed. The swizzle for rendering surfaces is always identity. So when we are doing the fast clear, we don't have enough information to store the clear color OR'ed with the Shader Channel Select bits for the dword in the SURFACE_STATE. Instead of trying to patch up the SURFACE_STATE correctly later, by reading the color from the clear color state buffer and then doing all the operations to store it, let's just re-emit the whole SURFACE_STATE. That should make things way simpler on gen8, and we can still use the clear color state buffer for gen9+. Fixes: `a8b5ea8ef0` "iris: Add function to update clear color in surface state." Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-02 15:26:33 -07:00
Rafael Antognolli	6a02873687	iris: Only update clear color for gens 8 and 9. Newer gens can read it directly. Also properly skip updating the ISL_AUX_USAGE_NONE surface. Fixes: `a8b5ea8ef0` "iris: Add function to update clear color in surface state." Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-02 15:24:15 -07:00
Alexander von Gluck IV	5f467fe08e	haiku: Fix hgl dispatch build. Tested under meson/scons. Reviewed-by: Brian Paul <brianp@vmware.com>	2019-04-02 16:06:00 -05:00
Guido Günther	10b90570d1	docs: Fix 19.0.x version numbers The list has 19.0.2 twice. Signed-off-by: Guido Günther <agx@sigxcpu.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-04-02 09:12:47 -07:00
Marek Olšák	40b9eec8bd	docs/relnotes: document parallel_shader_compile changes in 19.1.0, not 19.0.0	2019-04-02 10:47:37 -04:00
Benjamin Tissoires	7f8a9a1fbb	CI: use wayland ci-templates repo to create the base image There shouldn't be a difference for users, but this way we do manage all of our containers from freedesktop.org note: compared to the provious Dockerfile, we need to manually add gcc, g++ and python*-wheel Signed-off-by: Benjamin Tissoires <benjamin.tissoires@gmail.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-04-02 13:41:05 +00:00
Marek Olšák	7be26976b8	radeonsi: don't use PFP_SYNC_ME with compute-only contexts Compute rings don't have PFP. Fixes: `a1378639ab` "radeonsi: always use compute rings for clover on CI and newer (v2)" Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Jan Vesely <jan.vesely@rutgers.edu> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2019-04-02 08:46:49 -04:00
Gert Wollny	1e5381f934	virgl: define MAX_VERTEX_STREAMS based on availability of TF3 Since with gles hosts we lie about the GLSL feature level it is better to set the number of streams based on actual hosts capabilities. v2: Make use of feature check level to avoid regressions. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-By: Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-02 11:28:09 +00:00
Gert Wollny	33d9b9436c	softpipe: Implement ATOMFADD and enable cap TGSI_ATOMFADD This enables the following piglits with PASS: nv_shader_atomic_float/execution/ shared-atomicadd-float shared-atomicexchange-float ssbo-atomicadd-float ssbo-atomicexchange-float v2: Minimize the patch by using type punning (Eric Anholt) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-04-02 09:58:16 +00:00
Erik Faye-Lund	4f153fcd5c	virgl: stricter usage of compressed 3d textures Using RGTC, ETC1, ETC2 or S3TC for 3D-textures isn't alowed by any of OpenGL 4.6, OpenGL ES 3.2, ARB_texture_compression_rgtc, EXT_texture_compression_rgtc, OES_compressed_ETC1_RGB8_texture, S3_s3tc or EXT_texture_compression_s3tc specifications. So let's not allow any of those compressed 3d-textures at all. It's not going to work once it hits the OpenGL driver in virglrenderer. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-02 07:48:46 +00:00
Erik Faye-Lund	f53001324f	virgl: do not allow compressed formats for buffers Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-04-02 07:48:45 +00:00
Eric Anholt	edc7deec42	dri3: Return the current swap interval from glXGetSwapIntervalMESA(). We were caching only the value set with glXSwapIntervalSGI(), missing out on the default setting of the swap interval by the loader. This fixes glxgears's warning about being vblank synchronized by default. Fixes: `9777c4234b` ("loader: drop the [gs]et_swap_interval callbacks") Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-01 16:06:38 -07:00
Anuj Phogat	82f6a746e8	intel: Add support for Comet Lake Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-04-01 14:07:40 -07:00
Chris Wilson	80e1ca9d28	iris: Adapt to variable ppGTT size Not all hardware is made equal and some does not have the full complement of 48b of address space. Ask what the actual size of virtual address space allocated for contexts, and bail if that is not enough to satisfy our static partitioning needs. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-01 10:01:02 -07:00
Samuel Pitoiset	c25f63872b	radv: partially enable VK_KHR_shader_float16_int8 Only 8-bit integers for now, float16 requires a bit more work. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-01 18:53:59 +02:00
Samuel Pitoiset	d099bc5829	ac: add 8-bit and 64-bit support to ac_build_bitfield_reverse() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-01 18:53:57 +02:00
Samuel Pitoiset	2cecf6c5cc	ac: add 8-bit support to ac_build_umsb() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-01 18:53:55 +02:00
Samuel Pitoiset	a45d9e3e8d	ac: add 8-bit support to ac_find_lsb() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-01 18:53:53 +02:00
Samuel Pitoiset	89cf8ca0ae	ac: add 8-bit support to ac_build_bit_count() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-01 18:53:52 +02:00
Samuel Pitoiset	869af0464a	ac/nir: add support for nir_op_b2i8 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-01 18:53:49 +02:00
Marek Olšák	b9d627e076	radeonsi: implement ARB/KHR_parallel_shader_compile callbacks	2019-04-01 12:37:52 -04:00
Marek Olšák	050fae3983	util/queue: add util_queue_adjust_num_threads for ARB_parallel_shader_compile Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-01 12:37:52 -04:00
Marek Olšák	b7317b6ce0	util/queue: hold a lock when reading num_threads in util_queue_finish Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-01 12:37:52 -04:00
Marek Olšák	bb111559f2	util/queue: add ability to kill a subset of threads for ARB_parallel_shader_compile	2019-04-01 12:37:52 -04:00
Marek Olšák	d99cdc9d59	util/queue: move thread creation into a separate function Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-04-01 12:37:52 -04:00
Marek Olšák	e871cbd625	gallium: implement ARB/KHR_parallel_shader_compile	2019-04-01 12:37:52 -04:00
Marek Olšák	c5c38e831e	mesa: implement ARB/KHR_parallel_shader_compile Tested by piglit.	2019-04-01 12:37:52 -04:00
Marek Olšák	3ad2a9b3fa	radeonsi: fix assertion failure by using the correct type src/gallium/drivers/radeonsi/si_state_viewport.c:196: si_emit_guardband: Assertion `vp_as_scissor.maxx <= max_viewport_size[vp_as_scissor.quant_mode] && vp_as_scissor.maxy <= max_viewport_size[vp_as_scissor.quant_mode]' failed. The comparison was unsigned, so negative maxx or maxy would fail. Fixes: `3c540e0a74` "radeonsi: Fix guardband computation for large render targets"	2019-04-01 12:21:20 -04:00
Leo Liu	d4e0fbc92f	radeon/vcn/vp9: search the render target from the whole list The number of render targets could be more than max of references, so we search the full list of the render pictures for the current render target index https://bugs.freedesktop.org/show_bug.cgi?id=109648 Signed-off-by: Leo Liu <leo.liu@amd.com> Tested-by: James Zhu <James.Zhu@amd.com> Acked-by: James Zhu<James.Zhu@amd.com> Cc: <mesa-stable@lists.freedesktop.org>	2019-04-01 08:59:38 -04:00
Rhys Perry	0af95f0ffc	radv: lower 16-bit flrp Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-04-01 09:58:48 +02:00
Samuel Pitoiset	4d5fce29c3	ac: fix ac_build_umsb() for 16-bit integer type Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-01 09:51:56 +02:00
Samuel Pitoiset	7a088d1ac8	ac: fix ac_find_lsb() for 16-bit integer type Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-01 09:51:54 +02:00
Samuel Pitoiset	b16dffff23	ac: fix ac_build_bitfield_reverse() for 16-bit integer type Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-01 09:51:52 +02:00
Samuel Pitoiset	9d13b9e53e	ac: fix ac_build_bit_count() for 16-bit integer type Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-01 09:51:49 +02:00
Samuel Pitoiset	e39a6b940f	ac/nir: fix nir_op_b2i16 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-04-01 09:51:47 +02:00
Eric Engestrom	aa7afe324c	meson: strip rpath from megadrivers More specifically, use the library file that has been post-processed by Meson when creating the hardlinks. Bugs: https://bugs.freedesktop.org/show_bug.cgi?id=108766 Fixes: `3218056e0e` "meson: Build i965 and dri stack" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-04-01 07:04:13 +00:00
Tapani Pälli	06f40f5765	spirv: fix a compiler warning Fixes implicit conversion from enumeration type 'SpvOp' warning. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-04-01 07:43:10 +03:00
Lionel Landwerlin	f0b472b301	i965: perf: update render basic configs for big core gen9/gen10 This updates allows an MI_LRI to trigger a OA report write in the global OA buffer. This isn't really useful for us, we just keep close to the internal public configs. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-01 00:59:31 +03:00
Lionel Landwerlin	052ace0c81	i965: perf: add ring busyness metric for cfl gt2 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-04-01 00:59:26 +03:00
Lionel Landwerlin	7e54857b4a	i965: perf: enable Icelake metrics Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-31 10:36:37 +01:00
Lionel Landwerlin	897efc2059	i965: perf: add Icelake metrics Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-31 10:36:37 +01:00
Lionel Landwerlin	b910e40956	i965: perf: sklgt2: drop programming of an unused NOA register Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-31 10:35:16 +01:00
Lionel Landwerlin	29ce64a77a	i965: perf: hsw: drop register programming not needed on HSW This register is flagged as IVB only in the documentation. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-31 10:35:16 +01:00
Lionel Landwerlin	46250d7dac	i965: perf: chv: fixup counters names Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-31 10:35:16 +01:00
Lionel Landwerlin	046041b2a0	i965: perf: add PMA stall metrics These are new metrics for Gen8/9 to measure the effect of the PMA stall workaround fix. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-31 10:35:16 +01:00
Lionel Landwerlin	dc9e598f3c	i965: perf: sklgt2: update memory write config This rework the programming between older pre-production steppings & new ones. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-31 10:35:16 +01:00
Lionel Landwerlin	0d618bb635	i965: perf: sklgt2: update compute metrics config This unifies some of the programming between pre-production stepping and production ones. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-31 10:35:16 +01:00
Lionel Landwerlin	4edaa6f003	i965: perf: sklgt2: update a priority for register programming This makes no difference in term of programming, it's just a cleanup. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-31 10:35:16 +01:00
Alyssa Rosenzweig	e4e6a3deaf	panfrost: Implement FIXED formats Fixes crash in dEQP-GLES2.functional.draw.random.9 Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-31 04:42:37 +00:00
Alyssa Rosenzweig	ed160a1160	panfrost: Fix index calculation types and asserts Fixes crash in dEQP-GLES2.functional.draw.draw_elements.points.single_attribute. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-31 04:42:22 +00:00
Alyssa Rosenzweig	0e4c321c15	panfrost: Clean index state between indexed draws Fixes subsequent tests in dEQP-GLES2.functional.draw.draw_elements.indices.* Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-31 04:41:54 +00:00
Alyssa Rosenzweig	4fcd3189ae	panfrost/decode: Print negative_start This property slipped through.. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-31 04:41:06 +00:00
Alyssa Rosenzweig	9237204400	panfrost: Implement missing texture formats - Implements RGB565/RGBA5551 formats - Don't advertise support for flipped RGBA5551 and ETC Fixes remaining tests in dEQP-GLES2.functional.texture.format.* which is now at 36/36. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-31 02:36:38 +00:00
Alyssa Rosenzweig	01fce794dc	panfrost: Extend tiling for cubemaps transfer_unmap now tiles for any tiled resource, not just TEXTURE_2D, which should more than just cubemaps! Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-31 02:36:38 +00:00
Alyssa Rosenzweig	c87f3ce97f	panfrost: Implement command stream for linear cubemaps Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-31 02:36:38 +00:00
Alyssa Rosenzweig	70b3e5db7d	panfrost/midgard: Emit cubemap coordinates Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-31 02:36:38 +00:00
Alyssa Rosenzweig	b5f02bdd99	panfrost: Include all cubemap faces in bitmap list Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-31 02:36:38 +00:00
Alyssa Rosenzweig	3197b30c6e	panfrost/decode: Decode all cubemap faces Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-31 02:36:38 +00:00
Alyssa Rosenzweig	e658f7225d	panfrost: Preliminary work for cubemaps Again, not yet functional, but this sets up the memory management for cube maps. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-31 02:36:37 +00:00
Alyssa Rosenzweig	499f31aab8	panfrost/midgard: Add L/S op for writing cubemap coordinates Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-31 02:36:37 +00:00
Alyssa Rosenzweig	f67616ce60	panfrost/midgard: Disassemble `cube` texture op Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-31 02:36:37 +00:00
Alyssa Rosenzweig	28b234a092	panfrost: Fix vertex buffer corruption Fixes crash in dEQP-GLES2.functional.buffer.* Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-31 02:36:37 +00:00
Rob Clark	b2d651b862	iris: fix set_sampler_view Update to match docs. Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-30 13:05:31 -04:00
Rob Clark	e167e8f8a2	gallium/docs: clarify set_sampler_views (v2) Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-30 13:04:00 -04:00
Rob Clark	7ff6705b8d	freedreno/ir3: convert to "new style" frag inputs Add support for load_barycentric_pixel, load_interpolated_input, and friends. For now, this retains support for old-style inputs, which can probably be dropped with some ttn work. Prep work for sample-shading support. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-30 12:56:01 -04:00
Rob Clark	fc865de777	freedreno/ir3: add pass to move varying loads Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-30 12:56:01 -04:00
Rob Clark	831f1a05c0	freedreno/ir3: rework varying packing Originally we kept track of a table of inputs. But with new-style frag inputs this becomes awkward. Re-work it so that initially we assigned un-packed varying locations, and then after the shader is compiled scan to find actual used inputs, and re-pack. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-30 12:56:01 -04:00
Rob Clark	91a1354cd6	freedreno/ir3: re-indent comment Make it more clear that it applies to the following 'case' statements, rather than the previous one. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-30 12:56:01 -04:00
Rob Clark	1ae0c030cb	nir: add lower_all_io_to_elements I need this part of lower_all_io_to_temps but without the actual lowering to temps part. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-30 12:56:01 -04:00
Rob Clark	e5e67228f5	nir: print var name for load_interpolated_input too Signed-off-by: Rob Clark <robdclark@gmail.com> Acked-by: Karol Herbst <kherbst@redhat.com>	2019-03-30 12:55:47 -04:00
Sergii Romantsov	72a921e12a	i965,iris/blorp: do not blit 0-sizes Seems there is no sense in blitting 0-sized sources or destinations. Additionaly it may cause segfaults for i965. v2: Function call replaced with inline check v3: Added check to avoid devision by zero (L. Landwerlin) v4: Added simillar check for Iris (L. Landwerlin) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110239 Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-30 11:50:40 +00:00
Vinson Lee	e757a2481f	gallium: Fix autotools build with libxatracker.la. CXXLD libxatracker.la /usr/bin/ld: ../../../../src/gallium/auxiliary/.libs/libgallium.a(tgsi_to_nir.o): in function `ttn_finalize_nir': src/gallium/auxiliary/nir/tgsi_to_nir.c:2111: undefined reference to `gl_nir_lower_samplers_as_deref' /usr/bin/ld: src/gallium/auxiliary/nir/tgsi_to_nir.c:2113: undefined reference to `gl_nir_lower_samplers' Fixes: `9a834447d6` ("tgsi_to_nir: Produce optimized NIR for a given pipe_screen.") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109929 Signed-off-by: Vinson Lee <vlee@freedesktop.org>	2019-03-29 23:24:05 -07:00
Timur Kristóf	356ec7a219	gallium: fix autotools build of pipe_msm.la Signed-off-by: Vinson Lee <vlee@freedesktop.org> Fixes: `9a834447d6` ("tgsi_to_nir: Produce optimized NIR for a given pipe_screen.") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109929	2019-03-29 23:12:40 -07:00
Jason Ekstrand	7dbd934e26	nir: Lock around validation fail shader dumping This prevents getting mixed-up results if a multi-threaded app has two validation errors in different threads. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-29 21:57:51 -05:00
Brian Paul	b8e077daee	util: no-op __builtin_types_compatible_p() for non-GCC compilers __builtin_types_compatible_p() is GCC-specific and breaks the MSVC build. This intrinsic has been in u_vector_foreach() for a long time, but that macro has only recently been used in code (nir/nir_opt_comparison_pre.c) that's built with MSVC. Fixes: `2cf59861a` ("nir: Add partial redundancy elimination for compares") Reviewed-by: José Fonseca <jfonseca@vmware.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-29 15:33:43 -06:00
Caio Marcelo de Oliveira Filho	3b20ca34ae	iris: Clean up compiler warnings about unused Removed a few unused variables and iris_getparam_boolean(). Kept 'name' around since there's a commented debug that make use of it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-29 12:07:26 -07:00
Eric Engestrom	8d9c2044a4	egl: hide entrypoints that shouldn't be exported when using glvnd From GLVND author: > From a functional standpoint, exporting additional symbols doesn't > really matter, since libglvnd will load the vendor libraries with > RTLD_LOCAL. Suggested-by: Kyle Brenneman <kbrenneman@nvidia.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Kyle Brenneman <kbrenneman@nvidia.com>	2019-03-29 16:54:08 +00:00
Karol Herbst	fea0caea2b	nir/validate: validate that tex deref sources are actually derefs Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-29 16:03:22 +01:00
Karol Herbst	6ffc72472c	nir/print: fix printing the image_array intrinsic index Fixes: `0de003be03` ("nir: Add handle/index-based image intrinsics") Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-29 16:03:22 +01:00
Timothy Arceri	4478c5374b	Revert "ac/nir: use new LLVM 8 intrinsics for SSBO atomic operations" This reverts commit `29132af234`. It seems the new intrinsic causes a hang on radeonsi (VEGA) when running the piglit test: tests/spec/arb_shader_storage_buffer_object/execution/ssbo-atomicCompSwap-int.shader_test	2019-03-29 21:04:01 +11:00
Samuel Pitoiset	cc752dea61	ac: fix return type for llvm.amdgcn.frexp.exp.i32.64 This fixes the following piglit with RadeonSI tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-frexp-dvec4.shader_test Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-29 09:18:24 +01:00
Gert Wollny	a0edceb00d	virgl: Add a caps feature check version When we add new feature checks on the host side that is used to enable a cap conditionally that was enabled unconditionally before we might end up with a feature regression when a new mesa version is used with an old virglrenderer version that doesn't check for that cap. To work around this problem add a version id to the caps that corresponds to the features that are actually checked on the host and check that version too when enabling the cap. Fixes: `2ee197d6e8` virgl: Enable mixed color FBO attachemnets only when the host supports it Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Pohsien Wang <pwang@chromium.org>	2019-03-29 07:55:31 +00:00
Samuel Pitoiset	62a9d757e6	radv: do not always initialize HTILE in compressed state Especially when performing a transtion from UNDEFINED->GENERAL, the driver shouldn't initialize HTILE metadata in compressed state because it doesn't decompress when the src layout is GENERAL. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110259 Fixes: `3a2e93147f` ("radv: always initialize HTILE when the src layout is UNDEFINED") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-29 08:28:18 +01:00
Kenneth Graunke	3fee3d1319	iris: Print the memzone name when allocating BOs with INTEL_DEBUG=buf This gives me an idea of what kinds of buffers are being allocated on the fly which could help inform our cache decisions.	2019-03-28 23:37:32 -07:00
Brian Paul	4ee057eaef	nir: use {0} initializer instead of {} to fix MSVC build Trivial change. Fixes: `c6ee46a75` ("nir: Add nir_alu_srcs_negative_equal")	2019-03-28 20:34:23 -06:00
Ian Romanick	7832fb7889	intel/compiler: Use partial redundancy elimination for compares Almost all of the hurt shaders are repeated instances of the same shader in synmark's compilation speed tests. shader-db results: All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15256840 -> 15256389 (<.01%) instructions in affected programs: 54137 -> 53686 (-0.83%) helped: 288 HURT: 0 helped stats (abs) min: 1 max: 15 x̄: 1.57 x̃: 1 helped stats (rel) min: 0.06% max: 26.67% x̄: 1.99% x̃: 0.74% 95% mean confidence interval for instructions value: -1.76 -1.38 95% mean confidence interval for instructions %-change: -2.47% -1.50% Instructions are helped. total cycles in shared programs: 372286583 -> 372283851 (<.01%) cycles in affected programs: 833829 -> 831097 (-0.33%) helped: 265 HURT: 16 helped stats (abs) min: 2 max: 74 x̄: 11.81 x̃: 4 helped stats (rel) min: 0.04% max: 9.07% x̄: 0.99% x̃: 0.35% HURT stats (abs) min: 2 max: 130 x̄: 24.88 x̃: 8 HURT stats (rel) min: <.01% max: 12.31% x̄: 1.44% x̃: 0.27% 95% mean confidence interval for cycles value: -12.30 -7.15 95% mean confidence interval for cycles %-change: -1.06% -0.64% Cycles are helped. Iron Lake and GM45 had similar results. (GM45 shown) total instructions in shared programs: 5038653 -> 5038495 (<.01%) instructions in affected programs: 13939 -> 13781 (-1.13%) helped: 50 HURT: 1 helped stats (abs) min: 1 max: 15 x̄: 3.18 x̃: 4 helped stats (rel) min: 0.33% max: 13.33% x̄: 2.24% x̃: 1.09% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.83% max: 0.83% x̄: 0.83% x̃: 0.83% 95% mean confidence interval for instructions value: -3.73 -2.47 95% mean confidence interval for instructions %-change: -3.16% -1.21% Instructions are helped. total cycles in shared programs: 128118922 -> 128118228 (<.01%) cycles in affected programs: 134906 -> 134212 (-0.51%) helped: 50 HURT: 0 helped stats (abs) min: 2 max: 60 x̄: 13.88 x̃: 18 helped stats (rel) min: 0.06% max: 3.19% x̄: 0.74% x̃: 0.70% 95% mean confidence interval for cycles value: -16.54 -11.22 95% mean confidence interval for cycles %-change: -0.95% -0.53% Cycles are helped. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-28 15:35:53 -07:00
Ian Romanick	2cf59861a8	nir: Add partial redundancy elimination for compares This pass attempts to dectect code sequences like if (x < y) { z = y - x; ... } and replace them with sequences like t = x - y; if (t < 0) { z = -t; ... } On architectures where the subtract can generate the flags used by the if-statement, this saves an instruction. It's also possible that moving an instruction out of the if-statement will allow nir_opt_peephole_select to convert the whole thing to a bcsel. Currently only floating point compares and adds are supported. Adding support for integer will be a challenge due to integer overflow. There are a couple possible solutions, but they may not apply to all architectures. v2: Fix a typo in the commit message and a couple typos in comments. Fix possible NULL pointer deref from result of push_block(). Add missing (-A + B) case. Suggested by Caio. v3: Fix is_not_const_zero to work correctly with types other than nir_type_float32. Suggested by Ken. v4: Add some comments explaining how this works. Suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-28 15:35:53 -07:00
Ian Romanick	c6ee46a753	nir: Add nir_alu_srcs_negative_equal v2: Move bug fix in get_neg_instr from the next patch to this patch (where it was intended to be in the first place). Noticed by Caio. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-28 15:35:52 -07:00
Ian Romanick	be1cc3552b	nir: Add nir_const_value_negative_equal v2: Rebase on 1-bit Boolean changes. Reviewed-by: Thomas Helland <thomashelland90@gmail.com> [v1] Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-28 15:35:52 -07:00
Ian Romanick	ae21b52e1d	nir/algebraic: Add missing 16-bit extract_[iu]8 patterns No shader-db changes on any Intel platform. v2: Use a loop to generate patterns. Suggested by Jason. v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would replace an extract_i8 with and extract_u8. This broke ~180 tests. This bug was introduced in v2. Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2] Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]	2019-03-28 15:35:52 -07:00
Ian Romanick	cbad201c2b	nir/algebraic: Add missing 64-bit extract_[iu]8 patterns No shader-db changes on any Intel platform. v2: Use a loop to generate patterns. Suggested by Jason. v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would replace an extract_i8 with and extract_u8. This broke ~180 tests. This bug was introduced in v2. Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2] Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]	2019-03-28 15:35:52 -07:00
Ian Romanick	bc17f5a2a3	nir/algebraic: Remove redundant extract_[iu]8 patterns No shader-db changes on any Intel platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-28 15:35:52 -07:00
Ian Romanick	c152672e68	nir/algebraic: Fix up extract_[iu]8 after loop unrolling Skylake, Broadwell, and Haswell had similar results. (Skylake shown) total instructions in shared programs: 15256840 -> 15256837 (<.01%) instructions in affected programs: 4713 -> 4710 (-0.06%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.06% max: 0.08% x̄: 0.06% x̃: 0.06% total cycles in shared programs: 372286583 -> 372286583 (0.00%) cycles in affected programs: 198516 -> 198516 (0.00%) helped: 1 HURT: 1 helped stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 HURT stats (rel) min: 0.01% max: 0.01% x̄: 0.01% x̃: 0.01% No changes on any other Intel platform. v2: Use a loop to generate patterns. Suggested by Jason. v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would replace an extract_i8 with and extract_u8. This broke ~180 tests. This bug was introduced in v2. Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2] Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]	2019-03-28 15:35:52 -07:00
Dave Airlie	b779baa9bf	nir/deref: fix struct wrapper casts. (v3) llvm/spir-v spits out some struct a { struct b {} }, but it doesn't deref, it casts (struct a) to (struct b), reconstruct struct derefs instead of casts for these. v2: use ssa_def_rewrite uses, rework the type restrictions (Jason) v3: squish more stuff into one function, drop unused temp (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-29 08:10:50 +10:00
Rafael Antognolli	8e0469f629	i965/blorp: Remove unused parameter from blorp_surf_for_miptree. It seems pretty useless nowadays. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-28 14:38:23 -07:00
Anuj Phogat	9c421d6b47	iris/icl: Add WA_2204188704 to disable pixel shader panic dispatch Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-28 19:59:59 +00:00
Anuj Phogat	e0f4359ec1	iris/icl: Set Enabled Texel Offset Precision Fix bit h/w specification requires this bit to be always set. See Mesa commit `5eb173304b`. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-28 19:59:59 +00:00
Rob Clark	78825ca2d0	freedreno/ir3: align const size to vec4 This is no longer true since PIPE_CAP_PACKED_UNIFORMS was enabled. Fixes: `3c8779af32` freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-28 14:36:24 -04:00
Rob Clark	26e2906382	freedreno/ir3: reads/writes to unrelated arrays are not dependent Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-28 14:36:24 -04:00
Rob Clark	d71ce69d9c	freedreno/ir3: sched fix Not sure why new-style frag inputs start triggering this. But we probably shouldn't consider src's from other blocks. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-28 14:36:24 -04:00
Rob Clark	c557fcaf2b	freedreno/a6xx: small cleanup Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-28 14:36:23 -04:00
Kenneth Graunke	ee8370c766	iris: Fix blits with S8_UINT destination For depth and stencil blits, we always want the main mask to be Z, and the secondary pass mask to be S. If asked to blit Z+S to S, we should handle the blit in the second pass which properly gets the stencil resources. Before, we were trying to handle S as the main mask, and accidentally blitting a Z source to a S destination, which doesn't work out well. Fixes Piglit's "framebuffer-blit-levels {draw,read} stencil" tests.	2019-03-28 10:47:26 -07:00
Kenneth Graunke	ce89c19b88	st/mesa: Fix blitting from GL_DEPTH_STENCIL to GL_STENCIL_INDEX Fixes assertion failures in Piglit's "framebuffer-blit-levels {draw,read} stencil" tests on iris. Also fixes assert failures in frameretrace, which tries to ReadPixels the stencil values (only) from a Z24S8 depth/stencil attachment. Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-03-28 10:47:23 -07:00
Kristian H. Kristensen	107a8ec3b3	freedreno/ir3: Add workaround for VS samgq This instruction needs a workaround when used from vertex shaders. Fixes: dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.sampler2dshadow_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.sampler3d_fixed_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.sampler3d_float_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgradoffset.sampler2dshadow_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgradoffset.sampler3d_fixed_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgradoffset.sampler3d_float_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.sampler2dshadow_vertex Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-03-28 10:26:32 -07:00
Kristian H. Kristensen	f30d4a1cca	freedreno/ir3: Don't access beyond available regs emit_cat5() needs to check if the last optional reg is there before it accesses it. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-03-28 10:26:32 -07:00
Eric Engestrom	7fefa4610d	util/disk_cache: close fd in the fallback path There are multiple `goto path_fail` with an open fd, but none that go to `fail:` without going through `path_fail:` first, so let's just move the `close(fd)` there. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-28 16:41:27 +00:00
Samuel Pitoiset	6596eb2b30	radv: skip updating depth/color metadata for conditional rendering I don't think we should update metadata when conditional rendering is enabled. For some reasons, some CTS breaks only on SI. This fixes the following CTS on SI: dEQP-VK.conditional_rendering.draw_clear.clear.depth.* Cc: 19.0 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-28 17:37:12 +01:00
Kenneth Graunke	1d72de3bcc	st/nir: Free the GLSL IR after linking. i965 does this, and st's tgsi path does this. st/nir did not. Cuts 138MB of memory from a DiRT Rally trace, which is about 44% of the total GLSL IR memory. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-28 09:31:12 -07:00
Samuel Pitoiset	227b191206	radv: enable VK_AMD_gpu_shader_int16 This extension allows 16-bit support to Frexp/FrexpStruct. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-28 13:02:53 +01:00
Samuel Pitoiset	8a6e61cc52	radv: do not lower frexp_exp and frexp_sig Hardware has two instructions. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-28 13:02:51 +01:00
Samuel Pitoiset	52c02d921f	ac: add ac_build_frex_exp() helper ans 16-bit/32-bit support Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-28 13:02:48 +01:00
Samuel Pitoiset	1bf9311c59	ac: add ac_build_frexp_mant() helper and 16-bit/32-bit support Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-28 13:02:46 +01:00
Kenneth Graunke	de783a6897	iris: Actually advertise some modifiers I neglected to fill out this driver function, causing us to advertise 0 modifiers. Now we advertise the various tilings and let the driver pick them. I've verified that X tiling works with Weston (by hacking the list to skip Y tiling). Y+CCS doesn't work yet because it's multiplane and the Gallium dri state tracker isn't really prepared for that. Leave it off for now.	2019-03-27 21:27:54 -07:00
Toni Lönnberg	505854f84b	intel/genxml: Media instructions and structures for gen11 v2: Lionel Landwerlin <lionel.g.landwerlin@intel.com> - fix missing type - fix _FQM_/_QM_ commands - shorten some media structs using groups - factor out memory attributes - switch MI_FLUSH_DW fields to bool Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-28 04:26:30 +00:00
Toni Lönnberg	4dccf2edef	intel/genxml: Media instructions and structures for gen10 v2: Lionel Landwerlin <lionel.g.landwerlin@intel.com> - fix missing type - fix _FQM_/_QM_ commands - shorten some media structs using groups - factor out memory attributes - switch MI_FLUSH_DW fields to bool Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-28 04:26:30 +00:00
Toni Lönnberg	8e74cacdad	intel/genxml: Media instructions and structures for gen9 v2: Lionel Landwerlin <lionel.g.landwerlin@intel.com> - fix missing type - fix _FQM_/_QM_ commands - shorten some media structs using groups - factor out memory attributes - switch MI_FLUSH_DW fields to bool Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-28 04:26:30 +00:00
Toni Lönnberg	2f075c5ccc	intel/genxml: Media instructions and structures for gen8 v2: Lionel Landwerlin <lionel.g.landwerlin@intel.com> - switch MI_FLUSH_DW fields to bool Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-28 04:26:30 +00:00
Toni Lönnberg	2bf89a05f4	intel/genxml: Media instructions and structures for gen7.5 v2: Fixed MI_WAIT_FOR_EVENT to be for video also Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-28 04:26:30 +00:00
Toni Lönnberg	416e1567ee	intel/genxml: Media instructions and structures for gen7 v2: Fixed MI_WAIT_FOR_EVENT to be for blitter and video also Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-28 04:26:30 +00:00
Toni Lönnberg	9e6ffe3741	intel/genxml: Media instructions and structures for gen6 v2: Fixed MI_WAIT_FOR_EVENT to be for blitter and video also Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-28 04:26:30 +00:00
Toni Lönnberg	b6f7b40d81	intel/genxml: Only handle instructions meant for render engine when generating headers v2: Fixed the check for engine v3: Changed engine into an argument given to the scripts Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-28 04:26:30 +00:00
Dave Airlie	ce6faa57ae	softpipe: add indirect store buffer/image unit The code to handle image unit indirect was missing Fixes piglit tests/spec/arb_arrays_of_arrays/execution/image_store/basic-imageStore-mixed-const-non-const-uniform-index.shader_test Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-03-28 14:13:08 +10:00
Dave Airlie	9f9d9c948d	softpipe/draw: fix vertex id in soft paths. This fixes the vertex id fetch in the non-llvm drawing paths. This vertex id in elt mode comes from the elts not just a linear value. Note we don't bad basevertex in the elts case as it's already included in the elts by the looks of it (at least tests fail if I add it) Fixes piglit end-primitive tests and some others. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-03-28 14:13:08 +10:00
Kristian H. Kristensen	893425a607	freedreno/ir3: Push UBOs to constant file We have a rather big constant file and it seems that the best way to use it is to upload all UBOs and lower UBO access the load_uniform. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-03-27 13:26:02 -07:00
Kristian H. Kristensen	3c8779af32	freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS This commit turns on the gallium cap and adds a pass to lower the load_ubo intrinsics for block 0 back to load_uniform intrinsics and adjust the backend where the cap switches units from vec4s to dwords. As we stop using ir3_glsl_type_size() for uniform layout, this also corrects an issue where we would allocate a vec4 slot for samplers in uniforms, fixing: dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_fragment dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_vertex dEQP-GLES3.functional.shaders.struct.uniform.sampler_nested_fragment dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_vertex dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_fragment Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-03-27 13:26:02 -07:00
Kristian H. Kristensen	56b4bc292f	st/glsl_to_nir: Calculate num_uniforms from NumParameterValues We don't need to determine the number of uniform slots here, it's already available as prog->Parameters->NumParameterValues. The way we previously determined the number of slots was also broken for PackedDriverUniformStorage, where we would add loc (in dwords) and type_size() (in vec4s). Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-03-27 13:26:02 -07:00
Anuj Phogat	dce13e58b0	intel: Add Elkhart Lake PCI-IDs Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-27 19:34:48 +00:00
Anuj Phogat	a583f86305	intel: Add Elkhart Lake device info V2: Fix L3 bank count (Vivek) Fix simulator_id and num_eu_per_subslice (Lionel) Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-27 19:34:48 +00:00
Leo Liu	f8ef8b56a6	radeon/vcn: add H.264 constrained baseline support VCN supports this profile as well as UVD, so add it Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> CC: <mesa-stable@lists.freedesktop.org>	2019-03-27 14:33:55 -04:00
Gurchetan Singh	ac839bbf79	egl/android: chose node type based on swrast and preprocessor flags kms_swrast can work with primary nodes out of the box, but also with rendernodes if the build environment specifies the EGL_FORCE_RENDERNODE flag. Suggested-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-03-27 17:26:21 +00:00
Gurchetan Singh	a87096b79e	egl/android: use software rendering when appropriate Now the init logic fallbacks to or forces software rendering. v2: simplify flow (@eric) Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-27 17:26:21 +00:00
Gurchetan Singh	d4e7982b6e	egl/android: use swrast option in droid_load_driver Load the kms_swrast driver when specified. Doesn't work with drm_gralloc. v2: remove unneeded line (@eric) v3: Remove swrast_loader_extensions (@evelikov) Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-27 17:26:21 +00:00
Gurchetan Singh	f90fc102ed	egl/android: plumb swrast option It's good to have options. Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-27 17:26:21 +00:00
Gurchetan Singh	7d9719db83	egl/android: refactor droid_load_driver a bit This way, we can use primary nodes with kms_swrast too. Also fix up some whitespace issues. Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-27 17:26:21 +00:00
Gurchetan Singh	f1dd1be0c2	egl/android: droid_open_device_drm_gralloc --> droid_open_device Makes things easier to follow. Suggested-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-27 17:26:21 +00:00
Gurchetan Singh	95ad1744c1	egl/android: move droid_open_device_drm_gralloc down a bit 1) Removes a forward declaration. 2) Makes next patch easier. Suggested-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-27 17:26:21 +00:00
Gurchetan Singh	49d52539fb	egl/android: move droid_image_loader_extension down a bit This removes some #ifdefs. Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-27 17:26:21 +00:00
Dylan Baker	15f131b7b7	docs: update calendar, add news item and link release notes for 19.0.1	2019-03-27 10:14:50 -07:00
Dylan Baker	3f1a79989d	docs: Add SHA256 sums for mesa 19.0.1	2019-03-27 10:14:50 -07:00
Dylan Baker	fcf8be8a8a	docs: Add release notes for 19.0.1	2019-03-27 10:14:47 -07:00
Jason Ekstrand	ce47999cee	Revert "anv/radv: release memory allocated by glsl types during spirv_to_nir" This reverts commit `4e1bbb000c`. It turns out that some DXVK apps due to some implementation detail of DXVK or other create and destroy instances in an interleaved way. Freeing the glsl_type memory without being a bit more careful causes use-after-free issues. Looks like we need to try again.	2019-03-27 11:24:58 -05:00
Tomeu Vizoso	b817d00278	panfrost: Wait for last job to finish in force_flush_fragment Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-27 17:03:34 +01:00
Tomeu Vizoso	53ab812230	panfrost: Pass the context BOs to the kernel so they aren't unmapped while in use Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-27 17:03:34 +01:00
Tomeu Vizoso	b0f67c066f	panfrost: Also tell the kernel about the checksum_slab Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-27 17:03:34 +01:00
Tomeu Vizoso	95748f6483	panfrost: Set the GEM handle for AFBC buffers Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-27 17:03:34 +01:00
Tomeu Vizoso	02081edfaf	panfrost: Fix sscanf format options Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-27 17:03:34 +01:00
Alexandros Frantzis	3bccf70211	virgl: Fake MSAA when max samples is 1 When the host is running on softpipe/llvmpipe the maximum number of samples for multisampling is 1. GL 3.0 requires at least 4 samples, and softpipe/llvmpipe get around this by enabling PIPE_CAP_FAKE_SW_MSAA. This patch mimics softpipe/llvmpipe behavior in virgl by enabling the same PIPE_CAP_FAKE_SW_MSAA workaround when the max sample count reported by the host is 1. This change allows virgl on a softpipe/llvmpipe host to advertise support for GL 3.0 and beyond. Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com> Reviewed-By: Gert Wollny <gert.wollny@collabora.com>	2019-03-27 15:46:14 +02:00
Samuel Pitoiset	d6a07732c9	ac: use llvm.amdgcn.fmed3 intrinsic for nir_op_fmed3 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-27 14:45:52 +01:00
Michel Dänzer	6140ed3d2c	gitlab-ci: Automatically retry jobs after runner system failure Up to twice, for a total of 3 attempts maximum. This will hopefully avoid spurious CI pipeline failures due to intermittent GitLab/docker infrastructure issues. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-27 10:05:43 +01:00
Michel Dänzer	a3f34f9d85	gitlab-ci: Only pull/push cache contents in build+test stage jobs The containers-build stage job doesn't use the cache, so this might save some wasted time for it. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-27 10:05:43 +01:00
Michel Dänzer	1aca01dcf1	gitlab-ci: Make sure clang job actually uses ccache Meson didn't automatically pick up ccache in this job for some reason. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-27 10:05:43 +01:00
Samuel Pitoiset	bea540173c	spirv: propagate the access flag for store and load derefs It was only propagated when UBO/SSBO access are lowered to offsets. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: <Jason Ekstrand jason@jlekstrand.net>	2019-03-27 09:57:30 +01:00
Samuel Pitoiset	4d0b03c83d	nir: add nir_{load,store}_deref_with_access() helpers Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: <Jason Ekstrand jason@jlekstrand.net>	2019-03-27 09:57:27 +01:00
Timothy Arceri	d163780f81	spirv: make use of the select control support in nir Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108841	2019-03-27 02:39:12 +00:00
Timothy Arceri	e76ae39ae2	nir: add support for user defined select control This will allow us to make use of the selection control support in spirv and the GL support provided by EXT_control_flow_attributes. Note this only supports if-statements as we dont support switches in NIR. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108841	2019-03-27 02:39:12 +00:00
Timothy Arceri	24037ff228	spirv: make use of the loop control support in nir Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108841	2019-03-27 02:39:12 +00:00
Timothy Arceri	b56451f82c	nir: add support for user defined loop control This will allow us to make use of the loop control support in spirv and the GL support provided by EXT_control_flow_attributes. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108841	2019-03-27 02:39:12 +00:00
Alyssa Rosenzweig	6170814c42	panfrost: Preliminary work for mipmaps This patch refactors a substantial amount of code in preparation for mipmaps. In particular, we know have a correct slice abstraction based on offsets; cpu/gpu are no longer arbitrary pointers. We additionally shuffle around other code to accompany these changes and cleanup how tiled textures are handled, while drawing some attention to the blit code. Mipmaps are still disabled at this point, as autogeneration is not yet implemented; enabling as-is would cause regressions. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-27 02:11:24 +00:00
Alyssa Rosenzweig	04a72391f3	panfrost/midgard: fpow is a two-part operation In fact, the native "fpow" instruction only does half of it; more work is needed for the actual instruction. For now, just lower. Fixes: `1ea42894c` ("panfrost/midgard: Implement fpow") Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-26 23:36:09 +00:00
Alyssa Rosenzweig	12d1d99fee	panfrost/midgard: Handle i2b constant Fixes dEQP-GLES2.functional.shaders.conversions.scalar_to_scalar.int_to_bool_fragment Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-26 23:36:09 +00:00
Alyssa Rosenzweig	7b78af8e00	panfrost/midgard: Expand fge lowering to more types Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-26 23:36:09 +00:00
Alyssa Rosenzweig	b8739c24ee	panfrost/midgard: Add ult/ule ops Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-26 23:36:09 +00:00
Alyssa Rosenzweig	f277bd3c22	panfrost: Stub out ES3 caps/callbacks Although this is not functional (and the command stream side is not aiming for ES3 right now), this is enough to run dEQP-GLES3 shader tests with the version override directive; this is useful, as some ES3 shader feature can occur in ES2 class shaders due to lowering. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-26 23:35:58 +00:00
Alyssa Rosenzweig	89989e653e	panfrost/midgard: Cleanup midgard_nir_algebraic.py Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-26 23:35:37 +00:00
Alyssa Rosenzweig	effe6fb08d	panfrost/midgard: Lower source modifiers for ints On Midgard, float ops support standard source modifiers (abs/neg) and destination modifiers (sat/pos/round). Integer ops do not support these, however. To cope, we use native NIR source modifiers for floats, but lower them away to iabs/ineg for integers, implementing those ops simultaneously to avoid regressions. Fixes the integer tests in dEQP-GLES2.functional.shaders.operator.unary_operator.minus.* Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-26 23:35:36 +00:00
Alyssa Rosenzweig	3208c9d9a2	panfrost/midgard: Implement b2i; improve b2f/f2b Fixes dEQP-GLES2.functional.shaders.conversions.scalar_to_scalar.bool_to_int_fragment Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-26 23:35:27 +00:00
Alyssa Rosenzweig	5b95fef493	panfrost/midgard: Lower i2b32 Fixes dEQP-GLES2.functional.shader.conversions.scalar_to_scalar.int_to_bool_vertex Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-26 23:35:26 +00:00
Alyssa Rosenzweig	ae43b8faa7	panfrost/midgard: Lower f2b32 to fne Fixes dEQP-GLES2.functional.shaders.swizzles.vector_swizzles.mediump_bvec2_x_vertex Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-26 23:35:24 +00:00
Alyssa Rosenzweig	3fb884259b	panfrost/midgard: Lower bool_to_int32 Fixes dEQP-GLES2.functional.shaders.linkage.varying_type_vec2 (among many others). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-26 23:35:22 +00:00
Alyssa Rosenzweig	53664108c2	panfrost/midgard: Map more bany/ball opcodes Some of these are not yet fully functional due to related bugs, but this the correct op mapping. The native ball/bany opcodes act on vec4's unconditionally. That said, both ball and bany have the nice property that duplicating an argument does not affect their output, so the default "hanging swizzles" allow us to implement 2/3-component opcodes correctly, implicitly lowering. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-26 23:35:20 +00:00
Alyssa Rosenzweig	88b2a6b451	panfrost/midgard: Add more ball/bany, iabs ops Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-26 23:35:18 +00:00
Alyssa Rosenzweig	72cd677bac	panfrost/midgard: Schedule ball/bany to vectors Though they output scalars, they need a vector unit to make sense. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-26 23:35:17 +00:00
Alyssa Rosenzweig	89fdbb6707	panfrost/midgard: Add fcsel_i opcode Whereas a normal fcsel acts on a boolean input in r31.w, the fcsel_i variant acts on an integer input in r31.w, which can be preloaded with an instruction like imov (with the appropriate negate flag on the source). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-26 23:35:15 +00:00
Alyssa Rosenzweig	121417ef1d	panfrost: Implement scissor test This preliminary implementation should handle some basic cases. Future work should scissor the FRAGMENT job as well for efficiency. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-26 23:35:14 +00:00
Alyssa Rosenzweig	bd9446e719	panfrost: Fix viewports Our viewport code hardcoded a number of wrong assumptions, which sort of sometimes worked but was definitely wrong (and broke most of dEQP). This corrects the logic, accounting for flipped-Y framebuffers, which fixes... most of dEQP. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-26 23:35:10 +00:00
Alyssa Rosenzweig	9da4603fb6	panfrost/midgard: Fix b2f32 swizzle for vectors Fixes issues in most of dEQP-GLES2.functional.shaders.* Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-26 23:35:08 +00:00
Dave Airlie	e77013fb7f	softpipe: fix clears to only clear specified color buffers. This fixes piglit clearbuffer-mixed-format Reviewed-by: Brian Paul <brianp@vmware.com>	2019-03-27 07:53:32 +10:00
Dave Airlie	7f7c9425a8	draw/vs: partly fix basevertex/vertex id This gets the basevertex from the draw depending on whether it's an indexed or non-indexed draw. We still fail a transform feedback test for vertex id, as the vertex id actually an index id, and isn't getting translated properly to a vertex id, suggestions on how/where to fix that welcome. Reviewed-by: Brian Paul <brianp@vmware.com>	2019-03-27 07:52:28 +10:00
Nicolai Hähnle	e16ac33f37	amd/surface: provide firstMipIdInTail for metadata surface calculations This field was added in a recent addrlib update, and while there currently seems to be no issue with skipping it, we will have to set it correctly in the future. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-03-26 10:00:55 +01:00
Bas Nieuwenhuizen	82075e3c42	ac/nir: Return frag_coord as integer. To preserve the invariant that nir ssa defs are integers or pointers in LLVM. CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-03-26 09:41:15 +01:00
Kristian H. Kristensen	c7c432738a	freedreno/ir3: Fix operand order for DSX/DSY Most cat5 instructions are constructed using ir3_SAM, which uses regs[1] for the (sampler, tex) src. Not DSX/DSY though, so we look up src1 and src2 differently for those two. Fixes: `1dffb089` ("freedreno/ir3: fix sam.s2en encoding") Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-03-25 18:36:48 -07:00
Kristian H. Kristensen	a752422bd4	freedreno/ir3: Track whether shader needs derivatives In `1088b788` ("freedreno/ir3: find # of samplers from uniform vars") we started counting number of samplers based on the uniform vars instead of number of cat5 instructions. We used the number of samplers to determine whether to enable derivatives, but when we only use derivatives and no samplers, that now breaks. Track whether we need derivatives explicitly and use that to enable the state. Fixes: `1088b788` ("freedreno/ir3: find # of samplers from uniform vars") Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-03-25 18:36:48 -07:00
Andre Heider	12f11e6fe6	st/nine: enable csmt per default on iris iris is thread safe, enable csmt for a ~5% performace boost. Signed-off-by: Andre Heider <a.heider@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Axel Davy <davyaxel0@gmail.com>	2019-03-25 22:21:19 +01:00
Jason Ekstrand	8ed583fe52	spirv: Handle the NonUniformEXT decoration	2019-03-25 16:12:09 -05:00
Jason Ekstrand	e50ab2c0f2	nir: Add access flags to deref and SSBO atomics We will need them for a new ACCESS_NON_UNIFORM flag that's about to be added in the next commit. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-25 16:12:09 -05:00
Jason Ekstrand	40074ebf74	nir: Add texture sources and intrinsics for bindless On Intel, we have both bindless and bindful and we'd like to use them at the same time if we can so we need to be able to distinguish at the NIR level between the two. This also fixes nir_lower_tex to properly handle bindless in its tex_texture_size and get_texture_lod helpers. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-25 16:12:09 -05:00
Danylo Piliaiev	e0db0c74b9	intel/fs: Make alpha test work with MRT and sample mask Fix the order of src0_alpha and sample mask in fb payload. From SKL PRM Volume 7, "Data Payload Register Order for Render Target Write Messages": Type S0A oM sZ oS M2 M3 M4 SIMD8 1 1 0 0 s0A oM R SIMD16 1 1 0 0 1/0s0A 3/2s0A oM It also fixes working of alpha to coverage with sample mask on GEN6 since now they are in correct order. Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Signed-off-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-03-25 13:54:55 -07:00
Danylo Piliaiev	c8abe03f3b	i965,iris,anv: Make alpha to coverage work with sample mask From "Alpha Coverage" section of SKL PRM Volume 7: "If Pixel Shader outputs oMask, AlphaToCoverage is disabled in hardware, regardless of the state setting for this feature." From OpenGL spec 4.6, "15.2 Shader Execution": "The built-in integer array gl_SampleMask can be used to change the sample coverage for a fragment from within the shader." From OpenGL spec 4.6, "17.3.1 Alpha To Coverage": "If SAMPLE_ALPHA_TO_COVERAGE is enabled, a temporary coverage value is generated where each bit is determined by the alpha value at the corresponding sample location. The temporary coverage value is then ANDed with the fragment coverage value to generate a new fragment coverage value." Similar wording could be found in Vulkan spec 1.1.100 "25.6. Multisample Coverage" Thus we need to compute alpha to coverage dithering manually in shader and replace sample mask store with the bitwise-AND of sample mask and alpha to coverage dithering. The following formula is used to compute final sample mask: m = int(16.0 * clamp(src0_alpha, 0.0, 1.0)) dither_mask = 0x1111 * ((0xfea80 >> (m & ~3)) & 0xf) \| 0x0808 * (m & 2) \| 0x0100 * (m & 1) sample_mask = sample_mask & dither_mask Credits to Francisco Jerez <currojerez@riseup.net> for creating it. It gives a number of ones proportional to the alpha for 2, 4, 8 or 16 least significant bits of the result. GEN6 hardware does not have issue with simultaneous usage of sample mask and alpha to coverage however due to the wrong sending order of oMask and src0_alpha it is still affected by it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109743 Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-03-25 13:54:55 -07:00
Jason Ekstrand	3bd5457641	nir: Add a lowering pass for non-uniform resource access Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-25 15:00:36 -05:00
Jason Ekstrand	39da1deb49	nir/lower_io: Add a bounds-checked 64-bit global address format Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-25 14:40:54 -05:00
Dave Airlie	551950cacd	draw/gs: fix point size outputs from geometry shader. If the geom shader emits a point size we failed to find it here, use the correct API to look it up. Fixes: tests/spec/glsl-1.50/execution/geometry/point-size-out.shader_test Reviewed-by: Brian Paul <brianp@vmware.com>	2019-03-26 05:17:06 +10:00
Dave Airlie	d3836510d2	draw: bail instead of assert on instance count (v2) With indirect rendering it's fine to set the instance count parameter to 0, and expect the rendering to be ignored. Fixes assert in KHR-GLES31.core.compute_shader.pipeline-gen-draw-commands on softpipe v2: return earlier before changing fpstate Reviewed-by: Brian Paul <brianp@vmware.com>	2019-03-26 05:16:56 +10:00
Leo Liu	382401aab7	vl/dri3: remove the wait before getting back buffer The wait here is unnecessary since we got a pool of back buffers, and the wait for swap buffer will happen before the present pixmap, at the same time the previous back buffer will be put back to pool for reuse after the check for PresentIdleNotify event Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2019-03-25 12:20:31 -04:00
Iago Toral Quiroga	763c8aabed	compiler/nir: add lowering for 16-bit ldexp v2 (Topi): - Make bit-size handling order be 16-bit, 32-bit, 64-bit - Clamp lower exponent range at -28 instead of -30. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-25 16:08:25 +01:00
Iago Toral Quiroga	3766334923	compiler/nir: add lowering for 16-bit flrp And enable it on Intel. v2: - Squash the change to enable it on Intel (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-25 16:08:25 +01:00
Iago Toral Quiroga	ca31df6f1f	compiler/nir: add lowering option for 16-bit fmod And enable it on Intel. v2: - Squash the change to enable this lowering on Intel (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-25 16:08:25 +01:00
Brian Paul	08d97aadd1	st/mesa: fix texture deletion context mix-up issues (v2) When we destroy a context, we need to temporarily make that context the current one for the thread. That's because during context tear-down we make many calls to _mesa_reference_texobj(&texObj, NULL). Note there's no context parameter. If the texture's refcount goes to zero and we need to delete it, we use the thread's current context. But if that context isn't the context we're tearing down, we get into trouble when deallocating sampler views. See patch `593e36f956` ("st/mesa: implement "zombie" sampler views (v2)") for background information. Also, we need to release any sampler views attached to the fallback textures. Fixes a crash on exit with a glretrace of the Nobel Clinician application. v2: at end of st_destroy_context(), check if save_ctx == ctx and unbind the context if so. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2019-03-25 06:57:57 -06:00
Brian Paul	d13167cd21	nir: fix a few signed/unsigned comparison warnings Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-25 06:51:31 -06:00
Kishore Kadiyala	e1d8057160	android: static link with libexpat with Android O+ In Android O, MESA needs to statically link libexpat so that it's in same VNDK namespace. v2: apply change also to anv driver (Tapani) v3: use += in anv change (Eric Engestrom) Change-Id: I82b0be5c817c21e734dfdf5bfb6a9aa1d414ab33 Signed-off-by: Kishore Kadiyala <kishore.kadiyala@intel.com> Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-25 10:11:57 +02:00
Samuel Iglesias Gonsálvez	01cf390035	radv: write availability status vkGetQueryPoolResults() when the data is not available If VK_QUERY_RESULT_WITH_AVAILABILY_BIT is set and VK_QUERY_RESULT_WAIT_BIT and VK_QUERY_RESULT_PARTIAL_BIT are both not set, we need return to VK_NOT_READY only and set the availability status field for each query. From Vulkan spec: "If VK_QUERY_RESULT_WAIT_BIT and VK_QUERY_RESULT_PARTIAL_BIT are both not set then no result values are written to pData for queries that are in the unavailable state at the time of the call, and vkGetQueryPoolResults returns VK_NOT_READY. However, availability state is still written to pData for those queries if VK_QUERY_RESULT_WITH_AVAILABILITY_BIT is set." Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-03-25 08:21:22 +01:00
Samuel Iglesias Gonsálvez	cb3ea50ec2	radv: don't overwrite results in VkGetQueryPoolResults() when queries are not available If the query is not available and VK_QUERY_RESULT_WAIT_BIT and VK_QUERY_RESULT_PARTIAL_BIT are both not set, the spec doesn't allow to modify its result. From Vulkan spec: "If VK_QUERY_RESULT_WAIT_BIT and VK_QUERY_RESULT_PARTIAL_BIT are both not set then no result values are written to pData for queries that are in the unavailable state at the time of the call, and vkGetQueryPoolResults returns VK_NOT_READY. However, availability state is still written to pData for those queries if VK_QUERY_RESULT_WITH_AVAILABILITY_BIT is set." v2: - Move VK_NOT_READY change to next patch (Samuel Pitoiset) Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-03-25 08:21:22 +01:00
Tapani Pälli	2c240a5216	st/mesa: fix warnings about implicit conversion on enumeration type These enums match but compiler warns about implicit conversion. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-03-25 07:44:27 +02:00
Tapani Pälli	ec12316489	st/mesa: fix compilation warning on storage_flags_to_buffer_flags (warning: 'const' type qualifier on return type has no effect) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-03-25 07:44:05 +02:00
Dave Airlie	9417793fb1	nir/split_vars: fixup some more explicit_stride related issues. With vkpipelinedb Samuel discovered a regression since we stopped stripping types at the spir-v level. This adds a check to the var splitting for the case where it asserts the type hasn't changed, when it has just created a bare type, and it's different than the original type which has an explicit stride. This also removes a pointless assert that also triggers. Fixes: `3b3653c4cf` (nir/spirv: don't use bare types, remove assert in split vars for testing) Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-25 13:57:16 +10:00
Caio Marcelo de Oliveira Filho	9d0ae777dd	spirv: Use interface type for block and buffer block Also handle GLSL_TYPE_INTERFACE the same way we do GLSL_TYPE_STRUCT in various places. Motivated by ARB_gl_spirv work, that will take advantage of the interface types when handling NIR coming from SPIR-V. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-23 10:22:39 -07:00
Caio Marcelo de Oliveira Filho	fb024f5e72	intel/compiler: handle GLSL_TYPE_INTERFACE as GLSL_TYPE_STRUCT Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-23 10:22:39 -07:00
Caio Marcelo de Oliveira Filho	15012077bc	spirv: Add an execution environment to the options Also updates gl_spirv to pick the right one. At the moment nothing uses it, but upcoming functionality part of ARB_gl_spirv will use it, and we also later can be more assertful when handling certain features for each of the execution environments. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Acked-by: Karol Herbst <kherbst@redhat.com>	2019-03-23 09:29:21 -07:00
Eric Anholt	dacb11a585	egl: Add a 565 pbuffer-only EGL config under X11. The CTS requires a 565-no-depth-no-stencil (meaning d/s not-required, not not-present) config for ES 3.0, but at depth 24 of X11 we wouldn't do so. We can satisfy that bad requirement using a pbuffer-only visual with whatever other buffers the driver happens to have given us. I've tried to raise this as an absurd requirement with Khronos and made no progress. v2: Make sure it's single sample, no depth, no stencil. Comment typo fix Reviewed-by: Adam Jackson <ajax@redhat.com>	2019-03-22 15:22:40 -07:00
Caio Marcelo de Oliveira Filho	e5830e1132	nir: Handle array-deref-of-vector case in loop analysis SPIR-V can produce those for SSBO and UBO access. Found when testing the ARB_gl_spirv series. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-22 13:50:39 -07:00
Rob Clark	cdd90a7502	docs: update freedreno status Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-22 16:39:14 -04:00
Rob Clark	6fd5a7ff8c	freedreno: add ESSL cap Report 320 for a6xx, which isn't quite true (no geom/tess, in particular), but other caps keep the reported GL and GLSL versions correct (3.1 / 3.10 es). But reporting 320 will switch on EXT_gpu_shader5, which is the goal. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-22 16:39:14 -04:00
Rob Clark	6cd9876047	mesa/st: use ESSL cap top enable gpu_shader5 For GLES2+ contexts, enable EXT_gpu_shader5 if the driver exposes a sufficiently high ESSL feature level, even if the GLSL feature level isn't high enough. This allows drivers to support EXT_gpu_shader5 in GLES contexts before they support all the additional features of ARB_gpu_shader5 in GL contexts. Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-03-22 16:39:13 -04:00
Rob Clark	de481947d9	gallium: add PIPE_CAP_ESSL_FEATURE_LEVEL Adds a new cap to allow drivers to expose higher shading language versions in GLES contexts, to avoid having to report an artificially low version for the benefit of GL contexts. The motivation is to expose EXT_gpu_shader5 even though a driver may not support all the features needed for the corresponding GL extension (ARB_gpu_shader5). Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-03-22 16:39:13 -04:00
Vinson Lee	93c81ca336	swr: Fix build with llvm-9.0. Fix build error after llvm-9.0svn r352827 ("[opaque pointer types] Add a FunctionCallee wrapper type, and use it."). In file included from ./rasterizer/jitter/builder.h:158:0, from swr_shader.cpp:35: ./rasterizer/jitter/gen_builder_meta.hpp: In member function ‘llvm::Value* SwrJit::Builder::VGATHERPD(llvm::Value, llvm::Value, llvm::Value, llvm::Value, llvm::Value, const llvm: :Twine&)’: ./rasterizer/jitter/gen_builder_meta.hpp:51:117: error: no matching function for call to ‘cast(llvm::FunctionCallee)’ Function pFunc = cast<Function>(JM()->mpCurrentModule->getOrInsertFunction("meta.intrinsic.VGATHERPD", pFuncTy)); ^ Suggested-by: Philip Meulengracht <the_meulengracht@hotmail.com> Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-03-22 13:13:51 -07:00
Dylan Baker	ed96038e55	bin/install_megadrivers.py: Fix regression for set DESTDIR The previous patch tried to address a bug when DESTDIR is '', however, it introduces a bug when DESTDIR is not '', and fakeroot is used. This patch does fix that, and has been tested with the arch pkg-build to ensure it isn't regressed. Fixes: 093a1ade4e24b7dd701a093d30a71efd669fe9c8 ("bin/install_megadrivers.py: Correctly handle DESTDIR=''") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110221 Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-03-22 19:09:00 +00:00
Samuel Pitoiset	23d30f4099	spirv,nir: lower frexp_exp/frexp_sig inside a new NIR pass This lowering isn't needed for RADV because AMDGCN has two instructions. It will be disabled for RADV in an upcoming series. While we are at it, factorize a little bit. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-22 19:41:46 +01:00
Samuel Pitoiset	6ae5797243	nir: use generic float types for frexp_exp and frexp_sig Only the exponent needs to be 32-bit signed integer. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-22 19:41:44 +01:00
Vinson Lee	77aa11ca32	nir: Fix anonymous union initialization with older GCC. Fix this build error with GCC 4.4.7. CC nir/nir_opt_copy_prop_vars.lo nir/nir_opt_copy_prop_vars.c: In function ‘load_element_from_ssa_entry_value’: nir/nir_opt_copy_prop_vars.c:454: error: unknown field ‘ssa’ specified in initializer nir/nir_opt_copy_prop_vars.c:455: error: unknown field ‘def’ specified in initializer nir/nir_opt_copy_prop_vars.c:456: error: unknown field ‘component’ specified in initializer nir/nir_opt_copy_prop_vars.c:456: error: extra brace group at end of initializer nir/nir_opt_copy_prop_vars.c:456: error: (near initialization for ‘(anonymous).<anonymous>’) nir/nir_opt_copy_prop_vars.c:456: warning: excess elements in union initializer nir/nir_opt_copy_prop_vars.c:456: warning: (near initialization for ‘(anonymous).<anonymous>’) Fixes: `96c32d7776` ("nir/copy_prop_vars: handle load/store of vector elements") Signed-off-by: Vinson Lee <vlee@freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109810 Reviewed-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-22 10:43:41 -07:00
Chris Wilson	db99d02fce	iris: Push heavy memchecker code to DEBUG Invoking VALGRIND_CHECK_MEM_IS_DEFINED pulls in enough code to convince gcc to not inline __gen_uint and results in a lot of packing code ending up out-of-line with lots of stack copying. To ameliorate this, only insert the check inside the packer if DEBUG is defined and instead perform the validation checking before submitting the batch to the kernel. This should give accurate results if --trace-origins=yes is used, and failing that we can recompile in full debug mode to check on insertion. Improve drawoverhead baseline by 25% with a default build with valgrind-dev installed (with effectively no loss of vg coverage). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-22 10:38:03 -07:00
Kenneth Graunke	87f865aab3	iris: Fix batch chaining map_next increment. Caught by Chris Wilson; split out from his valgrind patch.	2019-03-22 09:31:15 -07:00
Rob Clark	bf5a92811d	freedreno/ir3: disable early-z for SSBO/image writes Fixes: dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_stencil dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_stencil_fbo Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-22 08:53:28 -04:00
Rob Clark	dbac1a80d1	freedreno/ir3: rename has_kill to no_earlyz There are other cases where we need to disable early-z, like image writes. So rename to something more generic. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-22 08:53:28 -04:00
Rhys Perry	f736250ab4	ac/nir: implement 16-bit pack/unpack opcodes Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-03-22 12:50:16 +01:00
Lionel Landwerlin	87dadbce5b	vulkan/overlay: improve error reporting We can show the actual command & line where the failure happened Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Suggested-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-03-22 11:26:04 +00:00
Lionel Landwerlin	9f3727351d	vulkan/overlay: check return value of swapchain get images Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-03-22 11:26:01 +00:00
Lionel Landwerlin	1fbf355597	vulkan/overlay: silence validation layer warnings v2: Drop call to FreeDescriptorSet Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-03-22 11:25:58 +00:00
Lionel Landwerlin	de14107741	vulkan/overlay: properly register layer object with loader This is required by the validation layers if we want to validate the commands inserted by the overlay layer. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-03-22 11:25:55 +00:00
Józef Kucia	c077d5d7de	radv: Fix driverUUID Fixes: `14cad8786a` ("radv: generate the same driver UUID as radeonsi") Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-03-22 08:57:16 +01:00
Danylo Piliaiev	ea9bde151f	glsl: Cross validate variable's invariance by explicit invariance only 'invariant' qualifier is propagated on variables which are used to calculate other invariant variables, however when we are matching variable's declarations we should take into account only explicitly declared invariance because invariance propagation is an implementation specific detail. Thus new flag is added to ir_variable_data which indicates 'invariant' qualifier being explicitly set in the shader. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100316 Fixes: `89b60492` ('glsl: Add a pass to propagate the "invariant" and "precise" qualifiers') Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-03-21 23:28:08 -07:00
Józef Kucia	1d996ef714	mesa: Fix GL_NUM_DEVICE_UUIDS_EXT Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-03-22 07:37:14 +02:00
Kenneth Graunke	66c100a8d6	iris: Skip resolves and flushes altogether if unnecessary Improves drawoverhead baseline scores by 1.17x.	2019-03-21 20:28:17 -07:00
Kenneth Graunke	365886ebe1	iris: Skip framebuffer resolve tracking if framebuffer isn't dirty Improves drawoverhead baseline score by 1.86x.	2019-03-21 20:28:17 -07:00
Kenneth Graunke	1d05d24b1d	iris: Skip input resolve handling if bindings haven't changed This brings the drawoverhead 16 Tex w/ no state change score from 22% of baseline to 97% of baseline.	2019-03-21 20:28:17 -07:00
Kenneth Graunke	a342f2deb1	iris: Fix util_vma_heap_init size for IRIS_MEMZONE_SHADER Fixes assertions when disabling bucket allocators.	2019-03-21 19:07:17 -07:00
Dave Airlie	9dd92d08a5	softpipe: fix integer texture swizzling for 1 vs 1.0f The swizzling was putting float one in not integer 1. This fixes a lot of arb_texture_view-rendering-formats cases. Reviewed-by: Brian Paul <brianp@vmware.com>	2019-03-22 09:30:35 +10:00
Dave Airlie	aae5ba72ab	softpipe: remove shadow_ref assert. I don't think this really buys us anything and TG4 with cubemap arrays falls over because sampler == 2, but otherwise works fine. Fixes: ./bin/textureGather fs shadow r CubeArray repeat on softpipe with ARB_gpu_shader5 enabled. Reviewed-by: Brian Paul <brianp@vmware.com>	2019-03-22 09:30:29 +10:00
Dave Airlie	8dc8b1361a	softpipe: handle 32-bit bitfield inserts Fixes piglits if ARB_gpu_shader5 is enabled Reviewed-by: Brian Paul <brianp@vmware.com>	2019-03-22 09:30:26 +10:00
Dave Airlie	7b7cb1bc35	softpipe: fix 32-bit bitfield extract These didn't deal with the width == 32 case that TGSI is defined with. Fixes piglit tests if ARB_gpu_shader5 is enabled. Reviewed-by: Brian Paul <brianp@vmware.com>	2019-03-22 09:30:21 +10:00
Timothy Arceri	a1bd9dd5bc	nir: fix opt_if_loop_last_continue() Rather than skipping code that looked like this: loop { ... if (cond) { do_work_1(); continue; } else { break; } do_work_2(); } Previously we would turn this into: loop { ... if (cond) { do_work_1(); continue; } else { do_work_2(); break; } } This was clearly wrong. This change checks for this case and makes sure we now leave it for nir_opt_dead_cf() to clean up. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-22 09:58:18 +11:00
Gurchetan Singh	620df57dbb	anv: fix build on Nougat AHardwareBuffer is only available on O and above. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-03-21 15:36:39 -07:00
Gurchetan Singh	139f908d8f	anv: move anv_GetMemoryAndroidHardwareBufferANDROID up a bit No functional change, just makes the next patch a little easier. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-03-21 15:36:39 -07:00
Gurchetan Singh	b070861045	configure.ac / meson: depend on libnativewindow when appropriate libnativewindow is only available on O or greater, and it's required for some features. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-03-21 15:36:39 -07:00
Eric Anholt	bfed0a7099	v3d: Remove some dead members of struct v3d_compile. These are more vc4 leftovers.	2019-03-21 14:20:50 -07:00
Eric Anholt	16f2770eb4	v3d: Upload all of UBO[0] if any indirect load occurs. The idea was that we could skip uploading the constant-indexed uniform data and just upload the uniforms that are variably-indexed. However, since the VS bin and render shaders may have a different set of uniforms used, this meant that we had to upload the UBO for each of them. The first case is generally a fairly small impact (usually the uniform array is the most space, other than a couple of FSes in shader-db), while the second is a larger impact: 3DMMES2 was uploading 38k/frame of uniforms instead of 18k. Given that the optimization is of dubious value, has a big downside, and is quite a bit of code, just drop it. No change in shader-db. No change on 3DMMES2 (n=15).	2019-03-21 14:20:50 -07:00
Eric Anholt	320e96bace	v3d: Move constant offsets to UBO addresses into the main uniform stream. We'd end up with the constant offset in the uniform stream anyway, since they're bigger than small immediates. Avoids the extra uniforms and adds in the shader in favor of just adding once on the CPU. shader-db: total instructions in shared programs: 6496865 -> 6494851 (-0.03%) total uniforms in shared programs: 2119511 -> 2117243 (-0.11%)	2019-03-21 14:20:50 -07:00
Eric Anholt	c36d2793ec	v3d: Rename v3d_tmu_config_data to v3d_unit_data. I want to reuse this for encoding small constant UBO/SSBO offsets into the uniform stream to reduce the extra uniform loads and adds for the small constant offsets.	2019-03-21 14:20:50 -07:00
Benjamin Gordon	b30aad552c	configure.ac/meson.build: Add options for library suffixes When building the Chrome OS Android container, we need to build copies of mesa that don't conflict with the Android system-supplied libraries. This adds options to create suffixed versions of EGL and GLES libraries: libEGL.so -> libEGL${egl-lib-suffix}.so libGLESv1_CM.so -> libGLESv1_CM${gles-lib-suffix}.so libGLESv2.so -> libGLES${gles-lib-suffix}.so This is similar to what happens when --enable-libglvnd is specified, but without the side effects of linking against libglvnd. To avoid unexpected clashes with the suffixed appended by libglvnd, make it an error to specify both --enable-libglvnd and --with-egl-lib-suffix. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-21 10:18:31 -07:00
Kenneth Graunke	e426c3a6cb	nir: Record non-vector/scalar varyings as unmovable when compacting In some cases, we can end up with varying structs that aren't split to their member variables. nir_compact_varyings attempted to record these as unmovable, so it would leave them be. Unfortunately, it didn't do it right for non-vector/scalar types. It set the mask to: ((1 << (elements * dmul)) - 1) << var->data.location_frac where elements is the number of vector elements. For structures and other non-vector/scalars, elements is 0...so the whole mask became 0. This caused nir_compact_varyings to assign other varyings on top of the structure varying's location (as it appeared to take up no space). To combat this, we just set elements to 4 for non-vector/scalar types, so that the entire slot gets marked as unmovable. Fixes KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_in on iris. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-21 16:03:58 +00:00
Rob Clark	6e781a01b9	freedreno/ir3: dynamic UBO indexing vs 64b pointers Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.uniform_fragment and similar things with multiple UBOs Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-21 09:13:05 -04:00
Rob Clark	2e01c534f4	freedreno/ir3: fix bit_count Seems like it can only work 16b at a time. Fixes dEQP-GLES31.functional.shaders.builtin_functions.integer.bitcount.* TODO need to check if this limitation applies to a3xx as well. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-21 09:13:05 -04:00
Rob Clark	3d8349048b	freedreno/ir3: additional lowering For some things that show up when we expose higher glsl TODO check blob traces to see if we have instructions for some of this? I guess we don't but worth a check.. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-21 09:13:05 -04:00
Rob Clark	bcd81d2387	freedreno/ir3: optimize sam.s2en to sam Detect when sampler/texture idx are immediate and switch to non s2en encoding. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-21 09:13:05 -04:00
Rob Clark	1443694ee5	freedreno/ir3: enable indirect tex/samp (sam.s2en) For now it uses indirect for everything. The next step is for the ir3_cp pass to detect the case that tex and samp idx are immediate and convert the sam instruction back to the non .s2en variant. But doing that in a following patch so we can shake out the bugs with .s2en more easily. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-21 09:13:05 -04:00
Rob Clark	1088b788d8	freedreno/ir3: find # of samplers from uniform vars When we have indirect samplers, we cannot tell the max sampler referenced. Instead just refer to the number of sampler uniforms. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-21 09:13:05 -04:00
Rob Clark	d4cbc94685	nir: move gls_type_get_{sampler,image}_count() I need at least the sampler variant in ir3.. Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-21 09:13:05 -04:00
Rob Clark	8eb16ae8bf	freedreno/ir3: fix regmask for merged regs On a6xx+ with half-regs conflicting with full-regs, the legalize pass needs to set appropriate sync bits, such as (sy), on writes to full regs that conflict with half regs, and visa-versa. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-21 09:13:05 -04:00
Rob Clark	1dffb089f9	freedreno/ir3: fix sam.s2en encoding Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-21 09:13:05 -04:00
Rob Clark	45b7a581b4	freedreno/ir3: fix sam.s2en decoding Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-21 09:13:05 -04:00
Rob Clark	2d31cf9d3b	freedreno/ir3/ra: fix half-class conflicts On a6xx, half-regs conflict with full-regs. But we were only setting up conflicts for the first class (ie. scalar, but not hvec2/hvec3/hvec4), resulting in higher half-reg classes getting assigned to regs that overwrite full-regs. Noticed while trying to enable indirect-sampler (sam.s2en) which uses an hvec2 argument to pass the sampler/tex index. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-21 09:13:05 -04:00
Rob Clark	cc5ca9391c	freedreno/ir3 better cat6 encoding detection These two bits seem to be a better way to detect which encoding we are looking at. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-21 09:13:05 -04:00
Samuel Pitoiset	00327f827f	ac: fix incorrect argument type for tbuffer.{load,store} with LLVM 7 GLC/SLC are boolean. This fixes the following LLVM error when checkir is set: Intrinsic has incorrect argument type! void (i32, <4 x i32>, i32, i32, i32, i32, i32, i32, i32, i32)* @llvm.amdgcn.tbuffer.store.i32 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl	2019-03-21 14:02:00 +01:00
Samuel Pitoiset	20cac1f498	ac: fix 16-bit shifts This fixes the following LLVM error when ckeckir is set: Type too small for ZExt Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl	2019-03-21 14:01:58 +01:00
Samuel Pitoiset	2ac5c5c1b5	ac: add 16-bit support to fract Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-21 12:13:09 +01:00
Samuel Pitoiset	0eb1478ac2	ac: add 16-bit support fo fsign Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-21 12:13:07 +01:00
Samuel Pitoiset	ff11c9dcc7	ac: add f16_0 and f16_1 constants Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-21 12:13:05 +01:00
Timothy Arceri	427a6fee43	nir: only override previous alu during loop analysis if supported Users of this function expect alu to be a supported comparision if the induction variable is not NULL. Since we attempt to override the return values if the first limit is not a const, we must make sure we are dealing with a valid comparision before overriding the alu instruction. Fixes an unreachable in inverse_comparison() with the game Assasins Creed Odyssey. Fixes: `3235a942c1` ("nir: find induction/limit vars in iand instructions") Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110216	2019-03-21 21:51:21 +11:00
Michel Dänzer	6d0a7f798c	gitlab-ci: Use 8 CPU cores in autotools job This cuts down the job runtime from ~9.5 to ~7 minutes with my personal runner on an 8-core Ryzen 7 1700. While this might result in slightly higher load on shared runners, it should be OK, since libtool doesn't use the CPU cores as effectively as e.g. ninja does; a significant part of the CPU load tends to be in bash processes at any time, which should be relatively light on memory. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-21 09:58:31 +01:00
Michel Dänzer	a2cce701e6	gitlab-ci: List some longer-running jobs before others of the same stage This increases the chance of them running earlier, which can have an impact on the total duration of the pipeline. v2: * Minor style fix-up to moved comment (Eric Anholt) Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Eric Anholt <eric@anholt.net>	2019-03-21 09:55:08 +01:00
Samuel Pitoiset	db07f0554a	radv: add missing initializations since VK_EXT_pipeline_creation_feedback This fixes the world. Fixes: `5f5ac19f13` ("radv: Implement VK_EXT_pipeline_creation_feedback.")" Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-21 09:42:31 +01:00
Rhys Perry	037f11d42e	radv: enable VK_KHR_8bit_storage Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-21 09:02:27 +01:00
Rhys Perry	3cc72a88d8	ac/nir: implement 8-bit conversions Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-21 09:02:25 +01:00
Rhys Perry	c73f8b6576	ac/nir: add 8-bit types to glsl_base_to_llvm_type Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-21 09:02:22 +01:00
Rhys Perry	9c5067acf1	ac/nir: implement 8-bit ssbo stores Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-21 09:02:20 +01:00
Samuel Pitoiset	b235d77e18	ac: add ac_build_tbuffer_store_byte() helper Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-21 09:02:18 +01:00
Rhys Perry	b12e074b89	ac/nir: implement 8-bit push constant, ssbo and ubo loads Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-21 09:02:16 +01:00
Samuel Pitoiset	104dbc64a5	ac: add ac_build_tbuffer_load_byte() helper Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-21 09:02:14 +01:00
Samuel Pitoiset	6e632eb24b	ac: add various int8 definitions Original patch by Rhys Perry. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-21 09:02:10 +01:00
Tapani Pälli	4e1bbb000c	anv/radv: release memory allocated by glsl types during spirv_to_nir Fixes leaks for each glsl_type generated: ==32470== 384 bytes in 3 blocks are possibly lost in loss record 18 of 18 ==32470== at 0x483880B: malloc (vg_replace_malloc.c:309) ==32470== by 0x4C43F4A: ralloc_size (ralloc.c:119) ==32470== by 0x4C44014: rzalloc_size (ralloc.c:151) ==32470== by 0x4C44258: rzalloc_array_size (ralloc.c:215) ==32470== by 0x4D38957: glsl_type::glsl_type(glsl_struct_field const, unsigned int, char const) (glsl_types.cpp:114) ==32470== by 0x4D3BEED: glsl_type::get_struct_instance(glsl_struct_field const, unsigned int, char const) (glsl_types.cpp:1146) ==32470== by 0x4D42ECC: glsl_struct_type (nir_types.cpp:501) ==32470== by 0x4CDB5A1: vtn_handle_type (spirv_to_nir.c:1269) ==32470== by 0x4CE53DD: vtn_handle_variable_or_type_instruction (spirv_to_nir.c:4018) ==32470== by 0x4CD8CFF: vtn_foreach_instruction (spirv_to_nir.c:365) ==32470== by 0x4CE5E6B: spirv_to_nir (spirv_to_nir.c:4490) ==32470== by 0x497AF10: anv_shader_compile_to_nir (anv_pipeline.c:173) v2: move release call to vkDestroyInstance v3: apply fix also to radv driver Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-21 08:30:22 +02:00
Jason Ekstrand	6e19348ad1	spirv: Drop inline tg4 lowering Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-03-21 02:58:41 +00:00
Jason Ekstrand	08f804ec0c	anv,radv,turnip: Lower TG4 offsets with nir_lower_tex v2: turn on for turnip as well (Karol Herbst) Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-03-21 02:58:41 +00:00
Karol Herbst	d8a0658d8b	nir/lower_tex: Add support for tg4 offsets lowering Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-21 02:58:41 +00:00
Karol Herbst	99f202432b	nv50/ir/nir: support gather offsets v2: only emit offsets if those are !0 Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-21 02:58:41 +00:00
Karol Herbst	71c66c254b	nir: add support for gather offsets Values inside the offsets parameter of textureGatherOffsets are required to be constants in the range of [GL_MIN_PROGRAM_TEXTURE_GATHER_OFFSET, GL_MAX_PROGRAM_TEXTURE_GATHER_OFFSET]. As this range is never outside [-32, 31] for all existing drivers inside mesa, we can simply store the offsets as a int8_t[4][2] array inside nir_tex_instr. Right now only Nvidia hardware supports this in hardware, so we can turn this on inside Nouveau for the NIR path as it is already enabled with the TGSI one. v2: use memcpy instead of for loops add missing bits to nir_instr_set don't show offsets if they are all 0 v3: default offsets aren't all 0 v4: rename offsets -> tg4_offsets rename nir_tex_instr_has_explicit_offsets -> nir_tex_instr_has_explicit_tg4_offsets Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-21 02:58:41 +00:00
Dave Airlie	b95b33a5c7	nir/deref: remove casts of casts which are likely redundant (v3) Not sure how ptr_stride should be taken into account if at all here v2: reorder check to avoid src walking (Jason) v3: remove is_cast_cast checks, keep going afterwards (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-21 10:58:06 +10:00
Dave Airlie	3b3653c4cf	nir/spirv: don't use bare types, remove assert in split vars for testing For OpenCL we never want to strip the info from the types, and it makes type comparisons easier in later stages. We might later need a nir pass to strip this for GLSL, but so far the only regression is the assert and Jason said removing that is fine. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Dave Airlie <airlied@redhat.com>	2019-03-21 10:25:40 +10:00
Rafael Antognolli	e7c8402163	iris: Let blorp update the clear color for us. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-20 16:46:26 -07:00
Rafael Antognolli	93123417dd	iris: Track fast clear color. v2: Update tracked clear color when we update the surface state. v3: Update all aux surface states when updating the clear color. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-20 16:46:26 -07:00
Rafael Antognolli	5658c661de	iris: Stall on the CPU and resolve predication during fast clears. Only if the clear color/depth is changing. In those cases, it's hard to keep track of the current clear color, and aux state of some layers, when predication is enabled. So simplify everything by stalling on the few cases where we would have a fast clear color change with predication. v2: - fix comment (Ken) - explicitly check for predicate state after resolving it (Ken) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-20 16:46:26 -07:00
Rafael Antognolli	ce830a364e	iris: Add iris_resolve_conditional_render(). This function can be used to stall on the CPU and resolve the predicate for the conditional render. It will convert ice->state.predicate from IRIS_PREDICATE_STATE_USE_BIT to either IRIS_PREDICATE_STATE_RENDER or IRIS_PREDICATE_STATE_DONT_RENDER, depending on the result of the query. v2: - return void (Ken) - update the stored condition (Ken) - simplify the code leading to resolve the predicate (Ken) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-20 16:46:25 -07:00
Rafael Antognolli	131b42f0aa	iris: Implement fast clear color. If all the restrictions are satisfied, do a fast clear instead of regular clear. v2: - add perf_debug() when we can't fast clear (Ken) - improve comment: s/miptree/resource/ (Ken) - use swizzle_color_value from blorp (Ken) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-20 16:46:25 -07:00
Rafael Antognolli	bd6f51ec21	intel/blorp: Make swizzle_color_value public. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-20 16:46:25 -07:00
Rafael Antognolli	d97eddff25	intel/isl: Add isl_format_has_color_component() function. v2: Get luminance bits from luminance component (Ken). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-20 16:46:25 -07:00
Rafael Antognolli	7f6344a726	iris: Bring back check for srgb and fast clear color. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-20 16:46:25 -07:00
Rafael Antognolli	a8b5ea8ef0	iris: Add function to update clear color in surface state. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-20 16:46:25 -07:00
Rafael Antognolli	32c8fa6411	iris: Add helper to convert fast clear color. It needs to be converted to a value that can be used by ISL (and our hardware SURFACE_STATE structure). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-20 16:46:25 -07:00
Rafael Antognolli	51638cf18a	iris: Fast clear depth buffers. Check and do a fast clear instead of a regular clear on depth buffers. v3: - remove swith with some cases that we shouldn't wory about (Ken) - more parens into the has_hiz check (Ken) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-20 16:46:25 -07:00
Rafael Antognolli	34d00b4410	iris: Use the clear depth when emitting 3DSTATE_CLEAR_PARAMS. Take the clear depth into account when IRIS_DIRTY_DEPTH_BUFFER is marked as dirty. Also update the blorp surface clear color. v2: Use a single if (zres && zres->aux.bo) (Ken). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-20 16:46:25 -07:00
Rafael Antognolli	37f2692591	iris: Allocate buffer space for the fast clear color. Also store clear color in the iris_resource. Always allocate clear color state buffer. v2: - Make clear_color_offset be 64 bits (Ken). - Simplify the logic to decide when to memset the aux buffer (Ken). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-20 16:46:25 -07:00
Bas Nieuwenhuizen	5f5ac19f13	radv: Implement VK_EXT_pipeline_creation_feedback. Does what it says on the tin. The per stage time is only an approximation due to linking and the Vega merged stages. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-03-20 21:19:46 +00:00
Samuel Pitoiset	72e366b4c2	ac: use new LLVM 8 intrinsics in ac_build_buffer_store_dword() New buffer intrinsics have a separate soffset parameter. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-20 22:19:19 +01:00
Samuel Pitoiset	9d960c17a8	ac: use new LLVM 8 intrinsic when storing 16-bit values vindex is always 0. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-20 22:19:14 +01:00
Samuel Pitoiset	2a9d331898	ac: add ac_build_{struct,raw}_tbuffer_store() helpers Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-20 22:19:12 +01:00
Samuel Pitoiset	30c2aca67f	ac: use new LLVM 8 intrinsics in ac_build_buffer_load() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-20 22:19:08 +01:00
Samuel Pitoiset	da46dbb1be	ac/nir: use ac_build_buffer_store_dword() for SSBO store operations Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-20 22:19:06 +01:00
Samuel Pitoiset	6b573c00c9	ac/nir: use ac_build_buffer_load() for SSBO load operations Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-20 22:19:02 +01:00
Samuel Pitoiset	29132af234	ac/nir: use new LLVM 8 intrinsics for SSBO atomic operations Use the raw version (ie. IDXEN=0) because vindex is unused. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-20 22:18:56 +01:00
Samuel Pitoiset	b39844457f	ac/nir: remove one useless check in visit_store_ssbo() Trivial. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-20 22:18:54 +01:00
Samuel Pitoiset	a2073f49f1	ac: add ac_build_buffer_store_format() helper Similar to ac_build_buffer_load_format(). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-20 22:18:50 +01:00
Samuel Pitoiset	4debe49d44	ac/nir: set attrib flags for SSBO and image store operations For consistency regarding other store operations. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-20 22:18:37 +01:00
Samuel Pitoiset	1b553dd47f	ac: make use of ac_get_store_intr_attribs() where possible Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-20 22:18:35 +01:00
Dylan Baker	4188dd7879	bin/install_megadrivers.py: Correctly handle DESTDIR='' Currently if destdir is set to '' then the resulting libdir will have it's first character replaced by / instead of / being prepended to the string. This was the result of ensuring that that DESTDIR wouldn't be ignored if libdir was absolute, since the only cases that meson allows the libdir to be absolute is if the prefix is /, this won't be a problem. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110211 Fixes: `ae3f45c11e` ("bin/install_megadrivers: fix DESTDIR and -D*-path") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-20 20:26:44 +00:00
Juan A. Suarez Romero	efcf9c9f9f	nir: deref only for OpTypePointer Fixes dEQP-VK.binding_model.buffer_device_address.* and dEQP-VK.ssbo.phys.layout* Vulkan CTS tests. v2: set val->type->stride in the section below (Jason) v3: restore val->type->type to original place (Jason) Fixes: `d0ba326f23` ("nir/spirv: support physical pointers") CC: Karol Herbst <kherbst@redhat.com> CC: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-20 19:26:32 +00:00
Dave Airlie	04189565a0	softpipe: fix texture view crashes I noticed we crashed piglit arb_texture_view-rendering-formats when run on softpipe. This fixes the clear tiles to use the surface format not the underlying storage format. This fixes a bunch of srgb piglits as well. Fixes: `396ac41fc2` (softpipe: add integer support) Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-03-21 05:06:07 +10:00
Kenneth Graunke	3c3f250456	nvc0: Skip new update barrier bits I added new barrier bits in `220c1dce1e` and made most drivers skip them. I thought nvc0 was already skipping those but missed the else case here, which does something. So make it explicitly skip like I did everywhere else. Thanks to Ilia for catching this. Fixes: `220c1dce1e` gallium: Add PIPE_BARRIER_UPDATE_BUFFER and UPDATE_TEXTURE bits.	2019-03-20 10:30:32 -07:00
Lionel Landwerlin	6601e5d6fc	anv: implement VK_EXT_pipeline_creation_feedback An extension reporting cache hit in the user supplied pipeline cache as well as timing information for creating the pipelines & stages. v2: Don't consider no cache for cache hits (Jason) Rework duration accumulation (Jason) v3: Fold feedback creation writing into pipeline compile functions (Jason/Lionel) v4: Get cache hit information from anv_device_search_for_kernel() (Jason) Only set cache hit from the whole pipeline if all stages also have that bit (Lionel) v5: Always user_cache_hit in anv_device_search_for_kernel() (Jason) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-20 16:18:35 +00:00
Rob Clark	70904eb99a	freedreno/ir3/a6xx: fix ssbo comp_swap One line left out of the conversion to ir3 ssbo intrinsics on a6xx. Fixes: `2e4525883f` ir3/compiler: Enable lower_io_offsets pass and handle new SSBO intrinsics Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-20 11:48:13 -04:00
Jason Ekstrand	0b7e5bdbd4	nir: Constant values are per-column not per-component Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-03-20 09:26:56 -05:00
Jason Ekstrand	9a129510f5	anv: Bump maxComputeWorkgroupInvocations We initially set this lower because we didn't have SIMD32 support yet but we've supported SIMD32 for quite some time now. We should bump it up to the real limit. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-20 09:26:56 -05:00
Samuel Pitoiset	4fa61273a8	radv: fix binding transform feedback buffers The mask should be accumulated if two calls are used for binding two buffers at different indexes. Otherwise, the driver only accounts for the last one. Noticed while glancing at this code. Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-20 09:06:40 +01:00
Samuel Pitoiset	f4f0e3a395	ac: use llvm.amdgcn.fract intrinsic for nir_op_ffract Noticed with a Doom shader. 29077 shaders in 15096 tests Totals: SGPRS: 1282125 -> 1282133 (0.00 %) VGPRS: 908716 -> 908616 (-0.01 %) Spilled SGPRs: 24811 -> 24779 (-0.13 %) Code Size: 49048176 -> 48936488 (-0.23 %) bytes Max Waves: 244232 -> 244226 (-0.00 %) Totals from affected shaders: SGPRS: 229584 -> 229592 (0.00 %) VGPRS: 163268 -> 163168 (-0.06 %) Spilled SGPRs: 8682 -> 8650 (-0.37 %) Code Size: 12819572 -> 12707884 (-0.87 %) bytes Max Waves: 24398 -> 24392 (-0.02 %) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-20 09:06:35 +01:00
Kenneth Graunke	220c1dce1e	gallium: Add PIPE_BARRIER_UPDATE_BUFFER and UPDATE_TEXTURE bits. The glMemoryBarrier() function makes shader memory stores ordered with respect to things specified by the given bits. Until now, st/mesa has ignored GL_TEXTURE_UPDATE_BARRIER_BIT and GL_BUFFER_UPDATE_BARRIER_BIT, saying that drivers should implicitly perform the needed flushing. This seems like a pretty big assumption to make. Instead, this commit opts to translate them to new PIPE_BARRIER bits, and adjusts existing drivers to continue ignoring them (preserving the current behavior). The i965 driver performs actions on these memory barriers. Shader memory stores go through a "data cache" which is separate from the render cache and other read caches (like the texture cache). All memory barriers need to flush the data cache (to ensure shader memory stores are visible), and possibly invalidate read caches (to ensure stale data is no longer visible). The driver implicitly flushes for most caches, but not for data cache, since ARB_shader_image_load_store introduced MemoryBarrier() precisely to order these explicitly. I would like to follow i965's approach in iris, flushing the data cache on any MemoryBarrier() call, so I need st/mesa to actually call the pipe->memory_barrier() callback. Fixes KHR-GL45.shader_image_load_store.advanced-sync-textureUpdate and Piglit's spec/arb_shader_image_load_store/host-mem-barrier on the iris driver. Roland said this looks reasonable to him. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-19 23:43:33 -07:00
Tapani Pälli	3e534489ec	iris: mark switch case fallthrough CID: 1444103 Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-20 08:21:50 +02:00
Tapani Pälli	03cbfbd913	iris: initialize num_cbufs Currently initialized only if 'ish' is non-NULL. CID: 1444106 Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-20 08:20:09 +02:00
Daniel Stone	d258b787fa	panfrost: Properly align stride Handle buffers whose width is not aligned to 16px by padding the stride and storing it accordingly. This does not reject imports for images whose stride is not sufficiently aligned. v2: make sure bo->stride is set on imported buffers, and add missing variable definition. (Tomeu) Tested-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-20 04:20:42 +00:00
Anuj Phogat	2be60e0c73	anv/icl: Add WA_2204188704 to disable pixel shader panic dispatch Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-19 14:42:19 -07:00
Anuj Phogat	85ecd14ef6	i965/icl: Add WA_2204188704 to disable pixel shader panic dispatch Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-19 14:42:02 -07:00
Eric Engestrom	b3aa37046b	gitlab-ci: drop most autotools builds With autotools this close to being not supported anymore, let's not waste half of the CI cycles on it. The default build will catch most issues, and the rest can be tested by the old Travis. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-19 17:40:05 +00:00
Eric Anholt	17115da6ad	v3d: Expose the dma-buf modifiers query. This allows DRI3 to pick between UIF and raster according to whether we're pageflipping or not and whether the pageflipping display can do UIF, avoiding copies for the windowed/composited case that previously was forced to linear. Improves windowed glmark2 -b build:use-vbo=false performance by 30.7783% +/- 13.1719% (n=3)	2019-03-19 08:59:01 -07:00
Eric Anholt	bf6973199d	v3d: Allow the UIF modifier with renderonly. We ask the other side to make a buffer with the right number of pages, and then just store the UIF in it. This avoids an extra silent copy of the buffer from linear to UIF if it gets used for texturing (X11 copy-based swapbuffers, GL compositors).	2019-03-19 08:54:46 -07:00
Eric Anholt	eb5903a908	v3d: Always lay out shared tiled buffers with UIF_TOP set. The samplers are already ready for this, we just needed to make sure that layout chose UIF for level 0.	2019-03-19 08:54:46 -07:00
Andres Gomez	ab28dca033	Revert "glsl: relax input->output validation for SSO programs" This reverts commit `1aa5738e66`. This patch incorrectly asumed that for SSOs no inner interface matching check was needed. From the ARB_separate_shader_objects spec v.25: " With separable program objects, interfaces between shader stages may involve the outputs from one program object and the inputs from a second program object. For such interfaces, it is not possible to detect mismatches at link time, because the programs are linked separately. When each such program is linked, all inputs or outputs interfacing with another program stage are treated as active. The linker will generate an executable that assumes the presence of a compatible program on the other side of the interface. If a mismatch between programs occurs, no GL error will be generated, but some or all of the inputs on the interface will be undefined." This completes the fix from commit: `3be05dd267` ("glsl/linker: don't fail non static used inputs without matching outputs") Fixes: `1aa5738e66` ("glsl: relax input->output validation for SSO programs") Cc: Tapani Pälli <tapani.palli@intel.com> Cc: Timothy Arceri <tarceri@itsqueeze.com> Cc: Ilia Mirkin <imirkin@alum.mit.edu> Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-19 17:36:20 +02:00
Andres Gomez	422882e78f	glsl/linker: simplify xfb_offset vs xfb_stride overflow check Current implementation uses a complicated calculation which relies in an implicit conversion to check the integral part of 2 division results. However, the calculation actually checks that the xfb_offset is smaller or a multiplier of the xfb_stride. For example, while this is expected to fail, it actually succeeds: " ... layout(xfb_buffer = 2, xfb_stride = 12) out block3 { layout(xfb_offset = 0) vec3 c; layout(xfb_offset = 12) vec3 d; // ERROR, requires stride of 24 }; ... " Fixes: `2fab85aaea` ("glsl: add xfb_stride link time validation") Cc: Timothy Arceri <tarceri@itsqueeze.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-19 17:23:27 +02:00
Andres Gomez	3be05dd267	glsl/linker: don't fail non static used inputs without matching outputs If there is no Static Use of an input variable, the linker shouldn't fail whenever there is no defined matching output variable in the previous stage. From page 47 (page 51 of the PDF) of the GLSL 4.60 v.5 spec: " Only the input variables that are statically read need to be written by the previous stage; it is allowed to have superfluous declarations of input variables." Now, we complete this exception whenever the input variable has an explicit location. Previously, `18004c338f` ("glsl: fail when a shader's input var has not an equivalent out var in previous") took care of the cases in which the input variable didn't have an explicit location. v2: do the location based interface matching check regardless on whether it is a separable program or not (Ilia). Fixes: `1aa5738e66` ("glsl: relax input->output validation for SSO programs") Cc: Timothy Arceri <tarceri@itsqueeze.com> Cc: Iago Toral Quiroga <itoral@igalia.com> Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Cc: Tapani Pälli <tapani.palli@intel.com> Cc: Ian Romanick <ian.d.romanick@intel.com> Cc: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-19 17:23:27 +02:00
Andres Gomez	de1bc2d19a	glsl/linker: always validate explicit location among inputs Outputs are always validated when having explicit locations and we were trusting its outcome to catch similar problems with the inputs since, in case of having undefined outputs for existing inputs, we would be already reporting a linker error. However, consider this case: " Shader stage n: --------------- ... layout(location = 0) out float a; ... Shader stage n+1: ----------------- ... layout(location = 0) in float b; layout(location = 0) in float c; ... " Currently, this won't report a linker error even though location aliasing is happening for the inputs. Therefore, we also need to validate the inputs independently from the outcome of the outputs validation. Cc: Timothy Arceri <tarceri@itsqueeze.com> Cc: Iago Toral Quiroga <itoral@igalia.com> Cc: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-19 17:23:27 +02:00
Andres Gomez	a96093136b	glsl: correctly validate component layout qualifier for dvec{3,4} From page 62 (page 68 of the PDF) of the GLSL 4.50 v.7 spec: " A dvec3 or dvec4 can only be declared without specifying a component." Therefore, using the "component" qualifier with a dvec3 or dvec4 should result in a compiling error. v2: enhance the error message (Timothy). Fixes: `94438578d2` ("glsl: validate and store component layout qualifier in GLSL IR") Cc: Timothy Arceri <tarceri@itsqueeze.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-19 17:23:27 +02:00
Jason Ekstrand	cbfe31ccbe	Revert "nir: const `nir_call_instr::callee`" This reverts commit `db57db5317`. When building IR, nothing is really immutable and, since C has no concept of constness propagating beyond the first pointer, we have to be vary careful with how we use it. To just throw const into a function like this is a lie. Instead, we should just drop the unneeded const in spirv_to_nir which this commit does along with the revert.	2019-03-19 10:19:42 -05:00
Eric Engestrom	43b6dd05f7	gitlab-ci: add clang build `clang` has a different set of warnings and errors than `gcc`, so it's useful to do at least a generic pass over Mesa with it. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-19 12:59:38 +00:00
Eric Engestrom	db57db5317	nir: const `nir_call_instr::callee` Fixes: `c95afe56a8` "nir/spirv: handle kernel function parameters" Cc: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Tapani Pälli <tapani.palli@intel.com> Acked-by: Karol Herbst <kherbst@redhat.com>	2019-03-19 12:51:53 +00:00
Rafael Antognolli	76f9ca6cf9	iris: Make intel_hiz_exec public. Need to use it for fast clearing depth buffers. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-18 22:27:02 -07:00
Rafael Antognolli	9c63ec26ea	iris: Enable HiZ for multisampled depth surfaces. Fix this check so that we can get a HiZ aux buffer for multisampled surfaces as well. Also make sure we don't try to emit a sampler view surface state for multisampled depth sufaces with HiZ enabled, as the sampler can't HiZ for multisampled buffers and isl would assert. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-18 22:21:30 -07:00
Karol Herbst	d0ba326f23	nir/spirv: support physical pointers v2: add load_kernel_input Signed-off-by: Karol Herbst <kherbst@redhat.com> squash! nir/spirv: support physical pointers	2019-03-19 04:08:07 +00:00
Karol Herbst	c95afe56a8	nir/spirv: handle kernel function parameters the idea here is to generate an entry point stub function wrapping around the actual kernel function and turn all parameters into shader inputs with byte addressing instead of vec4. This gives us several advantages: 1. calling kernel functions doesn't differ from calling any other function 2. CL inputs match uniforms in most ways and we can just take advantage of most of nir_lower_io v2: move code into a seperate function v3: verify the entry point got a name fix minor typo v4: make vtn_emit_kernel_entry_point_wrapper take the old entry point as an arg Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-19 04:08:07 +00:00
Karol Herbst	0ccdf23a57	nir/lower_locals_to_regs: cast array index to 32 bit local memory is too small to require 64 bit pointers, so cast the array index to a 32 bit value to save up on 64 bit operations. Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-19 04:08:07 +00:00
Karol Herbst	44d32e62fb	glsl: add cl_size and cl_alignment Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-03-19 04:08:07 +00:00
Karol Herbst	659f333b3a	glsl: add packed for struct types We need this for OpenCL kernels because we have to apply C rules for alignment and padding inside structs and for this we also have to know if a struct is packed or not. v2: fix for kernel params Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-03-19 04:08:07 +00:00
Alyssa Rosenzweig	b98955e128	panfrost: Rewrite varying assembly There are two stages to varying assembly in the command stream: creating the varying buffers in the command stream, and creating the varying meta descriptors (also in the command stream) linked to the aforementioned buffers. The previous code for this was ad hoc and brittle, making some invalid assumptions causing unmaintainable workarounds to pile up across the driver (both compiler and command stream side). This patch completely rewrites the varying assembly code. There's a trivial performance penalty (we now memcpy the varying meta to the command stream on draw, rather than on compile). That said, the improvement in flexibility and clarity is well-worth it. The motivator for these changes was support for gl_PointCoord (and eventually point sprites for legacy GL), which was impossible to implement with the old varying assembly code. With the new refactor, it's super easy; support for gl_PointCoord is included with this patch. All in all, I'm quite happy with how this turned out. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-19 03:55:10 +00:00
Alyssa Rosenzweig	5e6d33a7b6	panfrost: Replay more varying buffers This is required for gl_PointCoord to show up on decodes. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-19 03:53:56 +00:00
Alyssa Rosenzweig	b517e36842	panfrost/decode: Respect primitive size pointers Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-19 03:53:48 +00:00
Alyssa Rosenzweig	4f89e4437c	panfrost: Disable PIPE_CAP_TGSI_TEXCOORD I don't know why this was on to begin with...? Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-19 03:52:43 +00:00
Alyssa Rosenzweig	7c02c4f114	panfrost: Fix primconvert check In addition to fixing actual primconvert bugs, this prevents an infinite loop when trying to draw POINTS. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-19 03:52:20 +00:00
Alyssa Rosenzweig	60d5b85261	panfrost: Workaround buffer overrun with mip level Mipmaps are still broken, but at least this way we don't crash on some apps using mipmaps. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-19 03:50:59 +00:00
Bas Nieuwenhuizen	a777c3d7cb	radv: Use correct image view comparison for fast clears. The if is actually returning true on success, enabling fast clears, so we need to have the test succeed when the iview dimensions are right. Fixes: `d5400a5ec2` "radv: provide a helper for comparing an image extents." Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-03-19 00:39:47 +01:00
Jason Ekstrand	493b3ada9b	anv,radv: Implement VK_KHR_surface_capability_protected Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-03-18 17:02:10 +00:00
Danylo Piliaiev	ecb98c6898	anv: Treat zero size XFB buffer as disabled Vulkan spec doesn't explicitly forbid zero size transform feedback buffers. Having zero size xfb caused SurfaceSize overflow and triggered assert in debug build. The only way to have zero size SO_BUFFER is to disable SO_BUFFER as stated in hardware spec. From SKL PRM, Vol 2a, "3DSTATE_SO_BUFFER": "If set, stream output to SO Buffer is enabled, if 3DSTATE_STREAMOUT::SO Function ENABLE is also enabled. If clear, the SO Buffer is considered "not bound" and effectively treated as a zero- length buffer for the purposes of SO output and overflow detection. If an enabled stream's Stream to Buffer Selects includes this buffer it is by definition an overflow condition. That stream will cause no writes to occur, and only SO_PRIM_STORAGE_NEEDED[<stream>] will increment." Fixes: `36ee2fd61c` "anv: Implement the basic form of VK_EXT_transform_feedback" Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-18 16:09:42 +00:00
Emil Velikov	f5b71b18ef	docs: update calendar, add news item and link release notes for 18.3.5 Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2019-03-18 16:02:27 +00:00
Emil Velikov	d4e26b36b2	docs: add sha256 checksums for 18.3.5 Signed-off-by: Emil Velikov <emil.velikov@collabora.com> (cherry picked from commit `ec770b43b9`)	2019-03-18 15:58:06 +00:00
Emil Velikov	cb9fe1e89b	docs: add release notes for 18.3.5 Signed-off-by: Emil Velikov <emil.velikov@collabora.com> (cherry picked from commit `022708cb40`)	2019-03-18 15:58:05 +00:00
Bas Nieuwenhuizen	d1aa37dfff	radv: Implement VK_EXT_host_query_reset. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Eric Engestrom <eric@engestrom.ch>	2019-03-18 14:48:41 +00:00
Jason Ekstrand	887041c763	anv: Implement VK_EXT_host_query_reset Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-18 14:48:41 +00:00
Bas Nieuwenhuizen	42ea88c673	vulkan: Update the XML and headers to 1.1.104 Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Eric Engestrom <eric@engestrom.ch> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-18 14:48:41 +00:00
Bas Nieuwenhuizen	eb5cda1c3e	vulkan/util: Handle enums that are in platform-specific headers. VkFullScreenExclusiveEXT comes from the win32 header. Mostly took the logic from the entrypoint scripts: 1) If there is an ext that has it in the requires and has a platform, take the guard for that platform. 2) Otherwise assume it is from the core headers. Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-03-18 14:48:41 +00:00
Lionel Landwerlin	5abe488d18	vulkan: factor out wsi dependencies In commit `530927d3f6` ("vulkan/util: generate instance/device dispatch tables") we started generating instance dispatch tables some of them (like wayland) require external headers. This commit moves the dependencies up one level so that they apply the whole vulkan directory. We use them for both the util & overlay layer. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `530927d3f6` ("vulkan/util: generate instance/device dispatch tables") Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-03-18 12:05:13 +00:00
Tapani Pälli	791198a54b	android: Build fixes for OMR1 Some of the header file locations are changed between Android versions (when VNDK is used), patch makes sure we get all the required headers. v2: cleanups, put SDK version checks in all places (Tapani) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Signed-off-by: Chen Lin Z <lin.z.chen@intel.com> Tested-by: Clayton Craft <clayton.a.craft@intel.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-18 11:53:59 +02:00
Bas Nieuwenhuizen	8ebc7dcb59	radv: Allow fast clears with concurrent queue mask for some layouts. For VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL and VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL we do not care about the queue mask because 1) using these is only allowed on the gfx queue 2) transitions for these are only allowed on the gfx queue. This enables some fast clears for Doom that uses VK_SHARING_MODE_CONCURRENT. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-03-18 09:10:55 +00:00
Kenneth Graunke	d5974aeeae	iris: Slightly better bounds on buffer sizes	2019-03-18 01:39:43 -07:00
Kenneth Graunke	836b47ca4e	iris: Don't flush the batch for unsynchronized mappings I messed this up when adding the GPU copy path.	2019-03-18 01:02:18 -07:00
Tapani Pälli	a1cd0040b6	isl: fix automake build when sse41 is not supported Fixes: `864cc419eb` "intel/isl: move tiled_memcpy static libs from i965 to isl" Cc: mesa-stable@lists.freedesktop.org Reported-by: Milav Soni <milav.soni@teqdiligent.com> Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-18 08:51:57 +02:00
Brian Paul	f7332fbc08	gallium/util: remove pipe_sampler_view_release() It's no longer used. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Reviewed-By: Jose Fonseca <jfonseca@vmware.com>	2019-03-17 20:07:22 -06:00
Brian Paul	c473090b09	i915g: remove calls to pipe_sampler_view_release() As with previous patches for svga, llvmpipe, swr drivers. Compile tested only. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Reviewed-By: Jose Fonseca <jfonseca@vmware.com>	2019-03-17 20:07:22 -06:00
Brian Paul	768b770a86	swr: remove call to pipe_sampler_view_release() As with svga, llvmpipe drivers in previous patches. Compile tested only. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Reviewed-By: Jose Fonseca <jfonseca@vmware.com>	2019-03-17 20:07:22 -06:00
Brian Paul	2ff2a58774	llvmpipe: stop using pipe_sampler_view_release() This was used to avoid freeing a sampler view which was created by a context that was already deleted. But the state tracker does not allow that. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Reviewed-By: Jose Fonseca <jfonseca@vmware.com>	2019-03-17 20:07:22 -06:00
Brian Paul	a7afab7952	svga: stop using pipe_sampler_view_release() This function was used in the past to avoid deleting a sampler view for a context that no longer exists. But the Mesa state tracker ensures that cannot happen. Use the standard refcounting function instead. Also, remove the code which checked for context mis-matches in svga_sampler_view_destroy(). It's no longer needed since implementing the zombie sampler view code in the state tracker. Testing Done: google chrome, variety of GL demos/games Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Reviewed-By: Jose Fonseca <jfonseca@vmware.com>	2019-03-17 20:07:22 -06:00
Brian Paul	20de0359b5	st/mesa: stop using pipe_sampler_view_release() In all instances here we can replace pipe_sampler_view_release(pipe, view) with pipe_sampler_view_reference(view, NULL) because the views in question are private to the state tracker context. So there's no danger of freeing a sampler view with the wrong context. Testing done: google chrome, misc GL demos, games Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Reviewed-By: Jose Fonseca <jfonseca@vmware.com>	2019-03-17 20:07:22 -06:00
Brian Paul	41c4c49463	st/mesa: implement "zombie" shaders list As with the preceding patch for sampler views, this patch does basically the same thing but for shaders. However, reference counting isn't needed here (instead of calling cso_delete_XXX_shader() we call st_save_zombie_shader(). The Redway3D Watch is one app/demo that needs this change. Otherwise, the vmwgfx driver generates an error about trying to destroy a shader ID that doesn't exist in the context. Note that if PIPE_CAP_SHAREABLE_SHADERS = TRUE, then we can use/delete any shader with any context and this mechanism is not used. Tested with: google-chrome, google earth, Redway3D Watch/Turbine demos and a few Linux games. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Reviewed-By: Jose Fonseca <jfonseca@vmware.com>	2019-03-17 20:07:22 -06:00
Brian Paul	593e36f956	st/mesa: implement "zombie" sampler views (v2) When st_texture_release_all_sampler_views() is called the texture may have sampler views belonging to several contexts. If we unreference a sampler view and its refcount hits zero, we need to be sure to destroy the sampler view with the same context which created it. This was not the case with the previous code which used pipe_sampler_view_release(). That function could end up freeing a sampler view with a context different than the one which created it. In the case of the VMware svga driver, we detected this but leaked the sampler view. This led to a crash with google-chrome when the kernel module had too many sampler views. VMware bug 2274734. Alternately, if we try to delete a sampler view with the correct context, we may be "reaching into" a context which is active on another thread. That's not safe. To fix these issues this patch adds a per-context list of "zombie" sampler views. These are views which are to be freed at some point when the context is active. Other contexts may safely add sampler views to the zombie list at any time (it's mutex protected). This avoids the context/view ownership mix-ups we had before. Tested with: google-chrome, google earth, Redway3D Watch/Turbine demos a few Linux games. If anyone can recomment some other multi-threaded, multi-context GL apps to test, please let me know. v2: avoid potential race issue by always adding sampler views to the zombie list if the view's context doesn't match the current context, ignoring the refcount. Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Reviewed-By: Jose Fonseca <jfonseca@vmware.com>	2019-03-17 20:07:22 -06:00
Brian Paul	e547a1ccb5	docs: link to the meson_options.txt file gitlab.freedesktop.org	2019-03-17 20:07:22 -06:00
Brian Paul	16fb82d189	docs: separate information for compiler selection and compiler options Split up the "Environment Variables" section into "Compiler Options" and "Compiler Specification". I think this makes the information easier to find and understand.	2019-03-17 20:07:22 -06:00
Mauro Rossi	bfba0ecc1c	android: nouveau: add support for nir Add the necessary build rules for android, to avoid building errors. Fixes: `f014ae3` ("nouveau: add support for nir") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-03-18 00:29:39 +01:00
Timothy Arceri	010570c8e3	ac/nir_to_llvm: add assert to emit_bcsel() nir to llvm assumes we have already split vectors to scalars via nir_lower_alu_to_scalar(). Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-18 09:39:04 +11:00
Timothy Arceri	de8ec6e117	radeonsi/nir: call some more var optimisation passes shader-db results (VEGA64): Totals from affected shaders: SGPRS: 5328912 -> 5329680 (0.01 %) VGPRS: 2969308 -> 2969164 (-0.00 %) Spilled SGPRs: 37921 -> 37917 (-0.01 %) Spilled VGPRs: 32882 -> 29024 (-11.73 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 1400 -> 1200 (-14.29 %) dwords per thread Code Size: 121126000 -> 121282784 (0.13 %) bytes LDS: 1501 -> 1501 (0.00 %) blocks Max Waves: 933188 -> 933229 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-18 09:29:40 +11:00
Tobias Klausmann	29179f58c6	vulkan/util: meson build - add wayland client include Without this the build breaks with: In file included from ../src/vulkan/util/vk_util.h:32, from ../src/vulkan/util/vk_util.c:28: ../include/vulkan/vulkan.h:51:10: fatal error: wayland-client.h: No such file or directory #include <wayland-client.h> ^~~~~~~~~~~~~~~~~~ compilation terminated. The above misses the include directory for wayland: -I/usr/include/wayland Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-17 17:55:29 +00:00
Karol Herbst	58376c6b9b	nv50ir/nir: move immediates before use Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 17:14:54 +01:00
Karol Herbst	4ded1cdef9	nv50/ir/nir: handle user clip planes for each emitted vertex v9: convert to C++ style comments handle for tess eval shaders as well Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 17:14:21 +01:00
Karol Herbst	b866012f7b	nv50/ir/nir: implement intrinsic shader_clock v9: mark as fixed Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 10:33:28 +01:00
Karol Herbst	c00d45cb45	nv50/ir/nir: implement load_per_vertex_output v4: use smarter getIndirect helper use new getSlotAddress helper v5: use loadFrom helper v8: don't require C++11 features Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 10:33:28 +01:00
Karol Herbst	9c44f4e043	nv50/ir/nir: add memory barriers v5: add more barrier intrinsics Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 10:33:28 +01:00
Karol Herbst	753ae68ca0	nv50/ir/nir: implement images v3: fix compiler warnings v4: use loadFrom helper v5: fix signed min/max v6: set tex mask add support for indirect image access set cache mode v7: make compatible with `884d27bcf6` rework the whole deref thing to prepare for bindless v8: port to deref instructions don't require C++11 features v9: implement MS images rebase on master (image modifiers) fix regressions due to variable src compnents replace '(*it).' with 'it->' convert to C++ style comments Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 10:33:28 +01:00
Karol Herbst	2cdcb364f0	nv50/ir/nir: implement ssbo intrinsics v4: use loadFrom helper v5: support indirect buffer access v8: don't require C++11 features Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 10:33:28 +01:00
Karol Herbst	8dca02955a	nv50/ir/nir: implement nir_intrinsic_load_ubo v4: use loadFrom helper v8: don't require C++11 features Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 10:33:28 +01:00
Karol Herbst	1bef2b7bf5	nv50/ir/nir: implement geometry shader nir_intrinsics v4: use smarter getIndirect helper use new getSlotAddress helper use loadFrom helper v8: don't require C++11 features Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 10:33:28 +01:00
Karol Herbst	d2de40f07e	nv50/ir/nir: implement variable indexing We store those arrays in local memory and reserve some space for each of the arrays. With NIR we could store those arrays packed, but we don't do that yet as it causes MemoryOpt to generate unaligned memory accesses. v3: use fixed size vec4 arrays until we fix MemoryOpt v4: fix for 64 bit types v5: use loadFrom helper v8: don't require C++11 features v9: convert to C++ style comments Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 10:33:28 +01:00
Karol Herbst	fa361a3c1e	nv50/ir/nir: implement vote and ballot v2: add vote_eq support use the new subop intrinsic helper add ballot v3: add read_(first_)invocation v8: handle vectorized intrinsics don't require C++11 features v9: lower_subgroups to 32 bit (produces less instructions) use getSSA and getScratch instead of new_LValue Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 10:33:28 +01:00
Karol Herbst	4dec7f81e0	nv50/ir/nir: add skeleton getOperation for intrinsics v7: don't assert in default case for getSubOp Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>	2019-03-17 10:33:28 +01:00
Karol Herbst	bb032d8b62	nv50/ir/nir: implement nir_instr_type_tex a lot of those fields are not valid for a lot of tex ops. Not quite sure if it's worth the effort to check for those or just keep it like that. It seems to kind of work. v2: reworked offset handling add tex support with indirect R/S arguments handle GLSL_SAMPLER_DIM_EXTERNAL drop reference in convert(glsl_sampler_dim&, bool, bool) fix tg4 component selection v5: fill up coords args with scratch values if coords provided is less than TexTarget.getArgCount() v7: prepare for bindless_texture support v8: don't require C++11 features v9: convert to C++ style comments fix txf with a uniform constant 0 lod Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 10:33:28 +01:00
Karol Herbst	83cb790bf0	nv50/ir/nir: implement nir_ssa_undef_instr v2: use mkOp v8: don't require C++11 features Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>	2019-03-17 10:33:28 +01:00
Karol Herbst	ad61f7e20d	nv50/ir/nir: implement loading system values v2: support more sys values fixed a bug where for multi component reads all values ended up in x v3: add load_patch_vertices_in v4: add subgroup stuff v5: add helper invocation v6: fix loading 64 bit system values v8: don't require C++11 features v9: convert to C++ style comments Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 10:33:28 +01:00
Karol Herbst	b05494c216	nv50/ir/nir: implement intrinsic_discard(_if) v9: use getSSA instead of new_LValue Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 10:33:28 +01:00
Karol Herbst	9e68b7bef2	nv50/ir/nir: implement load_(interpolated_)input/output v3: and load_output v4: use smarter getIndirect helper use new getSlotAddress helper v5: don't use const_offset directly fix for indirects v6: add support for interpolateAt v7: fix compiler warnings add load_barycentric_sample handle load_output for fragment shaders v8: set info->prop.fp.readsSampleLocations for at_sample interpolation don't require C++11 features v9: convert to C++ style comments Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 10:33:28 +01:00
Karol Herbst	6bc32bf653	nv50/ir/nir: implement nir_intrinsic_store_(per_vertex_)output v3: add workaround for RA issues indirects have to be multiplied by 0x10 fix indirect access v4: use smarter getIndirect helper use storeTo helper v5: don't use const_offset directly v8: don't require C++11 features v9: convert to C++ style comments handle clip planes correctly Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 10:33:28 +01:00
Karol Herbst	8c257a0201	nv50/ir/nir: implement nir_intrinsic_load_uniform v2: use new getIndirect helper fixes symbols for 64 bit types v4: use smarter getIndirect helper simplify address calculation use loadFrom helper v8: don't require C++11 features Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 10:33:28 +01:00
Karol Herbst	6513c675ad	nv50/ir/nir: implement nir_alu_instr handling v2: user bitfield_insert instead of bfi rework switch helper macros remove some lowering code (LoweringHelper is now used for this) v3: add pack_half_2x16_split add unpack_half_2x16_split_x/y v5: replace first argument with nullptr in loadImm calls prefer getSSA over getScratch v8: fix setting precise modifier for first instruction inside a block add guard in case no instruction gets inserted into an empty block don't require C++11 features v9: use CC_NE for integer compares convert to C++ style comments fix b2f for doubles remove macros around nir ops to make it easier to grep them add handling for fpow Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 10:33:28 +01:00
Karol Herbst	c69b814728	nv50/ir/nir: add skeleton for nir_intrinsic_instr Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>	2019-03-17 10:33:28 +01:00
Karol Herbst	8379dc300d	nv50/ir/nir: implement nir_load_const_instr v8: fix loading 8/16 bit constants Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>	2019-03-17 10:33:28 +01:00
Karol Herbst	88c909e9a7	nv50/ir/nir: parse NIR shader info v2: parse a few more fields v3: add special handling for GL_ISOLINES v8: set info->prop.fp.readsSampleLocations don't require C++11 features v9: replace '(*it).' with 'it->' convert to C++ style comments Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 10:33:28 +01:00
Karol Herbst	e8d9be40cb	nv50/ir/nir: add loadFrom and storeTo helpler v8: don't require C++11 features Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 10:33:28 +01:00
Karol Herbst	39929a8164	nv50/ir/nir: run assignSlots v2: add support for geometry shaders set idx add some missing mappings fix for 64bit inputs/outputs fix up some FP color output index messup parse centroid flag v3: fix arrays in outputs as well fix input/ouput size calculation for tessellation shaders v4: add getSlotAddress helper fix for 64 bit typed inputs v5: change getSlotAddress interface for easier use fix sample inputs fix slot counting for mat v7: fix driver_location of images v8: don't require C++11 features v9: convert to C++ style comments support VERT_ATTRIB_POINT_SIZE add more error checking to slots Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 10:33:28 +01:00
Karol Herbst	ccc4de0bdd	nv50/ir/nir: add nir type helper functions v4: treat imul as unsigned v5: remove pointless !! v7: inot is unsigned as well v8: don't require C++11 features v9: convert to C++ style comments improve formatting print error in all cases where codegen doesn't support a given type Signed-off-by: Karol Herbst <kherbst@redhat.com> Acked-by: Pierre Moreau <pierre.morrow@free.fr>	2019-03-17 10:33:28 +01:00
Karol Herbst	7481abcd0c	nv50/ir/nir: track defs and provide easy access functions v2: add helper function for indirects v4: add new getIndirect overload for easier use v5: use getSSA for ssa values we can just create the values for unassigned registers in getSrc v6: always create at least 32 bit values v8: don't require C++11 features v9: include unordered_map on supported stdlibs replace '(*it).' with 'it->' Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 10:33:28 +01:00
Karol Herbst	9298664a5f	nv50/ir/nir: run some passes to make the conversion easier v2: add constant_folding v6: print non final NIR only for verbose debugging v8: add passes we will need for OpenCL compute shaders v9: move type_size into anonymous namespace convert to C++ style comments lower bools to int32 Signed-off-by: Karol Herbst <kherbst@redhat.com> Acked-by: Pierre Moreau <pierre.morrow@free.fr>	2019-03-17 10:33:28 +01:00
Karol Herbst	78c5336ca9	nouveau: fix nir and TGSI shader cache collision v9: rename variable to driver_flags use constants for shader cache flags Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>	2019-03-17 10:33:28 +01:00
Karol Herbst	f014ae3c7c	nouveau: add support for nir not all those nir options are actually required, it just made the work a little easier. v2: fix asserts parse compute shaders don't lower bitfield_insert v3: fix memory leak v4: don't lower fmod32 v5: set lower_all_io_to_temps to false fix memory leak because we take over ownership of the nir shader merge: use the lowering helper v6: include TGSI debug header for proper assert call add nv50 support v7: fix Automake build v8: free shader only for the set shader type v9: check for IR type inside get_compiler_options squash "nouveau: add env var to make nir default" fix memory leak when creating compute shaders use debug_get_bool_option as it is available in non debug builds return failure if unsupported IR is encountered don't lower fpow in nir lower int 64 divmod inside nir to prevent crashes Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>	2019-03-17 10:33:28 +01:00
Karol Herbst	a211c92c4b	nv50/ir: add lowering helper if we start supporting multiple input IRs we might want to move lowering code into a common place and keep the initial translation simplier. This will also allows us to react on ISA changes more easily. v5: also handle SAT v6: rename type variables fixed lowering of NEG add lowering of NOT v8: don't require C++11 features Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>	2019-03-17 10:33:28 +01:00
Karol Herbst	a0393010c4	nv50/ir: move common converter code in base class v2: remove TGSI related bits Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>	2019-03-17 10:33:28 +01:00
Karol Herbst	bb50cb66f0	nvc0: print the shader type when dumping headers this makes debugging the shader header a little easier Acked-by: Pierre Moreau <pierre.morrow@free.fr> Signed-off-by: Karol Herbst <kherbst@redhat.com>	2019-03-17 10:33:27 +01:00
Bas Nieuwenhuizen	213de3ea99	radeonsi: Remove implicit const cast. Fixes: `b9e02fe138` "gallium: add pipe_grid_info::last_block" Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-03-17 00:07:38 +01:00
Bas Nieuwenhuizen	158d45db0c	gitlab-ci: Build turnip. No autotools build to care about. The half baked turnips param is kind of ugly, but felt like a waste defining more variables for it now. Reviewed-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-03-16 14:38:51 +00:00
Bas Nieuwenhuizen	42ed6d9789	turnip: Deconflict vk_format_table regeneration Avoids src/freedreno/vulkan/meson.build:42:0: ERROR: Tried to create target "vk_format_table.c", but a target of that name already exists. when building both radv and turnip. Fixes: `26380b3a9f` "turnip: Add driver skeleton (v2)" Reviewed-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-03-16 14:38:51 +00:00
Bas Nieuwenhuizen	e1161d2ea7	turnip: Fix GCC compiles. Apparently GCC does not consider static const variables to be integer constants, and hence the array size and the static assert result in compile failures. Fixes: `4b9f967cd1` "turnip: add a more complete format table" Reviewed-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-03-16 14:38:51 +00:00
Jason Ekstrand	d3386e73c5	intel/nir: Lower array-deref-of-vector UBO and SSBO loads This fixes a serious performance issue with DXVK: https://github.com/doitsujin/dxvk/issues/937 This was caused by a recent change that to improve performance on RADV which back-fired on ANV and killed performance for some apps: `e5a06d3f4a` Throwing in this bit of lowering lets us come along and CSE those UBO loads (or copy-prop for SSBO load) and get one load where we previously would have gotten several. VkPipeline-db results on Kaby Lake: total instructions in shared programs: 5115361 -> 5073185 (-0.82%) instructions in affected programs: 1754333 -> 1712157 (-2.40%) helped: 5331 HURT: 63 total cycles in shared programs: 2544501169 -> 2481144545 (-2.49%) cycles in affected programs: 2531058653 -> 2467702029 (-2.50%) helped: 9202 HURT: 4323 total loops in shared programs: 3340 -> 3331 (-0.27%) loops in affected programs: 9 -> 0 helped: 9 HURT: 0 total spills in shared programs: 3246 -> 3053 (-5.95%) spills in affected programs: 384 -> 191 (-50.26%) helped: 10 HURT: 5 total fills in shared programs: 4626 -> 4452 (-3.76%) fills in affected programs: 439 -> 265 (-39.64%) helped: 10 HURT: 5 All of the shaders with hurt spilling were in Rise of the Tomb Raider which also had shaders solidly helped in the spilling department. Not shown in those results (because I've not had success dumping the shaders) is Witcher 3 where this reduces spilling and improves over-all perf by around 20-25%. There were no shader-db changes. Apparently, this just isn't a pattern that happens in OpenGL. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Cc: "19.0" mesa-stable@lists.freedesktop.org	2019-03-15 23:10:27 -05:00
Jason Ekstrand	35b8f6f40b	nir: Add a new pass to lower array dereferences on vectors This pass was originally written for lowering TCS output reads and writes but it is also applicable just about anything including UBOs, SSBOs, and shared variables. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 23:10:27 -05:00
Jason Ekstrand	fe9a6c0f14	nir/builder: Add a vector extract helper This one's a tiny bit better than what we had in spirv_to_nir because it emits a binary tree rather than a linear walk. It also doesn't leave around unneeded bcsel instructions for a constant index and returns an undef for constant OOB access. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 23:10:26 -05:00
Gert Wollny	9bb63e9a7c	softpipe: Enable PIPE_CAP_MIXED_COLORBUFFER_FORMATS It seems softpipe actually supports this. This change enables the following piglits as passing without regressions in the gpu test set: gl-3.1-mixed-int-float-fbo gl-3.1-mixed-int-float-fbo int_second fbo-blending-format-quirks Changes for deqp: dEQP-GLES2.functional.fbo.completeness.attachment_combinations.rbo_tex_none_none QualityWarning -> Pass dEQP-GLES2.functional.fbo.completeness.attachment_combinations.rbo_tex_none_rbo QualityWarning -> Pass dEQP-GLES2.functional.fbo.completeness.attachment_combinations.rbo_tex_none_tex QualityWarning -> Pass dEQP-GLES2.functional.fbo.completeness.attachment_combinations.rbo_tex_rbo_none QualityWarning -> Pass dEQP-GLES2.functional.fbo.completeness.attachment_combinations.rbo_tex_tex_none QualityWarning -> Pass dEQP-GLES2.functional.fbo.completeness.attachment_combinations.tex_rbo_none_none QualityWarning -> Pass dEQP-GLES2.functional.fbo.completeness.attachment_combinations.tex_rbo_none_rbo QualityWarning -> Pass dEQP-GLES2.functional.fbo.completeness.attachment_combinations.tex_rbo_none_tex QualityWarning -> Pass dEQP-GLES2.functional.fbo.completeness.attachment_combinations.tex_rbo_rbo_none QualityWarning -> Pass dEQP-GLES2.functional.fbo.completeness.attachment_combinations.tex_rbo_tex_none QualityWarning -> Pass dEQP-GLES3.functional.fbo.completeness.samples.rbo0_rbo0_tex Fail -> Pass dEQP-GLES3.functional.fbo.completeness.samples.rbo0_tex_none Fail -> Pass dEQP-GLES3.functional.fbo.completeness.samples.rbo1_rbo1_rbo1 Fail -> Pass dEQP-GLES3.functional.fragment_out.random.* NotSupported -> Pass dEQP-GLES31.functional.shaders.builtin_functions.common.frexp._fragment Fail -> Pass dEQP-GLES31.functional.shaders.builtin_functions.common.frexp._vertex Fail -> Pass dEQP-GLES31.functional.shaders.builtin_functions.precision.frexp._fragment. Fail -> Pass dEQP-GLES31.functional.shaders.builtin_functions.precision.frexp._vertex. Fail -> Pass Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-15 19:04:05 +01:00
Rob Clark	ca11f9263e	freedreno/ir3/cp: fix ldib bug Something that we didn't hit earlier because of the extra shr.b Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-03-15 10:52:11 -07:00
James Zhu	abfd572bd2	gallium/auxiliary/vl: Change weave compute shader implementation Use 2D_ARRARY instead of RECT to fetch texels for weave compute shader. Problem 2,3: Fixed interpolation issue with weave de-interlace Fixes: `9364d66cb7` (Add video compositor compute shader render) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109646 Signed-off-by: James Zhu <James.Zhu@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Tested-by: Bruno Milreu <bmilreu@gmail.com>	2019-03-15 11:53:15 -04:00
James Zhu	a8ee07d83e	gallium/auxiliary/vl: Change grid setting Using draw area for grid setting instead of destination buffer size. Signed-off-by: James Zhu <James.Zhu@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Tested-by: Bruno Milreu <bmilreu@gmail.com>	2019-03-15 11:53:15 -04:00
James Zhu	998dca4dbb	gallium/auxiliary/vl: Increase shader_params size Increase shader_params size to pass sampler data to compute shader during weave de-interlace. Signed-off-by: James Zhu <James.Zhu@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Tested-by: Bruno Milreu <bmilreu@gmail.com>	2019-03-15 11:53:15 -04:00
Marek Olšák	b276e8358a	omx: add a compute path in enc_LoadImage_common Acked-by: Leo Liu <leo.liu@amd.com>	2019-03-15 11:53:08 -04:00
Marek Olšák	323e7be91c	omx: clean up enc_LoadImage_common - add *pipe - add documentation Acked-by: Leo Liu <leo.liu@amd.com>	2019-03-15 11:53:08 -04:00
Marek Olšák	b9e02fe138	gallium: add pipe_grid_info::last_block The OpenMAX state tracker will use this. RadeonSI is adapted to use pipe_grid_info::last_block instead of its internal state. Acked-by: Leo Liu <leo.liu@amd.com>	2019-03-15 11:53:08 -04:00
Alejandro Piñeiro	34b3b92bbe	nir/xfb: move varyings info out of nir_xfb_info When varyings was added we moved to use to dynamycally allocated pointers, instead of allocating just one block for everything. That breaks some assumptions of some vulkan drivers (like anv), that make serialization and copying easier. And at the same time, varyings are not needed for vulkan. So this commit moves them out. Although it seems a little an overkill, fixing the anv side would require a similar, or more, changes, so in the end it is about to decide where do we want to put our effort. v2: (from Jason review) * Don't use a temp variable on the _create methods, just return result of rzalloc_size * Wrap some lines too long. Fixes: `cf0b2ad486` ("nir/xfb: adding varyings on nir_xfb_info and gather_info") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-15 11:59:32 +01:00
Samuel Pitoiset	d5befdbe4a	radv: always load 3 channels for formats that need to be shuffled This fixes a rendering issue with Hellblade and DXVK. Fixes: `a66b186beb` ("radv: use typed buffer loads for vertex input fetches") Reported-by: Philip Rebohle <philip.rebohle@tu-dortmund.de> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-15 11:35:52 +01:00
Mathias Fröhlich	ebc15ecde5	mesa: Add assert to _mesa_primitive_restart_index. Make sure the inde_size parameter is meant to be in bytes. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-03-15 06:06:42 +01:00
Mathias Fröhlich	d66faa54b2	vbo: Fix GL_PRIMITIVE_RESTART_FIXED_INDEX in display list compiles. The maximum value primitive restart index is different for each index data type. Use the appropriate fixed restart index value. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-03-15 06:06:42 +01:00
Mathias Fröhlich	a503f0562a	vbo: Fix basevertex handling in display list compiles. The standard requires that the primitive restart comparison happens before the basevertex value is added. Do this now, drop a reference to the standard why this happens at this place. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-03-15 06:06:42 +01:00
Mathias Fröhlich	94b64eb462	mesa: Use mapping tools in debug prints. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-03-15 06:06:42 +01:00
Mathias Fröhlich	a8183c1334	mesa: Remove _ae_{,un}map_vbos and dependencies. Since mapping and unmapping the buffer objects in a VAO is handled directly from the VAO, this part of the _NEW_ARRAY state is no longer used. So remove this part of array element state. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-03-15 06:06:42 +01:00
Mathias Fröhlich	b89ae55a70	mesa: Replace _ae_{,un}map_vbos with _mesa_vao_{,un}map_arrays Due to the use of bitmaps, the _mesa_vao_{,un}map_arrays functions should provide comparable runtime efficienty to the currently used _ae_{,un}map_vbos functions. So use this functions and enable further cleanup. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-03-15 06:06:42 +01:00
Mathias Fröhlich	b43fae364f	mesa: Use _mesa_array_element in dlist save. Make use of the newly factored out _mesa_array_element function in display list compilation. For now that duplicates out the primitive restart logic. But that turns out to need a fix in display list handling anyhow. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-03-15 06:06:42 +01:00
Mathias Fröhlich	80e319485a	mesa: Factor out _mesa_array_element. The factored out function handles emitting the vertex attributes at the given index. The now public accessible function gets used in the following patches. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-03-15 06:06:42 +01:00
Mathias Fröhlich	85fd380878	mesa: Implement helper functions to map and unmap a VAO. Provide a set of functions that maps or unmaps all VBOs held in a VAO. The functions will be used in the following patches. v2: Update comments. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-03-15 06:06:42 +01:00
Jason Ekstrand	efa4fc0ebd	st/mesa: Let NIR lower UBO and SSBO access when we have it Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	be2990d8fb	i965: Stop setting LowerBuferInterfaceBlocks Instead, we do UBO and SSBO deref lowering in NIR after we've given it a chance to optimize SSBO access: Shader-db results on Kaby Lake: total instructions in shared programs: 15235775 -> 15235484 (<.01%) instructions in affected programs: 14992 -> 14701 (-1.94%) helped: 19 HURT: 20 total cycles in shared programs: 339220331 -> 339027307 (-0.06%) cycles in affected programs: 79831981 -> 79638957 (-0.24%) helped: 540 HURT: 602 total loops in shared programs: 4402 -> 4348 (-1.23%) loops in affected programs: 186 -> 132 (-29.03%) helped: 27 HURT: 0 total spills in shared programs: 23261 -> 23234 (-0.12%) spills in affected programs: 38 -> 11 (-71.05%) helped: 1 HURT: 0 total fills in shared programs: 31442 -> 31371 (-0.23%) fills in affected programs: 98 -> 27 (-72.45%) helped: 1 HURT: 0 LOST: 12 GAINED: 12 Most of the help and hurt in instruction counts was just churn caused by re-ordering of optimizations and the fact that the NIR deref lowering code is emitting slightly different instructions. Nothing was hurt by more than three instructions and most things weren't helped by more than four. The primary exception to this is one Car Chase shader: shaders/non-free/gfxbench4/carchase/341.shader_test CS SIMD32: 1144 -> 821 (-28.23%) There is also one compute shader in Manhattan 3.1 and a fragment shader in the UE4 Shooter Game demo that now get a loop partially unrolled. Those showed up in the results as hurt instructions but were manually removed to get the results above. The lost/gained was a dozen Car Chase shaders that went from SIMD8 to SIMD16 thanks to improved register pressure: shaders/non-free/gfxbench4/carchase/366.shader_test CS shaders/non-free/gfxbench4/carchase/368.shader_test CS shaders/non-free/gfxbench4/carchase/370.shader_test CS shaders/non-free/gfxbench4/carchase/372.shader_test CS shaders/non-free/gfxbench4/carchase/376.shader_test CS shaders/non-free/gfxbench4/carchase/378.shader_test CS shaders/non-free/gfxbench4/carchase/380.shader_test CS shaders/non-free/gfxbench4/carchase/382.shader_test CS shaders/non-free/gfxbench4/carchase/384.shader_test CS shaders/non-free/gfxbench4/carchase/388.shader_test CS shaders/non-free/gfxbench4/carchase/4.shader_test CS shaders/non-free/gfxbench4/carchase/6.shader_test CS Given how much it appeared to be improved, I ran Car Chase on my laptop. Unfortunately, I wasn't able to see any measurable improvement. It might be helped by 1-2% but it's in the noise. It does render correctly as far as I can tell so the improvement is legitimate. All of the loops that got delete were in dolphin uber shaders. I've had no opportunity to test them for correctness or performance. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	810dde2a6b	glsl/nir: Add a pass to lower UBO and SSBO access Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	77e5ec394e	glsl/nir: Handle unlowered SSBO atomic and array_length intrinsics We didn't have any of these before because all NIR consumers always called lower_ubo_references. Soon, we want to pass the derefs straight through to NIR so we need to handle these intrinsics directly. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	76ba225184	glsl/nir: Set explicit types on UBO/SSBO variables We want to be able to use variables and derefs for UBO/SSBO access in NIR. In order to do this, the rest of NIR needs to know the type layout information. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	8f3ab8aa78	glsl: Don't lower vector derefs for SSBOs, UBOs, and shared All of these are backed by some sort of memory so if you have multiple threads writing to different components of the same vector at the same time, the load-vec-store pattern that GLSL IR emits won't work. This shouldn't affect any drivers today as they all call GLSL IR lowering which lowers access to these variables to index+offset intrinsics before we get to this point. However, NIR will start handling the derefs itself and won't want the lowering. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	3c11fc7654	nir/lower_io: Add a new buffer_array_length intrinsic and lowering Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	c8d42c8cf6	nir: Rename nir_address_format_vk_index_offset to not be vk It's just a 32-bit index and offset. We're going to want to use it in GL as well so stop talking about Vulkan. Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	60af3a93e9	nir/deref: Consider COHERENT decorated var derefs as aliasing If we get to two deref_var paths with different variables, we usually know they don't alias. However, if both of the paths are marked coherent, we don't have to worry about it. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	8b073832ff	compiler/types: Add helpers to get explicit types for standard layouts We also need to modify the current size/align helpers to not blow up when they encounter an explicitly laid out type. Previously we considered using the size/align helpers mutually exclusive with standard layouts but now we just assert that they match. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	5b2b144566	compiler/types: Add a C wrapper to get full struct field data Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	ef4ca44780	compiler/types: Add a new is_interface C wrapper Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	b315f6f82b	nir/validate: Allow 32-bit boolean load/store intrinsics With UBOs and SSBOs we have boolean types but they're actually 32-bit values. Make the validator a little less strict so that we can do a 32-bit load/store on boolean types. We're about to add a lowering pass called gl_nir_lower_buffers which will lower boolean load/store operations to 32-bit and insert i2b and b2i instructions to convert to/from 1-bit booleans. We want that to be legal. Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	5d26f2d3d5	nir/validate: Only require bare types to match for copy_deref If we want to be able to use copy_deref instructions on explicitly laid out types, we have to be a little more flexible about what types we allow. Instead, of requiring the types to exactly match, only require the bare types to match. Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-15 01:02:19 +00:00
Jason Ekstrand	2b76de9b5d	nir/algebraic: Add a couple optimizations for iabs and ishr Shader-db results on Kaby Lake: total instructions in shared programs: 15225213 -> 15222365 (-0.02%) instructions in affected programs: 43524 -> 40676 (-6.54%) helped: 203 HURT: 0 Lots of shaders in Shadow Warrior had this pattern along with Deus Ex, Civ, Shadow of Mordor, and several others. Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-03-15 01:02:19 +00:00
Eric Anholt	0803bef006	mesa/st: Fix leaks of TGSI tokens in VP variants. Starting a glxgears and closing it, I was seeing a lot of leaked TGSI for the fixed function VPs. v2: drop unused delete_ir() arg. Fixes: `3b4929ec6e` ("st/mesa: Copy VP TGSI tokens if they exist, even for NIR shaders.") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-14 16:18:59 -07:00
Eric Anholt	e0806c1ea0	mesa/st: Make sure that prog_to_nir NIR gets freed. GLSL NIR gets freed on relink by _mesa_delete_program(), but for ARB programs we need to free the old NIR when PSN is used to set up new NIR in the same gl_program. Additionally, set the base .nir field so that it will get freed by _mesa_delete_program(). Fixes: `3d7611e9a6` ("st/nir: use NIR for asm programs") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-14 16:18:38 -07:00
Alyssa Rosenzweig	1ea42894c7	panfrost/midgard: Implement fpow We have a native op for this, which was just found in a disassembly -- so instead of lowering, use it! Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-14 22:50:24 +00:00
Alyssa Rosenzweig	2eb65c2173	panfrost: Compute viewport state on the fly Previously, we were caching this incorrectly; there's no real reason to given how variable it is (sensitive to changes in viewport, framebuffer dimensions, and scissors) and how cheap it is to recompute. So, just do it on the fly each draw. Fixes glmark-es2 -bshadow and -brefract. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-14 22:47:12 +00:00
Alyssa Rosenzweig	c6a725888f	panfrost; Disable AFBC for depth buffers For inexplicable reasons, the depth buffer is faster if kept as linear, whereas the colour buffers are faster if AFBC. Given both code paths are available, we'll choose the faster one of each (which also helps with testing coverage). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-14 22:47:12 +00:00
Alyssa Rosenzweig	54e45d1d73	panfrost: Allocate extra data for depth buffer It's not clear why the hardware "spills" a little bit, but if we don't do this, we get MMU faults with linear depth buffers. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-14 22:47:12 +00:00
Alyssa Rosenzweig	79e474fa46	panfrost: Comment spelling fix Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-14 22:47:12 +00:00
Alyssa Rosenzweig	8c26890ac2	panfrost/mfbd: Respect per-job depth write flag While a depth buffer may be supplied, it only needs to be written to if the depth writemask is set for any draw AND if the depth buffer is not immediately invalidated (as is the case for scanout). This refactors panfrost_job to provide a depth write requirement, which is now implemented for MFBD depth buffers. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-14 22:47:11 +00:00
Alyssa Rosenzweig	9bf6024c6b	panfrost/mfbd: Implement linear depth buffers This removes a clunky hack where the depth buffer was enabled during the clear, instead of during depth buffer linking. That said, this does not yet support writeback like AFBC depth buffers. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-14 22:47:11 +00:00
Alyssa Rosenzweig	23e0135723	panfrost: Minor comment cleanup (version detection) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-14 22:47:11 +00:00
Alyssa Rosenzweig	c119c282af	panfrost: Remove staging MFBD Same idea as the previous commit, but for the MFBD this time instead of the SFBD. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-14 22:47:11 +00:00
Alyssa Rosenzweig	d47f090738	panfrost: Remove staging SFBD for pan_context The fragment framebuffer descriptor should not be a context entry; rather, it should be constructed only at fragment time to keep analysis tractable. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-14 22:47:11 +00:00
Alyssa Rosenzweig	9dd84db7a5	panfrost: Break out fragment to SFBD/MFBD files This substantially cleans up the corresponding logic at the expense of a bit of code duplication; nevertheless, it's a net win since otherwise incompatible hardware code is mixed confusingly. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-14 22:47:11 +00:00
Alyssa Rosenzweig	4d1a356a57	freedreno: Use shared drm_find_modifier util Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-03-14 22:43:08 +00:00
Alyssa Rosenzweig	dd12142e34	vc4: Use shared drm_find_modifier util Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-14 22:43:06 +00:00
Alyssa Rosenzweig	cca270bb03	v3d: Use shared drm_find_modifier util Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-14 22:42:51 +00:00
Alyssa Rosenzweig	8a1ab9a166	util: Add a drm_find_modifier helper This function is replicated across vc4/v3d/freedreno and is needed in Panfrost; let's make this shared code. v2: Supply generic util_array_contains_u64 version (Eric Engestrom). Add missing stdbool.h include (Eric Anholt). Mark inline (Christian Gmeiner). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-14 22:41:08 +00:00
Mark Janes	16d108b502	mesa: add logging function for formatted string Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-03-14 12:56:59 -07:00
Mark Janes	b8a1a3214a	mesa: rename logging functions to reflect that they format strings In preparation for the definition of a function to log a formatted string. Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-03-14 12:56:45 -07:00
Mark Janes	eb1a869a5d	mesa: properly report the length of truncated log messages _mesa_log_msg must provide the length of the string passed into the KHR_debug api. When the string formatted by _mesa_gl_vdebugf exceeds MAX_DEBUG_MESSAGE_LENGTH, the length is incorrectly set to the number of characters that would have been written if enough space had been available. Fixes: `3025680578` ("mesa: Add support for GL_ARB_debug_output with dynamic ID allocation.") Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-03-14 12:56:19 -07:00
Jason Ekstrand	162286eb75	anv: Only set 3DSTATE_PS::VectorMaskEnable on gen8+ We don't set it on HSW and earlier in i965 and disabling it appears to make derivatives somewhat more reliable. Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-14 12:22:20 -05:00
Eric Engestrom	b63fe65bf6	travis: fix osx meson build	2019-03-14 17:06:03 +00:00
Samuel Pitoiset	3a2e93147f	radv: always initialize HTILE when the src layout is UNDEFINED HTILE should always be initialized when transitioning from VK_IMAGE_LAYOUT_UNDEFINED to other image layouts. Otherwise, if an app does a transition from UNDEFINED to GENERAL, the driver doesn't initialize HTILE and it tries to decompress the depth surface. For some reasons, this results in VM faults. Cc: mesa-stable@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107563 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-14 17:22:23 +01:00
Tomeu Vizoso	27b0661e30	panfrost: Adapt to uapi changes Two ioctls had wrong DRM_IO* flags. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Rob Herring <robh@kernel.org>	2019-03-14 15:24:27 +01:00
Plamena Manolova	19ab082001	i965: Disable ARB_fragment_shader_interlock for platforms prior to GEN9 ARB_fragment_shader_interlock depends on memory fences to ensure fragment ordering and this ordering guarantee is only supported from GEN9 onwards. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109980 Fixes: `939312702e` "i965: Add ARB_fragment_shader_interlock support." Signed-off-by: Plamena Manolova <plamena.n.manolova@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-14 13:04:12 +00:00
Kenneth Graunke	0c3adaad22	iris: Don't mutate box in transfer map code Not mutating the boxes is arguably cleaner. Split from a patch by Chris Wilson but reworked to use a pointer to the original box rather than making a copy at all.	2019-03-13 23:31:51 -07:00
Tapani Pälli	3b41175c22	i965: remove scaling factors from P010, P012 Patch removes scaling factors introduced in `2a2e69f975` but leaves option to use scaling in place as it could be useful with other upcoming YUV formats. We did this scaling because ffmpeg was shifting channel bits down, however it seems this is not the right place as compositor wants to flip same buffers directly to display as well and therefore bitshifting needs to be done by the client when receiving frame from ffmpeg. Now P0x formats are treated the same, e.g. P010 is same as P016 but with lower 6 bits set to zeros. Fixes: `2a2e69f975` "i965: add P0x formats and propagate required scaling factors" Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-14 07:41:44 +02:00
Jason Ekstrand	489bf2de23	anv/pass: Flag the need for a RT flush for resolve attachments Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Cc: mesa-stable@lists.freedesktop.org	2019-03-13 17:58:27 -05:00
Jason Ekstrand	13099d4490	anv: Stop using VK_TRUE/FALSE We've been fairly inconsistent about this so we should really choose whether we're going to use VK_TRUE/FALSE or the C boolean values. The Vulkan #defines are set to 1 and 0 respectively so it's the same value as C gives you when you cast a boolean expression to an integer. Since there are several places where we set a VkBool32 to a C logical expression, let's just embrace C booleans and stop using the VK defines. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-13 17:58:27 -05:00
Gurchetan Singh	d6dc68e7b5	virgl: use uint16_t mask instead of separate booleans This should save some space. Suggested-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-03-13 22:58:22 +00:00
Albert Pal	56717e13a6	Fix link release notes for 19.0.0. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-03-13 22:36:42 +00:00
Rafael Antognolli	2b2b449dd1	iris: Enable auxiliary buffer support again Now that we are properly resolving buffers before giving them to the window system, let's enable aux support again. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-13 14:45:13 -07:00
Rafael Antognolli	1281368d02	iris: Convert RGBX to RGBA always. In i965, we disable the use of RGBX formats, so the higher layers of Mesa choose the equivalent RGBA format, and swizzle the alpha channel to 1.0. However, Gallium won't do that. We need to explicitly convert it to RGBA. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-13 14:45:13 -07:00
Rafael Antognolli	9159a5bbf8	iris: Add resolve on iris_flush_resource. The flush_resource hook is supposedly called when the resource content needs to be made visible to external (okay, that's pretty vague). For instance, it gets called before a surface gets handled to the window system. So we need to resolve it if it's not resolved yet. v2 (Ken): - Check mod_info in iris_flush_resource instead of ISL_AUX_USAGE_NONE - Drop my old broken resolve code from iris_resource_get_handle() now that Rafael's got it hooked up in the right place. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-13 14:45:13 -07:00
Eduardo Lima Mitev	759ceda07e	ir3/lower_io_offsets: Try propagate SSBO's SHR into a previous shift instruction While we lack value range tracking, this patch tries to 'manually' propogate the division by 4 to calculate SSBO element-offset, into a possible previous shift operation (shift left or right); checking that it is safe to do so. This should help in cases like ie. when accessing a field in an array of structs, where the offset is likely defined as base plus a multiplication by a struct or array element size. See dEQP test 'dEQP-GLES31.functional.ssbo.atomic.xor.highp_uint' for an example of a shader that benefits from this. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-03-13 21:19:44 +01:00
Eduardo Lima Mitev	2e4525883f	ir3/compiler: Enable lower_io_offsets pass and handle new SSBO intrinsics These intrinsics have the offset in dwords already computed in the last source, so the change here is basically using that instead of emitting the ir3_SHR to divide the byte-offset by 4. The improvement in shader stats is significant, of up to ~15% in instruction count in some cases. Tested only on a5xx. shader-db is unfortunately not very useful here because shaders that use SSBO require GLSL versions that are not supported by freedreno yet. For examples, most Khronos CTS tests under 'dEQP-GLES31.functional.ssbo.*' are helped. A random case: dEQP-GLES31.functional.ssbo.layout.2_level_array.packed.row_major_mat3x2 with current master: ; CL prog 14/1: 1252 instructions, 0 half, 48 full ; 8 const, 8 constlen ; 61 (ss), 43 (sy) with the SSBO dword-offset moved to NIR: ; CL prog 14/1: 1053 instructions, 0 half, 45 full ; 7 const, 7 constlen ; 34 (ss), 73 (sy) The SHR previously emitted for every single SSBO instruction disappears in most cases, and the dword-offset ends up embedded in the STGB instruction as immediate in many cases as well. There are also a few of those tests that are currently failing on register allocation, that start to pass as a result of reducing the pressure. At least these, probably more: dEQP-GLES31.functional.ssbo.layout.random.unsized_arrays.24 dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.6 dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.17 dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays.14 dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays_instance_arrays.5 dEQP-GLES31.functional.ssbo.layout.random.nested_structs_arrays_instance_arrays.7 No regressions observed with relevant CTS and piglit tests. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-03-13 21:19:44 +01:00
Eduardo Lima Mitev	9dd0cfafc9	ir3/nir: Add a new pass 'ir3_nir_lower_io_offsets' This NIR->NIR pass implements offset computations that are currently done on the IR3 backend compiler, to give NIR a better chance of optimizing them. For now, it supports lowering the dword-offset computation for SSBO instructions. It will take an SSBO intrinsic and replace it with the new ir3-specific version that adds an extra source. That source will hold the SSA value resulting from inserting a division by 4 (an SHR op) of the original byte-offset source already provided by NIR in one of the intrinsic sources. Note that on a6xx the original byte-offset is not needed, so we could potentially replace that source instead of adding a new one. But to keep things simple and consistent we always add the new source and a6xx will just ignore the original one. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-03-13 21:19:44 +01:00
Eduardo Lima Mitev	6ff50a488a	nir: Add ir3-specific version of most SSBO intrinsics These are ir3 specific versions of SSBO intrinsics that add an extra source to hold the element offset (dword), which is what the backend instructions need. The original byte-offset source provided by NIR is not replaced because on a4xx and a5xx the backend still needs it. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-03-13 21:19:44 +01:00
Dylan Baker	03a0801bcb	docs: update calendar, add news item, and link release notes for 19.0.0	2019-03-13 12:36:27 -07:00
Dylan Baker	0cd487f375	docs: Add SHA256 sums for 19.0.0	2019-03-13 12:22:58 -07:00
Dylan Baker	44273b4806	docs: Add release notes for 19.0.0	2019-03-13 12:22:57 -07:00
Kevin Strasser	70b36c0ef9	egl/dri: Avoid out of bounds array access indexConfigAttrib iterates over every index in the dri driver, possibly exceeding __DRI_ATTRIB_MAX. In other words, if the dri driver has newer attributes libEGL will end up reading from uninitialized memory through dri2_to_egl_attribute_map[]. Signed-off-by: Kevin Strasser <kevin.strasser@intel.com> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-03-13 18:28:53 +00:00
Chris Wilson	97ad0efba0	iris: Use streaming loads to read from tiled surfaces Always use the streaming load (since we know we have Broadwell+, all of our target CPU support sse41) for reading back form the tiled surface for mapping the resource. This means we hit the fast WC handling paths on Atoms (without LLC), and for big Core (with LLC) using the streaming load is no less efficient as we do not require the tiled buffer to be pulled into the CPU cache. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-13 10:54:16 -07:00
Chris Wilson	797fb6c6ac	iris: Use coherent allocation for PIPE_RESOURCE_STAGING On !llc machines (Atoms), reading from a linear buffers is slow and so copying from one resource into the linear staging buffer is still slow. However, we can tell the GPU to snoop the CPU cache when reading from and writing to the staging buffer eliminating the slow uncached reads. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-13 10:54:16 -07:00
Chris Wilson	01b224047b	iris: Use PIPE_BUFFER_STAGING for the query objects We prefer fast CPU access to read back the query results. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-13 10:54:16 -07:00
Caio Marcelo de Oliveira Filho	65e8761474	intel/nir: Combine store_derefs to improve code from SPIR-V Due to lack of write mask in SPIR-V store, generators may produce multiple stores to the same vector but using different array derefs. Use the combining store pass to clean this up. For example, layout(binding = 3) buffer block { vec4 v; }; void main() { v.x = 11; v.y = 22; } after going to SPIR-V and NIR, ends up with in two store_derefs to v[0] and v[1] vec2 32 ssa_4 = deref_struct &ssa_3->field0 (ssbo vec4) /* &((block )ssa_2)->field0 / vec2 32 ssa_6 = deref_array &(ssa_4)[0] (ssbo float) / &((block )ssa_2)->field0[0] / intrinsic store_deref (ssa_6, ssa_7) (1, 0) /* wrmask=x / / access=0 / vec1 32 ssa_13 = load_const (0x00000001 / 0.000000 /) vec2 32 ssa_14 = deref_array &(ssa_4)[1] (ssbo float) /* &((block )ssa_2)->field0[1] / intrinsic store_deref (ssa_14, ssa_15) (1, 0) /* wrmask=x / / access=0 / producing two different sends instructions in skl. The combining pass transform the snippet above into vec2 32 ssa_4 = deref_struct &ssa_3->field0 (ssbo vec4) / &((block )ssa_2)->field0 / vec4 32 ssa_18 = vec4 ssa_7, ssa_15, ssa_16, ssa_17 intrinsic store_deref (ssa_4, ssa_18) (3, 0) /* wrmask=xy / / access=0 */ producing a single sends instruction. v2: Move this from spirv_to_nir into the general optimization pass for intel compiler. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-13 08:39:16 -07:00
Caio Marcelo de Oliveira Filho	10dfb0011e	intel/nir: Combine store_derefs after vectorizing IO Shader-db results for skl: total instructions in shared programs: 15232903 -> 15224781 (-0.05%) instructions in affected programs: 61246 -> 53124 (-13.26%) helped: 221 HURT: 0 total cycles in shared programs: 371440470 -> 371398018 (-0.01%) cycles in affected programs: 281363 -> 238911 (-15.09%) helped: 221 HURT: 0 Results for bdw are very similar. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-13 08:39:16 -07:00
Caio Marcelo de Oliveira Filho	822a8865e4	nir: Add a pass to combine store_derefs to same vector v2: (all from Jason) Reuse existing function for the end of the block combinations. Check the SSA values are coming from the right place in tests. Document the case when the store to array_deref is reused. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-13 08:39:16 -07:00
Samuel Pitoiset	cbf022cb31	ac: use the raw tbuffer version for 16-bit SSBO loads vindex is always 0. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-13 14:16:14 +01:00
Samuel Pitoiset	045fae0f73	ac: add ac_build_{struct,raw}_tbuffer_load() helpers The struct version sets IDXEN=1, while the raw version sets IDXEN=0. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-13 14:15:05 +01:00
Samuel Pitoiset	a66b186beb	radv: use typed buffer loads for vertex input fetches This drastically reduces the number of SGPRs because the driver now uses descriptors per vertex binding, instead of per vertex attribute format. 29077 shaders in 15096 tests Totals: SGPRS: 1354285 -> 1282109 (-5.33 %) VGPRS: 909896 -> 908800 (-0.12 %) Spilled SGPRs: 24840 -> 24811 (-0.12 %) Code Size: 49221144 -> 48986628 (-0.48 %) bytes Max Waves: 243930 -> 244229 (0.12 %) Totals from affected shaders: SGPRS: 390648 -> 318472 (-18.48 %) VGPRS: 288432 -> 287336 (-0.38 %) Spilled SGPRs: 94 -> 65 (-30.85 %) Code Size: 11548412 -> 11313896 (-2.03 %) bytes Max Waves: 86460 -> 86759 (0.35 %) This gives a really tiny boost. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-13 13:31:11 +01:00
Samuel Pitoiset	0b9a06a1a0	radv: store more vertex attribute infos as pipeline keys They are required for using typed buffer loads. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-13 13:31:08 +01:00
Samuel Pitoiset	489dac0d21	ac: rework typed buffers loads for LLVM 7 Be more generic, this will be used by an upcoming series. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-13 13:31:06 +01:00
Tomeu Vizoso	56e04f67f9	panfrost: Set bo->gem_handle when creating a linear BO So we can free it later. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-13 07:35:39 +01:00
Tomeu Vizoso	bfbad30543	panfrost: Set bo->size[0] in the DRM backend So we can unmap it later. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-13 07:35:25 +01:00
Kenneth Graunke	3570d15b6d	intel/fs: Fix opt_peephole_csel to not throw away saturates. We were not copying the saturate bit from the original instruction to the new replacement instruction. This caused major misrendering in DiRT Rally on iris, where comparisons leading to discards failed due to the missing saturate, causing lots of extra garbage pixels to be drawn in text rendering, trees, and so on. This did not show up on i965 because st/nir performs a more aggressive version of nir_opt_peephole_select, yielding more b32csel operations. Fixes: `52c7df1643` i965/fs: Merge CMP and SEL into CSEL on Gen8+ Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 20:11:55 -07:00
Jason Ekstrand	bd17bdc56b	glsl/lower_vector_derefs: Don't use a temporary for TCS outputs Tessellation control shader outputs act as if they have memory backing them and you can have multiple writes to different components of the same vector in-flight at the same time. When this happens, the load vec store pattern that gets used by ir_triop_vector_insert doesn't yield the correct results. Instead, just emit a sequence of conditional assignments. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Cc: mesa-stable@lists.freedesktop.org	2019-03-13 02:10:31 +00:00
Jason Ekstrand	20c4578c55	glsl/list: Add a list variant of insert_after Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-13 02:10:31 +00:00
Jason Ekstrand	83fdefc062	nir/loop_unroll: Fix out-of-bounds access handling The previous code was completely broken when it came to constructing the undef values. I'm not sure how it ever worked. For the case of a copy that reads an undefined value, we can just delete the copy because the destination is a valid undefined value. This saves us the effort of trying to construct a value for an arbitrary copy_deref intrinsic. Fixes: `e8a8937a04` "nir: add partial loop unrolling support" Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-12 21:06:39 -05:00
Jason Ekstrand	c056609c43	anv: Ignore VkRenderPassInputAttachementAspectCreateInfo We don't care about the information but there's no sense in throwing a debug warning about it. It's harmless but annoying to users. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109984 Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>	2019-03-12 21:06:39 -05:00
Eric Anholt	486b181fd7	v3d: Fix leak of the renderonly struct on screen destruction. This makes v3d match vc4's destroy path. Fixes: `e113b21cb7` ("v3d: Add renderonly support.")	2019-03-12 16:15:40 -07:00
Eric Anholt	0c874c18cd	v3d: Fix leak of the mem_ctx after the DAG refactor. Noticed while trying to get a CTS run again. Fixes: `33886474d6` ("v3d: Use the DAG datastructure for QPU instruction scheduling.")	2019-03-12 16:15:40 -07:00
Grigori Goronzy	acfd88204e	glx: add support for GLX_ARB_create_context_no_error (v3) v2: Only reject no-error contexts for too-old GL if we're actually trying to create a no-error context (Adam Jackson) v3: Fix share contexts (Adam Jackson) Reviewed-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-12 19:12:21 -04:00
Samuel Pitoiset	ae77f12368	radv: set the maximum number of IBs per submit to 192 This fixes random SteamVR corruption, see https://github.com/ValveSoftware/SteamVR-for-Linux/issues/181 Fixes: `4d30f2c6f4` ("radv/winsys: remove the max IBs per submit limit for the fallback path") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-12 22:15:45 +01:00
Danylo Piliaiev	9c80be956f	anv: Fix destroying descriptor sets when pool gets reset pool->next and pool->free_list were reset before their usage in anv_descriptor_pool_free_set Fixes: 775aabdd "anv: destroy descriptor sets when pool gets reset" Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-12 17:09:37 +00:00
Eric Anholt	ccce940947	v3d: Disable PIPE_CAP_BLIT_BASED_TEXTURE_TRANSFER. This reduces the runtime of dEQP-GLES3.functional.shaders.precision.* from 11.5s to 3.3s. This brings CTS runs down to 4 hours on one of my target devices.	2019-03-12 09:04:25 -07:00
Jason Ekstrand	6d5d89d25a	intel/nir: Vectorize all IO The IO scalarization pass that we run to help with linking end up turning some shader I/O such as that for tessellation and geometry shaders into many scalar URB operations rather than one vector one. To alleviate this, we now vectorize the I/O once again. This fixes a 10% performance regression in the GfxBench tessellation test that was caused by scalarizing. Shader-db results on Kaby Lake: total instructions in shared programs: 15224023 -> 15220871 (-0.02%) instructions in affected programs: 342009 -> 338857 (-0.92%) helped: 1236 HURT: 443 total spills in shared programs: 23471 -> 23465 (-0.03%) spills in affected programs: 6 -> 0 helped: 1 HURT: 0 total fills in shared programs: 31770 -> 31766 (-0.01%) fills in affected programs: 4 -> 0 helped: 1 HURT: 0 Cycles was just a lot of churn do to moves being different places. Most of the pure churn in instructions was +/- one or two instructions in fragment shaders. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510 Fixes: `4434591bf5` "intel/nir: Call nir_lower_io_to_scalar_early" Fixes: `8d8222461f` "intel/nir: Enable nir_opt_find_array_copies" Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-03-12 15:34:06 +00:00
Jason Ekstrand	5ef2b8f1f2	nir: Add a pass for lowering IO back to vector when possible This pass tries to turn scalar and array-of-scalar IO variables into vector IO variables whenever possible. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Cc: "19.0" <mesa-stable@lists.freedesktop.org>	2019-03-12 15:34:06 +00:00
Rhys Perry	0f025bbccc	ac/nir: fix 16-bit ssbo stores Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-03-12 15:51:52 +01:00
pal1000	7f89fd17ed	scons: Compatibility with Scons development version string This ensures Mesa3D build doesn't fail in this case as encountered when bisecting Scons source code while regression testing https://bugs.freedesktop.org/show_bug.cgi?id=109443 and when testing 3.0.5.a.2 Technical details: Scons version string has consistently been in this format: MajorVersion.MinorVersion.Patch[.alpha/beta.yyyymmdd] so these formulas should strip alpha/beta flags and return Scons version: - as string - `'.'.join(SCons.__version__.split('.')[:3])` - as tuple of integers - `tuple(map(int, SCons.__version__.split('.')[:3]))` - v2: Fixed Scons version retrieval formulas as string and tuple of integers. - v3: Fixed Scons version string format description. Cc: "19.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2019-03-12 14:22:34 +00:00
Tapani Pälli	bef354321b	anv: revert "anv: release memory allocated by glsl types during spirv_to_nir" This reverts commit `47fc359822`. Reason is that patch did not take in to account situation where we might have both OpenGL and Vulkan using glsl_types at the same time. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-12 14:12:36 +02:00
Connor Abbott	1bbe58c214	radeonsi/nir: Use nir stripping pass This reduces compilation time for my shader-db collection from around 40 seconds to 30, vs. 19 seconds for TGSI. There are still some shaders that TGSI caches but NIR doesn't, partly because of more aggressive cross-stage optimizations with NIR. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-12 10:49:48 +01:00
Connor Abbott	5b2ec9c81e	nir: Add a stripping pass for improved cacheability Oftentimes various nir shaders after lowering will be the same, or almost the same. For example, this can happen when the same shader is linked with different shaders to form different pipelines and cross-stage optimizations don't kick in to change it. We want to avoid running the backend twice on these shaders. We were already doing this with radeonsi, but we were storing a few extra pieces of information that made this much less effective compared to TGSI. The worse offender by far was the program name, which caused most of the cache misses. This pass strips out these pieces of information, controlled by the NIR_STRIP debug env variable. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-12 10:49:48 +01:00
Samuel Pitoiset	6403171843	radv: fix pointSizeRange limits The values should match the ones that are emitted. This fixes new CTS dEQP-VK.rasterization.primitive_size.points.*. Fixes: `f4e499ec79` ("radv: add initial non-conformant radv vulkan driver") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-12 09:00:32 +01:00
Sagar Ghuge	bbef6c2d5f	iris: Flag fewer dirty bits in BLORP v2: 1) Skip flagging IRIS_DIRTY_DEPTH_BUFFER if BLORP_BATCH_NO_EMIT_DEPTH_STENCIL is set (Kenneth Graunke) 2) Add missing flags (Kenneth Graunke) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-11 22:46:39 -07:00
Timothy Arceri	cb2898f478	st/glsl_to_nir: fix incorrect arrary access This fixes a segfault when we try to access the array using a -1 when the array wasn't allocated in the first place. Before `7536af670b` we would just access a pre-allocated array that was also load/stored to/from the shader cache. But now the cache will no longer allocate these arrays if they are empty. The change resulted in tests such as the following segfaulting when run with a warm shader cache. tests/spec/arb_arrays_of_arrays/execution/sampler/fs-struct-const-index.shader_test	2019-03-12 14:47:21 +11:00
Brian Paul	02c2863df5	nir: silence a couple new compiler warnings [33/630] Compiling C object 'src/compiler/nir/nir@sta/nir_loop_analyze.c.o'. ../src/compiler/nir/nir_loop_analyze.c: In function ‘try_find_trip_count_vars_in_iand’: ../src/compiler/nir/nir_loop_analyze.c:846:29: warning: suggest parentheses around ‘&&’ within ‘\|\|’ [-Wparentheses] if (ind == NULL \|\| ind && (ind)->type != basic_induction \|\| ^ [85/630] Compiling C object 'src/compiler/nir/nir@sta/nir_opt_loop_unroll.c.o'. ../src/compiler/nir/nir_opt_loop_unroll.c: In function ‘complex_unroll_single_terminator’: ../src/compiler/nir/nir_opt_loop_unroll.c:494:17: warning: unused variable ‘unroll_loc’ [-Wunused-variable] nir_cf_node unroll_loc = ^ Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-12 14:34:51 +11:00
Alyssa Rosenzweig	587ad37e72	panfrost: Identify fragment_extra flags The fragment_extra structure contains additional fields extending the MRT framebuffer descriptor, snuck in between the main framebuffer descriptor and the render targets. Its fields include those related to transaction elimination and depth/stencil buffers. This patch identifies the flags field (previously just "unk" with some magic values) as well as identifying some (but not all) flags set by the driver. The process of identifying flags brought a bug to light where transaction elimination (checksumming) could not be enabled unless AFBC was in-use. This issue is now resolved. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-03-12 02:37:42 +00:00
Alyssa Rosenzweig	e57ea53acf	panfrost: Document "depth-buffer writeback" bit This bit, if set, causes the depth buffer to be copied from GPU tile memory to the provided depth buffer in main memory. If not set, the GPU will not access the main memory (saving considerable memory bandwidth if depth results are not actually used). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-12 02:37:42 +00:00
Alyssa Rosenzweig	2df4537f91	panfrost: Support linear depth textures This combination has not yet been seen "in the wild" in traces, but to support linear depth FBOs, ~bruteforce reveals this bit pattern is necessary. It's not yet clear why the meanings of 0x1 and 0x2 are essentially flipped (tiled vs linear for colour, linear vs some sort of tiled for depth). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-03-12 02:37:41 +00:00
Alyssa Rosenzweig	9f25a4e65c	panfrost: Allocate dedicated slab for linear BOs Previously, linear BOs shared memory with each other to minimize kernel round-trips / latency, as well as to work around a bug in the free_slab function. These concerns are invalid now, but continuing to use the slab allocator for BOs resulted in memory allocation errors. This issue was aggravated, though not introduced (so not a real regression) in the previous commit. v2 (unreviewed): Fix bug in v1 preventing munmaps from working Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-03-12 02:37:41 +00:00
Alyssa Rosenzweig	f9dc1ebc0d	panfrost: Determine framebuffer format bits late Again, these formats are only properly known at the time of fragment job emit. Rather than hardcoding the format, at least for MFBD we begin to construct the format bits on-demand. This cleans up the code, futureproofs for ES3 framebuffer formats, and should fix bugs regarding FBO colour swizzles. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>	2019-03-12 02:37:41 +00:00
Alyssa Rosenzweig	7ba18cdfa9	panfrost: Delay color buffer setup In an effort to cleanup framebuffer management code, we delay colour buffer setup until the FRAGMENT job is actually emitted, allowing the AFBC and linear codepaths to be unified. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>	2019-03-12 02:37:41 +00:00
Alyssa Rosenzweig	536bcaa68f	panfrost: Combine has_afbc/tiled in layout enum AFBC, tiled, and linear BO layouts are mutually exclusive; they should be coupled via a single enum rather than ad hoc checks of booleans. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>	2019-03-12 02:37:41 +00:00
Alyssa Rosenzweig	d93c5c3148	panfrost: Cleanup needless if in create_bo I'm not sure why we were checking for these additional criteria (likely inherited from some other driver); remove the needless checks to cleanup the code and perhaps fix some bugs down the line. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>	2019-03-12 02:37:41 +00:00
Kenneth Graunke	1467deb543	i965: Reimplement all the PIPE_CONTROL rules. This implements virtually all documented PIPE_CONTROL restrictions in a centralized helper. You now simply ask for the operations you want, and the pipe control "brain" will figure out exactly what pipe controls to emit to make that happen without tanking your system. The hope is that this will fix some intermittent flushing issues as well as GPU hangs. However, it also has a high risk of causing GPU hangs and other regressions, as this is a particularly sensitive area and poking the bear isn't always advisable. Mark Janes noted that this patch helps with some GPU hangs on Icelake. This does re-enable the VF Invalidate => Write Immediate workaround on Gen8, which had been disabled (bug 103787) due to GPU hangs. The old code did this workaround after another which would have added CS stall bits, so it missed a workaround. The new code orders them properly and appears to work. v4: Don't pass "bo, offset, imm" to a recursive CS stall (caught by Topi Pohjolainen), drop Gen10 workarounds that are unnecessary for production hardware. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2019-03-11 19:32:40 -07:00
Kenneth Graunke	c6af96d1bc	i965: Use genxml for emitting PIPE_CONTROL. While this does add a bunch of boilerplate, it also protects us against the hardware moving bits, or changing their meaning. For something as finnicky as PIPE_CONTROL, the extra safety seems worth it. We turn PIPE_CONTROL_* into an bitfield of arbitrary flags, and then pack them appropriately. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2019-03-11 19:32:40 -07:00
Kenneth Graunke	2c6f712408	i965: Rename ISP_DIS to INDIRECT_STATE_POINTERS_DISABLE. Clearer name. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2019-03-11 19:32:40 -07:00
Kenneth Graunke	aa139f0980	i965: Move some genX infrastructure to genX_boilerplate.h. This will let us make multiple genX_*.c files, without copy and pasting all this boilerplate. Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2019-03-11 19:32:40 -07:00
Brian Paul	ecb708fada	gallium/winsys/kms: fix incomplete type compilation failure Fixes: ../src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c: In function ‘kms_sw_displaytarget_from_handle’: ../src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c:402:60: error: dereferencing pointer to incomplete type ‘const struct pipe_resource’ templ->format, ^ Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-03-11 20:08:16 -06:00
Brian Paul	04544d852c	drisw: fix incomplete type compilation failure Fixes: ../src/gallium/winsys/sw/dri/dri_sw_winsys.c: In function ‘dri_sw_displaytarget_display’: ../src/gallium/winsys/sw/dri/dri_sw_winsys.c:255:39: error: dereferencing pointer to incomplete type ‘struct pipe_box’ offset = dri_sw_dt->stride * box->y; ^ Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-03-11 20:08:16 -06:00
Brian Paul	45c6da5a48	docs: try to improve the Meson documentation (v2) Add new Introduction and Advanced Usage sections. Spell out a few more details, like "ninja install". Improve the layout around example commands. Fix grammatical errors and tighten up the text. Explain the --prefix option. v2: Remove language about 'ninja clean' and move link to Meson information about separate build directories earlier in the page. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-11 20:08:16 -06:00
Brian Paul	187a527ed7	st/mesa: minor refactoring of texture/sampler delete code Rename st_texture_free_sampler_views() to st_delete_texture_sampler_views() to align with st_DeleteTextureObject(), its only caller. Move the call to st_texture_release_all_sampler_views() from st_DeleteTextureObject() to st_delete_texture_sampler_views() so all the sampler view clean-up code is in one place. Reviewed-by: Neha Bhende <bhenden@vmware.com>	2019-03-11 20:08:16 -06:00
Brian Paul	70a2ede112	st/mesa: rename st_texture_release_sampler_view() To st_texture_release_context_sampler_view() to be more clear that it's context-specific. Reviewed-by: Neha Bhende <bhenden@vmware.com>	2019-03-11 20:08:16 -06:00
Brian Paul	41adb3d6df	st/mesa: add/improve sampler view comments Reviewed-by: Neha Bhende <bhenden@vmware.com>	2019-03-11 20:08:16 -06:00
Brian Paul	c7d2504625	st/mesa: move around some code in st_context.c st_init_driver_functions() is only called in st_context.c so there's no need for the prototype in st_context.h To avoid a forward declaration of st_init_driver_functions() in st_context.c, we need to move around several other functions. No functional change. Reviewed-by: Neha Bhende <bhenden@vmware.com>	2019-03-11 20:08:16 -06:00
Brian Paul	b29d827f09	st/mesa: move utility functions, macros into new st_util.h file To de-clutter st_context.h. Clean up remaining function prototypes in st_context.h. The st_vp_uses_current_values() helper is only used in st_context.c so move it there. The st_get_active_states() function is only used in st_context.c so remove its prototype in st_context.h Reviewed-by: Neha Bhende <bhenden@vmware.com>	2019-03-11 20:08:16 -06:00
Juan A. Suarez Romero	775aabdd01	anv: destroy descriptor sets when pool gets reset As stated in Vulkan spec: "Resetting a descriptor pool recycles all of the resources from all of the descriptor sets allocated from the descriptor pool back to the descriptor pool, and the descriptor sets are implicitly freed." This fixes dEQP-VK.api.descriptor_pool.* Fixes: `14f6275c92` "anv/descriptor_set: add reference counting for..." Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Clayton Craft <clayton.a.craft@intel.com>	2019-03-11 20:40:31 -05:00
Timothy Arceri	3235a942c1	nir: find induction/limit vars in iand instructions This will be used to help find the trip count of loops that look like the following: while (a < x && i < 8) { ... i++; } Where the NIR will end up looking something like this: vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */) loop { ... vec1 1 ssa_12 = ilt ssa_225, ssa_11 vec1 1 ssa_17 = ilt ssa_226, ssa_1 vec1 1 ssa_18 = iand ssa_12, ssa_17 vec1 1 ssa_19 = inot ssa_18 if ssa_19 { ... break } else { ... } } On RADV this unrolls a bunch of loops in F1-2017 shaders. Totals from affected shaders: SGPRS: 4112 -> 4136 (0.58 %) VGPRS: 4132 -> 4052 (-1.94 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 515444 -> 587720 (14.02 %) bytes LDS: 2 -> 2 (0.00 %) blocks Max Waves: 194 -> 196 (1.03 %) Wait states: 0 -> 0 (0.00 %) It also unrolls a couple of loops in shader-db on radeonsi. Totals from affected shaders: SGPRS: 128 -> 128 (0.00 %) VGPRS: 64 -> 64 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 6880 -> 9504 (38.14 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 16 -> 16 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	67c3478482	nir: pass nir_op to calculate_iterations() Rather than getting this from the alu instruction this allows us some flexibility. In the following pass we instead pass the inverse op. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	11e8f8a166	nir: add get_induction_and_limit_vars() helper to loop analysis This helps make find_trip_count() a little easier to follow but will also be used by a following patch. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	f219f6114d	nir: add helper to return inversion op of a comparison This will be used to help find the trip count of loops that look like the following: while (a < x && i < 8) { ... i++; } Where the NIR will end up looking something like this: vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */) loop { ... vec1 1 ssa_12 = ilt ssa_225, ssa_11 vec1 1 ssa_17 = ilt ssa_226, ssa_1 vec1 1 ssa_18 = iand ssa_12, ssa_17 vec1 1 ssa_19 = inot ssa_18 if ssa_19 { ... break } else { ... } } So in order to find the trip count we need to find the inverse of ilt. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	090feaacdc	nir: simplify the loop analysis trip count code a little Here we create a helper is_supported_terminator_condition() and use that rather than embedding all the trip count code inside a switch. The new helper will also be used in a following patch. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	7571de8eaa	nir: unroll some loops with a variable limit For some loops can have a single terminator but the exact trip count is still unknown. For example: for (int i = 0; i < imin(x, 4); i++) ... Shader-db results radeonsi (all affected are from Tropico 5): Totals from affected shaders: SGPRS: 144 -> 152 (5.56 %) VGPRS: 124 -> 108 (-12.90 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 5180 -> 6640 (28.19 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 17 -> 21 (23.53 %) Wait states: 0 -> 0 (0.00 %) Shader-db results i965 (SKL): total loops in shared programs: 3808 -> 3802 (-0.16%) loops in affected programs: 6 -> 0 helped: 6 HURT: 0 vkpipeline-db results RADV (Unrolls some Skyrim VR shaders): Totals from affected shaders: SGPRS: 304 -> 304 (0.00 %) VGPRS: 296 -> 292 (-1.35 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 15756 -> 25884 (64.28 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 29 -> 29 (0.00 %) Wait states: 0 -> 0 (0.00 %) v2: fix bug where last iteration would get optimised away by mistake. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	68ce0ec222	nir: calculate trip count for more loops This adds support to loop analysis for loops where the induction variable is compared to the result of min(variable, constant). For example: for (int i = 0; i < imin(x, 4); i++) ... We add a new bool to the loop terminator struct in order to differentiate terminators with this exit condition. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	e8a8937a04	nir: add partial loop unrolling support This adds partial loop unrolling support and makes use of a guessed trip count based on array access. The code is written so that we could use partial unrolling more generally, but for now it's only use when we have guessed the trip count. We use partial unrolling for this guessed trip count because its possible any out of bounds array access doesn't otherwise affect the shader e.g the stores/loads to/from the array are unused. So we insert a copy of the loop in the innermost continue branch of the unrolled loop. Later on its possible for nir_opt_dead_cf() to then remove the loop in some cases. A Renderdoc capture from the Rise of the Tomb Raider benchmark, reports the following change in an affected compute shader: GPU duration: 350 -> 325 microseconds shader-db results radeonsi VEGA (NIR backend): SGPRS: 1008 -> 816 (-19.05 %) VGPRS: 684 -> 432 (-36.84 %) Spilled SGPRs: 539 -> 0 (-100.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 39708 -> 45812 (15.37 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 105 -> 144 (37.14 %) Wait states: 0 -> 0 (0.00 %) shader-db results i965 SKL: total instructions in shared programs: 13098265 -> 13103359 (0.04%) instructions in affected programs: 5126 -> 10220 (99.38%) helped: 0 HURT: 21 total cycles in shared programs: 332039949 -> 331985622 (-0.02%) cycles in affected programs: 289252 -> 234925 (-18.78%) helped: 12 HURT: 9 vkpipeline-db results VEGA: Totals from affected shaders: SGPRS: 184 -> 184 (0.00 %) VGPRS: 448 -> 448 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 26076 -> 24428 (-6.32 %) bytes LDS: 6 -> 6 (0.00 %) blocks Max Waves: 5 -> 5 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	fba5d275db	nir: add new partially_unrolled bool to nir_loop In order to stop continuously partially unrolling the same loop we add the bool partially_unrolled to nir_loop, we add it here rather than in nir_loop_info because nir_loop_info is only set via loop analysis and is intended to be cleared before each analysis. Also nir_loop_info is never cloned. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Timothy Arceri	03a452b7d0	nir: add guess trip count support to loop analysis This detects an induction variable used as an array index to guess the trip count of the loop. This enables us to do a partial unroll of the loop, which can eventually result in the loop being eliminated. v2: check if the induction var is used to index more than a single array and if so get the size of the smallest array. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-12 00:52:30 +00:00
Tomeu Vizoso	97f2d04d5e	panfrost: Add support for PAN_MESA_DEBUG Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-12 00:30:27 +00:00
Tomeu Vizoso	f0b1bbebdd	panfrost/midgard: Add support for MIDGARD_MESA_DEBUG Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-12 00:30:27 +00:00
Xavier Bouchoux	c5236fc6e2	nir/spirv: Fix assert when unsampled OpTypeImage has unknown 'Depth' 'dxc' hlsl-to-spirv compiler appears to emit 2 (Unknown) in the depth field, when the image is not sampled and the value is not needed. Previously, shaders failed with: SPIR-V parsing FAILED: In file ../src/compiler/spirv/spirv_to_nir.c:1412 !is_shadow 632 bytes into the SPIR-V binary Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-11 23:28:39 +01:00
Kenneth Graunke	d75f84cb65	iris: Fix write enable in pinning of depth/stencil resources We may bind new Z/S buffers (which come via the framebuffer CSO, triggering IRIS_DIRTY_DEPTH_BUFFER), but with writes disabled. The next draw may enable Z or S writes (which come via the ZSA CSO, triggering IRIS_DIRTY_WM_DEPTH_STENCIL), which requires us to update our pin to have the write flag. So, update pinning if either dirty flag changes. To clarify, pass cso_zsa to the pinning function rather than pulling the random values out of ice->state, which unfortunately have to exist for the resolve code since iris_depth_stencil_alpha_state only exists in iris_state.c.	2019-03-11 15:04:08 -07:00
Kenneth Graunke	863e810a19	iris: Refactor depth/stencil buffer pinning into a helper. This avoids the code duplication that caused me to put things in the wrong place in the previous commit. One used to have extra flushes, but we moved those out so now these are identical and can be easily shared.	2019-03-11 15:04:08 -07:00
Kenneth Graunke	9302414f8b	iris: Move depth/stencil flushes so they actually do something Commit `d6dd57d43c` (iris: Add missing depth cache flushes) added the depth/stencil flushes to the wrong place. I meant to add them to the iris_upload_dirty_render_state code that emits the packets, but I accidentally added them to the nearly identical looking code in iris_restore_render_saved_bos. This meant we missed the actual flushing at draw time, but instead did pointless flushing on the first draw in a batch where things are already flushed anyway. This commit moves them to iris_resolve.c, next to the depth prepares, similar to what we do for color buffers. i965 does them elsewhere, but I'm not sure why - this seems like the most consistent place.	2019-03-11 15:04:08 -07:00
Christian Gmeiner	076a7095bb	st/dri: allow direct UYVY import Push this format to the pipe driver unchanged. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-11 22:19:11 +01:00
Kenneth Graunke	04ff2e3fbb	iris: Fix TES gl_PatchVerticesIn handling. 1. If we switch the TCS for one with a different number of output vertices, then the TES's gl_PatchVerticesIn value will change. We need to re-upload in this case. For now, re-emit constants whenever the TCS/TES are swapped out. 2. If there is no TCS, then we can't grab gl_PatchVerticesIn from the TCS info. Since it's a passthrough, we can just use the primitive's patch count (like the TCS gl_PatchVerticesIn does). Fixes KHR-GL45.tessellation_shader.single.max_patch_vertices and KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-11 14:07:16 -07:00
Kenneth Graunke	2f51cb5e67	iris: Rework default tessellation level uploads Now that we've added a system value uploading mechanism, we may as well reuse the same system for default tessellation levels. This simplifies the state upload code a bit. Also fixes: KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_tessLevel Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-11 14:07:12 -07:00
Timur Kristóf	fd5075e059	iris: Face should be a system value. This patch adds PIPE_CAP_TGSI_FS_FACE_IS_INTEGER_SYSVAL which despite its name is not a TGSI-specific capability, just lets the state tracker know that it should generate a system value for FACE. This is needed if we want to run tgsi_to_nir on iris. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-11 14:02:40 -07:00
Eric Anholt	3a9e2d6085	vc4: Switch the post-RA scheduler over to the DAG datastructure. Just a small code reduction from shared infrastructure.	2019-03-11 13:14:37 -07:00
Eric Anholt	33886474d6	v3d: Use the DAG datastructure for QPU instruction scheduling. Just a small code reduction from shared infrastructure.	2019-03-11 13:14:32 -07:00
Eric Anholt	d6d83b34ee	vc4: Reuse list_for_each_entry_rev().	2019-03-11 13:14:32 -07:00
Eric Anholt	7c01ddbf7f	v3d: Reuse list_for_each_entry_rev().	2019-03-11 13:14:32 -07:00
Eric Anholt	7a727c1a12	vc4: Switch over to using the DAG datastructure for QIR scheduling. Just a small code reduction from shared infrastructure.	2019-03-11 13:14:18 -07:00
Eric Anholt	0533d2d95c	util: Add a DAG datastructure. I keep writing this for various schedulers. Acked-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-11 13:13:52 -07:00
Kristian H. Kristensen	5f0a922c27	freedreno/a6xx: Remove extra parens There's a warning about this now. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-03-11 11:37:53 -07:00
Kristian H. Kristensen	08c452bef7	freedreno: Use c_vis_args and no_override_init_args Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-03-11 11:37:53 -07:00
Chia-I Wu	24af64baa5	turnip: preliminary support for Wayland WSI	2019-03-11 10:02:13 -07:00
Chia-I Wu	ae82b5df88	turnip: preliminary support for tu_GetImageSubresourceLayout	2019-03-11 10:02:13 -07:00
Chad Versace	6cb5fd0d71	turnip: Use Vulkan 1.1 names instead of KHR That is, drop KHR from all tokens that were promoted to Vulkan 1.1. The consistency makes ctags more useful (it now jumps directly to the real definitions in vulkan_core.h instead of the typedefs); and it makes the code slightly less verbose.	2019-03-11 10:02:13 -07:00
Chia-I Wu	4f863dc0f7	turnip: guard -Dvulkan-driver=freedreno Require -DI-love-half-baked-turnips=true as well to enable freedreno vulkan driver.	2019-03-11 10:02:13 -07:00
Chia-I Wu	949ce2745d	turnip: preliminary support for tu_CmdDraw	2019-03-11 10:02:13 -07:00
Chia-I Wu	f9b34622cd	turnip: preliminary support for draw state binding This adds support for tu_CmdBindPipeline, tu_CmdBindVertexBuffers, etc.	2019-03-11 10:02:13 -07:00
Chia-I Wu	54b7a57c22	turnip: add draw_cs to tu_cmd_buffer It will hold draw commands.	2019-03-11 10:02:13 -07:00
Chia-I Wu	1cdbab016e	turnip: parse VkPipelineVertexInputStateCreateInfo	2019-03-11 10:02:13 -07:00
Chia-I Wu	d17096b9b1	turnip: parse VkPipelineShaderStageCreateInfo	2019-03-11 10:02:13 -07:00
Chia-I Wu	a7d842c97c	turnip: compile VkPipelineShaderStageCreateInfo Compile all shaders and upload the binaries to a BO.	2019-03-11 10:02:13 -07:00
Chia-I Wu	970a8fec96	turnip: preliminary support for shader modules Save SPIR-V in tu_shader_module. Tranlation to NIR happens in tu_shader_create, and compilation to binary code happens in tu_shader_compile. Both will be called during pipeline creation.	2019-03-11 10:02:13 -07:00
Chia-I Wu	9e0d878787	turnip: parse VkPipeline{Multisample,ColorBlend}StateCreateInfo	2019-03-11 10:02:13 -07:00
Chia-I Wu	bec0abf294	turnip: parse VkPipelineDepthStencilStateCreateInfo	2019-03-11 10:02:13 -07:00
Chia-I Wu	9496b377ff	turnip: parse VkPipelineRasterizationStateCreateInfo	2019-03-11 10:02:13 -07:00
Chia-I Wu	b4884761e8	turnip: parse VkPipelineViewportStateCreateInfo	2019-03-11 10:02:13 -07:00
Chia-I Wu	1bea6a91cb	turnip: parse VkPipelineInputAssemblyStateCreateInfo	2019-03-11 10:02:13 -07:00
Chia-I Wu	c584c2e86c	turnip: parse VkPipelineDynamicStateCreateInfo	2019-03-11 10:02:13 -07:00
Chia-I Wu	df48cb7b3e	turnip: create a less dummy pipeline Still dummy, but at least it is created from tu_pipeline_builder.	2019-03-11 10:02:13 -07:00
Chia-I Wu	57327626dc	turnip: simplify tu_cs sub-streams usage Let tu_cs_begin_sub_stream imply tu_cs_reserve_space, and tu_cs_end_sub_stream imply tu_cs_sanity_check. Callers are no longer required to call them (but can still do if they choose to).	2019-03-11 10:02:13 -07:00
Chia-I Wu	59419bb691	turnip: fix tu_cs sub-streams Update cs->start in tu_cs_end_sub_stream. Otherwise, the entry would include commands from all prior sub-streams.	2019-03-11 10:02:13 -07:00
Chia-I Wu	c0567e84db	turnip: tu_cs_emit_array Array version of tu_cs_emit. Useful for updating multiple consecutive array-like registers, or loading a shader binary with SS6_DIRECT.	2019-03-11 10:02:13 -07:00
Chia-I Wu	fffaa9b4b3	turnip: add tu_cs_discard_entries We will start a draw IB at the beginning of a subpass and consume it at the end of the subpass. With tu_cs_discard_entries, we can reuse the same tu_cs for all subpasses.	2019-03-11 10:02:13 -07:00
Chia-I Wu	10c5013442	turnip: more/better asserts for tu_cs Asserting (cur < end) in tu_cs_emit catches much less programming errors comparing to asserting (cur < reserved_end). We should never write more commands than what we have reserved. Assert IB is non-empty and sane in tu_cs_emit_ib.	2019-03-11 10:02:13 -07:00
Chia-I Wu	aa7dd6cb7f	turnip: use 32-bit offset in tu_cs_entry We don't support nor expect BOs to be that big in tu_cs.	2019-03-11 10:02:13 -07:00
Chia-I Wu	b8a5e10d0d	turnip: mark IBs for dumping Includes IBs in kernel cmdbuf dumps.	2019-03-11 10:02:13 -07:00
Eric Engestrom	4a48dd9fb8	turnip: use the platform defines in vk.xml instead of hard-coding them Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-11 10:02:13 -07:00
Bas Nieuwenhuizen	0d12bcbfa7	turnip: Add todo for copies.	2019-03-11 10:02:13 -07:00
Bas Nieuwenhuizen	51115e7201	turnip: Add buffer->image DMA copies. Passes dEQP-VK.api.copy_and_blit.core.buffer_to_image.*	2019-03-11 10:02:13 -07:00
Bas Nieuwenhuizen	6616563472	turnip: Add image->buffer DMA copies. Passes dEQP-VK.api.copy_and_blit.core.image_to_buffer.*	2019-03-11 10:02:13 -07:00
Bas Nieuwenhuizen	d76a1e2aa1	turnip: Implement buffer->buffer DMA copies. Passes dEQP-VK.api.copy_and_blit.core.buffer_to_buffer.*	2019-03-11 10:02:13 -07:00
Bas Nieuwenhuizen	bafbf3bafe	turnip: Add tu6_rb_fmt_to_ifmt.	2019-03-11 10:02:13 -07:00
Bas Nieuwenhuizen	148876d424	turnip: Make tu6_emit_event_write shared.	2019-03-11 10:02:13 -07:00
Bas Nieuwenhuizen	7238471587	turnip: Add buffer memory binding.	2019-03-11 10:02:13 -07:00
Chia-I Wu	08b1c3fc7f	turnip: respect color attachment formats Make tu6_get_native_format available to tu_cmd_buffer and start using of it.	2019-03-11 10:02:13 -07:00
Chia-I Wu	68c27ea92b	turnip: preliminary support for fences This should be quite complete feature-wise. External fences are still missing. We probably also want to add a simpler path to tu_WaitForFences for when fenceCount == 1.	2019-03-11 10:02:13 -07:00
Chia-I Wu	15319963fa	turnip: fix VkClearValue packing Add tu_pack_clear_value to correctly pack VkClearValue according to VkFormat. It ignores the component order defined by VkFormat, and always packs to WZYX order.	2019-03-11 10:02:13 -07:00
Chia-I Wu	6545461041	turnip: add support for VK_KHR_external_memory_{fd,dma_buf}	2019-03-11 10:02:13 -07:00
Chia-I Wu	6d1c4049de	turnip: advertise VK_KHR_external_memory AFAICT, it is supported. We don't need to handle any of the new structs because our BOs can always be exported.	2019-03-11 10:02:13 -07:00
Chia-I Wu	0253845272	turnip: advertise VK_KHR_external_memory_capabilities AFAICT, it is supported.	2019-03-11 10:02:13 -07:00
Chia-I Wu	de89436216	turnip: add functions to import/export prime fd Add tu_bo_init_dmabuf, tu_bo_export_dmabuf, tu_gem_import_dmabuf, and tu_gem_export_dmabuf.	2019-03-11 10:02:13 -07:00
Chad Versace	d5239bc59c	turnip: Fix error behavior for VkPhysicalDeviceExternalImageFormatInfo If the handle type is unsupported, then the spec requires us to return VK_ERROR_FORMAT_NOT_SUPPORTED. Reviewed-by: Chia-I Wu <olvaffe@gmail.com> Closes: https://gitlab.freedesktop.org/bnieuwenhuizen/mesa/merge_requests/17	2019-03-11 10:02:13 -07:00
Chia-I Wu	4b9f967cd1	turnip: add a more complete format table A format table is an array of tu_native_format. Table lookup is done through array indexing. This commit defines a single format table for core VkFormat. It is derived from the table in the gallium driver. There might be errors introduced in the process of the conversion. When an extension that defines new VkFormat is supported, we need to add a new table for the extension.	2019-03-11 10:02:13 -07:00
Chia-I Wu	f3bf779184	turnip: preliminary support for loadOp and storeOp - create tile_load_ib and tile_store_ib at the beginning of each subpass - execute the IBs at the end of each subpass - no DONT_CARE support - no subpass dependency analysis and subpass merging - no zs support - no true VkImageView support - assume VK_FORMAT_B8G8R8A8_UNORM - no tiling - no MSAA This also removes cur_cs from tu_cmd_buffer.	2019-03-11 10:02:13 -07:00
Chia-I Wu	0aeef7c8bd	turnip: add TU_CS_MODE_SUB_STREAM When in TU_CS_MODE_SUB_STREAM, tu_cs_begin_sub_stream (or tu_cs_end_sub_stream) should be called instead of tu_cs_begin (or tu_cs_end). It gives the caller a TU_CS_MODE_EXTERNAL cs to emit commands to.	2019-03-11 10:02:13 -07:00
Chia-I Wu	f59c381423	turnip: add tu_cs_mode Add tu_cs_mode and TU_CS_MODE_EXTERNAL. When in TU_CS_MODE_EXTERNAL, tu_cs wraps an external buffer and can not grow. This also moves tu_cs* up in tu_private.h, such that other structs can embed tu_cs_entry.	2019-03-11 10:02:13 -07:00
Chia-I Wu	5c63fc626f	turnip: provide both emit_ib and emit_call tu_cs_emit_ib emits a CP_INDIRECT_BUFFER for a BO. tu_cs_emit_call emits a CP_INDIRECT_BUFFER for each entry of a target cs.	2019-03-11 10:02:13 -07:00
Chia-I Wu	741a4325df	turnip: add tu_cs_sanity_check It replaces tu_cs_reserve_space_assert and can be called at any time to sanity check tu_cs.	2019-03-11 10:02:13 -07:00
Chia-I Wu	29f1110003	turnip: never fail tu_cs_begin/tu_cs_end Error checking tu_cs_begin/tu_cs_end is too tedious for the callers. Move tu_cs_add_bo and tu_cs_reserve_entry to tu_cs_reserve_space such that tu_cs_begin/tu_cs_end never fails.	2019-03-11 10:02:13 -07:00
Chia-I Wu	0d81be3959	turnip: specify initial size in tu_cs_init We will drop size parameter from tu_cs_begin shortly, such that tu_cs_begin never fails.	2019-03-11 10:02:13 -07:00
Chia-I Wu	2774a1b97d	turnip: add tu_cs_{reserve,add}_entry We will stop calling tu_cs_reserve_entry in tu_cs_end shortly, such that tu_cs_end never fails.	2019-03-11 10:02:13 -07:00
Chia-I Wu	c11580373f	turnip: add internal helpers for tu_cs Add tu_cs_get_offset, tu_cs_get_size, tu_cs_get_space, and tu_cs_is_empty.	2019-03-11 10:02:13 -07:00
Chia-I Wu	429e2d5755	turnip: add tu_tiling_config We need the current color/depth/stencil attachments and the current render area to compute the tiling config. We compute the tiling config at the beginning of each subpass for the moment. We should change that when the driver can reorder/merge subpasses. It is very common that the render area is the entire framebuffer. We might want to optimize for the case and compute the tiling config in tu_framebuffer ctor.	2019-03-11 10:02:13 -07:00
Chia-I Wu	7c4483de0e	turnip: preliminary support for tu_GetRenderAreaGranularity Set it to tile alignments, 32x32 on 6xx.	2019-03-11 10:02:13 -07:00
Chia-I Wu	9c83a7572b	turnip: emit HW init in tu_BeginCommandBuffer Being the first commit that emits meaningful command packets, there are many things included in this commit - tu6_emit_xxx are low-level helpers that emit command packets without boundary checks - tu6_xxx are high-level helpers that emit command packets with boundary checks - cmdbuf->cs is a pointer to the current CS, so that we can use the helpers above to emit to other CS - use cmd as the variable name of tu_cmd_buffer - there is a per-cmdbuf scratch bo for CP_EVENT_WRITE writeback - there is a per-cmdbuf debug marker, using scratch reg 7 or 6 depending on whether the cmdbuf is primary or secondary (olv, after rebase) REG_A6XX_SP_UNKNOWN_AB20 is renamed	2019-03-11 10:01:49 -07:00
Chia-I Wu	3b3af6321b	turnip: add tu_cs_reserve_space(_assert) They are used like tu_cs_reserve_space(...); tu_cs_emit(...); ...; tu_cs_reserve_space_assert(); to make sure we reserved enough space at the beginning.	2019-03-11 10:01:41 -07:00
Chad Versace	aaa59ef70c	turnip: Annotate vkGetImageSubresourceLayout with tu_stub Reviewed-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-11 10:01:41 -07:00
Chia-I Wu	ba6bcb387c	turnip: preliminary support for tu_CmdBeginRenderPass	2019-03-11 10:01:41 -07:00
Chia-I Wu	1085df8176	turnip: preliminary support for tu_image_view_init	2019-03-11 10:01:41 -07:00
Chia-I Wu	992ecdd40e	turnip: preliminary support for tu_BindImageMemory2	2019-03-11 10:01:41 -07:00
Chia-I Wu	ef49b07b83	turnip: add cmdbuf->bo_list to bo_list in queue submit	2019-03-11 10:01:41 -07:00
Chia-I Wu	6c4df43db5	turnip: add tu_bo_list_merge tu_bo_list_merge adds an entire list to the current list.	2019-03-11 10:01:41 -07:00
Chia-I Wu	7ad01913bd	turnip: build drm_msm_gem_submit_bo array directly Build drm_msm_gem_submit_bo array directly in tu_bo_list. We might change this again, but this is good enough for now. There are other issues as well, such as not using VkAllocationCallbacks and sloppy error checking. We should revisit this in the near future. Same to tu_cs.	2019-03-11 10:01:41 -07:00
Chia-I Wu	c969d8b975	turnip: add more tu_cs helpers	2019-03-11 10:01:41 -07:00
Chia-I Wu	39ba2b20d1	turnip: inline tu_cs_check_space This allows the fast path (size check) to be inlined.	2019-03-11 10:01:41 -07:00
Chia-I Wu	2bcaa78236	turnip: update cs->start in tu_cs_end This allows us to assert that there is no dangling command in tu_cs_begin, rather than discarding them silently.	2019-03-11 10:01:41 -07:00
Chia-I Wu	b01d1618a4	turnip: minor cleanup to tu_cs_end Add comments and error checking.	2019-03-11 10:01:41 -07:00
Chia-I Wu	af4eb20891	turnip: add tu_cs_add_bo Refactor BO allocation code out of tu_cs_begin. Add error checking.	2019-03-11 10:01:41 -07:00
Chia-I Wu	ae9a72b48b	turnip: document tu_cs	2019-03-11 10:01:41 -07:00
Chia-I Wu	45120127ea	turnip: run sed and clang-format on tu_cs	2019-03-11 10:01:41 -07:00
Kristian H. Kristensen	0801019d33	turnip: Only get bo offset when we need to mmap The offset we get from MSM_INFO_GET_OFFSET is an offset into the drm fd for the purpose of mmaping the buffer.	2019-03-11 10:01:41 -07:00
Bas Nieuwenhuizen	23d6f0f970	turnip: Move stream functions to tu_cs.c	2019-03-11 10:01:41 -07:00
Bas Nieuwenhuizen	ac2a845abf	turnip: Add emit functions in a header. This adds a radv-style check_space functions + emit functions. Also puts them in a header as a bunch of inlines, so (1) we can use them from meta code. (2) they are inline for performance as these are common and small. Did not put them in tu_private.h as a bunch of inlines only clutters up that huge headerfile. Precise error propagation for memory allocation failures is still todo.	2019-03-11 10:01:41 -07:00
Chia-I Wu	2e684cb800	turnip: preliminary support for tu_QueueWaitIdle This creates a new fd on each queue submit. I do not go with DRM_IOCTL_MSM_WAIT_FENCE solely because the path is marked legacy. Otherwise, we can use the fence id rather than requesting a fence fd until external fences are supported and enabled.	2019-03-11 10:01:41 -07:00
Chia-I Wu	b7a6a80e6c	turnip: constify tu_device in tu_gem_*	2019-03-11 10:01:41 -07:00
Chia-I Wu	3809e6cf63	turnip: add wrappers around DRM_MSM_SUBMITQUEUE_* Add tu_drm_submitqueue_new and tu_drm_submitqueue_close.	2019-03-11 10:01:41 -07:00
Chia-I Wu	fcf24f47aa	turnip: add wrappers around DRM_MSM_GET_PARAM Add tu_drm_get_gpu_id and tu_drm_get_gmem_size.	2019-03-11 10:01:41 -07:00
Chia-I Wu	a25a803127	turnip: remove unnecessary libfreedreno_drm dep Remove libfreedreno_drm dep and unused fd_device.	2019-03-11 10:01:41 -07:00
Chia-I Wu	91232c52fe	turnip: use msm_drm.h from inc_freedreno The recent change to msm_drm.h changed the APIs in an incompatible way.	2019-03-11 10:01:41 -07:00
Bas Nieuwenhuizen	4f32869e3d	turnip: Shorten primary_cmd_stream name. It really is too long.	2019-03-11 10:01:41 -07:00
Bas Nieuwenhuizen	26261847cf	turnip: Fill command buffer	2019-03-11 10:01:41 -07:00
Bas Nieuwenhuizen	abe352525d	turnip: Implement submission.	2019-03-11 10:01:41 -07:00
Bas Nieuwenhuizen	abf0792bbe	turnip: Make bo_list functions not static	2019-03-11 10:01:41 -07:00
Bas Nieuwenhuizen	65e0e79054	turnip: Add msm queue support.	2019-03-11 10:01:41 -07:00
Bas Nieuwenhuizen	8713499657	turnip: Add a command stream.	2019-03-11 10:01:41 -07:00
Bas Nieuwenhuizen	e3a9b07923	turnip: Implement a slow bo list	2019-03-11 10:01:41 -07:00
Bas Nieuwenhuizen	48b65201a6	turnip: Implement some UUIDs.	2019-03-11 10:01:41 -07:00
Bas Nieuwenhuizen	7ae005f037	turnip: clean up TODO. ./deqp-vk -n dEQP-VK.info.* Writing test log into TestResults.qpa dEQP Core unknown (0xcafebabe) starting.. target implementation = 'Surfaceless' WARNING: tu is not a conformant vulkan implementation, testing use only. WARNING: tu is not a conformant vulkan implementation, testing use only. Test case 'dEQP-VK.info.build'.. Pass (Not validated) Test case 'dEQP-VK.info.device'.. Pass (Not validated) Test case 'dEQP-VK.info.platform'.. Pass (Not validated) Test case 'dEQP-VK.info.memory_limits'.. Pass (Pass) DONE! Test run totals: Passed: 4/4 (100.0%) Failed: 0/4 (0.0%) Not supported: 0/4 (0.0%) Warnings: 0/4 (0.0%)	2019-03-11 10:01:41 -07:00
Bas Nieuwenhuizen	06602bf77f	turnip: Remove some radv leftovers.	2019-03-11 10:01:41 -07:00
Bas Nieuwenhuizen	c72e6085e7	turnip: Implement some format properties for RGBA8. Just to get some tests to not skip. This is neither complete nor completely correct.	2019-03-11 10:01:41 -07:00
Chia-I Wu	d30baaaba6	turnip: add .clang-format Add and apply .clang-format.	2019-03-11 10:01:41 -07:00
Bas Nieuwenhuizen	6401ad389e	turnip: Implement pipe-less param query.	2019-03-11 10:01:41 -07:00
Bas Nieuwenhuizen	b0562e272f	turnip: move tu_gem.c to tu_drm.c	2019-03-11 10:01:41 -07:00
Bas Nieuwenhuizen	3d99dd55a0	turnip: Stop hardcoding the msm version check.	2019-03-11 10:01:41 -07:00
Bas Nieuwenhuizen	d9c3dc8ec8	turnip: Add image layout calculations.	2019-03-11 10:01:41 -07:00
Bas Nieuwenhuizen	603354cffa	turnip: Fix memory mapping.	2019-03-11 10:01:41 -07:00
Bas Nieuwenhuizen	426f6e46a8	turnip: Fix bo allocation after we stopped using libdrm_freedreno ... Al this figuring out new errors is why I don't like reinventing the wheel.	2019-03-11 10:01:41 -07:00
Bas Nieuwenhuizen	f0a24e123f	turnip: Add 630 to the list.	2019-03-11 10:01:41 -07:00
Chad Versace	c3b5eea2cc	turnip: Don't return from tu_stub funcs Since the macros are lowercase and look like normal functions, that they change control flow with a hidden return is surprising.	2019-03-11 10:01:41 -07:00
Chad Versace	bf709dfe3f	turnip: Fix 'unused' warnings Now turnip builds without warnings on my machine.	2019-03-11 10:01:41 -07:00
Chad Versace	471f2d8409	turnip: Add TODO file	2019-03-11 10:01:41 -07:00
Chad Versace	359e9016c5	turnip: Replace fd_bo with tu_bo (olv, after rebase) remove inc_drm_uapi	2019-03-11 10:01:33 -07:00
Chad Versace	eb16ec715f	turnip: Use vk_errorf() for initialization error messages This small cleanup better prepares turnip for VK_EXT_debug_report.	2019-03-11 10:01:33 -07:00
Chad Versace	1372c95ad2	turnip: Add TODO for Android logging	2019-03-11 10:01:33 -07:00
Chad Versace	cca208a033	turnip: Require DRM device version >= 1.3 Because the driver will require support for iova.	2019-03-11 10:01:33 -07:00
Chad Versace	5486943ed9	turnip: Fix indentation	2019-03-11 10:01:33 -07:00
Chad Versace	99a5de14cb	turnip: Fix a real -Wmaybe-uninitialized	2019-03-11 10:01:33 -07:00
Chad Versace	75f2c8458b	turnip: Use vk_outarray in all relevant public functions	2019-03-11 10:01:33 -07:00
Chad Versace	3ec87d56bd	turnip: Fix result of vkEnumerate*ExtensionProperties Given an unsupported layer name, the functions must return VK_ERROR_LAYER_NOT_PRESENT.	2019-03-11 10:01:33 -07:00
Chad Versace	ee835c7790	turnip: Fix result of vkEnumerateLayerProperties The functions must not return VK_ERROR_LAYER_NOT_PRESENT. The spec reserves that error for vkEnumerateExtensionProperties.	2019-03-11 10:01:33 -07:00
Chad Versace	daffb01704	turnip: Fix indentation in function signatures Due to s/anv/tu/, in many function signatures the indentation of parameters was off-by-one.	2019-03-11 10:01:33 -07:00
Bas Nieuwenhuizen	b4f3e0d549	turnip: Disable more features.	2019-03-11 10:01:33 -07:00
Bas Nieuwenhuizen	a01edd9c86	turnip: Initialize memory type in requirements.	2019-03-11 10:01:33 -07:00
Bas Nieuwenhuizen	7be2e1fc37	turnip: Cargo cult the Intel heap size functionality.	2019-03-11 10:01:33 -07:00
Bas Nieuwenhuizen	462b693d94	turnip: Report a memory type and heap.	2019-03-11 10:01:33 -07:00
Bas Nieuwenhuizen	8e52e8183c	turnip: Add buffer allocation & mapping support.	2019-03-11 10:01:33 -07:00
Bas Nieuwenhuizen	a0d62e4337	turnip: Fix newly introduced warning.	2019-03-11 10:01:33 -07:00
Bas Nieuwenhuizen	bcd15ab34e	turnip: Remove abort.	2019-03-11 10:01:33 -07:00
Bas Nieuwenhuizen	13ff7ffbcb	turnip: Gather some device info.	2019-03-11 10:01:33 -07:00
Bas Nieuwenhuizen	7922d50bd4	turnip: Fix up detection of device.	2019-03-11 10:01:33 -07:00
Chad Versace	c63cb15745	turnip: Drop Makefile.am and Android.mk The Makefile.am doesn't work. I tried fixing it but gave up because I don't understand Autotools. I strongly suspect the Android.mk also doesn't work. Rather than maintain the broken build files, let's delete them and re-add working build files if-and-when we need them. (Maybe we'll be lucky and turnip will never need to support Autotools!).	2019-03-11 10:01:33 -07:00
Bas Nieuwenhuizen	26380b3a9f	turnip: Add driver skeleton (v2) meson files have been updated, autotools and android still need updating. Only build tested. v2 (chadv): - Rebase onto master. - Fix build breakage in Python scripts. - Drop the WSI code. The internal WSI apis have changed recently, and will likely change again before the driver goes upstream. To avoid unnecessary rebase work, let's drop the WSI code and re-add it when we're ready to really use WSI. (olv, after rebase) do not enable freedreno by default on ARM	2019-03-11 10:01:15 -07:00
Connor Abbott	d086d16b81	nir/serialize: Prevent writing uninitialized state_slot data The nir_state_slot struct had some padding that was never initialized. Serializing the individual parts of the struct is more robust and avoids the overhead of zeroing it at creation, so just do that. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-11 15:17:41 +01:00
Tapani Pälli	47fc359822	anv: release memory allocated by glsl types during spirv_to_nir Fixes leaks for each glsl_type generated: ==32470== 384 bytes in 3 blocks are possibly lost in loss record 18 of 18 ==32470== at 0x483880B: malloc (vg_replace_malloc.c:309) ==32470== by 0x4C43F4A: ralloc_size (ralloc.c:119) ==32470== by 0x4C44014: rzalloc_size (ralloc.c:151) ==32470== by 0x4C44258: rzalloc_array_size (ralloc.c:215) ==32470== by 0x4D38957: glsl_type::glsl_type(glsl_struct_field const, unsigned int, char const) (glsl_types.cpp:114) ==32470== by 0x4D3BEED: glsl_type::get_struct_instance(glsl_struct_field const, unsigned int, char const) (glsl_types.cpp:1146) ==32470== by 0x4D42ECC: glsl_struct_type (nir_types.cpp:501) ==32470== by 0x4CDB5A1: vtn_handle_type (spirv_to_nir.c:1269) ==32470== by 0x4CE53DD: vtn_handle_variable_or_type_instruction (spirv_to_nir.c:4018) ==32470== by 0x4CD8CFF: vtn_foreach_instruction (spirv_to_nir.c:365) ==32470== by 0x4CE5E6B: spirv_to_nir (spirv_to_nir.c:4490) ==32470== by 0x497AF10: anv_shader_compile_to_nir (anv_pipeline.c:173) v2: move release call to vkDestroyInstance Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-11 13:14:45 +02:00
Eric Engestrom	f9a6460bbf	wsi/x11: use WSI_FROM_HANDLE() instead of pointer casts Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-03-11 10:49:36 +00:00
Eric Engestrom	f2e24dd81d	wsi/wayland: fix pointer casting warning on 32bit Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-03-11 10:49:36 +00:00
Eric Engestrom	687babc045	wsi/display: s/#if/#ifdef/ to fix -Wundef Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-03-11 10:49:36 +00:00
Eric Engestrom	1ee01d91c7	wsi: deduplicate get_current_time() functions between display and x11 Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-03-11 10:49:36 +00:00
Tapani Pälli	7bb34ecff9	anv: release memory allocated by bo_heap when descriptor pool is destroyed Fixes following leak: ==21853== 32 bytes in 1 blocks are definitely lost in loss record 2 of 20 ==21853== at 0x483AB1A: calloc (vg_replace_malloc.c:762) ==21853== by 0x4C4DD7F: util_vma_heap_free (vma.c:221) ==21853== by 0x4C4D647: util_vma_heap_init (vma.c:46) ==21853== by 0x4957B9F: anv_CreateDescriptorPool (anv_descriptor_set.c:578) Fixes: `c520f4dec9` ("anv: Add a concept of a descriptor buffer") Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-11 08:13:27 +02:00
Tapani Pälli	105002bd2d	anv: destroy descriptor sets when pool gets destroyed Patch maintains a list of sets in the pool and destroys possible remaining sets when pool is destroyed. As stated in Vulkan spec: "When a pool is destroyed, all descriptor sets allocated from the pool are implicitly freed and become invalid." This fixes memory leaks spotted with valgrind: ==19622== 96 bytes in 1 blocks are definitely lost in loss record 2 of 3 ==19622== at 0x483880B: malloc (vg_replace_malloc.c:309) ==19622== by 0x495B67E: default_alloc_func (anv_device.c:547) ==19622== by 0x4955E05: vk_alloc (vk_alloc.h:36) ==19622== by 0x4956A8F: anv_multialloc_alloc (anv_private.h:538) ==19622== by 0x4956A8F: anv_CreateDescriptorSetLayout (anv_descriptor_set.c:217) Fixes: `14f6275c92` ("anv/descriptor_set: add reference counting for descriptor set layouts") Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-11 08:13:01 +02:00
Timothy Arceri	051b4064da	anv: add support for dumping shader info via VK_EXT_debug_report This information will be used by the vkpipeline-db tool. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-11 16:16:04 +11:00
Kenneth Graunke	f36794d1f0	iris: Fix backface stencil write condition A bit too much search and replace here.	2019-03-10 14:52:53 -07:00
Alyssa Rosenzweig	ea2cd73625	panfrost/drm: Cast pointer to u64 to fix warning Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-10 19:16:56 +00:00
Tomeu Vizoso	756f7b9989	panfrost: Add backend targeting the DRM driver This backend interacts with the new DRM driver for Midgard GPUs which is currently in development. When using this backend, Panfrost has roughly on-par functionality as when using the non-DRM driver from Arm. Alyssa Rosenzweig: To do so, we implement additional routines for runtime GPU version detection and fencing. We cleanup some duplicate code interfering with the new driver. We fix a long-standing memory leak which is aggravated on the new driver. Finally, we implement BO import/export in a way compatible with the new driver. These changes are squashed to preserve bisectability given the hard-to-track ABI shifts in the nondrm module Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-10 19:09:23 +00:00
Tomeu Vizoso	d4dc79df72	panfrost: Add gem_handle to panfrost_memory and panfrost_bo It will be used by the DRM backend to store GEM handles from the kernel. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-10 18:56:56 +00:00
Rob Clark	941adcef03	freedreno/a6xx: more bcolor fixes Non-zero offset wasn't working, which breaks a bunch of dEQP-GLES31.functional.texture.border_clamp.formats.* when doing sharded deqp runs (because order of tests changes, resulting in different texture state bound.. deqp doesn't really clean up it's gl state between tests very well) Previously, if additional textures were bound, due to using too small of a bcolor_entry size, the last 32bytes of the bcolor_entry would be overwritten. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-10 11:40:06 -04:00
Eric Engestrom	db944999a1	gitlab-ci: add panfrost to the gallium drivers build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-09 23:25:12 +00:00
Eric Engestrom	e6ba67dd65	panfrost: move #include to fix compilation In standalone.h, the struct gl_context type is not declared by #includ'ing mtypes.h: In file included from src/gallium/drivers/panfrost/midgard/cmdline.c:24: src/compiler/glsl/standalone.h:46:14: warning: ‘struct gl_context’ declared inside parameter list will not be visible outside of this definition or declaration struct gl_context ctx); ^~~~~~~~~~ This causes the following compilation failure: src/gallium/drivers/panfrost/midgard/cmdline.c: In function ‘compile_shader’: src/gallium/drivers/panfrost/midgard/cmdline.c:58:61: error: passing argument 4 of ‘standalone_compile_shader’ from incompatible pointer type [-Werror=incompatible-pointer-types] prog = standalone_compile_shader(&options, 2, argv, &local_ctx); ^~~~~~~~~~ In file included from src/gallium/drivers/panfrost/midgard/cmdline.c:24: src/compiler/glsl/standalone.h:43:28: note: expected ‘struct gl_context ’ but argument is of type ‘struct gl_context ’ struct gl_shader_program standalone_compile_shader( ^~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: `e67e072637` "panfrost: Implement Midgard shader toolchain" Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-09 22:37:40 +00:00
Eric Engestrom	d4d29c0455	panfrost: fix tgsi_to_nir() call Bug: https://bugs.freedesktop.org/show_bug.cgi?id=109945 Fixes: `7da251fc72` "panfrost: Check in sources for command stream" Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-09 22:06:19 +00:00
Axel Davy	5475434fa6	Revert "d3dadapter9: Support software renderer on any DRI device" This reverts commit `0d08476593`. It makes gitlab's travis fail. Revert until patch is fixed. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-03-09 14:29:43 +01:00
Axel Davy	597b5e27fa	st/nine: Change a few advertised caps Most hw on the native platform advertise these caps this way. D3DCAPS_READ_SCANLINE: We don't really have hardware support for that, but many games don't even check the flag, and expect GetRasterStatus to work, which is why we emulated it with a timer (like wine). So we may as well advertise the cap. D3DCURSORCAPS_LOWRES: I don't know what is the status of this on X11, but I don't know of any dx9 game running at height < 400 either. D3DPTEXTURECAPS_TEXREPEATNOTSCALEDBYSIZE: The cap should correspond to what the current generation of hw is doing. Signed-off-by: Axel Davy <davyaxel0@gmail.com> Reviewed-by: Patrick Rudolph <siro@das-labor.org>	2019-03-09 13:57:49 +01:00
Axel Davy	0d3c37e2f9	st/nine: Do not advertise CANMANAGERESOURCE It doesn't seem the main vendors advertise it. Signed-off-by: Axel Davy <davyaxel0@gmail.com> Reviewed-by: Patrick Rudolph <siro@das-labor.org>	2019-03-09 13:57:49 +01:00
Axel Davy	a8583e75d6	st/nine: Do not advertise support for D15S1 and D24X4S4 The former is supported on Matrox cards but no other hw. The latter isn't supported anywhere. It is fine to not advertise them as supported, and it could prevent apps to trigger weird rendering paths. Signed-off-by: Axel Davy <davyaxel0@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-09 13:57:49 +01:00
Patrick Rudolph	0d08476593	d3dadapter9: Support software renderer on any DRI device If D3D_ALWAYS_SOFTWARE is set for debugging purposes, run on any DRI enabled platform. Instead of probing for a compatible gallium driver (which might fail if there's none) always use the KMS DRI software renderer. Allows to run nine on i915 when D3D_ALWAYS_SOFTWARE=1. Signed-off-by: Patrick Rudolph <siro@das-labor.org> Reviewed-by: Axel Davy <davyaxel0@gmail.com>	2019-03-09 13:57:49 +01:00
Axel Davy	f7b9c09c7c	st/nine: Disable depth write when nothing gets updated I do not see any perf impact on radeonsi, but it seems iris needs this. It seems something sensible to do. Signed-off-by: Axel Davy <davyaxel0@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: Andre Heider <a.heider@gmail.com>	2019-03-09 13:57:49 +01:00
Elie Tournier	d7b3196976	virgl: Return an error if we use fp64 on top of GLES Signed-off-by: Elie Tournier <elie.tournier@collabora.com> Reviewed-by: <Gurchetan Singh gurchetansingh@chromium.org>	2019-03-09 11:33:20 +01:00
Elie Tournier	1f1514e1aa	virgl: Set PIPE_CAP_DOUBLES when running on GLES This is a lie but no known app use fp64. Signed-off-by: Elie Tournier <elie.tournier@collabora.com> Reviewed-by: <Gurchetan Singh gurchetansingh@chromium.org>	2019-03-09 11:33:14 +01:00
Elie Tournier	8ad1e86bb0	virgl: Add a caps to advertise GLES backend Signed-off-by: Elie Tournier <elie.tournier@collabora.com> Reviewed-by: <Gurchetan Singh gurchetansingh@chromium.org>	2019-03-09 11:32:30 +01:00
Kenneth Graunke	da51e3f1b0	Revert MR 369 (Fix extract_i8 and extract_u8 for 64-bit integers) This broke piles of image load store tests (179 failures on CI, mesa_master build #15546, previous build right before this landed was green). I'd rather not leave the tree on fire over the weekend, so let's revert for now, and we can figure out what happened next week.	2019-03-09 01:42:16 -08:00
Ian Romanick	18e4bf65de	nir/algebraic: Add missing 16-bit extract_[iu]8 patterns No shader-db changes on any Intel platform. v2: Use a loop to generate patterns. Suggested by Jason. Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-08 22:24:19 -08:00
Ian Romanick	55c1ac4b75	nir/algebraic: Add missing 64-bit extract_[iu]8 patterns No shader-db changes on any Intel platform. v2: Use a loop to generate patterns. Suggested by Jason. Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-08 22:24:19 -08:00
Ian Romanick	9aaaac6080	nir/algebraic: Remove redundant extract_[iu]8 patterns No shader-db changes on any Intel platform. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-08 22:24:19 -08:00
Ian Romanick	37ee462e03	nir/algebraic: Fix up extract_[iu]8 after loop unrolling Skylake, Broadwell, and Haswell had similar results. (Skylake shown) total instructions in shared programs: 15256840 -> 15256837 (<.01%) instructions in affected programs: 4713 -> 4710 (-0.06%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.06% max: 0.08% x̄: 0.06% x̃: 0.06% total cycles in shared programs: 372286583 -> 372286583 (0.00%) cycles in affected programs: 198516 -> 198516 (0.00%) helped: 1 HURT: 1 helped stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10 HURT stats (rel) min: 0.01% max: 0.01% x̄: 0.01% x̃: 0.01% No changes on any other Intel platform. v2: Use a loop to generate patterns. Suggested by Jason. Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-08 22:24:19 -08:00
Jason Ekstrand	8fdee457a4	anv/pipeline: Move lower_explicit_io much later Now that nir_opt_copy_prop_vars can properly handle array derefs on vectors, it's safe to move UBO and SSBO lowering to late in the pipeline. This should allow NIR to actually start optimizing SSBO access. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-08 22:03:34 -06:00
Jason Ekstrand	179d254cba	intel/nir: Move lower_mem_access_bit_sizes to postprocess_nir It doesn't really matter where this pass goes as long as it's after we call nir_lower_explicit_io and before we go into the back-end. Putting it brw_postprocess_nir lets us move nir_lower_explicit_io significantly later in the pipeline. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-08 22:03:14 -06:00
Rob Clark	ad25948261	freedreno/ir3: turn on [iu]mul_high Which also requires uadd_carry lowering Until recently this was lowered in glsl ir so it went unnoticed that we weren't lowering it. Fixes: `1d8994a63b` glsl: [u/i]mulExtended optimization for GLSL Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-08 18:44:57 -05:00
Rob Clark	53083e4fbc	freedreno/ir3: fix ir3_cmdline harder Fixes: `45271702ec` freedreno: fix ir3_cmdline build Fixes: `7530d4abfc` glsl/freedreno/panfrost: pass gl_context to the standalone compiler Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-08 18:44:57 -05:00
Eric Anholt	fafead7b62	st/dri: Set the PIPE_BIND_SHARED flag on create_image_with_modifiers. With createImage(), the caller was expected to set a SHARED flag if they needed the ability to get a GEM handle. DRI3, wayland, and gbm all set it, EGL_MESA_drm_image passes it through, and surfaceless doesn't need it because there's no way to request a handle. With the new createImageWithModifiers() DRI method to replace it, the expectation is that you'll always be able to share the buffer, so the flag is unnecessary in its arguments. However, we do need to tell gallium about this expectation. Without this, kmscube's modifiers path using gbm_bo_create_with_modifiers(&modifier, 1) instead of gbm_bo_create(SCANOUT \| SHARED) will call the driver's resource_create() function wtih PIPE_BIND_SHARED unset, so the driver (particularly renderonly drivers) may allocate in such a way that it can't return an answer from gbm_bo_get_handle(). I used to have a hack in v3d using count==1 && modifier==LINEAR to indicate that you wanted SHARED anyway, but that was dropped recently. Fixes: `59527a36e9` ("v3d: Restructure RO allocations using resource_from_handle.") Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-03-08 15:33:35 -08:00
Kenneth Graunke	9d1334d2a0	iris: Use copy_region and staging resources to avoid transfer stalls This is similar to intel_miptree_map_blit and intel_buffer_object.c's temporary blits in i965. Improves performance of DiRT Rally by 20-25% by eliminating stalls. Breaks piglit's spec/arb_shader_image_load_store/host-mem-barrier, by using the GPU to do uploads, exposing a st/mesa issue where it doesn't give us memory_barrier() calls. This is a pre-existing issue and will be fixed by a later patch (currently out for review).	2019-03-08 13:29:39 -08:00
Eric Engestrom	f67c870179	android: fix missing backspace for line continuation Reported-by: Clayton Craft <clayton.a.craft@intel.com> Bug: https://bugs.freedesktop.org/show_bug.cgi?id=109944 Fixes: `e1d81decf7` "build: make passing an incorrect pointer type a hard error" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-08 21:14:24 +00:00
Karol Herbst	8a8742d327	prog_to_nir: fix write from vps to FOG for fragment programs we already treat fog as a single component value, but for vp we didn't. Fixes fog related piglit tests with my out of tree Nouveau nir patches. Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-08 10:35:12 -08:00
Sagar Ghuge	bca28deb46	iris: Track last VS URB entry size Return immediately if last VS URB entry size is good enough for BLORP operation v2: Fix comments (Caio) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Suggested-by: Kenneth Graunke<kenneth@whitecape.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-08 10:01:39 -08:00
Sagar Ghuge	d0a8fba69a	iris: Refactor code to share 3DSTATE_URB_* packet v2: 1) Set IRIS_DIRTY_URB bit (Caio) 2) Get rid of unnecessary function (Caio) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-08 10:01:39 -08:00
Eric Engestrom	6e3d3f5b2c	glx/meson: use full include path for dri_interface.h Everything else uses `#include "GL/internal/dri_interface.h"` instead, and this full path was even already used in other parts of GLX. While at it, nothing uses `inc_gl_internal` anymore so let's remove it as well. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Tested-by: Clayton Craft <clayton.a.craft@intel.com>	2019-03-08 18:00:19 +00:00
Eric Engestrom	b1218d8cf7	hgl/meson: drop unused include directory Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Tested-by: Clayton Craft <clayton.a.craft@intel.com>	2019-03-08 18:00:19 +00:00
Brian Paul	0de83bacf0	intel/compiler: silence unitialized variable warning in opt_vector_float() Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-08 10:23:11 -07:00
Brian Paul	b5ea56e411	intel/decoders: silence uninitialized variable warnings in gen_print_batch() Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-08 10:23:11 -07:00
Brian Paul	e5e2be3c73	st/mesa: init hash keys with memset(), not designated initializers Since the compiler may not zero-out padding in the object. Add a couple comments about this to prevent misunderstandings in the future. Fixes: `67d96816ff` ("st/mesa: move, clean-up shader variant key decls/inits") Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-03-08 10:23:11 -07:00
Eric Engestrom	d2cff164cd	gitlab-ci: fix llvm version (7 doesn't have a ".0") Fixes: `85ee157283` "gitlab-ci: autotools needs to be told which llvm version to use" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-08 17:03:06 +00:00
Eric Engestrom	e1d81decf7	build: make passing an incorrect pointer type a hard error More or less any of this issue pointed out by the compiler is a coding error. Make sure we flag it and bail loudly. v2: - apply the change to autotools and scons as well (Emil) - C++ doesn't need this, it's already an error and the flag doesn't exist (Gert) v3: - drop scons, flags are not checked so until someone adds that functionality we can't have this. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> # v1 Reviewed-by: Emil Velikov <emil.velikov@collabora.com> # v1 [Emil: apply the same change to autotools and scons] Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-08 16:24:06 +00:00
Eric Engestrom	598f10eacc	r600: cast pointer to expected type Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-By: Gert Wollny <gert.wollny@collabora.com>	2019-03-08 16:24:06 +00:00
Eric Engestrom	85ee157283	gitlab-ci: autotools needs to be told which llvm version to use Fixes: 45d58cd91567b39f51af "gitlab-ci: only build the default (=latest) and oldest llvm versions" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-08 16:03:04 +00:00
Eric Engestrom	3006f9d8c0	gitlab-ci: only build the default (=latest) and oldest llvm versions Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-08 15:59:27 +00:00
Eric Engestrom	08b70e1c2b	travis: clean up Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-08 15:33:39 +00:00
Eric Engestrom	e2f528bf21	travis: drop unused vars Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-08 15:17:42 +00:00
Eric Engestrom	44c420aa1b	travis: fix meson build by letting `auto` do its job Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-08 15:17:42 +00:00
Eric Engestrom	9cf85d3b78	autotools: don't build libGLES*.so with GLVND GLVND already provides these, so distro packagers have been deleting them all along. Let's save ourselves the trouble and not build them in the first place. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-03-08 15:13:36 +00:00
Eric Engestrom	b01524fff0	meson: don't build libGLES*.so with GLVND GLVND already provides these, so distro packagers have been deleting them all along. Let's save ourselves the trouble and not build them in the first place. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-03-08 15:13:36 +00:00
Brian Paul	2c387819f4	pipebuffer: s/PB_ALL_USAGE_FLAGS/PB_USAGE_ALL/ To fix build failure. I guess my meson configuration has assertions disabled for some reason. Trivial fix.	2019-03-08 08:07:24 -07:00
Brian Paul	d4381cf593	svga: remove SVGA_RELOC_READ flag in SVGA3D_BindGBSurface() This fixes a rendering issue where UBO updates aren't always picked up by drawing calls. This issue effected the Webots robotics simulator. VMware bug 2175527. Testing Done: Webots replay, piglit, misc Linux games Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>	2019-03-08 07:40:35 -07:00
Brian Paul	07e8a31e49	svga: refactor draw_vgpu10() function The draw_vgpu10() function was huge. Move the code for preparing the vertex buffers and the index buffer into separate functions. Reviewed-by: Neha Bhende <bhenden@vmware.com>	2019-03-08 07:40:35 -07:00
Brian Paul	53acd4c688	st/mesa: whitespace, formatting fixes in st_cb_flush.c Trivial.	2019-03-08 07:40:35 -07:00
Brian Paul	67d96816ff	st/mesa: move, clean-up shader variant key decls/inits Move the variant key declarations inside the scope they're used. Use designated initializers instead of memset() calls. Reviewed-by: Neha Bhende <bhenden@vmware.com>	2019-03-08 07:40:35 -07:00
Brian Paul	76a10fc89e	winsys/svga: use new pb_usage_flags enum type And add a comment that we're implicitly converting PIPE_TRANSFER_ flags to PB_USAGE_ flags in one place. And statically assert that the enum values match. Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>	2019-03-08 07:40:35 -07:00
Brian Paul	b5f2b0d6b6	pipebuffer: whitespace fixes in pb_buffer.h Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>	2019-03-08 07:40:35 -07:00
Brian Paul	b286e74df6	pipebuffer: use new pb_usage_flags enum type Use a new enum type instead of 'unsigned' to make things a bit more understandable. Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>	2019-03-08 07:40:35 -07:00
Charmaine Lee	daf567f797	svga: add svga shader type in the shader variant With this patch, the svga shader type will be saved in the shader variant, and there is no need to pass in the shader type to the define/destroy variant functions. Reviewed-by: Brian Paul <brianp@vmware.com>	2019-03-08 07:40:34 -07:00
Brian Paul	ac6b33a50d	gallium/util: add some const qualifiers in u_bitmask.c And add/update comments. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2019-03-08 07:40:34 -07:00
Brian Paul	b5a3a90c0c	gallium/util: whitespace cleanups in u_bitmask.[ch] Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2019-03-08 07:40:34 -07:00
Alejandro Piñeiro	686b7b1d48	nir/linker: fix ARRAY_SIZE query with xfb varyings For a non-array varying, it is expecting ARRAY_SIZE as 1, instead of 0. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-08 15:00:50 +01:00
Antia Puentes	de31fb2f4f	nir/linker: Fix TRANSFORM_FEEDBACK_BUFFER_INDEX From the ARB_enhanced_layouts specification: "For the property TRANSFORM_FEEDBACK_BUFFER_INDEX, a single integer identifying the index of the active transform feedback buffer associated with an active variable is written to <params>. For variables corresponding to the special names "gl_NextBuffer", "gl_SkipComponents1", "gl_SkipComponents2", "gl_SkipComponents3", and "gl_SkipComponents4", -1 is written to <params>." We were storing the xfb_buffer value, instead of the value corresponding to GL_TRANSFORM_FEEDBACK_BUFFER_INDEX. Note that the implementation assumes that varyings would be sorted by offset and buffer. Signed-off-by: Antia Puentes <apuentes@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-08 15:00:50 +01:00
Alejandro Piñeiro	7c0f411c27	nir/linker: use nir_gather_xfb_info Instead of a custom ARB_gl_spirv xfb gather info pass. In fact, this is not only about reusing code, but the current custom code was not handling properly how many varyings are enumerated from some complex types. So this change is also about fixing some corner cases. v2: Use util_bitcount, simplify current stage check (Kenneth) Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-08 15:00:50 +01:00
Alejandro Piñeiro	b2a212ac2e	nir/xfb: handle arrays and AoA of basic types On OpenGL, a array of a simple type adds just one varying. So gl_transform_feedback_varying_info struct defined at mtypes.h includes the parameters Type (base_type) and Size (number of elements). This commit checks this when the recursive add_var_xfb_outputs call handles arrays, to ensure that just one is addded. We also need to take into account AoA here v2: use glsl_type_is_leaf from nir_types (Timothy Arceri) v3: simplified aoa check, without the need ot using glsl_type_is_leaf, using glsl_types_is_struct (Timothy Arceri) Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-08 15:00:50 +01:00
Alejandro Piñeiro	2b65fecd85	nir_types: add glsl_type_is_struct helper Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-08 15:00:50 +01:00
Alejandro Piñeiro	8d693746e9	nir/xfb: sort varyings too Right now we are only re-sorting outputs. But it is better to sort too varyings, as linker expect them to be sorted out (as it was done on GLSL). For varyings, and to make easier to compute buffer_index, we sort also by buffer. We could do the same for outputs, but we lack a reason for that, so we left it as it is (just offset). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-08 15:00:50 +01:00
Alejandro Piñeiro	cf0b2ad486	nir/xfb: adding varyings on nir_xfb_info and gather_info In order to be used for OpenGL (right now for ARB_gl_spirv). This commit adds two new structures: * nir_xfb_varying_info: that identifies each individual varying. For each one, we need to know the type, buffer and xfb_offset * nir_xfb_buffer_info: as now for each buffer, in addition to the stride, we need to know how many varyings are assigned to it. For this patch, the only case where num_outputs != num_varyings is with the case of doubles, that for dvec3/4 could require more than one output. There are more cases though (like aoa), that will be handled on following patches. v2: updated after new nir general XFB support introduced for "anv: Add support for VK_EXT_transform_feedback" v3: compute num_varyings beforehand for allocating, instead of relying on num_outputs as approximate value (Timothy Arceri) Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-08 15:00:50 +01:00
Alejandro Piñeiro	9f68b9ac71	nir_types: add glsl_varying_count helper Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-08 15:00:50 +01:00
Alejandro Piñeiro	b62a8149ab	nir/xfb: add component_offset at nir_xfb_info Where component_offset here is the offset when accessing components of a packed variable. Or in other words, location_frac on nir.h. Different places of mesa use different names for it. Technically nir_xfb_info consumer can get the same from the component_mask, it seems somewhat forced to make it to compute it, instead of providing it. v2: rename local location_frac for comp_offset, more similar to the intended use (Timothy Arceri) Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-08 15:00:50 +01:00
Samuel Pitoiset	e72daf3e70	Revert "radv: execute external subpass barriers after ending subpasses" This changes is actually wrong because we have to sync before doing image layout transitions. This fixes rendering issues in Batman, Path of Exile and probably more titles. This reverts commit `76c17cfd8d`. Fixes: `76c17cfd8d` ("radv: execute external subpass barriers after ending subpasses") Cc: 19.0 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-08 14:59:26 +01:00
Lionel Landwerlin	7271808df8	intel/error2aub: support older style engine names Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-08 11:01:14 +00:00
Lionel Landwerlin	a036eac029	intel/error2aub: deal with GuC log buffer When Guc is enabled, the error state will contain a "global" buffer for the GuC log buffer. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-08 11:01:14 +00:00
Lionel Landwerlin	c619ea945d	intel/error2aub: add a verbose option Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-08 11:01:14 +00:00
Lionel Landwerlin	ca0161f890	intel/error2aub: write GGTT buffers into the aub file Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-08 11:01:14 +00:00
Lionel Landwerlin	9b5dc2124f	intel/error2aub: store engine last ring buffer head/tail pointers Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-08 11:01:14 +00:00
Lionel Landwerlin	cdab19fa57	intel/error2aub: annotate buffer with their address space Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-08 11:01:14 +00:00
Lionel Landwerlin	630a72827a	intel/error2aub: parse other buffer types We don't write them in the aub file yet. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-08 11:01:14 +00:00
Lionel Landwerlin	c0ea043888	intel/error2aub: strenghten batchbuffer identifier marker Found out that some base64 data matched the '---' identifier. We can avoid this by adding the surrounding spaces. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-08 11:01:14 +00:00
Lionel Landwerlin	650e6e5d33	intel/error2aub: identify buffers by engine Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-08 11:01:14 +00:00
Lionel Landwerlin	a07f5262f0	intel/error2aub: build a list of BOs before writing them The error state contains several kind of BOs, including the context image which we will want to write in a later commit. Because it can come later in the error state than the user buffers and because we need to write it first in the aub file, we have to first build a list of BOs and then write them in the appropriate order. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-08 11:01:14 +00:00
Chris Wilson	04ddff1aa4	iris: Wire up EGL_IMG_context_priority Add the missing PIPE_CAP_CONTEXT_PRIORITY_MASK and parsing of the context construction flags. Testcase: piglit/egl-context-priority Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-07 20:27:10 -08:00
Kenneth Graunke	2993088500	iris: Export a copy_region helper that doesn't flush I'll want to use this for transfer maps, which already do their own flushing. This lets us avoid a double flush, and also gives us more control over the batch which is selected.	2019-03-07 17:08:19 -08:00
Kenneth Graunke	335726fdac	iris: Spruce up "are we using this engine?" checks for flushing We were using batch->contains_draw as a proxy for "are we even using this engine?" That isn't quite right, because it only counts regular draws. BLORP operations may have also rendered to a resource, which needs to trigger flushing. To check for this, we also see if the render and sometimes depth caches are non-empty. We can also drop the "but there might already be stale data in the cache even if we haven't emitted any commands yet" concern in the comments. The kernel flushes caches between batches. This may not be great but it's at least better than what was there.	2019-03-07 17:08:07 -08:00
Timur Kristóf	b0c214ccee	radeonsi/nir: Only set window_space_position for vertex shaders. By mistake, this was previously set for all shaders. It is a vertex shader property so only makes sense to set it for vertex shaders. Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com> Reviewed-By: Timothy Arceri <tarceri@itsqueeze.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2019-03-08 00:39:45 +00:00
Jason Ekstrand	1664de5924	nir/builder: Add a build_deref_array_imm helper Unlike most of the cases in which we do this by hand, the new helper properly handles non-32-bit pointers. Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-07 21:20:30 +00:00
Jason Ekstrand	fcf2a0122e	nir/builder: Cast array indices in build_deref_follower There's no guarantee when build_deref_follower is called that the two derefs have the same bit size destination. Insert a cast on the array index in case we have differing bit sizes. While we're here, insert some asserts in build_deref_array and build_deref_ptr_as_array. The validator will catch violations here but they're easier to debug if we catch them while building. Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-07 21:20:30 +00:00
Jason Ekstrand	cd4c1458ba	nir/builder: Emit better code for iadd/imul_imm Because we already know the immediate right-hand parameter, we can potentially save the optimizer a bit of work. Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-07 21:20:30 +00:00
Rob Clark	ebbb6b8eaa	freedreno/a6xx: perfcntrs Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-07 15:33:42 -05:00
Rob Clark	40d8ed5ef3	freedreno/a6xx: fix border-color swizzles Fixes nearly all of the remaining dEQP-GLES31.functional.texture.border_clamp.formats.* fails Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-03-07 15:33:42 -05:00
Rob Clark	f5d80ff2db	freedreno/a6xx: refactor fd6_tex_swiz() We need a version of fd6_tex_swiz() that just returns the composed swizzle without building part of the TEX_CONST_0 state. So just refactor the existing function to build more of the TEX_CONST_0 state, and leave fd6_tex_swiz() simply composing swizzles. The small IBO state change (to use LINEAR for smaller sizes/levels) is to match the state in fd6_tex_const_0(). It seems like maybe tiled actually works at the smaller sizes but not if minification is in play, so best just to make images match what we do for textures. Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-03-07 15:33:42 -05:00
Rob Clark	8dc47490c8	freedreno/a6xx: remove astc_srgb workaround Not used on a6xx, so remove some of the related plumbing that was copied over from older gens. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-07 15:33:42 -05:00
Rob Clark	45271702ec	freedreno: fix ir3_cmdline build Fixes: `7530d4abfc` glsl/freedreno/panfrost: pass gl_context to the standalone compiler Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-07 15:33:20 -05:00
Kenneth Graunke	d53b1b6215	iris: Drop PIPE_CAP_BUFFER_SAMPLER_VIEW_RGBA_ONLY This cap is mainly for working around a r600 texture swizzle issue, but it also controls whether ARB_texture_buffer_object (with legacy formats) is enabled. I suspect the missing I/L/A/LA faking is why I had it set in the first place. Thanks to Ilia for pointing out that I shouldn't be setting this. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-07 11:39:27 -08:00
Kenneth Graunke	809a81ec3a	iris: Properly support alpha and luminance-alpha formats For texturing, we map alpha formats to the corresponding red format, as many alpha formats are outright missing, and red is more efficient when sampling anyway. When rendering to A8_UNORM, we use that format directly, so the image gets the shader output's .a/.w channel, rather than the .r/.x channel. All other A* formats are non-renderable, so we can't do much and just mark them as unsupported for rendering. Fortunately, GL only requires rendering to A8_UNORM, so that works out. According to Andre Heider and Timur Kristóf, this fixes font rendering in Witcher 1 (via nine). Andre also reported that it fixes Unigine Heaven (presumably via nine). v2: Use the same swizzle for both sampler views and "render targets". BLORP expects the read swizzle, and will take the inverse when setting up the destination swizzle (and actually applying it in the shaders). We ignore the format swizzle when setting up normal rendering SURFACE_STATEs, which is necessary because it would be an illegal shader channel select combination. Thanks to Jason Ekstrand for pointing out that BLORP took an inverse swizzle. Tested-by: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-07 11:39:27 -08:00
Kenneth Graunke	fbc51c4c95	iris: Defer uploading sampler state tables until draw time Gallium might call us multiple times to bind subsets of the samplers, at which point we'd recreate the table a bunch of times. It doesn't really buy us anything to do it here - even if we defer to draw time, the dirty tracking ensures we'll only do it on the first draw after a bind_sampler_states() call. We now use the number of samplers specified by the shader instead of the binding count. If this number changes, we flag sampler state as dirty so we re-upload a table with the right number of entries. This also fixes a bug where ice->state.need_border_colors was never unset, so once something needed border colors, the pool would always be pinned in all future batches. v2: Explicitly flag sampler states as dirty, rather than assuming that bind_sampler_states() will be called if the program texture count changes. While this may be true for st/mesa, it isn't the case for Gallium HUD. Tested-by: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-07 11:39:27 -08:00
Kenneth Graunke	9caabd6c5f	iris: Plumb through ISL_SWIZZLE_IDENTITY in buffer surface emitters Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-07 11:39:27 -08:00
Kenneth Graunke	4787bc944a	isl: Add a swizzle parameter to isl_buffer_fill_state() This is necessary for legacy texture buffer object formats, where we'll need to use a swizzle to fake e.g. luminance. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-07 11:39:27 -08:00
Lionel Landwerlin	575f8e8b60	iris: fix decode_get_bo callback Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `acb50d6b1f` ("intel/decoders: handle decoding MI_BBS from ring") Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-07 17:39:07 +00:00
Erik Faye-Lund	55e4759c8d	virgl: remove unused variable This variable is now unused, so let's remove it. Fixes: `9c4930946a` (virgl: add encoder functions for new protocol) Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-03-07 17:24:54 +00:00
Erik Faye-Lund	44620d4ef7	virgl: remove unused variable This variable is now unused, so let's remove it. Fixes: `db77573d7b` (virgl: modify how we handle GL_MAP_FLUSH_EXPLICIT_BIT) Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-03-07 17:24:54 +00:00
Erik Faye-Lund	524934586b	virgl: remove unused variable This variable is now unused, so let's remove it. Fixes: `c19aedcf1a` (virgl: don't mark unclean after a flush) Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-03-07 17:24:54 +00:00
Erik Faye-Lund	af29c93f22	virgl: remove unused variables These variables are now unused, let's remove them to get rif of a few warnings. Fixes: `f0e71b1088` (virgl: use transfer queue) Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-03-07 17:24:54 +00:00
Lionel Landwerlin	0e269c0ac2	iris: fix decoder call Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `acb50d6b1f` ("intel/decoders: handle decoding MI_BBS from ring")	2019-03-07 16:15:03 +00:00
Lionel Landwerlin	0b3871bc7f	intel/aub_write: factorize context image/pphwsp/ring creation We allocate GGTT entries and physical addresses are we create engines rather than having a fixed layout. Context images now receive a parameter argument which is used to setup pml4 & ring buffer addresses. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-07 15:08:32 +00:00
Lionel Landwerlin	c1a2c72e76	intel/aub_write: turn context images arrays into functions We'll make them more parameterized in a later commit. As this is just a transitional commit, we allow ourself to leak the context images allocated in get_context_init(). We'll fix this in the next commit. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-07 15:08:32 +00:00
Lionel Landwerlin	8e14c9b7db	intel/aub_write: store the physical page allocator in struct We want to use this allocator in the next commit for GGTT pages. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-07 15:08:32 +00:00
Lionel Landwerlin	0343a3b42b	intel/aub_write: log mmio writes Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-07 15:08:32 +00:00
Lionel Landwerlin	6ef46972d9	intel/aub_write: switch to use i915_drm engine classes Prepare aub write to deal with multiple engine instances. We don't pass the instance number yet this could be done in the future by having a 2 dimensional array of struct engine. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-07 15:08:32 +00:00
Lionel Landwerlin	8a81f5c255	intel/aub_write: break execlist write in 2 We want to reuse the execlist submission, but won't need the ring buffer update. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-07 15:08:32 +00:00
Lionel Landwerlin	69ee5bde4e	intel/aub_write: write header in init Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-07 15:08:31 +00:00
Lionel Landwerlin	01443f34b4	intel/aub_write: split comment section from HW setup In the future we'll want error2aub to reuse the context image saved by i915 instead of the default one we write in intel_dump_gpu. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-07 15:08:31 +00:00
Lionel Landwerlin	2b42adff14	intel/aub_read: reuse defines from gen_context Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-07 15:08:31 +00:00
Lionel Landwerlin	bf93084f44	intel/decoders: limit number of decoded batchbuffers IGT has a test to hang the GPU that works by having a batch buffer jump back into itself, trigger an infinite loop on the command stream. As our implementation of the decoding is "perfectly" mimicking the hardware, our decoder also "hangs". This change limits the number of batch buffer we'll decode before we bail to 100. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-07 15:08:31 +00:00
Lionel Landwerlin	acb50d6b1f	intel/decoders: handle decoding MI_BBS from ring An MI_BATCH_BUFFER_START in the ring buffer acts as a second level batchbuffer (aka jump back to ring buffer when running into a MI_BATCH_BUFFER_END). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-07 15:08:31 +00:00
Lionel Landwerlin	ec526d6ba0	intel/decoders: add address space indicator to get BOs Some commands like MI_BATCH_BUFFER_START have this indicator. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-03-07 15:08:31 +00:00
Eric Engestrom	3e8d5b5ed4	vulkan/overlay: fix missing var rename in previous commit Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-07 13:45:14 +00:00
Eric Engestrom	d141472d0e	vulkan/util: use the platform defines in vk.xml instead of hard-coding them See also: `3d4238d26c` "anv: use the platform defines in vk.xml instead of hard-coding them" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-07 11:49:44 +00:00
Andre Heider	a4324dcefb	iris: add support for tgsi_to_nir The Gallium Nine state tracker now works on iris. Also tested with GALLIUM_HUD and Star Wars: Knights of the Old Republic on WINE (GL_ATI_fragment_shader). Signed-off-by: Andre Heider <a.heider@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-07 00:38:13 -08:00
Tapani Pälli	8b010f3557	nir: free dead_ctx in case of no progress Fixes a leak: ==7576== 320 (48 direct, 272 indirect) bytes in 1 blocks are definitely lost in loss record 26 of 26 ==7576== at 0x4C2EE3B: malloc (vg_replace_malloc.c:309) ==7576== by 0x53EF0E4: ralloc_size (ralloc.c:119) ==7576== by 0x53EF0C2: ralloc_context (ralloc.c:113) ==7576== by 0x5471F64: nir_split_per_member_structs (nir_split_per_member_structs.c:176) ==7576== by 0x51288CF: anv_shader_compile_to_nir (anv_pipeline.c:216) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-07 07:40:19 +02:00
Tapani Pälli	4900c0cff4	anv: call blob_finish when done with it Fixes leaks from anv_device_upload_nir: ==7345== 8,192 bytes in 2 blocks are definitely lost in loss record 24 of 24 ==7345== at 0x4C2ED78: malloc (vg_replace_malloc.c:308) ==7345== by 0x4C31393: realloc (vg_replace_malloc.c:836) ==7345== by 0x54E0848: grow_to_fit (blob.c:67) ==7345== by 0x54E0BE5: blob_reserve_bytes (blob.c:166) ==7345== by 0x54E0C7C: blob_reserve_intptr (blob.c:186) ==7345== by 0x54704A7: nir_serialize (nir_serialize.c:1091) ==7345== by 0x512F97D: anv_device_upload_nir (anv_pipeline_cache.c:756) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-07 07:39:48 +02:00
Tapani Pälli	a9555f37d5	anv: use anv_gem_munmap in block pool cleanup Use anv_gem_munmap for unmap when softpin in use, this corresponds to anv_gem_mmap used in anv_block_pool_expand_range. This fixes valgrind errors seen for each pool when softpin is in use: ==25581== 262,144 bytes in 1 blocks are definitely lost in loss record 31 of 31 ==25581== at 0x50E77E8: anv_gem_mmap (anv_gem.c:96) ==25581== by 0x50EEE2B: anv_block_pool_expand_range (anv_allocator.c:543) ==25581== by 0x50EEB51: anv_block_pool_init (anv_allocator.c:477) ==25581== by 0x50EF7EF: anv_state_pool_init (anv_allocator.c:920) ==25581== by 0x510B8EB: anv_CreateDevice (anv_device.c:2031) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-07 07:36:28 +02:00
Kenneth Graunke	744b8e1c12	iris: Fix MOCS for blits and clears I915_MOCS_CACHED is the wrong value. Expose mocs() and use that.	2019-03-06 18:04:53 -08:00
Timothy Arceri	ecceb076e5	st/glsl: start spilling out common st glsl conversion code The NIR and TGSI paths are currently intertwined which makes it not only hard to follow but also makes it hard to take advantage of the differences in IR. Here we take the first step to splitting that path apart. With this we take the opportunity to no longer call the GLSL IR optimisation passes after the final lowering calls for NIR. We can instead just use the NIR passes which can produce better code and should also result in faster compile times. The speed-up can be measured in some dolphin uber shaders due to no longer calling lower_if_to_cond_assign() for example dolphin/ubershaders/120.shader_test goes from ~1.63 -> ~1.53 seconds on my machine. There are some code changes as a result of not calling lower_if_to_cond_assign(), this is because it flattens ifs that contain UBOs where as NIR's peephole select doesn't. This is were most of the regressions in Max Waves happens with shader-db. shader-db results (VEGA): Totals from affected shaders: SGPRS: 2349056 -> 2349640 (0.02 %) VGPRS: 1322160 -> 1323300 (0.09 %) Spilled SGPRs: 21190 -> 21527 (1.59 %) Spilled VGPRs: 99 -> 99 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 72 -> 72 (0.00 %) dwords per thread Code Size: 57260904 -> 57270932 (0.02 %) bytes Compile Time: 1107186 -> 1022942 (-7.61 %) milliseconds LDS: 786 -> 786 (0.00 %) blocks Max Waves: 391932 -> 391619 (-0.08 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-06 23:05:20 +00:00
Timothy Arceri	e2fd96a563	radeonsi/nir: stop calling nir_lower_returns() We now call this for all drivers in glsl_to_nir() instead. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-06 23:05:20 +00:00
Timothy Arceri	673f4f69a8	i965: stop calling nir_lower_returns() We now call this for all drivers in glsl_to_nir() instead. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-06 23:05:20 +00:00
Timothy Arceri	7e60d5a501	glsl: use NIR function inlining for drivers that use glsl_to_nir() glsl_to_nir() is still missing support for converting certain functions to NIR, so for those we use the GLSL IR optimisations to remove the functions. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-06 23:05:20 +00:00
Timothy Arceri	7530d4abfc	glsl/freedreno/panfrost: pass gl_context to the standalone compiler This allows us to use the ctx with glsl_to_nir() in a following patch. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-06 23:05:20 +00:00
Lionel Landwerlin	15b83b3af9	vulkan/overlay: drop dependency on validation layer headers v2: reimplement layer chain info getters (Eric) v3: make it compile.. (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-06 22:46:37 +00:00
Lionel Landwerlin	530927d3f6	vulkan/util: generate instance/device dispatch tables This will be used by the overlay instead of system installed validation layers helpers. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-06 22:46:37 +00:00
Lionel Landwerlin	ee491a4987	vulkan/util: make header available from c++ Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-06 22:46:37 +00:00
Jose Maria Casanova Crespo	ffa9082c40	iris: setup EdgeFlag Vertex Element when needed. If Vertex Shader uses EdgeFlag the hardware request that it is setup as the last VERTEX_ELEMENT_STATE. If SGVS are add at draw time we need to also reconfigure the last 3DSTATE_VF_INSTANCING so its VertexElementIndex points to the new Vertex Element that contains the EdgeFlag. So if draw parameters or edgeflag are not used the CSO generated at iris_create_vertex_element is sent directly in the batches. But if edge flag is used we adjust last VERTEX_ELEMENT_STATE and last 3DSTATE_VF_INSTANCING using their alternative edge flag version we generate at iris_create_vertex_element and store at the CSO. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 22:19:08 +00:00
Eric Anholt	c4d2da1f14	v3d: Include a count of register pressure in the RA failure dumps. You usually want to go find the highest pressure and figure out why you couldn't spill or what pattern led to a bunch of pressure leading to that point.	2019-03-06 14:13:45 -08:00
Samuel Pitoiset	71ffa00fc6	radv: enable lower_mul_2x32_64 Fixes: `58bcebd987` ("spirv: Allow [i/u]mulExtended to use new nir opcode") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-06 22:41:20 +01:00
Jason Ekstrand	9ab1b1d022	st/nir: Move 64-bit lowering later Now that we have a loop unrolling cost function and loop unrolling isn't going to kill us the moment we have a 64-bit op in a loop, we can go ahead and move 64-bit lowering later. This gives us the opportunity to do more optimizations and actually let the full optimizer run even on 64-bit ops rather than hoping one round of opt_algebraic will fix everything. This substantially reduces both fp64 shader compile times and the resulting code size. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Jason Ekstrand	656ace3dd8	intel/nir: Move 64-bit lowering later Now that we have a loop unrolling cost function and loop unrolling isn't going to kill us the moment we have a 64-bit op in a loop, we can go ahead and move 64-bit lowering later. This gives us the opportunity to do more optimizations and actually let the full optimizer run even on 64-bit ops rather than hoping one round of opt_algebraic will fix everything. This substantially reduces both fp64 shader compile times and the resulting code size. On the vs-isnan-dvec test from piglit: Before this commit: 1684.63s user 17.29s system 99% cpu 28:28.24 total 101479 instructions. 0 loops. 802452 cycles. 79:369 spills:fills. Peak memory usage (according to massif): 1.435 GB After this commit: 179.64s user 7.75s system 99% cpu 3:07.92 total 57316 instructions. 0 loops. 459287 cycles. 0:0 spills:fills. Peak memory usage (according to massif): 531.0 MB Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Jason Ekstrand	e02959f442	nir/lower_doubles: Inline functions directly in lower_doubles Instead of trusting the caller to already have created a softfp64 function shader and added all its functions to our shader, we simply take the softfp64 shader as an argument and do the function inlining ouselves. This means that there's no more nasty functions lying around that the caller needs to worry about cleaning up. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Jason Ekstrand	f25ca337b4	nir/deref: Expose nir_opt_deref_impl Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Jason Ekstrand	de8d80f9cc	nir/inline_functions: Break inlining into a builder helper This pulls the guts of function inlining into a builder helper so that it can be used elsewhere. The rest of the infrastructure is still needed for most inlining cases to ensure that everything gets inlined and only ever once. However, there are use-cases where you just want to inline one little thing. This new helper also has a neat trick where it can seamlessly inline a function from one nir_shader into another. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Jason Ekstrand	0a6b1d0580	glsl/nir: Inline functions in float64_funcs_to_nir This doesn't really change anything as the functions will all get inlined anyway. However it does let us do a bit of the work earlier and in a common place. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Jason Ekstrand	82d9a37a59	glsl/nir: Add a shared helper for building float64 shaders Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Jason Ekstrand	8993e0973f	intel/nir: Drop an unneeded lower_constant_initializers call Even though this is technically a step in the function inlining process as laid out in nir_inline_functions.c, it's not really needed. We already have constant initializers lowered here and no new ones are added by appending the softfp64 functions. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Jason Ekstrand	fa4824c1db	intel/debug: Add a debug flag to force software fp64 Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Jason Ekstrand	0ce1aea88b	i965: Compile the fp64 program based on nir options Instead of looking the devinfo directly, look at the lowering options we provided to NIR. This is more accurate as it's now checking for "do we need full software lowering" rather than a hardware bit. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Jason Ekstrand	9314084237	nir: Teach loop unrolling about 64-bit instruction lowering The lowering we do for 64-bit instructions can cause a single NIR ALU instruction to blow up into hundreds or thousands of instructions potentially with control flow. If loop unrolling isn't aware of this, it can unroll a loop 20 times which contains a nir_op_fsqrt which we then lower to a full software implementation based on integer math. Those 20 invocations suddenly get a lot more expensive than NIR loop unrolling currently expects. By giving it an approximate estimate function, we can prevent loop unrolling from going to town when it shouldn't. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Jason Ekstrand	ebb3695376	nir: Expose double and int64 op_to_options_mask helpers We already have one internally for int64 but we don't have a similar one for doubles so we'll have to make one. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Iago Toral Quiroga	ca2b5e9069	compiler/nir: add an is_conversion field to nir_op_info This is set to True only for numeric conversion opcodes. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 17:24:57 +00:00
Ian Romanick	55e6454d5e	intel/fs: Fix extract_u8 of an odd byte from a 64-bit integer In the old code, we would generate the exact same instruction for extract_u8(some_u64, 0) and extract_u8(some_u64, 1). The mask-a-word trick only works for even numbered bytes. This fixes the (new) piglit test tests/spec/arb_gpu_shader_int64/execution/fs-ushr-and-mask.shader_test. v2: Use a SHR instead of an AND. This saves an instruction compared to using two moves. Suggested by Jason. Fixes: `6ac2d16901` ("i965/fs: Fix extract_i8/u8 to a 64-bit destination") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-06 08:35:45 -08:00
Ian Romanick	4aaf139ea4	intel/fs: nir_op_extract_i8 extracts a byte, not a word Fixes: `6ac2d16901` ("i965/fs: Fix extract_i8/u8 to a 64-bit destination") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-06 08:35:42 -08:00
Ian Romanick	bbf20a1ca3	intel/compiler: Silence unused parameter warning in brw_interpolation_map.c The parameter is never used, and it's not part of a common interface idiom. Remove it. src/intel/compiler/brw_interpolation_map.c: In function ‘brw_setup_vue_interpolation’: src/intel/compiler/brw_interpolation_map.c:62:59: warning: unused parameter ‘devinfo’ [-Wunused-parameter] const struct gen_device_info *devinfo) ^~~~~~~ Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-06 08:35:36 -08:00
Ian Romanick	dea19138dd	intel/compiler: Silence many unused parameter warnings in brw_eu.h In file included from src/intel/compiler/brw_eu_util.c:34:0: src/intel/compiler/brw_eu.h: In function ‘brw_message_desc_header_present’: src/intel/compiler/brw_eu.h:288:63: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_message_desc_header_present(const struct gen_device_info devinfo, ^~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_message_ex_desc’: src/intel/compiler/brw_eu.h:296:51: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_message_ex_desc(const struct gen_device_info devinfo, ^~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_message_ex_desc_ex_mlen’: src/intel/compiler/brw_eu.h:303:59: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_message_ex_desc_ex_mlen(const struct gen_device_info devinfo, ^~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_binding_table_index’: src/intel/compiler/brw_eu.h:337:68: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_sampler_desc_binding_table_index(const struct gen_device_info devinfo, ^~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_sampler’: src/intel/compiler/brw_eu.h:344:56: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_sampler_desc_sampler(const struct gen_device_info devinfo, uint32_t desc) ^~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_sampler_desc_return_format’: src/intel/compiler/brw_eu.h:371:62: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_sampler_desc_return_format(const struct gen_device_info devinfo, ^~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_dp_desc_binding_table_index’: src/intel/compiler/brw_eu.h:405:63: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_dp_desc_binding_table_index(const struct gen_device_info devinfo, ^~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_dp_a64_untyped_atomic_desc’: src/intel/compiler/brw_eu.h:754:41: warning: unused parameter ‘exec_size’ [-Wunused-parameter] unsigned exec_size, /< 0 for SIMD4x2 / ^~~~~~~~~ src/intel/compiler/brw_eu.h: In function ‘brw_dp_a64_untyped_atomic_float_desc’: src/intel/compiler/brw_eu.h:775:47: warning: unused parameter ‘exec_size’ [-Wunused-parameter] unsigned exec_size, ^~~~~~~~~ Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-06 08:35:31 -08:00
Eric Engestrom	89241eeafc	meson: remove unused include_directories(vulkan) The correct include path is "vulkan/…". Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-03-06 12:46:11 +00:00
Eric Engestrom	ad862c36e5	meson: fix with_dri2 definition for GNU Hurd Suggested-by: Dylan Baker <dylan@pnwbakers.com> Cc: Timo Aaltonen <tjaalton@debian.org> Cc: James Clarke <jrtc27@debian.org> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-03-06 12:40:06 +00:00
Lionel Landwerlin	b49726afd4	radv: set num_components on vulkan_resource_index intrinsic In `61e009d2c4` we changed the number of components in the vulkan_resource_index intrinsic and forgot the update Radv's code for it. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `61e009d2c4` ("spirv: Use the same types for resource indices as pointers") Reviewed-by: Samuel Pitoiset samuel.pitoiset@gmail.com	2019-03-06 11:56:21 +00:00
Timothy Arceri	54522d0506	nir: rename glsl_type_is_struct() -> glsl_type_is_struct_or_ifc() Replace done using: find ./src -type f -exec sed -i -- \ 's/glsl_type_is_struct(/glsl_type_is_struct_or_ifc(/g' {} \; Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 13:10:02 +11:00
Timothy Arceri	e16a27fcf8	glsl: rename record_types -> struct_types Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 13:10:02 +11:00
Timothy Arceri	8294295dbd	glsl: rename record_location_offset() -> struct_location_offset() Replace done using: find ./src -type f -exec sed -i -- \ 's/record_location_offset(/struct_location_offset(/g' {} \; Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 13:10:02 +11:00
Timothy Arceri	88d8c4e290	glsl: rename get_record_instance() -> get_struct_instance() Replace done using: find ./src -type f -exec sed -i -- \ 's/get_record_instance(/get_struct_instance(/g' {} \; Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 13:10:02 +11:00
Timothy Arceri	81ee2cd8ba	glsl: rename is_record() -> is_struct() Replace was done using: find ./src -type f -exec sed -i -- \ 's/is_record(/is_struct(/g' {} \; Acked-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-06 13:10:02 +11:00
Karol Herbst	272e927d0e	nir/spirv: initial handling of OpenCL.std extension opcodes Not complete, mostly just adding things as I encounter them in CTS. But not getting far enough yet to hit most of the OpenCL.std instructions. Anyway, this is better than nothing and covers the most common builtins. v2: add hadd proof from Jason move some of the lowering into opt_algebraic and create new nir opcodes simplify nextafter lowering fix normalize lowering for inf rework upsample to use nir_pack_bits add missing files to build systems v3: split lines of iadd/sub_sat expressions Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-05 22:28:29 +01:00
Karol Herbst	d0b47ec4df	nir/vtn: add support for SpvBuiltInGlobalLinearId v2: use formula with fewer operations Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-05 22:28:29 +01:00
Karol Herbst	f48c672965	nir: add support for address bit sized system values v2: add assert in else clause make local group intrinsics 32 bit wide v3: always use 32 bit constant for local_size v4: add comment by Jason Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-05 22:28:29 +01:00
Karol Herbst	5f8257fb0b	nir/spirv: improve parsing of the memory model v2: add some vtn_fail_ifs Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-05 22:28:29 +01:00
Karol Herbst	5d48359a2c	nir: replace magic numbers with M_PI we define it inside 'include/c99_math.h' so it is safe to use. Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-05 22:28:29 +01:00
Caio Marcelo de Oliveira Filho	69cc6272fb	anv: Implement VK_EXT_external_memory_host v2: Ignore the import if handleType == 0. (Jason) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-05 12:59:50 -08:00
Eric Anholt	5c655c47db	v3d: Drop the V3D 3.x vpm read dead code elimination. We now have NIR dead code eliminating our VPM reads, so this shouldn't be necessary.	2019-03-05 12:57:39 -08:00
Eric Anholt	e8ee1f8eaf	v3d: Eliminate the TLB and TLBU files. We can just use the magic register file like we do for other magic waddrs.	2019-03-05 12:57:39 -08:00
Eric Anholt	110f14d4b4	v3d: Use ldunif instructions for uniforms. The idea is that for repeated use of the same uniform, we could avoid loading it on each consumer. The results look pretty good. total instructions in shared programs: 6413571 -> 6521464 (1.68%) total threads in shared programs: 154214 -> 154000 (-0.14%) total uniforms in shared programs: 2393604 -> 2119629 (-11.45%) total spills in shared programs: 4960 -> 4984 (0.48%) total fills in shared programs: 6350 -> 6418 (1.07%) Once we do scheduling at the NIR level, the register pressure (and thus also instructions) issues we see here will drop back down.	2019-03-05 12:57:39 -08:00
Eric Anholt	4036fce8fd	v3d: Add support for register-allocating a ldunif to a QFILE_TEMP. On V3D 4.x, we can use ldunifrf to load uniforms to any register, and this will let us schedule the ldunif wherever we want in the program.	2019-03-05 12:57:39 -08:00
Eric Anholt	70df388219	v3d: Drop the old class bits splitting up the accumulators. This seems to be left over from vc4, and I don't use them any more.	2019-03-05 12:57:39 -08:00
Eric Anholt	dff1fc04e0	v3d: Add support for vir-to-qpu of ldunif instructions to a temp. We can load a uniform to any register, so add support for non-ALU instructions with sig.ldunif to a temp.	2019-03-05 12:57:39 -08:00
Eric Anholt	4739181a16	v3d: Switch implicit uniforms over to being any qinst->uniform != ~0. I'm not sure why I didn't do this before -- it's clearly much simpler to add dumping of the extra thing than to have it as another implicit source.	2019-03-05 12:57:39 -08:00
Eric Anholt	1e98f02d88	v3d: Do uniform rematerialization spilling before dropping threadcount This feels like the right tradeoff for threads vs uniforms, particularly given that we often have very short thread segments right now: total instructions in shared programs: 6411504 -> 6413571 (0.03%) total threads in shared programs: 153946 -> 154214 (0.17%) total uniforms in shared programs: 2387665 -> 2393604 (0.25%)	2019-03-05 12:57:39 -08:00
Eric Anholt	060979a380	v3d: Fix temporary leaks of temp_registers and when spilling. On each iteration of successfully spilling a reg, we'd allocate another copy of temp_registers, and when decrementing thread conut we'd allocate another copy of the graph. These all got cleaned up on freeing the compile.	2019-03-05 12:57:39 -08:00
Eric Engestrom	faf9e40f35	gitlab-ci: drop job prefixes It is already obvious whether the job is building a container or running a mesa build, so let's drop that prefix so that we can see more information on the screen (eg. in the jobs list on a pipeline page). Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-03-05 20:49:42 +00:00
Timur Kristóf	45809bcb33	tgsi_to_nir: Set correct location for uniforms. Previously, only the driver_location was set for all variables, but constants need to use the location field instead. This change is necessary because the nine state tracker can produce non-packed constants whose location needs to be explicitly set. Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-05 19:13:27 +00:00
Timur Kristóf	770faf546d	tgsi_to_nir: Improve interpolation modes. This patch extracts the interpolation mode translation into a separate function called ttn_translate_interp_mode, adds support for TGSI_INTERPOLATE_COLOR which was missing, and also sets the proper interpolation mode to output variables, which were not set previously. Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-05 19:13:27 +00:00
Kenneth Graunke	2fb800fd1d	tgsi_to_nir: use sampler variables and derefs v2: fix is_shadow, is_array and txq Some drivers (eg. iris) need the presence of sampler variables and derefs so that they can count them to determine the number of samplers used. This change also makes the output NIR closer to what glsl_to_nir outputs. Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-05 19:13:27 +00:00
Timur Kristóf	674045d04b	tgsi_to_nir: Support FACE and POSITION properly. Previously, FACE was hard-coded as a sysval, but TTN emulated it incorrectly. Also, POSITION was not supported when it was a sysval. This patch fixes these by allowing both of them to be sysvals or inputs, based on driver capabilities. It also fixes the TGSI FACE emulation based on the TGSI spec. Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-05 19:13:27 +00:00
Timur Kristóf	f748fa47f8	tgsi_to_nir: Extract ttn_emulate_tgsi_front_face into its own function. We'll need to use the same logic in other places, so it makes sense to have a separate function for this. Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-05 19:13:27 +00:00
Timur Kristóf	840c7d1ebd	tgsi_to_nir: Restructure system value loads. Minor cleanup to the way system value loads work in tgsi_to_nir. Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-05 19:13:27 +00:00
Timur Kristóf	9a834447d6	tgsi_to_nir: Produce optimized NIR for a given pipe_screen. With this patch, tgsi_to_nir will output NIR that is tailored to the given pipe, by reading its capabilities and adjusting the NIR code to those capabilities similarly to how glsl_to_nir works. It also adds an optimization loop that brings the output NIR in line with what glsl_to_nir outputs. This is necessary for the same reason why glsl_to_nir has its own optimization loop: currently not every driver does these optimizations yet. For uses which cannot pass a pipe_screen we also keep a variant called tgsi_to_nir_noscreen which keeps the old behavior. Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Acked-By: Eric Anholt <eric@anholt.net>	2019-03-05 19:13:27 +00:00
Timur Kristóf	e582e761b7	freedreno: Plumb pipe_screen through to irX_tgsi_to_nir. This patch makes it possible for freedreno to pass a pipe_screen to tgsi_to_nir. This will be needed when tgsi_to_nir supports reading pipe capabilities. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-03-05 19:13:27 +00:00
Timur Kristóf	6684e039eb	nir: Add multiplier argument to nir_lower_uniforms_to_ubo. Note that locations can be set in different units, and the multiplier argument caters to supporting these different units. For example, st_glsl_to_nir uses dwords (4 bytes) so the multiplier should be 4, while tgsi_to_nir uses bytes, so the multiplier should be 16. Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-05 19:13:27 +00:00
Timur Kristóf	909d1f50f3	nir: Move nir_lower_uniforms_to_ubo to compiler/nir. The nir_lower_uniforms_to_ubo function is useful outside of mesa/state_tracker, and in fact is needed to produce NIR for drivers that have the PIPE_CAP_PACKED_UNIFORMS capability. Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-05 19:13:27 +00:00
Timur Kristóf	4dba72c4b3	tgsi_to_nir: Split to smaller functions. Previously, tgsi_to_nir was a single big function, and this patch intends to make the code easier to understand by splitting it up to multiple smaller pieces. Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Acked-By: Tested-by: Rob Clark <robdclark@gmail.com>	2019-03-05 19:13:27 +00:00
Timur Kristóf	950aebbc53	tgsi_to_nir: Make the TGSI IF translation code more readable. This patch is a minor cleanup that only intends to make the TGSI IF translation a bit easier to read. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-05 19:13:27 +00:00
Timur Kristóf	fa076acbc0	tgsi_to_nir: Fix TGSI LIT translation by using flt. TGSI spec says LIT needs a "greater than" comparison. NIR doesn't have that, so let's use "less than" and swap the arguments. Previously "greater than or equal" was used by tgsi_to_nir which is incorrect. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-05 19:13:27 +00:00
Timur Kristóf	28be7b33b9	tgsi_to_nir: Fix the TGSI ARR translation by converting the result to int. According to the TGSI spec, ARR needs to do a rounding and then a float-to-integer conversion which was missing. This patch also makes the rounding a bit more efficient by using nir_fround_even instead of the previous nir_ffloor+nir_fadd trick. Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-05 19:13:27 +00:00
Timur Kristóf	317f10bf40	nir: Add ability for shaders to use window space coordinates. This patch adds a shader_info field that tells the driver to use window space coordinates for a given vertex shader. It also enables this feature in radeonsi (the only NIR-capable driver that supported it in TGSI), and makes tgsi_to_nir aware of it. Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Tested-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-05 19:13:27 +00:00
Eric Anholt	2780a99ff8	v3d: Move the stores for fixed function VS output reads into NIR. This lets us emit the VPM_WRITEs directly from nir_intrinsic_store_output() (useful once NIR scheduling is in place so that we can reduce register pressure), and lets future NIR scheduling schedule the math to generate them. Even in the meantime, it looks like this lets NIR DCE some more code and make better decisions. total instructions in shared programs: 6429246 -> 6412976 (-0.25%) total threads in shared programs: 153924 -> 153934 (<.01%) total loops in shared programs: 486 -> 483 (-0.62%) total uniforms in shared programs: 2385436 -> 2388195 (0.12%) Acked-by: Ian Romanick <ian.d.romanick@intel.com> (nir)	2019-03-05 10:59:40 -08:00
Eric Anholt	a9dd227a47	v3d: Translate f2i(fround_even) as FTOIN. This appears to be just what the opcode does. Needed for equivalence when moving FF VPM stores into NIR.	2019-03-05 10:59:40 -08:00
Eric Anholt	a4f612b4cf	nir: Improve printing of load_input/store_output variable names. We were printing only when the channel was exactly the start channel, so scalarized loads/stores would be missing the name on the rest. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-05 10:59:40 -08:00
Jason Ekstrand	43f40dc7cb	anv: Implement VK_EXT_inline_uniform_block Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	61e009d2c4	spirv: Use the same types for resource indices as pointers We need more space than just a 32-bit scalar and we have to burn all that space anyway so we may as well expose it to the driver. This also fixes a subtle bug when UBOs and SSBOs have different pointer types. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	9f7ee4f8e5	spirv: Use the generic dereference function for OpArrayLength With the new deref changes, the old pointer_offset version may not be the right one to call. Just call the generic one and let it sort it out. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	f1dbc7e97d	spirv: Pull offset/stride from the pointer for OpArrayLength We can't pull it from the variable type because it might be an array of blocks and not just the one block. While we're here, throw in some error checking. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable@lists.freedesktop.org	2019-03-05 10:06:50 -06:00
Jason Ekstrand	c520f4dec9	anv: Add a concept of a descriptor buffer This buffer goes along side the CPU data structure and may contain pointers, bindless handles, or any other descriptor information. Currently, all descriptors are size zero and nothing goes in the buffer but this commit sets up the framework we will need later. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	5c30fffeec	anv: Take references to push descriptor set layouts Technically, descriptor set layouts aren't required to survive past the function they're passed into so we need to reference them. Cc: "19.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	8ab95b849e	anv: Refactor descriptor pushing a bit Pull the common code out of the two entrypoints into the helper which fetches the push descriptor set for us. Now that it does more than just get a thing, call it anv_cmd_buffer_push_descriptor_set. Cc: "19.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	cab064bc10	anv: drop add_var_binding from anv_nir_apply_pipeline_layout.c It has exactly one caller. Just inline it. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	49cf61c6aa	anv: Clean up descriptor set layouts The descriptor set layout code in our driver has undergone many changes over the years. Some of the fields which were once essential are now useless or nearly so. The has_dynamic_offsets field was completely unused accept for the code to set and hash it. The per-stage indices were only being used to determine if a particular binding had images, samplers, etc. The fact that it's per-stage also doesn't matter because that binding should never be accessed by a shader of the wrong stage. This commit deletes a pile of cruft and replaces it all with a descriptive bitfield which states what a particular descriptor contains. This merely describes the data available and doesn't necessarily dictate how it will be lowered in anv_nir_apply_pipeline_layout. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	4c50b7c92c	anv: Count image param entries rather than images This is what we're actually storing in the descriptor set and consuming when we bind surface states. This commit renames image_count to image_param_count a few places and moves the decision to not count image params on gen9+ into anv_descriptor_set.c when we build the layout. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	3822c7495a	anv: Stop allocating buffer views for dynamic buffers We emit the surface states for those on-the-fly so we don't need the buffer view. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	8c6d410a50	anv: Rework arguments to anv_descriptor_set_write_* Make them all take a device followed by a set. This is consistent with how the actual Vulkan entrypoint parameters are laid out. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Jason Ekstrand	5b7a9e7398	anv/descriptor_set: Refactor alloc/free of descriptor sets This commit just puts the free list code together as part of the pool instead of having it inlined into the descriptor set create code. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:06:50 -06:00
Eric Anholt	fd1d22b92e	v3d: Stop treating exec masking specially. In our backend, the successor edges from the blocks only point to where QPU control flow goes, not where the notional control flow goes from a "break" or "continue" modifying the execution mask to resume writing to some channels later. As a result, this attempt at restricting live ranges ended up missing the live range of a value where a conditional break/continue was present in a loop before the later def of a variable. The previous commit ended up fixing the problem that the flag tried to solve. Fixes glsl-vs-loop-continue.shader_test and/or glsl-vs-loop-redundant-condition.shader_test based on register allocation results.	2019-03-05 07:36:24 -08:00
Eric Anholt	c6ae666cf5	v3d: Restrict live intervals to the blocks reachable from any def. In the backend, we often have condition codes on writes to variables, such that there's no screening def anywhere and the previous live ranges algorithm would conclude that the start of the range extends to the start of the program. However, we do know that the live range can only extend as early as you can reach from all blocks writing to the variable. The motivation was that, while we have a couple of hacks to try to promote conditional writes up to being a def within the block, the exec_mask one was broken and needed a replacement. Based on `c3c1aa5aeb` ("intel/fs: Restrict live intervals to the subset possibly reachable from any definition.").	2019-03-05 07:36:24 -08:00
Andres Gomez	cf79d62f90	gitlab-ci: install distro's ninja Ubuntu Bionic is shipping ninja 1.8.2. Therefore, we do not need to download v1.6.0 manually any more. Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-05 14:05:24 +00:00
Samuel Pitoiset	c2a148692b	radv: properly align the fence and EOP bug VA on GFX9 If alignement is 0, offets returned by radv_cmd_buffer_upload_alloc() are always 0. These two virtual addresses were pointing at the same location. Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-05 15:00:20 +01:00
Samuel Pitoiset	2eb0905ffa	radv: allocate enough space in cmdbuf when starting a subpass This fixes some CTS crashes with: dEQP-VK.renderpass2.suballocation.attachment_write_mask.attachment_count_8.start_index_* Ideally, we should check cmd_buffer->cs->max_dw because there is likely enough space (the internal clear draws allocate space), but keep that way for consistency. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-05 15:00:10 +01:00
Eric Engestrom	31d302ae51	vulkan: import vk_layer.h from Khronos Instead of relying on the system having it (and the right version). Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 13:24:14 +00:00
Eric Engestrom	bcc4bfc8e8	egl: fix libdrm-less builds This function was never used, and isn't properly guarded by HAVE_LIBDRM, breaking the build on systems that don't have libdrm. Let's just remove it. Fixes: `7552fcb7b9` "egl: add base EGL_EXT_device_base implementation" Reported-by: Timo Aaltonen <tjaalton@debian.org> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2019-03-05 13:04:06 +00:00
Eric Engestrom	e37ea1e0d3	vulkan: import missing file from Khronos Fixes: `114c4aa0c8` "vulkan: update headers/registry to 1.1.102" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 12:52:31 +00:00
Eric Engestrom	91cc6fcbb0	util: #define PATH_MAX when undefined (eg. Hurd) Cc: Timo Aaltonen <tjaalton@debian.org> Cc: James Clarke <jrtc27@debian.org> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-03-05 12:27:35 +00:00
Eric Engestrom	fe205818c2	radv: use the platform defines in vk.xml instead of hard-coding them Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-05 11:57:10 +00:00
Eric Engestrom	3d4238d26c	anv: use the platform defines in vk.xml instead of hard-coding them Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-05 11:57:10 +00:00
Lionel Landwerlin	e21c201c96	anv: update supported patch version Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-03-05 10:39:17 +00:00
Tapani Pälli	3bb8768b9d	anv: toggle on support for VK_EXT_ycbcr_image_arrays We already propagate coord_components correctly and did not have layer restrictions for ycbcr formats. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:39:17 +00:00
Lionel Landwerlin	114c4aa0c8	vulkan: update headers/registry to 1.1.102 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-03-05 10:39:11 +00:00
Tapani Pälli	33bf3d510c	anv: retain the is_array state in create_plane_tex_instr_implicit This does not seem to fix anything ATM but is the right thing todo. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Fixes: `f3e91e78a3` ("anv: add nir lowering pass for ycbcr textures") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-05 10:38:31 +00:00
Eric Engestrom	e1ee4ab3dc	meson: avoid going back up the tree with include_directories() Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-03-05 10:02:47 +00:00
Kenneth Graunke	dca36d5516	i965: Implement threaded GL support. Now i965 supports mesa_glthread=true like Gallium drivers do. According to Markus (degasus), the Citra emulator now runs ~30% faster. Emmanuel (linkmauve) also reported that the Dolphin emulator improved by 2.8x on one game. (Both of those still need to be added to drirc.) An Intel Mesa CI run with mesa_glthread=true appears to be happy. Bioshock Infinite's benchmark mode seems to be around 15-20% faster on my Skylake GT4 at 1920x1080. Tested-by: Markus Wick <markus@selfnet.de> Tested-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr> Tested-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-05 00:49:05 -08:00
Jason Ekstrand	0010d0348a	anv/pipeline: Drop anv_fill_binding_table We zero out the prog data anyway and, now that bias is always zero, this function is accomplishing nothing. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-04 23:56:40 +00:00
Jason Ekstrand	65ee5cc0da	anv: Use an actual binding for gl_NumWorkgroups This commit moves our handling of gl_NumWorkgroups over to work like our handling of other special bindings in the Vulkan driver. We give it a magic descriptor set number and teach emit_binding_tables to handle it. This is better than the bias mechanism we were using because it allows us to do proper accounting through the bind map mechanism. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-04 23:56:40 +00:00
Jason Ekstrand	5c96120b5c	intel,nir: Lower TXD with min_lod when the sampler index is not < 16 When we have a larger sampler index, we get into the "high sampler" scenario and need an instruction header. Even in SIMD8, this pushes the instruction over the sampler message size maximum of 11 registers. Instead, we have to lower TXD to TXL. Fixes: `cb98e0755f` "intel/fs: Support min_lod parameters on texture..." Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-03-04 23:56:39 +00:00
Jason Ekstrand	ca295ddbfb	spirv: OpImageQueryLod requires a sampler No idea how this fell through the cracks besides the fact that the sampler bound at 0 almost always works and the CTS isn't amazing. In any case, this appears to have been broken for almost forever. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable@lists.freedesktop.org	2019-03-04 23:56:39 +00:00
Jason Ekstrand	5049fbddb4	anv: Count surfaces for non-YCbCr images in GetDescriptorSetLayoutSupport We were accidentally not counting those surfaces Fixes: `ddc4069122` "anv: Implement VK_KHR_maintenance3" Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-04 23:56:39 +00:00
Sagar Ghuge	58bcebd987	spirv: Allow [i/u]mulExtended to use new nir opcode Use new nir opcode nir_[i/u]mul_2x32_64 and extract lower and higher 32 bits as needed instead of emitting mul and mul_high. v2: Surround the switch case with curly braces (Jason Ekstrand) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-04 15:50:25 -08:00
Sagar Ghuge	47ec9bdc60	nir/algebraic: Optimize low 32 bit extraction Optimize a situation where we only need lower 32 bits from 64 bit result. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Suggested-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-04 15:50:25 -08:00
Sagar Ghuge	1d8994a63b	glsl: [u/i]mulExtended optimization for GLSL Optimize mulExtended to use 32x32->64 multiplication. Drivers which are not based on NIR, they can set the MUL64_TO_MUL_AND_MUL_HIGH lowering flag in order to have same old behavior. v2: Add missing condition check (Jason Ekstrand) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Suggested-by: Matt Turner <Matt Turner <mattst88@gmail.com> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-04 15:50:25 -08:00
Sagar Ghuge	e551040c60	nir/glsl: Add another way of doing lower_imul64 for gen8+ On Gen 8 and 9, "mul" instruction supports 64 bit destination type. We can reduce our 64x64 int multiplication from 4 instructions to 3. Also instead of emitting two mul instructions, we can emit single mul instuction and extract low/high 32 bits from 64 bit result for [i/u]mulExtended v2: 1) Allow lower_mul_high64 to use new opcode (Jason Ekstrand) 2) Add lower_mul_2x32_64 flag (Matt Turner) 3) Remove associative property as bit size is different (Connor Abbott) v3: Fix indentation and variable naming convention (Jason Ekstrand) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-04 15:50:25 -08:00
Axel Davy	1d363d440f	st/nine: Ignore multisample quality level if no ms Apparently instead of returning error when passing a quality level different than 0 for D3DMULTISAMPLE_NONE, we should pass. Fixes: https://github.com/iXit/Mesa-3D/issues/340 Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-03-04 21:52:15 +01:00
Axel Davy	86666f051e	st/nine: Ignore window size if error Check GetWindowInfo and ignore the computed sizes if there is an error. Fixes a regression caused by earlier commit when using old wine gallium nine patches. Should also address a crash at window destruction. Related issues: https://github.com/iXit/Mesa-3D/issues/331 https://github.com/iXit/Mesa-3D/issues/332 Cc: mesa-stable@lists.freedesktop.org Fixes: `2318ca68bb` ("st/nine: Handle window resize when a presentation buffer is used") Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-03-04 21:52:15 +01:00
Mauro Rossi	ec0f465bc5	android: anv: fix libexpat shared dependency Fixes undefined reference building errors for XML_* functions Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Cc: "19.0" <mesa-stable@lists.freedesktop.org>	2019-03-04 20:53:59 +01:00
Mauro Rossi	14e7e26a09	android: anv: fix generated files depedencies (v2) Fix anv_extrypoints.{c,h} and anv_extensions.{c,h} missing dependencies Rename the variable labels according to targets and python scripts Align the building rules as per Automake for simplification Fixes building errors during rebuils due to missing dependencies (v2) Fixed a missing $(VULKAN_API_XML) reference Fixes: `9a508b7` ("android: anv/extensions: fix generated sources build") Fixes: `dd088d4bec` ("anv/extensions: Generate a header file with extension tables") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Cc: "19.0" <mesa-stable@lists.freedesktop.org>	2019-03-04 20:53:51 +01:00
Brian Paul	e2369e133c	st/wgl: init a variable to silence MinGW warning MinGW release build says 'value' may be used before being initialized. Reviewed-by: Neha Bhende <bhenden@vmware.com>	2019-03-04 11:48:48 -07:00
Brian Paul	66ba12973b	svga: silence array out of bounds warning MinGW release build complains about a possible out-of-bounds array access. Test i < 4 to silence it. Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-03-04 11:48:47 -07:00
Brian Paul	999db9ac51	svga: init fill variable to avoid compiler warning MinGW release builds warns about use of a possbily uninitialized variable here. Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-03-04 11:48:47 -07:00
Brian Paul	9b07a221a4	st/mesa: whitespace fixes in st_texture.h Trivial.	2019-03-04 11:48:47 -07:00
Brian Paul	d74932dfea	st/mesa: line wrapping, whitespace fixes in st_cb_texture.c Trivial.	2019-03-04 11:48:36 -07:00
Brian Paul	fc91c2698e	st/mesa: whitespace fixes in st_sampler_view.c Replace tabs w/ spaces. 80-column wrapping. Trivial.	2019-03-04 11:42:49 -07:00
Gurchetan Singh	610758d3e5	egl/sl: also allow virtgpu to fallback to kms_swrast virtio-gpu fallbacks to software rendering when 3D features are unavailable since 6c5ab, and kms_swrast is more feature complete than swrast. v2: Add comment (Emil) Reviewed-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-03-04 17:33:17 +00:00
Mathias Fröhlich	904a0552aa	st/mesa: Invalidate the gallium array atom only if needed. Now that the buffer object usage history tracks if it is being used as vertex buffer object, we can restrict setting the ST_NEW_VERTEX_ARRAYS bit to dirty on glBufferData calls to buffers that are potentially used as vertex buffer object. Also put a note that the same could be done for index arrays used in indexed draws. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-03-04 17:03:06 +01:00
Mathias Fröhlich	e727f8c8b8	mesa: Track buffer object use also for VAO usage. We already track the usage history for buffer objects in a lot of aspects. Add GL_ARRAY_BUFFER and GL_ELEMENT_ARRAY_BUFFER to gl_buffer_object::UsageHistory. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-03-04 17:03:06 +01:00
Samuel Pitoiset	9e787904d0	rav: use 32_AR instead of 32_ABGR when alpha coverage is required This export format is faster. Seems to improve performance in Wreckfest. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-03-04 12:02:01 +01:00
Alyssa Rosenzweig	72981c92ce	panfrost: List primitive restart enable bit Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-03-04 05:04:14 +00:00
Alyssa Rosenzweig	2b5cda137f	panfrost/midgard: Preview for data hazards If a selected unit causes a data hazard, the whole block gets cut short. So, we preview for data hazards _while_ selecting units. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Tested-by: Tomeu Vizoso <tomeu.vizoso@collabora.com	2019-03-04 05:03:48 +00:00
Alyssa Rosenzweig	93eeba623b	panfrost/midgard: Promote smul to vmul smul comes first in the pipeline, before vmul. Until we have a full instruction scheduler, it's better to have vmul prioritized to maximize bundle size. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Tested-by: Tomeu Vizoso <tomeu.vizoso@collabora.com	2019-03-04 05:02:58 +00:00
Alyssa Rosenzweig	25bbb44dce	panfrost: Flush with offscreen rendering This special-case was needlessly added and breaks purely offscreen rendering (when there is no scanout involved) Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com	2019-03-04 05:01:45 +00:00
Alyssa Rosenzweig	4f7460297b	panfrost/midgard: Don't force constant on VLUT Previously, we forced a #0 inline constant tacked on for the lut instructions to mirror the blob's behaviour, which caused some suboptimal codegen due to our constant inlining implementation. Instead, just don't force a constant at all. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Tested-by: Tomeu Vizoso <tomeu.vizoso@collabora.com	2019-03-04 04:59:58 +00:00
Alyssa Rosenzweig	c351cc4e94	panfrost: Cleanup cruft related to clears Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-03-04 04:59:12 +00:00
Alyssa Rosenzweig	40ffee4448	panfrost: Decouple Gallium clear from FBD clear The operations of gallium->clear() and the hardware callbacks are fundamentally independent. This routine decouples them by routing shared information via panfrost_job, allowing the hardware half to be deferred to the fragment job generation. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-03-04 04:58:55 +00:00
Alyssa Rosenzweig	59c9623d0a	panfrost: Import job data structures from v3d At the moment, Panfrost state is ad hoc, which creates issues for FBOs. This commit imports the skeleton of the v3d_job structure as panfrost_job, in preparation for refactors to organize this state. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2019-03-04 04:58:15 +00:00
Ilia Mirkin	4eec3a2a36	glsl: fix recording of variables for XFB in TCS shaders This is purely for conformance, since it's not actually possible to do XFB on TCS output varyings. However we do have to make sure we record the names correctly, and this removes an extra level of array-ness from the names in question. Fixes KHR-GL45.tessellation_shader.single.xfb_captures_data_from_correct_stage v2: Add comment to the new program_resource_visitor::process function. (Ilia Mirkin) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108457 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: 19.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-04 01:55:00 +01:00
Jose Maria Casanova Crespo	bf1f49482d	glsl: TCS outputs can not be transform feedback candidates on GLES Avoids regression on: KHR-GLES*.core.tessellation_shader.single.xfb_captures_data_from_correct_stage that is uncovered by the following patch. "glsl: fix recording of variables for XFB in TCS shaders" v2: Rebased over glsl: fix recording of variables for XFB in TCS shaders v3: Move this patch before "glsl: fix recording of variables for XFB in TCS shaders" to avoid temporal regressions. (Illia Mirkin) Cc: 19.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-04 01:55:00 +01:00
Jose Maria Casanova Crespo	cc7173b438	glsl: fix typos in comments "transfor" -> "transform" Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-03-04 01:55:00 +01:00
Gert Wollny	3214f20914	mesa: Expose EXT_texture_query_lod and add support for its use shaders EXT_texture_query_lod provides the same functionality for GLES like the ARB extension with the same name for GL. v2: Set ES 3.0 as minimum GLES version as required by the extension Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-03-03 21:50:42 +01:00
Greg V	7dc2f47882	util: emulate futex on FreeBSD using umtx Obtained from: FreeBSD ports Acked-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-03 19:48:49 +00:00
Rob Clark	00f838fa73	freedreno/ir3: track register pressure in sched Not a perfect solution, and the "pressure" target is hard-coded. But it doesn't really seem to much in the common case, and avoids exploding register usage in dEQP ssbo tests. So this should serve as a stop-gap solution until I have time to re- write the scheduler. Hurts slightly in instruction count, but gains (reduces) slightly the register usage in shader-db. Fixes ~150 dEQP-GLES31.functional.ssbo.* that were failing due to RA fail. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-03 13:27:50 -05:00
Rob Clark	8a5f2d9444	freedreno/ir3: add Sethi–Ullman numbering pass Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-03 13:27:50 -05:00
Rob Clark	c8e351ee3a	freedreno/ir3: include nopN in expanded instruction count Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-03-03 13:27:50 -05:00
Dave Airlie	cb4e3e3ef6	st/mesa: add support for lowering fp64/int64 for nir drivers This might enough for iris and possible r600 (when it gets NIR) This appears to work for iris. v2: * change cap return so DOUBLES == 2 means sw emu v3: * Refactor using int64/doubles lowering options which were added into nir options * Remove DOUBLES == 2 added in v2 [jordan: Remove "2" value on PIPE_CAP_DOUBLES] [jordan: Use lowering options added to nir options] Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-02 14:33:44 -08:00
Jordan Justen	7de056e1a9	scons: Generate float64_glsl.h for glsl_to_nir fp64 lowering Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-02 14:33:44 -08:00
Jordan Justen	10c5579921	intel/compiler: Move int64/doubles lowering options Instead of calculating the int64 and doubles lowering options each time a shader is preprocessed, save and use the values in nir_shader_compiler_options. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-02 14:33:44 -08:00
Jordan Justen	31b35916dd	nir: Add int64/doubles options into nir_shader_compiler_options This will allow the options to be visible under nir_shader->options, which will allow the gallium state_tracker to use the driver preferred settings during glsl_to_nir. Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-03-02 14:33:41 -08:00
Ian Romanick	bae0c36751	nir/algebraic: Optimize away an fsat of a b2f The b2f can only produce 0.0 or 1.0, so the fsat does nothing. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-02 13:58:56 -08:00
Ian Romanick	d1d56f5f9a	intel/fs: Don't assert on b2f with a saturate modifier This ran afoul of Iris's use of nir_lower_clamp_color_outputs which applies fsat() before writes to vertex shader color outpus. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Fixes: `7725d60938` ("intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a))")	2019-03-02 13:58:50 -08:00
Lionel Landwerlin	32ffd90002	anv: add support for INTEL_DEBUG=bat As requested by Ken ;) v2: Also decode simple batches (Caio) Fix u_vector usage issues (Lionel) v3: Make binding/instruction/state/surface available (Lionel) v4: Going through device pools for simple batches (Lionel) Centralize search BO callbacks into anv_device.c (Lionel) v5: Clear decoded batch buffer var after use (Caio) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-03-02 12:53:21 +00:00
Eric Anholt	f1122f78b7	v3d: Fix build of NEON code with Mesa's cflags not targeting NEON. v3d may be built as part of a set of drivers in a system not requiring NEON, but we know V3D devices will be paired with CPUs with NEON so we should be able to use this asm. Fixes: `0c05198d6b` ("v3d: Always enable the NEON utile load/store code.")	2019-03-01 14:21:49 -08:00
Matt Turner	e0148bbcfd	intel/compiler: Add commas on final values of compaction table arrays Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-03-01 13:56:25 -08:00
Ian Romanick	ecc9ffa778	nir/algebraic: Replace a-fract(a) with floor(a) I noticed this while looking at a shader that was affected by Tim's "more loop unrolling" series. In review, Tim Arceri asked: > Why the hurt on Gen6+ is this something that should be in the late > optimisations pass? As far as I can tell, it's just because our scheduler is terrible. In all the fragment shaders that I looked at (some hurt shaders were from other stages), only one of the SIMD8 or SIMD16 version would be hurt. In many of those case, the other SIMD width is improved (e.g., shaders/closed/steam/brutal-legend/3990.shader_test). Often it looks like the scheduler decides to differently schedule a SEND the occurs somewhere early in the shader. Once that happens, everything is different. I looked at one vertex shader that was hurt (from Goat Simulator). In that case, both the floor and fract are used. The optimization eliminates the add, and it should allow better scheduling. In the area of the FRC and RNDD instructions, the scheduler does the right thing. However, later in the shader a MAD and and ADD get scheduled differently, and that makes it slightly worse. In light of this, I tried adding some "is_used_once" mark-up, and that did not fix all the cycles regressions. It also did a lot more harm than good on SKL (helped 82 vs. hurt 241). All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15437001 -> 15435259 (-0.01%) instructions in affected programs: 213651 -> 211909 (-0.82%) helped: 988 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 1.76 x̃: 1 helped stats (rel) min: 0.15% max: 11.54% x̄: 1.14% x̃: 0.59% 95% mean confidence interval for instructions value: -1.89 -1.63 95% mean confidence interval for instructions %-change: -1.23% -1.05% Instructions are helped. total cycles in shared programs: 383007378 -> 382997063 (<.01%) cycles in affected programs: 1650825 -> 1640510 (-0.62%) helped: 679 HURT: 302 helped stats (abs) min: 1 max: 348 x̄: 23.39 x̃: 14 helped stats (rel) min: 0.04% max: 28.77% x̄: 1.61% x̃: 0.98% HURT stats (abs) min: 1 max: 250 x̄: 18.43 x̃: 7 HURT stats (rel) min: 0.04% max: 25.86% x̄: 1.41% x̃: 0.53% 95% mean confidence interval for cycles value: -13.05 -7.98 95% mean confidence interval for cycles %-change: -0.86% -0.50% Cycles are helped. Iron Lake and GM45 had similar results. (GM45 shown) total instructions in shared programs: 5043616 -> 5043010 (-0.01%) instructions in affected programs: 119691 -> 119085 (-0.51%) helped: 432 HURT: 0 helped stats (abs) min: 1 max: 27 x̄: 1.40 x̃: 1 helped stats (rel) min: 0.10% max: 8.11% x̄: 0.66% x̃: 0.39% 95% mean confidence interval for instructions value: -1.58 -1.23 95% mean confidence interval for instructions %-change: -0.72% -0.59% Instructions are helped. total cycles in shared programs: 128139812 -> 128135762 (<.01%) cycles in affected programs: 3829724 -> 3825674 (-0.11%) helped: 602 HURT: 0 helped stats (abs) min: 2 max: 486 x̄: 6.73 x̃: 6 helped stats (rel) min: 0.02% max: 4.85% x̄: 0.19% x̃: 0.10% 95% mean confidence interval for cycles value: -8.40 -5.05 95% mean confidence interval for cycles %-change: -0.22% -0.16% Cycles are helped. Reviewed-by: Elie Tournier <tournier.elie@gmail.com>	2019-03-01 12:43:25 -08:00
Ian Romanick	1edf67fc3f	intel/fs: Generate if instructions with inverted conditions Per-platform results were all over the place, so I have included all the results here. There is an important note at the bottom of the commit message. Skylake total instructions in shared programs: 15184683 -> 15184679 (<.01%) instructions in affected programs: 2786 -> 2782 (-0.14%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.05% max: 0.84% x̄: 0.44% x̃: 0.44% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.96% 0.07% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 370961367 -> 370961173 (<.01%) cycles in affected programs: 205867 -> 205673 (-0.09%) helped: 5 HURT: 1 helped stats (abs) min: 1 max: 149 x̄: 39.60 x̃: 16 helped stats (rel) min: 0.02% max: 1.05% x̄: 0.45% x̃: 0.55% HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03% 95% mean confidence interval for cycles value: -93.01 28.34 95% mean confidence interval for cycles %-change: -0.82% 0.08% Inconclusive result (value mean confidence interval includes 0). Broadwell total instructions in shared programs: 15465366 -> 15465362 (<.01%) instructions in affected programs: 2799 -> 2795 (-0.14%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.04% max: 0.84% x̄: 0.44% x̃: 0.44% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -0.96% 0.07% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 410938419 -> 410938531 (<.01%) cycles in affected programs: 566028 -> 566140 (0.02%) helped: 18 HURT: 17 helped stats (abs) min: 1 max: 16 x̄: 3.50 x̃: 1 helped stats (rel) min: <.01% max: 1.05% x̄: 0.13% x̃: <.01% HURT stats (abs) min: 1 max: 12 x̄: 10.29 x̃: 12 HURT stats (rel) min: <.01% max: 0.16% x̄: 0.08% x̃: 0.09% 95% mean confidence interval for cycles value: 0.31 6.09 95% mean confidence interval for cycles %-change: -0.10% 0.05% Inconclusive result (%-change mean confidence interval includes 0). Haswell total instructions in shared programs: 13749760 -> 13749759 (<.01%) instructions in affected programs: 2241 -> 2240 (-0.04%) helped: 1 HURT: 0 total cycles in shared programs: 385398913 -> 385398363 (<.01%) cycles in affected programs: 554914 -> 554364 (-0.10%) helped: 31 HURT: 1 helped stats (abs) min: 1 max: 453 x̄: 18.00 x̃: 6 helped stats (rel) min: <.01% max: 0.25% x̄: 0.03% x̃: 0.05% HURT stats (abs) min: 8 max: 8 x̄: 8.00 x̃: 8 HURT stats (rel) min: 0.06% max: 0.06% x̄: 0.06% x̃: 0.06% 95% mean confidence interval for cycles value: -45.88 11.51 95% mean confidence interval for cycles %-change: -0.05% -0.02% Inconclusive result (value mean confidence interval includes 0). Ivy Bridge total cycles in shared programs: 180663626 -> 180663881 (<.01%) cycles in affected programs: 472350 -> 472605 (0.05%) helped: 15 HURT: 30 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01% HURT stats (abs) min: 8 max: 10 x̄: 9.00 x̃: 9 HURT stats (rel) min: 0.06% max: 0.14% x̄: 0.10% x̃: 0.10% 95% mean confidence interval for cycles value: 4.21 7.12 95% mean confidence interval for cycles %-change: 0.05% 0.08% Cycles are HURT. Sandy Bridge total cycles in shared programs: 154568664 -> 154569225 (<.01%) cycles in affected programs: 356486 -> 357047 (0.16%) helped: 1 HURT: 31 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.02% max: 0.02% x̄: 0.02% x̃: 0.02% HURT stats (abs) min: 4 max: 33 x̄: 18.16 x̃: 8 HURT stats (rel) min: 0.05% max: 0.23% x̄: 0.14% x̃: 0.10% 95% mean confidence interval for cycles value: 12.19 22.87 95% mean confidence interval for cycles %-change: 0.10% 0.16% Cycles are HURT. Iron Lake total instructions in shared programs: 8206589 -> 8206565 (<.01%) instructions in affected programs: 3024 -> 3000 (-0.79%) helped: 12 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.75% max: 0.83% x̄: 0.80% x̃: 0.80% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -0.82% -0.77% Instructions are helped. total cycles in shared programs: 187657428 -> 187656228 (<.01%) cycles in affected programs: 95748 -> 94548 (-1.25%) helped: 12 HURT: 0 helped stats (abs) min: 80 max: 120 x̄: 100.00 x̃: 100 helped stats (rel) min: 1.00% max: 1.66% x̄: 1.27% x̃: 1.21% 95% mean confidence interval for cycles value: -113.27 -86.73 95% mean confidence interval for cycles %-change: -1.43% -1.11% Cycles are helped. GM45 total instructions in shared programs: 5037569 -> 5037557 (<.01%) instructions in affected programs: 1521 -> 1509 (-0.79%) helped: 6 HURT: 0 helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 helped stats (rel) min: 0.75% max: 0.83% x̄: 0.79% x̃: 0.79% 95% mean confidence interval for instructions value: -2.00 -2.00 95% mean confidence interval for instructions %-change: -0.83% -0.75% Instructions are helped. total cycles in shared programs: 128101478 -> 128100758 (<.01%) cycles in affected programs: 52746 -> 52026 (-1.37%) helped: 6 HURT: 0 helped stats (abs) min: 120 max: 120 x̄: 120.00 x̃: 120 helped stats (rel) min: 1.16% max: 1.66% x̄: 1.41% x̃: 1.41% 95% mean confidence interval for cycles value: -120.00 -120.00 95% mean confidence interval for cycles %-change: -1.70% -1.12% Cycles are helped. This change has almost no effect right now. However, removing this patch (but leaving the patch "nir/algebraic: Replace a bcsel of a b2f with a b2f(!(a \|\| b))") after adding a patch that removes !(a < b) -> (a >= b) optimizations (like https://patchwork.freedesktop.org/patch/264787/) has the following results on Skylake: Skylake total instructions in shared programs: 15071022 -> 15089710 (0.12%) instructions in affected programs: 1022219 -> 1040907 (1.83%) helped: 1 HURT: 3937 helped stats (abs) min: 41 max: 41 x̄: 41.00 x̃: 41 helped stats (rel) min: 1.01% max: 1.01% x̄: 1.01% x̃: 1.01% HURT stats (abs) min: 1 max: 256 x̄: 4.76 x̃: 4 HURT stats (rel) min: 0.05% max: 11.18% x̄: 2.59% x̃: 2.60% 95% mean confidence interval for instructions value: 4.56 4.93 95% mean confidence interval for instructions %-change: 2.54% 2.64% Instructions are HURT. total cycles in shared programs: 369777134 -> 370092923 (0.09%) cycles in affected programs: 17516573 -> 17832362 (1.80%) helped: 115 HURT: 3624 helped stats (abs) min: 1 max: 1721 x̄: 81.18 x̃: 28 helped stats (rel) min: <.01% max: 10.74% x̄: 1.24% x̃: 0.65% HURT stats (abs) min: 1 max: 12640 x̄: 89.71 x̃: 54 HURT stats (rel) min: <.01% max: 28.24% x̄: 4.72% x̃: 4.52% 95% mean confidence interval for cycles value: 75.21 93.71 95% mean confidence interval for cycles %-change: 4.43% 4.64% Cycles are HURT. total spills in shared programs: 9450 -> 9442 (-0.08%) spills in affected programs: 166 -> 158 (-4.82%) helped: 2 HURT: 0 total fills in shared programs: 21115 -> 21094 (-0.10%) fills in affected programs: 438 -> 417 (-4.79%) helped: 2 HURT: 0 LOST: 1 GAINED: 0 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	d40640efe8	nir/algebraic: Replace a bcsel of a b2f sources with a b2f(!(a \|\| b)) I have not investigated the result of doing this during code generation. That should be possible, but it would be a bit more effort. All Gen6+ platforms had nearly identical results. (Skylake shown) total cycles in shared programs: 370961508 -> 370961367 (<.01%) cycles in affected programs: 5174 -> 5033 (-2.73%) helped: 2 HURT: 0 Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8206587 -> 8206589 (<.01%) instructions in affected programs: 1325 -> 1327 (0.15%) helped: 0 HURT: 2 total cycles in shared programs: 187657422 -> 187657428 (<.01%) cycles in affected programs: 11566 -> 11572 (0.05%) helped: 0 HURT: 2 This change has almost no effect right now. However, removing this patch (but leaving the patch "intel/fs: Generate if instructions with inverted conditions") after adding a patch that removes !(a < b) -> (a >= b) optimizations (like https://patchwork.freedesktop.org/patch/264787/) has the following results on Skylake: Skylake total instructions in shared programs: 15071804 -> 15071806 (<.01%) instructions in affected programs: 640 -> 642 (0.31%) helped: 0 HURT: 2 total cycles in shared programs: 369914348 -> 369916569 (<.01%) cycles in affected programs: 27900 -> 30121 (7.96%) helped: 4 HURT: 15 helped stats (abs) min: 2 max: 112 x̄: 30.00 x̃: 3 helped stats (rel) min: 0.28% max: 12.28% x̄: 3.34% x̃: 0.40% HURT stats (abs) min: 2 max: 758 x̄: 156.07 x̃: 81 HURT stats (rel) min: 0.20% max: 74.30% x̄: 16.29% x̃: 16.91% 95% mean confidence interval for cycles value: 12.68 221.11 95% mean confidence interval for cycles %-change: 3.09% 21.23% Cycles are HURT. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	7725d60938	intel/fs: Emit better code for b2f(inot(a)) and b2i(inot(a)) Since Boolean values are either -1 (true) or 0 (false), b2f(inot(a)) maps -1 => 0.0 and 0 => 1.0. This is equivalent to 1.0 + float(boolBitsToInt(a)). On Intel GPUs, ADD is one of the few instructions that can type-convert during write to destination, so we can achieve this in a single instruction: add g47F, g26D, 1D v2: Fix swizzles. v3: Fix typos in comments. Noticed by Ken. All Gen6+ platforms had similar results. (Skylake shown) Skylake total instructions in shared programs: 15185583 -> 15184683 (<.01%) instructions in affected programs: 239389 -> 238489 (-0.38%) helped: 899 HURT: 1 helped stats (abs) min: 1 max: 2 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.15% max: 1.85% x̄: 0.49% x̃: 0.44% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.09% max: 0.09% x̄: 0.09% x̃: 0.09% 95% mean confidence interval for instructions value: -1.01 -0.99 95% mean confidence interval for instructions %-change: -0.51% -0.48% Instructions are helped. total cycles in shared programs: 370964249 -> 370961508 (<.01%) cycles in affected programs: 1487586 -> 1484845 (-0.18%) helped: 420 HURT: 268 helped stats (abs) min: 1 max: 232 x̄: 22.41 x̃: 6 helped stats (rel) min: 0.05% max: 22.60% x̄: 1.30% x̃: 0.41% HURT stats (abs) min: 1 max: 230 x̄: 24.90 x̃: 10 HURT stats (rel) min: <.01% max: 21.60% x̄: 1.45% x̃: 0.52% 95% mean confidence interval for cycles value: -7.61 -0.36 95% mean confidence interval for cycles %-change: -0.44% -0.02% Cycles are helped. No changes on Iron Lake or GM45. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	cb3e21cd19	intel/fs: Use De Morgan's laws to avoid logical-not of a logic result on Gen8+ Instead of emitting ~(a & b), emit (~a \| ~b) since logical-not of operands is free on Gen8+. v2: Fix swizzles. Fix types for cmod propagation. v3: Simplify logic for inverting source of inot(ixor(a, b)). Suggested by Ken. Skylake and Broadwell had similar results. (Skylake shown) Skylake total instructions in shared programs: 15185593 -> 15185583 (<.01%) instructions in affected programs: 5673 -> 5663 (-0.18%) helped: 12 HURT: 1 helped stats (abs) min: 1 max: 2 x̄: 1.17 x̃: 1 helped stats (rel) min: 0.30% max: 5.88% x̄: 1.50% x̃: 0.70% HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12% 95% mean confidence interval for instructions value: -1.66 0.13 95% mean confidence interval for instructions %-change: -2.60% -0.15% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 370977726 -> 370964249 (<.01%) cycles in affected programs: 869987 -> 856510 (-1.55%) helped: 15 HURT: 2 helped stats (abs) min: 2 max: 6640 x̄: 902.20 x̃: 16 helped stats (rel) min: <.01% max: 4.92% x̄: 1.71% x̃: 1.53% HURT stats (abs) min: 14 max: 42 x̄: 28.00 x̃: 28 HURT stats (rel) min: 1.08% max: 3.18% x̄: 2.13% x̃: 2.13% 95% mean confidence interval for cycles value: -1654.87 69.34 95% mean confidence interval for cycles %-change: -2.29% -0.23% Inconclusive result (value mean confidence interval includes 0). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	8eb36c9129	intel/fs: Emit logical-not of operands on Gen8+ On Gen8+ specifying negation of a logical operation such as AND actually performs a logical-not. Take advantage of this to generate fewer instructions. v2: Major rebase. Use nir_src_as_alu_instr. Fix swizzle handling. No changes on any pre-Gen8 platform. Skylake and Broadwell had similar results. (Broadwell shown) total instructions in shared programs: 15466902 -> 15466274 (<.01%) instructions in affected programs: 1262953 -> 1262325 (-0.05%) helped: 682 HURT: 4 helped stats (abs) min: 1 max: 5 x̄: 1.02 x̃: 1 helped stats (rel) min: 0.03% max: 2.40% x̄: 0.18% x̃: 0.04% HURT stats (abs) min: 1 max: 62 x̄: 17.50 x̃: 3 HURT stats (rel) min: 0.03% max: 1.89% x̄: 0.53% x̃: 0.10% 95% mean confidence interval for instructions value: -1.10 -0.73 95% mean confidence interval for instructions %-change: -0.19% -0.15% Instructions are helped. total cycles in shared programs: 410996093 -> 410950440 (-0.01%) cycles in affected programs: 144389048 -> 144343395 (-0.03%) helped: 519 HURT: 51 helped stats (abs) min: 1 max: 1060 x̄: 104.46 x̃: 140 helped stats (rel) min: 0.01% max: 10.98% x̄: 0.34% x̃: 0.03% HURT stats (abs) min: 1 max: 4060 x̄: 167.90 x̃: 22 HURT stats (rel) min: <.01% max: 8.20% x̄: 0.96% x̃: 0.25% 95% mean confidence interval for cycles value: -97.16 -63.02 95% mean confidence interval for cycles %-change: -0.32% -0.13% Cycles are helped. total spills in shared programs: 95311 -> 95329 (0.02%) spills in affected programs: 881 -> 899 (2.04%) helped: 0 HURT: 4 total fills in shared programs: 93629 -> 93634 (<.01%) fills in affected programs: 794 -> 799 (0.63%) helped: 1 HURT: 2 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	06eaaf2de9	intel/fs: Refactor ALU source and destination handling to a separate function Other places will need to do this soon to properly handle source swizzles. The patch looks a little odd, but the change is pretty straight forward. All of the swizzle and mask handling is moved out, but the code for handling move instructions and vecN instructions remains in nir_emit_alu. I'm not terribly pleased with the "need_dest" parameter, but get_nir_dest is (somewhat surprisingly) destructive. I am open to suggestions of alternatives. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	fb3ca9109c	intel/fs: Handle OR source modifiers in algebraic optimization Found by inspection. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	c9d5bd050c	intel/fs: Relax type matching rules in cmod propagation from MOV instructions To allow cmod propagation from a MOV in a sequence like: and(16) g31<1>UD g20<8,8,1>UD g22<8,8,1>UD mov.nz.f0(16) null<1>F g31<8,8,1>D A similar change to the vec4 backend had no effect. Somewhere between `c1ec582059` and `40fc4b5acd` (1,094 commits) the effectiveness of this patch diminished, and as of commit `d7e0d47b9d` (nir: Add a bunch of b2[if] optimizations) this optimization no longer has any effect on any platform. A later patch "intel/fs: Use De Morgan's laws to avoid logical-not of a logic result on Gen8+," generates some instruction sequences that require this change in order for cmod propagation to make progress. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	eae19f5f19	nir/algebraic: Replace i2b used by bcsel or if-statement with comparison All of the helped shaders are in Deus Ex. I looked at a couple shaders, and they have a pattern like: vec1 32 ssa_373 = i2b32 ssa_345.w vec1 32 ssa_374 = bcsel ssa_373, ssa_20, ssa_0 ... vec1 32 ssa_377 = ine ssa_345.w, ssa_0 if ssa_377 { ... vec1 32 ssa_416 = i2b32 ssa_385.w vec1 32 ssa_417 = bcsel ssa_416, ssa_386, ssa_374 ... } The massive help occurs because the i2b32 is removed, then other passes determine that ssa_374 must be ssa_20 inside the if-statement allowing the first bcsel to also be deleted. v2: Rebase on 1-bit Boolean changes. v3: Fix i2b32 vs ine problem in if-statement replacement. Noticed by Bas. Skylake total instructions in shared programs: 15241394 -> 15186287 (-0.36%) instructions in affected programs: 890583 -> 835476 (-6.19%) helped: 355 HURT: 0 helped stats (abs) min: 1 max: 497 x̄: 155.23 x̃: 149 helped stats (rel) min: 0.09% max: 16.49% x̄: 6.10% x̃: 6.59% 95% mean confidence interval for instructions value: -165.07 -145.39 95% mean confidence interval for instructions %-change: -6.42% -5.77% Instructions are helped. total cycles in shared programs: 373846583 -> 371023357 (-0.76%) cycles in affected programs: 118972102 -> 116148876 (-2.37%) helped: 343 HURT: 14 helped stats (abs) min: 45 max: 118284 x̄: 8332.32 x̃: 6089 helped stats (rel) min: 0.03% max: 38.19% x̄: 2.48% x̃: 1.77% HURT stats (abs) min: 120 max: 4126 x̄: 2482.79 x̃: 3019 HURT stats (rel) min: 0.16% max: 17.37% x̄: 2.13% x̃: 1.11% 95% mean confidence interval for cycles value: -8723.28 -7093.12 95% mean confidence interval for cycles %-change: -2.57% -2.02% Cycles are helped. total spills in shared programs: 32401 -> 23465 (-27.58%) spills in affected programs: 24457 -> 15521 (-36.54%) helped: 343 HURT: 0 total fills in shared programs: 37866 -> 31765 (-16.11%) fills in affected programs: 18889 -> 12788 (-32.30%) helped: 343 HURT: 0 Broadwell and Haswell had similar results. (Haswell shown) Haswell total instructions in shared programs: 13764783 -> 13750679 (-0.10%) instructions in affected programs: 1176256 -> 1162152 (-1.20%) helped: 334 HURT: 21 helped stats (abs) min: 1 max: 358 x̄: 42.59 x̃: 47 helped stats (rel) min: 0.09% max: 11.81% x̄: 1.30% x̃: 1.37% HURT stats (abs) min: 1 max: 61 x̄: 5.76 x̃: 1 HURT stats (rel) min: 0.03% max: 1.84% x̄: 0.17% x̃: 0.03% 95% mean confidence interval for instructions value: -43.99 -35.47 95% mean confidence interval for instructions %-change: -1.35% -1.08% Instructions are helped. total cycles in shared programs: 386511910 -> 385402528 (-0.29%) cycles in affected programs: 143831110 -> 142721728 (-0.77%) helped: 327 HURT: 39 helped stats (abs) min: 16 max: 25219 x̄: 3519.74 x̃: 3570 helped stats (rel) min: <.01% max: 10.26% x̄: 0.95% x̃: 0.96% HURT stats (abs) min: 16 max: 4881 x̄: 1065.95 x̃: 997 HURT stats (rel) min: <.01% max: 16.67% x̄: 0.70% x̃: 0.24% 95% mean confidence interval for cycles value: -3375.59 -2686.60 95% mean confidence interval for cycles %-change: -0.92% -0.64% Cycles are helped. total spills in shared programs: 100480 -> 97846 (-2.62%) spills in affected programs: 84702 -> 82068 (-3.11%) helped: 316 HURT: 21 total fills in shared programs: 96877 -> 94369 (-2.59%) fills in affected programs: 69167 -> 66659 (-3.63%) helped: 316 HURT: 9 No changes on Ivy Bridge or earlier platforms. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:42:14 -08:00
Ian Romanick	d2056ab993	intel/vec4: Emit constants for some ALU sources as immediate values In some cases of flow control, the constant propagation is not able to determine that the source of an instruction must be a constant value. When we still have NIR SSA values, we can easily determine this. Emit the immediate value during code generation to possible avoid spurious loads of constants into registers. I wrote this patch to prevent a couple trivial regressions in vec4 shaders caused by "nir/algebraic: Replace i2b used by bcsel or if-statement with comparison". The final result was quite a bit better than that... No shader-db changes on any Gen8+ platform. v2: Assert that we never get a negation source modifier on Gen8+. Suggested by Ken. This should never happen because we don't normally use vec4 for Gen8+ (requires and environment variable to force it), and there's no code to generate these negations. Still, erring on the side of caution is better. Haswell total instructions in shared programs: 13776218 -> 13764783 (-0.08%) instructions in affected programs: 663931 -> 652496 (-1.72%) helped: 3495 HURT: 1 helped stats (abs) min: 1 max: 30 x̄: 3.28 x̃: 2 helped stats (rel) min: 0.21% max: 10.00% x̄: 1.79% x̃: 1.49% HURT stats (abs) min: 24 max: 24 x̄: 24.00 x̃: 24 HURT stats (rel) min: 12.24% max: 12.24% x̄: 12.24% x̃: 12.24% 95% mean confidence interval for instructions value: -3.39 -3.15 95% mean confidence interval for instructions %-change: -1.84% -1.75% Instructions are helped. total cycles in shared programs: 386818984 -> 386511910 (-0.08%) cycles in affected programs: 20379636 -> 20072562 (-1.51%) helped: 3052 HURT: 476 helped stats (abs) min: 2 max: 12516 x̄: 110.40 x̃: 6 helped stats (rel) min: 0.05% max: 24.68% x̄: 1.58% x̃: 0.69% HURT stats (abs) min: 2 max: 416 x̄: 62.76 x̃: 24 HURT stats (rel) min: 0.10% max: 10.75% x̄: 4.03% x̃: 2.18% 95% mean confidence interval for cycles value: -115.57 -58.51 95% mean confidence interval for cycles %-change: -0.93% -0.73% Cycles are helped. total spills in shared programs: 100482 -> 100480 (<.01%) spills in affected programs: 79 -> 77 (-2.53%) helped: 3 HURT: 1 total fills in shared programs: 96883 -> 96877 (<.01%) fills in affected programs: 85 -> 79 (-7.06%) helped: 4 HURT: 0 Ivy Bridge total instructions in shared programs: 12000562 -> 11990113 (-0.09%) instructions in affected programs: 572581 -> 562132 (-1.82%) helped: 3106 HURT: 0 helped stats (abs) min: 1 max: 30 x̄: 3.36 x̃: 2 helped stats (rel) min: 0.21% max: 10.00% x̄: 1.86% x̃: 1.49% 95% mean confidence interval for instructions value: -3.49 -3.23 95% mean confidence interval for instructions %-change: -1.91% -1.81% Instructions are helped. total cycles in shared programs: 180958504 -> 180664500 (-0.16%) cycles in affected programs: 19991810 -> 19697806 (-1.47%) helped: 2654 HURT: 486 helped stats (abs) min: 2 max: 12516 x̄: 121.61 x̃: 6 helped stats (rel) min: 0.05% max: 20.66% x̄: 1.48% x̃: 0.68% HURT stats (abs) min: 2 max: 396 x̄: 59.18 x̃: 24 HURT stats (rel) min: 0.05% max: 9.62% x̄: 3.82% x̃: 2.16% 95% mean confidence interval for cycles value: -125.62 -61.64 95% mean confidence interval for cycles %-change: -0.76% -0.56% Cycles are helped. Sandy Bridge total instructions in shared programs: 10842336 -> 10835438 (-0.06%) instructions in affected programs: 395340 -> 388442 (-1.74%) helped: 1926 HURT: 0 helped stats (abs) min: 1 max: 22 x̄: 3.58 x̃: 2 helped stats (rel) min: 0.10% max: 9.68% x̄: 1.78% x̃: 1.42% 95% mean confidence interval for instructions value: -3.73 -3.43 95% mean confidence interval for instructions %-change: -1.84% -1.72% Instructions are helped. total cycles in shared programs: 154590074 -> 154569050 (-0.01%) cycles in affected programs: 8159932 -> 8138908 (-0.26%) helped: 1670 HURT: 228 helped stats (abs) min: 2 max: 260 x̄: 18.13 x̃: 6 helped stats (rel) min: 0.02% max: 8.70% x̄: 0.74% x̃: 0.28% HURT stats (abs) min: 2 max: 1798 x̄: 40.58 x̃: 14 HURT stats (rel) min: 0.03% max: 12.97% x̄: 1.04% x̃: 0.31% 95% mean confidence interval for cycles value: -13.51 -8.64 95% mean confidence interval for cycles %-change: -0.60% -0.46% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8212357 -> 8206587 (-0.07%) instructions in affected programs: 323664 -> 317894 (-1.78%) helped: 1457 HURT: 0 helped stats (abs) min: 1 max: 12 x̄: 3.96 x̃: 3 helped stats (rel) min: 0.33% max: 11.49% x̄: 1.86% x̃: 1.44% 95% mean confidence interval for instructions value: -4.14 -3.78 95% mean confidence interval for instructions %-change: -1.93% -1.78% Instructions are helped. total cycles in shared programs: 187668016 -> 187657422 (<.01%) cycles in affected programs: 14856234 -> 14845640 (-0.07%) helped: 1372 HURT: 83 helped stats (abs) min: 2 max: 24 x̄: 7.92 x̃: 6 helped stats (rel) min: 0.02% max: 1.14% x̄: 0.12% x̃: 0.08% HURT stats (abs) min: 2 max: 14 x̄: 3.20 x̃: 2 HURT stats (rel) min: 0.03% max: 0.60% x̄: 0.12% x̃: 0.12% 95% mean confidence interval for cycles value: -7.65 -6.91 95% mean confidence interval for cycles %-change: -0.11% -0.10% Cycles are helped. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-03-01 12:41:46 -08:00
Eric Engestrom	fc82ea1350	Revert "swr/rast: Archrast codegen updates" This reverts the following commits: `71a76a47cc` "swr/codegen: fix autotools build" `7763e664ce` "meson/swr: replace hard-coded path with current_build_dir()" `773b3ceaca` "swr/rast: Fix autotools and scons codegen" `16e10b8c30` "swr/rast: Add general SWTag statistics" `b45a15a39f` "swr/rast: Add string handling to AR event framework" `8608a747aa` "swr/rast: Add initial SWTag proto definitions" `93cd9905c8` "swr/rast: Cleanup and generalize gen_archrast" The last one in this list broke all the build systems that can build this (meson, autotools & scons). See MR !304 for more details: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/304 Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-03-01 16:46:32 +00:00
Fritz Koenig	12af6b30a3	freedreno/a6xx: Enable UBWC modifier Adding the supported_modifiers allows buffers to be created with UBWC	2019-03-01 15:51:16 +00:00
Fritz Koenig	4715e7a98a	freedreno: UBWC allocator UBWC requires space for a metadata or flag buffer that contains compression data. Each 16x4 tile of image data corresponds to a byte of compression data. This buffer needs to be stored before (at a lower address) the image buffer in order to match up with what the display driver. This allows the display driver to directly scan-out at UBWC buffer.	2019-03-01 15:51:16 +00:00
Fritz Koenig	3e6758a4e7	freedreno/a6xx: UBWC support Universal bandwidth compression(UBWC) reduces memory bandwidth by compressing buffers. This compression takes the form of a full sized image buffer as well as a smaller metadata buffer.	2019-03-01 15:51:16 +00:00
Fritz Koenig	41082446db	freedreno: pass count to query_dmabuf_modifiers query_dmabuf_modifiers needs to know the max number of modifiers that the list will hold.	2019-03-01 15:51:16 +00:00
Eric Engestrom	2793417ec6	anv: fix typo Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-01 11:20:28 +00:00
Eric Engestrom	258e463db5	anv: remove spaces around kwargs assignment pylint complains: > C0326: No space allowed around keyword argument assignment Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-01 11:20:28 +00:00
Eric Engestrom	7b704fd2fd	anv: drop unused parameter I'm guessing a previous version of this script used an index-based map of entrypoints, but that's not the case anymore. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-01 11:20:28 +00:00
Eric Engestrom	b503d4e458	anv: simplify chained comparison Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-03-01 11:20:28 +00:00
Caio Marcelo de Oliveira Filho	1458aa1f78	nir/copy_prop_vars: handle indirect vector elements Differently than the direct case, the indirect array derefs of vector are handled like regular derefs, with the exception that we ignore any vector entry that has SSA values when performing a load. Such SSA values don't help loading of the indirect unless we emit an if-ladder. Copy_derefs are supported for indirects. Also enable two tests that now pass. v2: Remove unnecessary temporaries. Be clearer when identifying the case where copy_entry doesn't help when we are dealing with an indirect array_deref (of a vector). (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho	6c0de78cc2	nir/copy_prop_vars: prefer using entries from equal derefs When looking up an entry to use, always prefer an equal match, as it more likely to contain reusable SSA or derefs to propagate. This will be necessary when adding entries with array derefs of vectors, because we don't want the vector if the equal entry (an array deref of that vector) is present. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho	61965afd00	nir/copy_prop_vars: add tests for indirect array deref Both on an actual array and on a vector, and an extra test on a vector mixing direct and indirect access. The vector tests are disabled and will be enabled by a later commit. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:55:31 -08:00
Caio Marcelo de Oliveira Filho	96c32d7776	nir/copy_prop_vars: handle load/store of vector elements When direct array deref is used on a vector type (for loads and stores), copy_prop_vars is now smart to propagate values it knows about. Given a 'vec4 v', storing to v[3] will update the copy entry for v and it is equivalent to a write to v.w. Loading from v[1] will try first to see if there's a known value for v.y -- and drop the load in that case. The copy entries still always refer to the entire vectors, so the operations happen on the parent deref (the 'vector') and the values are fixed accordingly. It might be the case now that certain entries have not only different SSA defs in each element but also those come from different components than they are set to, because stores to individual elements always come from a SSA definition with a single component. Tests related to these cases are now enabled. v2: Instead of asserting on invalid indices, "load" an undef and remove the store. (Jason) v3: Merge code path for the cases of is_array_deref_of_vector into the regular code path. Add a base_index parameter to value_set_from_value. (code changes by Jason) v4: Removed the get_entry_for_deref helper, now being used only once. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:50:05 -08:00
Caio Marcelo de Oliveira Filho	33dafdc024	nir/copy_prop_vars: use NIR_MAX_VEC_COMPONENTS Also replace uses of 0xf with the appropriate full mask created from the number of components. Note that an increase of MAX might make us change how the data is stored later on, but for now at least we make sure the pass is not hardcoded. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:50:05 -08:00
Caio Marcelo de Oliveira Filho	e84c841fb0	nir/copy_prop_vars: rename/refactor store_to_entry helper The name reflected this function role back when the pass also did dead write elimination. So rename it to what it does now, which is setting a value using another value; and narrow the argument list. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 23:50:05 -08:00
Christian Gmeiner	6c61449251	etnaviv: fix compile warnings Fixes the following compile warnings: [591/629] Compiling C object 'src/gallium/drivers/etnaviv/df32d18@@etnaviv@sta/etnaviv_context.c.o'. ../../src/ac_mesa/src/gallium/drivers/etnaviv/etnaviv_context.c: In function 'etna_cmd_stream_reset_notify': ../../src/ac_mesa/src/gallium/drivers/etnaviv/etnaviv_context.c:334:22: warning: unused variable 'entry' [-Wunused-variable] struct set_entry entry; ^~~~~ [604/629] Compiling C object 'src/gallium/drivers/etnaviv/df32d18@@etnaviv@sta/etnaviv_resource.c.o'. ../../src/ac_mesa/src/gallium/drivers/etnaviv/etnaviv_resource.c: In function 'etna_resource_used': ../../src/ac_mesa/src/gallium/drivers/etnaviv/etnaviv_resource.c:649:22: warning: unused variable 'entry' [-Wunused-variable] struct set_entry entry; ^~~~~ Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-03-01 08:45:05 +01:00
Christian Gmeiner	64813541d5	etnaviv: fix resource usage tracking across different pipe_context's A pipe_resource can be shared by all the pipe_context's hanging off the same pipe_screen. Changes from v2 -> v3: - add locking with mtx_*() to resource and screen (Marek) Changes from v3 -> v4: - drop rsc->lock, just use screen->lock for the entire serialization (Marek) - simplify etna_resource_used() flush condition, which also prevents potentially flushing resources twice (Marek) - don't remove resouces from screen->used_resources in etna_cmd_stream_reset_notify(), they may still be used in other contexts and may need flushing there later on (Marek) Changes from v4 -> v5: - Fix coding style issues reported by Guido Changes from v5 -> v6: - Add missing locking in etna_transfer_map(..) (Boris) Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Tested-by: Marek Vasut <marex@denx.de> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Tested-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-03-01 08:08:56 +01:00
Christian Gmeiner	f1061fa577	etnaviv: enable ETC2 texture compression support for HALTI0 GPUs Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>	2019-03-01 08:02:17 +01:00
Christian Gmeiner	5d09325c1c	etnaviv: hook-up etc2 patching Changes v1 -> v2: - Avoid the GPU sampling from the resource that gets mutated by the the transfer map by setting DRM_ETNA_PREP_WRITE. Changes v2 -> v3: - make use of likely(..) - drop minor optimization regarding rsc->layout == ETNA_LAYOUT_LINEAR - better documentation why DRM_ETNA_PREP_WRITE is needed Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>	2019-03-01 08:02:17 +01:00
Christian Gmeiner	d8177f6233	etnaviv: keep track of mapped bo address Saves us from calling etna_bo_map(..) and saves us from doing the same offset calcs for map() and unmap() operations. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>	2019-03-01 08:02:17 +01:00
Christian Gmeiner	5bb4e6956d	etnaviv: implement ETC2 block patching for HALTI0 ETC2 is supported with HALTI0, however that implementation is buggy in hardware. The blob driver does per-block patching to work around this. We need to swap colors for t-mode etc2 blocks. Changes v2 -> v3: - Drop redundant format check Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Acked-by: Lucas Stach <l.stach@pengutronix.de>	2019-03-01 08:02:17 +01:00
Jason Ekstrand	e8f863e718	intel/compiler: Re-prefix non-logical surface opcodes with VEC4 The scalar back-end uses SHADER_OPCODE_SEND for all surface messages so we no longer need the non-logical opcodes there. Prefix them VEC4 so it's clear that they're only used by the vec4 back-end. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-28 16:58:20 -06:00
Jason Ekstrand	95ae400abc	intel/schedule_instructions: Move some comments Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-28 16:58:20 -06:00
Jason Ekstrand	aeaba24fcb	intel/compiler: Drop unused surface opcodes The unused typed surface read/write support in the vec4 back-end has been dropped and the fs back-end now uses SHADER_OPCODE_SEND for all image and buffer ops. There's no reason to keep these opcodes around anymore. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-28 16:58:20 -06:00
Jason Ekstrand	a04c737215	intel/fs: Get rid of the IMAGE_SIZE opcode Since switching to SHADER_OPCODE_SEND for image operations, we no longer need the non-logical opcode. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-28 16:58:20 -06:00
Jason Ekstrand	10b7d14c31	intel/vec4: Drop dead code for handling typed surface messages Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-28 16:58:20 -06:00
Jason Ekstrand	9d437f9482	intel/fs: Drop the fs_surface_builder All of the actual abstraction (except possibly setting size_written) happens as part of the logical opcodes. The only thing that the surface builder is providing at this point is extra levels of functions to call through. I'm going to be adding bindless image support soon and all the extra abstraction here is just getting in the way. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-28 16:58:20 -06:00
Jason Ekstrand	494a0543e6	intel/fs: Re-order logical surface arguments It makes more sense to start at the surface then move on to the address and then the data. Also, this is a really good test of whether or not we got all the places that use the sources by explicit integer number. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-28 16:58:20 -06:00
Jason Ekstrand	94f8fd9a0c	intel/fs: Add an enum type for logical sampler inst sources Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-28 16:58:20 -06:00
Jose Fonseca	838c0485e0	scons: Workaround failures with MSVC when using SCons 3.0.[2-4]. This change applies the workaround suggested by Bill Deegan on the affected SCons versions. It also adds a comment with the URL explaining why we were using customizing the decider and max_drift in the first place, as I had forgotten all about it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109443 Tested-by: liviuprodea@yahoo.com Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2019-02-28 21:26:15 +00:00
Kristian H. Kristensen	87c2e8cbc9	freedreno: Fix a couple of warnings Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-28 10:43:53 -08:00
Kristian H. Kristensen	a5a19d1bc8	freedreno/a6xx: Don't zero SO buffer addresses Just disable SO in VPC_SO_BUF_CNTL. Less noise in dumps. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-28 10:43:53 -08:00
Kristian H. Kristensen	7dee916105	freedreno/a6xx: Only output MRT control for used framebuffers Not much of an optimization, but makes for less noise in the command buffer dumps. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-28 10:43:53 -08:00
Eric Engestrom	df5cd51259	gitlab-ci: install xmllint to validate 00-mesa-defaults.conf Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-02-28 17:30:48 +00:00
Eric Engestrom	bb6b691c57	driconf: add DTD to allow the drirc xml (00-mesa-defaults.conf) to be validated This DTD can be used to validate the drirc xml: $ xmllint --noout --valid 00-mesa-defaults.conf Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-02-28 17:30:44 +00:00
Eric Engestrom	4c3b293242	vulkan: use VkBase{In,Out}Structure instead of a custom struct VkBaseInStructure and VkBaseOutStructure are part of vulkan_core.h (which is part of vulkan.h) Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-02-28 16:25:59 +00:00
Lionel Landwerlin	add4b8930a	vulkan/overlay: add support for fps output in file Also make the sampling period configurable. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-28 12:40:57 +00:00
Lionel Landwerlin	b6b275212d	vulkan/overlay: rework option parsing Makes adding new options easier. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-28 12:40:57 +00:00
Lionel Landwerlin	4e29a1d36a	vulkan/overlay: fix min/max computations This shouldn't be condition to the acquire time being visible. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-28 12:40:57 +00:00
Emil Velikov	7ad1a05c83	egl/sl: use kms_swrast with vgem instead of a random GPU VGEM and kms_swrast were introduced to work with one another. All we do is CPU rendering to dumb buffers. There is no reason to carve out GPU memory, increasing the memory pressure on a device that could make a better use of it. Note: - The original code did not work out of the box, since the dumb buffer ioctls are not exposed to render nodes. - This requires libdrm commit 3df8a7f0 ("xf86drm: fallback to MODALIAS for OF less platform devices") - The non-kms, swrast is unaffected by this change. v2: - elaborate what and how is/isn't working (Eric) - simplify driver_name handling (Eric) v3: - move node_type outside of the loop (Eric) - kill no longer needed DRM_RENDER_DEV_NAME define Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-02-28 12:05:03 +00:00
Emil Velikov	218c7b5aca	egl/sl: use drmDevice API to enumerate available devices This provides for a more comprehensive iteration and slightly more straight-forward codebase. v2: - s/dpy/disp/ - keep original 64 devices (Eric) Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-02-28 12:02:38 +00:00
Emil Velikov	893421f315	egl/sl: split out swrast probe into separate function Make the code a bit easier to read. As a bonus point this makes it obvious that we forgot to call _eglAddDevice() for the device - do so. v2: - s/dpy/disp/ (Eric) - free(driver_name) on dri2_load_driver_swrast() failure (Eric) Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> (v1) Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-02-28 12:02:19 +00:00
Juan A. Suarez Romero	b43b55d461	nir/spirv: return after emitting a branch in block When emitting a branch in a block, it does not make sense to continue processing further instructions, as they will not be reachable. This fixes a nasty case with a loop with a branch that both then-part and else-part exits the loop: %1 = OpLabel OpLoopMerge %2 %3 None OpBranchConditional %false %2 %2 %3 = OpLabel OpBranch %1 %2 = OpLabel [...] We know that block %1 will branch always to block %2, which is the merge block for the loop. And thus a break is emitted. If we keep continuing processing further instructions, we will be processing the branch conditional and thus emitting the proper NIR conditional, which leads to instructions after the break. This fixes dEQP-VK.graphicsfuzz.continue-and-merge. CC: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-28 09:47:06 +01:00
Eric Engestrom	0c3287e94d	egl/android: replace magic 0=CbCr,1=CrCb with simple enum Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-02-28 07:44:46 +00:00
Caio Marcelo de Oliveira Filho	6a553bedcc	st/nir: count num_uniforms for FS bultin shader Usually the uniforms will be assigned locations and have their slots counted automatically, but for builtin shaders the location assignment is manual. So count them too otherwise we get num_uniforms == 0. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-27 22:18:24 -08:00
Ray Zhang	b344e32cdf	glx: fix shared memory leak in X11 call XShmDetach to allow X server to free shared memory Fixes: `bcd80be49a` "drisw/glx: use XShm if possible" Signed-off-by: Ray Zhang <zhanglei002@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-02-28 14:23:02 +10:00
Timothy Arceri	e907337fad	radeonsi/nir: move si_lower_nir() call into compiler thread This helps improve compile times. For example the shader-db dolphin shader shaders/dolphin/ubershaders/120.shader_test goes from ~1.69 -> ~1.57 seconds on my machine with this change. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-02-28 11:54:06 +11:00
Timothy Arceri	7536af670b	glsl: fix shader cache for packed param list Some types of params such as some builtins are always padded. We need to keep track of this so we can restore the list correctly. Here we also remove a couple of cache entries that are not actually required as they get rebuilt by the _mesa_add_parameter() calls. This patch fixes a bunch of arb_texture_multisample and arb_sample_shading piglit tests for the radeonsi NIR backend. Fixes: `edded12376` ("mesa: rework ParameterList to allow packing") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-02-28 11:47:37 +11:00
Yevhenii Kolesnikov	07f4b4e403	i965: Fix allow_higher_compat_version workaround limited by OpenGL 3.0 Added check for higher compat profile being allowed before assigning certain extensions. Fixes: `272fe94942` (mesa: enable ARB_texture_buffer_* extensions in the Compatibility profile) Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107052	2019-02-28 10:25:16 +11:00
Lionel Landwerlin	6e184147dd	intel/compiler: use correct swizzle for replacement The optimization in `4cd1a0be76` introduced a replacement of : cmp(8).z.f0.0 vgrf11.y:D, vgrf10.xxxx:D, vgrf2.xyyy:D ... cmp(8).nz.f0.0 null.x:D, vgrf11.yyyy:D, 0D By : cmp(8).z.f0.0 vgrf15.x:D, vgrf10.xxxx:D, vgrf2.yyyy:D ... mov(8) vgrf11.y:D, vgrf15.yyyy:D The first cmp instruction is storing in x while the second mov is sourcing from y. We need to take into account where the replacement on the scan_inst destination is going to store thing so that the replacement mov can source things from the correct location. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `4cd1a0be76` ("i965/vec4: Propagate conditional modifiers from more compares to other compares") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109759 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-02-27 20:06:42 +00:00
Jonathan Marek	61e3188633	freedreno: catch failing fd_blit and fallback to software blit Fixes cases where the fd_blit fails and never happens (ex: blit to etc1) Signed-off-by: Jonathan Marek <jonathan@marek.ca>	2019-02-27 18:46:28 +00:00
Jonathan Marek	e3591b0339	freedreno: use renderonly path for buffers allocated with modifiers Now that freedreno has create_with_modifiers(), this "hack" is needed to make some cases work. Copied from vc4. Fixes: `41ddf1d1` Signed-off-by: Jonathan Marek <jonathan@marek.ca>	2019-02-27 18:46:28 +00:00
Jonathan Marek	6c0fefb448	freedreno: a2xx: fix mipmapping for NPOT textures Fixes: `3a273a4a` Signed-off-by: Jonathan Marek <jonathan@marek.ca>	2019-02-27 18:46:28 +00:00
Jonathan Marek	4f23767590	freedreno: a2xx: fix fast clear for some gmem configurations In freedreno_gmem.c, gmem_align of 0x8000 is used. Alignment used here should be the same. Fixes: `912a9c8d` Signed-off-by: Jonathan Marek <jonathan@marek.ca>	2019-02-27 18:46:28 +00:00
Jonathan Marek	8eca6df5ed	freedreno: a2xx: add use_hw_binning function Fixes: `cb2322c7` Signed-off-by: Jonathan Marek <jonathan@marek.ca>	2019-02-27 18:46:28 +00:00
Jonathan Marek	357313ab0f	freedreno: a2xx: don't write 4th vertex in mem2gmem There is only room for 3 vertices now (RECT has 3 vertices). Fixes: `6ef7700a` Signed-off-by: Jonathan Marek <jonathan@marek.ca>	2019-02-27 18:46:28 +00:00
Erik Faye-Lund	71a76a47cc	swr/codegen: fix autotools build When the output directory was changed, the BUILT_SOURCES and build-rule target-path was no longer correct, leading to races to generate the sources and compiling them. Fix this by updating both sets of paths, so automake see what's going on here. Fixes: `773b3ceaca` ("swr/rast: Fix autotools and scons codegen") Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Alok Hota <alok.hota@intel.com>	2019-02-27 17:59:06 +00:00
Timo Aaltonen	738626daca	util/os_misc: Add check for PIPE_OS_HURD Fix build on Hurd. Signed-off-by: Timo Aaltonen <tjaalton@debian.org> Acked-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2019-02-27 14:56:48 +00:00
Lionel Landwerlin	2fff5966d6	vulkan/overlay: install layer binary in libdir This will allow multilib. v2: Drop path from json file, dlopen should be able to locate the lib in libdir v3: Switch from configure_file to install_data (Dylan) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109788 Tested-by: Mike Lothian <mike@fireburn.co.uk> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-27 11:45:42 +00:00
Eric Engestrom	7763e664ce	meson/swr: replace hard-coded path with current_build_dir() Fixes: `93cd9905c8` "swr/rast: Cleanup and generalize gen_archrast" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Alok Hota <alok.hota@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-02-27 11:13:05 +00:00
Gert Wollny	b7201a468d	nir: Add posibility to not lower to source mod 'abs' for ops with three sources This is useful for r600 since there the abs source modifier is not supported for ops with three sources v2: Use correct logic to enable lowering to abs source mod (Eric Anhold) Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-27 11:04:06 +00:00
Gurchetan Singh	ce112fcc87	virgl/vtest: deprecate protocol version 1 This is a partial revert of 9d81cd ("virgl: Pass resource size and transfer offsets"). The adjustments made in the client code means there's various mismatches when transfering data. Let's fallback to protocol version 0 and deprecate protocol version 1. We can still use the protocol version 1 slots for a shared memory transfer mechanism later. Fixes: dEQP-GLES31.functional.copy_image.mixed.viewclass_128_bits_mixed.*_renderbuffer Reviewed-By: Gert Wollny <gert.wollny@collabora.com>	2019-02-27 11:02:29 +00:00
Tapani Pälli	b9acfef337	util: fix a warning when building against clang7 headers Header xmmintrin.h conditionally includes emmintrin.h that defines _MM_DENORMALS_ZERO_MASK, add ifndef to fix this warning. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-02-27 08:57:41 +02:00
Tapani Pälli	d1af8115f8	iris: add libmesa_iris_gen8 library to the build Patch fixes iris build on Android. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-02-27 08:57:41 +02:00
Tapani Pälli	5e52184f72	android: make libbacktrace optional on USE_LIBBACKTRACE Otherwise with VNDK enabled we fail linking: src/gallium/targets/dri/Android.mk: error: gallium_dri (native:vendor) should not link to libbacktrace.vendor (native:vndk_private) Option makes it possible to use libbacktrace only when VNDK is not enabled. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-02-27 08:56:46 +02:00
Tapani Pälli	a3c366c4b2	android: add liblog to libmesa_intel_common build Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-02-27 08:53:09 +02:00
Alyssa Rosenzweig	b7a5b81d14	panfrost/midgard: Allow flt to run on most units Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-27 03:56:56 +00:00
Alyssa Rosenzweig	4c82abb9b6	panfrost: Expose perf counters in environment Previously, we were guarded by an #ifdef, which is generally a bad form. This patch instead guards them behind an environmental variable. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-27 03:56:38 +00:00
Alyssa Rosenzweig	60270c83b5	panfrost: Identify 4-bit channel texture formats Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-27 03:56:17 +00:00
Alyssa Rosenzweig	90fd82c540	panfrost: Add RGB565, RGB5A1 texture formats Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-27 03:55:19 +00:00
Jose Maria Casanova Crespo	4122665dd9	iris: Enable ARB_shader_draw_parameters support Additional VERTEX_ELEMENT_STATE are used to store basevertex and baseinstance and drawid updating the DWordLength of the 3DSTATE_VERTEX_ELEMENTS command. This passes all piglit tests for spec.draw_parameters. tests and VK-GL-CTS KHR-GL45.shader_draw_parameters_tests.* tests. Now we only mark a dirty_update when parameters are changed or when we have an indirect draw. We enable PIPE_CAP_DRAW_PARAMETERS on Iris. There is no edge flag support in the Vertex Elements setup. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-26 13:28:38 -08:00
Pierre Moreau	1c9fdcefd4	clover: Fix indentation issues Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-02-26 21:02:07 +01:00
Pierre Moreau	5285fff5f9	clover: Only use devices supporting IR_NATIVE Currently clover will advertise any device that advertises PIPE_CAP_COMPUTE, even if they do not support PIPE_SHADER_IR_NATIVE, which is the IR used internally by clover. This avoids clover advertising devices as available even though they actually are not supported. Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-02-26 21:02:07 +01:00
Pierre Moreau	8f9b4a2be6	clover: Move platform extensions definitions to clover/platform.cpp Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Aaron Watry <awatry@gmail.com>	2019-02-26 21:02:07 +01:00
Pierre Moreau	b033620abf	clover: Move device extensions definitions to core/device.cpp Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Aaron Watry <awatry@gmail.com>	2019-02-26 21:02:07 +01:00
Pierre Moreau	d42f5896c5	clover: Validate program and library linking options Program linking options are only valid if the library was created with the `-enable-link-options` option, which itself is only valid when creating a library, and only when creating an executable. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-02-26 21:02:07 +01:00
Pierre Moreau	fccc6ecb52	clover: Disallow creating libraries from other libraries If creating a library, do not allow non-compiled object in it, as executables are not allowed, and libraries would make it really hard to enforce the "-enable-link-options" flag. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Aaron Watry <awatry@gmail.com>	2019-02-26 21:02:07 +01:00
Pierre Moreau	bad161c894	clover/api: Fail if trying to build a non-executable binary From the OpenCL 1.2 Specification, Section 5.6.2 (about clBuildProgram): > If program is created with clCreateProgramWithBinary, then the > program binary must be an executable binary (not a compiled binary or > library). Reviewed-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-02-26 21:02:07 +01:00
Pierre Moreau	25d4e65eb7	clover/api: Rework the validation of devices for building Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-02-26 21:02:07 +01:00
Pierre Moreau	505ec3a530	clover: Add an helper for checking if an IR is supported Reviewed-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-02-26 21:02:07 +01:00
Pierre Moreau	67769c913f	clover: Remove the TGSI backend as unused Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-02-26 21:02:07 +01:00
Pierre Moreau	669d00ba4c	clover: Avoid warnings from new OpenCL headers * Avoid warnings from references to deprecated CL 1.0, 1.2, 2.0 and 2.1 APIs. * Avoid warnings from not defining CL_TARGET_OPENCL_VERSION. Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-02-26 21:02:07 +01:00
Karol Herbst	ba8d21a8d3	clover: update ICD table to support everything up to 2.2 Signed-off-by: Karol Herbst <kherbst@redhat.com> Acked-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>	2019-02-26 21:02:07 +01:00
Pierre Moreau	dddc5649bf	include/CL: Update to the latest OpenCL 2.2 headers Acked-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-02-26 21:02:07 +01:00
Marek Olšák	2ae07830e7	gallium/u_tests: use a compute-only context to test GCN compute ring	2019-02-26 14:58:55 -05:00
Marek Olšák	a1378639ab	radeonsi: always use compute rings for clover on CI and newer (v2) initialize all non-compute context functions to NULL. v2: fix SI	2019-02-26 14:58:55 -05:00
Bas Nieuwenhuizen	c0110477b5	radv: Interpolate less aggressively. Seems like dxvk used integer builtins without setting the flat interpolation decoration. I believe in the current spec the app is required to set these, but in the meantime to avoid breaking things in stable releases (and so close to release for 19.0), only expand the interpolation to float16 and struct (which cannot be builtins as our spirv parser lowers the builtin block). Fixes: `f324784104` "radv: Allow interpolation on non-float types." Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-26 18:51:35 +00:00
Drew Davenport	1fd79b4b6d	util: Don't block SIGSYS for new threads SIGSYS is needed for programs using seccomp for sandboxing. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-02-26 19:39:14 +01:00
Rob Clark	64206102fc	freedreno/ir3: gsampler2DMSArray fixes Array index should come before sample-id. And exclude all isam variants (which take integer texel coords) from adding of offset. Fixes dEQP-GLES31.functional.texture.multisample.samples_1.use_texture_*_2d_array Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-26 13:19:44 -05:00
Rob Clark	a06bb486b0	freedreno/ir3/a6xx: fix atomic shader outputs We also need to put in the output mov. Possibly we could just fixup the output register to read it directly from the dummy, but that is more work and I guess dEQP is probably the only time you encounter this. Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.atomic_counter.const_literal_fragment Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-26 13:19:44 -05:00
Rob Clark	db1fa21374	freedreno/a6xx: vertex_id is not _zero_based Fixes dEQP-GLES31.functional.draw_base_vertex.draw_elements_base_vertex.builtin_variable.vertex_id Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-26 13:19:44 -05:00
Rob Clark	79180a0566	freedreno/a6xx: fix DRAW_IDX_INDIRECT max_indicies The indirect offset does not effect the index buffer size. Fixes all of dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_100x100_drawcount_* with drawcount > 1. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-26 13:19:44 -05:00
Rob Clark	cabe55a2e7	freedreno/ir3/a6xx: fix non-ssa atomic dst We weren't propagating the array info for cases where result of atomic is array/reg. This can happen, for example, if result is part of a phi web lowered to regs. Fixes dEQP-GLES31.functional.ssbo.atomic.compswap.* Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-26 13:19:44 -05:00
Rob Clark	edd5b3126d	freedreno/a6xx: fix ssbo alignment Fixes a bunch of deqp ssbo tests that use multiple ssbo blocks packed into a single buffer. Note the a5xx value seems suspicious, but this is what blob seems to advertise. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-26 13:19:44 -05:00
Rob Clark	cb884d8ab2	freedreno/ir3: use nopN encoding when possible Use the (nopN) encoding for slightly denser shaders.. this lets us fold nop instructions into the previous alu instruction in certain cases. Shouldn't change the # of cycles a shader takes to execute, but reduces the size. (ex: glmark2 refract goes from 168 to 116 instructions) Currently only enabled for a6xx, but I think we could enable this for a5xx and possibly a4xx. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-26 13:19:44 -05:00
Rob Clark	04c2520d91	freedreno/a6xx: fix hangs with large shaders We were overflowing instrlen (which is # of groups of 16 instructions) in a couple dEQP tests, causing gpu hangs: dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20 Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-26 13:19:44 -05:00
Brian Paul	6dabcb5bcf	mesa: fix display list corner case assertion This fixes a failed assertion in glDeleteLists() for the following case: list = glGenLists(1); glDeleteLists(list, 1); when those are the first display list commands issued by the application. When we generate display lists, we plug in empty lists created with the make_list() helper. This function uses the OPCODE_END_OF_LIST opcode but does not call dlist_alloc() which would set the InstSize[OPCODE_END_OF_LIST] element to non-zero. When the empty list was deleted, we failed the InstSize[opcode] > 0 assertion. Typically, display lists are created with glNewList/glEndList so we set InstSize[OPCODE_END_OF_LIST] = 1 in dlist_alloc(). That's why this bug wasn't found before. To fix this failure, simply initialize the InstSize[OPCODE_END_OF_LIST] element in make_list(). The game oolite was hitting this. Fixes: https://github.com/OoliteProject/oolite/issues/325 Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-02-26 09:56:45 -07:00
Brian Paul	cb52d4482d	svga: fix dma.pending > 0 test The dma.pending field is boolean, so testing for > 0 isn't right. Reviewed-by: Neha Bhende <bhenden@vmware.com>	2019-02-26 09:56:45 -07:00
Brian Paul	96ea977c79	svga: assorted whitespace and formatting fixes Remove trailing whitespace, etc. Trivial.	2019-02-26 09:56:45 -07:00
Brian Paul	a81eebf9bc	st/mesa: whitespace/formatting fixes in st_cb_texture.c Remove trailing whitespace, replace tabs w/ spaces, etc. Trivial.	2019-02-26 09:56:45 -07:00
Eleni Maria Stea	fd37a19ac4	i965: fixed clamping in set_scissor_bits when the y is flipped Calculating the scissor rectangle fields with the y flipped (0 on top) can generate negative values that will cause assertion failure later on as the scissor fields are all unsigned. We must clamp the bbox values again to make sure they don't exceed the fb_height. Also fixed a calculation error. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108999 https://bugs.freedesktop.org/show_bug.cgi?id=109594 v2: - I initially clamped the values inside the if (Y is flipped) case and I made a mistake in the calculation: the clamp of the bbox[2] should be a check if (bbox[2] >= fbheight) bbox[2] = fbheight - 1 instead and I shouldn't have changed the ScissorRectangleYMax calculation. As the fixed code is equivalent with using CLAMP instead of MAX2 at the top of the function when bbox[2] and bbox[3] are calculated, and the 2nd is more clear, I replaced it. (Nanley Chery) v3: - Reversed the CLAMP change in bbox[3] as the API guarantees that the viewport height is positive. (Nanley Chery) v4: - Added nomination for the mesa-stable branch and the link to the second bugzilla bug (Nanley Chery) CC: <mesa-stable@lists.freedesktop.org> Tested-by: Paul Chelombitko <qamonstergl@gmail.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2019-02-26 08:23:26 -08:00
Eduardo Lima Mitev	0bf667984b	freedreno/a6xx: Silence compiler warnings util_format_compose_swizzles() expects 'const unsigned char' and we are feeding it 'char'. Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-02-26 14:15:33 +01:00
Kasireddy, Vivek	7cab8d3661	i965: Add support for sampling from XYUV images Add support to the i965 DRI driver to sample from XYUV8888 buffers. Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-26 13:08:52 +00:00
Kasireddy, Vivek	65600d0946	dri: Add XYUV8888 format In addition to adding this format to the dri_interface header, add an entry in the android and wayland backends as well. Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-26 13:08:52 +00:00
Vivek Kasireddy	ff14d06be5	drm-uapi: Update headers from drm-next Pull new updates from drm-next as of the following commit: commit a5f2fafece141ef3509e686cea576366d55cabb6 Merge: 71f4e45a4ed3 860433ed2a55 Author: Dave Airlie <airlied@redhat.com> Date: Wed Feb 20 12:16:30 2019 +1000 Merge https://gitlab.freedesktop.org/drm/msm into drm-next Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-26 13:08:51 +00:00
Kasireddy, Vivek	78fb3fd17e	nir/lower_tex: Add support for XYUV lowering The memory layout associated with this format would be: Byte: 0 1 2 3 Component: V U Y X Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-26 13:08:51 +00:00
Lionel Landwerlin	913d711e0f	imgui: update memory editor Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Tapani Pälli <tapani.palli@intel.com>	2019-02-26 12:49:07 +00:00
Lionel Landwerlin	ab9ae080ec	imgui: update commit In commit `3950e7c11e` ("imgui: bump copy") I forgot to update the README about what copy of imgui we carry. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Tapani Pälli <tapani.palli@intel.com>	2019-02-26 12:49:04 +00:00
Eric Engestrom	a213b927f2	driinfo: add DTD to allow the xml to be validated This DTD can be used to validate the output and make sure any parsers out there can handle it: $ xmllint --noout --valid driinfo.xml Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-02-26 12:48:28 +00:00
Lionel Landwerlin	9646750822	vulkan/overlay: fix includes The Loader/Validation-Layers repository allow the user to choose where header files are installed. On my system I choose /usr/include thinking it was the obvious "base" location, but it turns out the headers end up being installed right there rather in a vulkan subdirectory. On Debian/Ubuntu the selected installation path is /usr/include/vulkan, so just go with that. Hopefully other distro don't choose another path. Note that the validation layer doesn't provide a .pc file so we have no way of querying where the headers are installed. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109739 Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-26 12:29:54 +00:00
Lionel Landwerlin	47ef52d333	vulkan/overlay: fix missing installation of layer Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109739 Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-26 12:29:46 +00:00
Eric Engestrom	318e550549	dri_interface: add missing #include Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-02-26 12:03:20 +00:00
Eric Engestrom	7f5d9c2757	gitlab-ci: always run the containers build If the first time a fork was created, the job creating the containers was manually cancelled, this would have left the fork unable to use the CI (until the next automatic regeneration of the container). Avoid this by always running the container-generation job, even though 99% of the time it will spin up, see that the container exists and shut down. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2019-02-26 12:02:14 +00:00
Emil Velikov	40a82e6463	docs: mention "Allow commits from members who can merge..." Mention the tick-box otherwise only the MR author can rebase the series. Cc: Jordan Justen <jordan.l.justen@intel.com> Cc: Dylan Baker <dylan@pnwbakers.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reivewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-02-26 11:27:10 +00:00
Emil Velikov	d9d1cb43d7	egl/android: bump the number of drmDevices to 64 It's the current maximum supported by the kernel. Stay consistent with the rest of Mesa and use the same number. Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2019-02-26 11:07:23 +00:00
Emil Velikov	02344fe80b	loader: use loader_open_device() to handle O_CLOEXEC Some platforms lack O_CLOEXEC. The loader_open_device() handles those appropriately, so use the helper. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-26 11:07:23 +00:00
Emil Velikov	f0a7b463b5	meson: egl: correctly manage loader/xmlconfig Earlier commit introduced support for haiku yet did not properly annotate the loader/xmlconfig dependencies. Thus we ended up adding inc_loader for each !haiku platform - see `659910eda0` `9a96bf0ecd` `c731508b98` `ec6cb01e21`. One piece remained though - the wayland platform. Hence the following would fail: meson -Dgallium-drivers=etnaviv -Ddri-drivers=''\ -Dtools=etnaviv -Dplatforms=wayland -Dglx=disabled \ build/ Cc: Alexander von Gluck IV <kallisti5@unixzen.com> Reported-by: Boris Brezillon <boris.brezillon@collabora.com> Fixes: `834d221512` ("meson: Add Haiku platform support v4") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Tested-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-02-26 11:07:23 +00:00
Emil Velikov	9d84a922b8	egl/dri: de-duplicate dri2_load_driver* The difference between the three functions is the list of mandatory driver extensions. Pass that as an argument to the common helper. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Frank Binns <frank.binns@imgtec.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-26 11:07:23 +00:00
Samuel Pitoiset	4924dfc851	radv: don't copy buffer descriptors list for samplers Sampler descriptors don't have a buffer list. This fixes some crashes with new CTS dEQP-VK.binding_model.descriptor_copy..sampler_. Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-26 11:22:28 +01:00
Samuel Pitoiset	9256e0a09d	radv: fix out-of-bounds access when copying descriptors BO list We shouldn't increment the buffer list pointers twice. This fixes some crashes with new CTS dEQP-VK.binding_model.descriptor_copy.*. Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-26 11:22:22 +01:00
Tapani Pälli	1d5e5ec30a	nir: use nir_variable_create instead of open-coding the logic Fixes: `3d7611e9` "st/nir: use NIR for asm programs" Reported-by: Matthias Lorenz <oschowa@web.de> Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-26 09:00:36 +02:00
Tapani Pälli	22267feff1	nir: initialize value in copy_prop_vars_block Fixes following valgrind warning: ==27561== Conditional jump or move depends on uninitialised value(s) ==27561== at 0x667856B: value_set_ssa_components (nir_opt_copy_prop_vars.c:78) ==27561== by 0x667A1C4: copy_prop_vars_block (nir_opt_copy_prop_vars.c:797) Fixes: `62332d139c` "nir: Add a local variable-based copy propagation pass" Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-26 08:56:25 +02:00
Eric Anholt	97566efe5c	v3d: Rematerialize MOVs of uniforms instead of spilling them. If we have a MOV of a uniform value available to spill, that's one of our best choices. We can just not spill the value, and emit a new load of the uniform as the fill. This saves bothering the TMU and the thrsw, and is the same cost in uniforms (since the spill offset is a uniform anyway). This doesn't have a huge impact on shader-db, since there aren't a whole lot of spills and we usually copy-prop the uniforms at the VIR level such that the only uniform MOVs are from vir_lower_uniforms: total instructions in shared programs: 6430292 -> 6430279 (<.01%) total uniforms in shared programs: 2386023 -> 2385787 (<.01%) total spills in shared programs: 4961 -> 4960 (-0.02%) total fills in shared programs: 6352 -> 6350 (-0.03%) However, I'm interested in dropping the uniforms copy-prop in the backend, since it would be cheaper to not load repeated uniforms if we have the registers to spare. This also saves many spills on dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20, which is what motivated a bunch of my recent backend work in the first place: before: 46 spills, 106 fills, 3062 instructions after: 0 spills, 0 fills, 2611 instructions	2019-02-25 21:33:47 -08:00
Eric Anholt	e0fada983d	v3d: Dump the VIR after register spilling if we were forced to. Spilling is unusual, but one often has to debug it when it happens, so dump it.	2019-02-25 21:26:24 -08:00
Eric Anholt	2786d2161a	v3d: Fix vir_is_raw_mov() for input unpacks. There are no users at the moment, but I wanted to start using this in register spilling.	2019-02-25 21:26:24 -08:00
Mathias Fröhlich	1ab2159249	st/mesa: Reduce array updates due to current changes. Since using bitmasks we can easily check if we have any current value that is potentially uploaded on array setup. So check for any potential vertex program input that is not already a vao enabled array. Only flag array update if there is a potential overlap. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2019-02-26 05:42:04 +01:00
Dylan Baker	6f42303646	meson/iris: Use current coding style Just a few minor style things. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-02-25 23:37:27 +00:00
Timothy Arceri	603206d0a6	radeonsi: fix query buffer allocation Fix the logic for buffer full check on alloc. This patch just takes the fix Nicolai attached to the bug report and updates it to work on master. Fixes: `e0f0d3675d` ("radeonsi: factor si_query_buffer logic out of si_query_hw") Reviewed-by: Marek Olšák <marek.olsak@amd.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109561	2019-02-26 09:55:41 +11:00
Eric Anholt	7c1bf075f3	nir: Just return when asked to rewrite uses of an SSA def to itself. The nir_builder swizzling improvement to not emit extra MOVs resulted in nir_lower_tex() trying to rewrite an SSA def to itself, triggering the assert on all texturing in v3d. There's no work to be done in this case, so just stop asserting. Fixes: `743700be1f` ("nir/builder: Don't emit no-op swizzles") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-25 21:25:24 +00:00
Samuel Pitoiset	5671f38085	radv: fix clearing attachments in secondary command buffers If no framebuffer is bound, get the number of samples and the image format from the render pass. This fixes new CTS dEQP-VK.geometry.layered.*.secondary_cmd_buffer. Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-25 21:42:50 +01:00
Alok Hota	773b3ceaca	swr/rast: Fix autotools and scons codegen Use new input flags for gen_archrast.py Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-02-25 13:05:39 -06:00
Alok Hota	16e10b8c30	swr/rast: Add general SWTag statistics Update Archrast parser to use stats, used with an internal tool Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-02-25 13:05:36 -06:00
Alok Hota	b45a15a39f	swr/rast: Add string handling to AR event framework For use by an internal tool Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-02-25 13:05:31 -06:00
Alok Hota	8608a747aa	swr/rast: Add initial SWTag proto definitions Update gen_archrast.py to properly generate event IDs Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-02-25 13:05:17 -06:00
Alok Hota	93cd9905c8	swr/rast: Cleanup and generalize gen_archrast Update meson.build to accomodate Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-02-25 13:05:07 -06:00
Daniel Schürmann	0bd45f96b9	nir: Use SM5 properties to optimize shift(a@32, iand(31, b)) This is a common pattern from HLSL->SPIRV translation and supported in HW by all current NIR backends. vkpipeline-db results anv (SKL): total instructions in shared programs: `6403130` -> 6402380 (-0.01%) instructions in affected programs: 204084 -> 203334 (-0.37%) helped: 208 HURT: 0 total cycles in shared programs: 1915629582 -> 1918198408 (0.13%) cycles in affected programs: 1158892682 -> 1161461508 (0.22%) helped: 107 HURT: 86 shader-db results on i965 (KBL): total instructions in shared programs: 15284592 -> 15284568 (<.01%) instructions in affected programs: 81683 -> 81659 (-0.03%) helped: 24 HURT: 0 total cycles in shared programs: 375013622 -> 375013932 (<.01%) cycles in affected programs: 40169618 -> 40169928 (<.01%) helped: 13 HURT: 9 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-25 12:59:44 -06:00
Daniel Schürmann	0525bdc225	nir: Define shifts according to SM5 specification. SPIR-V shifts are undefined for values >= bitsize, but SM5 shifts are defined to only use the least significant bits. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-25 12:59:43 -06:00
Jason Ekstrand	c4fb6b0c81	intel/eu: Add an EOT parameter to send_indirect_[split]_message For split indirect sends we have to put the EOT parameter in the extended descriptor as well as the instruction itself so just calling brw_inst_set_eot is insufficient. Moving the EOT handling handling into the send_indirect_[split]_message helper lets us handle it properly.	2019-02-25 11:35:12 -06:00
Sergii Romantsov	dcc4866419	d3d: meson: do not prefix user provided d3d-drivers-path The user can select the location where there d3d drivers are installed by the d3d-drivers-path meson option. By default path will be $prefix/$libdir/d3d. Currently we add $prefix to the user provided path. Resulting in an incorrect or even missing path. Based on logic of Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109698 CC: Kenneth Graunke <kenneth@whitecape.org> CC: Emil Velikov <emil.l.velikov@gmail.com> Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-02-25 16:07:02 +00:00
Sergii Romantsov	f6556ec7d1	dri: meson: do not prefix user provided dri-drivers-path The user can select the location where there dri drivers are installed by the dri-drivers-path meson option. By default path will be $prefix/$libdir/dri. Currently we add $prefix to the user provided path. Resulting in an incorrect or even missing path. v2: fixed dri_search_path by default, rebased to master v3: new commit-message (Emil Velikov), cc mesa-stable Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109698 CC: Rafael Antognolli <rafael.antognolli@intel.com> CC: Dylan Baker <dylan@pnwbakers.com> Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org> Fixes: `306914db92` (meson: Add dridriverdir variable to dri.pc.) Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-02-25 16:07:02 +00:00
Lionel Landwerlin	30828f4646	intel/aub_viewer: silence more compiler warnings format not a string literal and no format arguments. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-25 13:11:16 +00:00
Lionel Landwerlin	91df8b1780	intel/aub_viewer: silence compiler warning buffer_addr may be used uninitialized. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-25 13:11:13 +00:00
Lionel Landwerlin	f1da10e0c5	intel/aub_viewer: printout 48bits addresses Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-25 13:11:05 +00:00
Gert Wollny	875942c059	mesa/core: Enable EXT_depth_clamp for GLES >= 2.0 The extension NV_depth_clamp is written against OpenGL 1.2.1, and since GLES 2.0 is based on GL 2.0 there is no reason not to enable this extension also for GLES >= 2.0. v2: Use EXT_depth_clamp that has been proposed to Khronos v3: - Fix check for extension availability (Erik Faya-Lund) - Also fix the test in is_enabled v4: - Test both, ARB and EXT extension (Erik) v5: - Fix white space errors (Erik) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-02-25 09:44:27 +00:00
Kenneth Graunke	b45186a6cd	iris: Properly allow rendering to RGBX formats. I was converting them at pipe_surface creation time, but not when answering queries about whether formats support rendering. This caused a lot of FBO incomplete errors for formats that ought to be supported. Fixes "Child of Light", which uses PIPE_FORMAT_R8G8B8X8_UNORM_SRGB. Also fixes Witcher 1 using wined3d (GL) according to Timur Kristóf. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109738	2019-02-25 01:11:27 -08:00
Kenneth Graunke	fce089c8a2	iris: Drop RGBX -> RGBA for storage image usages GLSL doesn't expose RGB/RGBX image formats, so this isn't needed.	2019-02-25 00:57:50 -08:00
Kenneth Graunke	6921588d54	mesa: Fix RGBBuffers for renderbuffers with sized internal formats For texture attachments, 'f' is texImg->_BaseFormat, but for renderbuffer attachments, 'f' is att->Renderbuffer->InternalFormat. InternalFormat may be something like GL_RGB8, which causes our (f == GL_RGB) check to fail. Switch to using a proper _BaseFormat, which drops the size. Fixes dEQP-GLES31.functional.draw_buffers_indexed.random. max_required_draw_buffers.15 on iris when combined with a driver fix. Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>	2019-02-25 00:57:42 -08:00
Oscar Blumberg	da9c030763	glsl: Fix function return typechecking apply_implicit_conversion only converts and check base types but we need actual type equality for function returns, otherwise you can return a vec2 from a function declared as returning a float. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-02-25 08:49:06 +02:00
Jordan Justen	bd0ad651e0	iris: Always use in-tree i915_drm.h Ref: `f1374805a8` "drm-uapi: use local files, not system libdrm" Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-24 21:06:40 -08:00
Alyssa Rosenzweig	f943047e48	panfrost: Decode render target swizzle/channels On MRT-capable systems, the framebuffer format is encoded as a 64-bit word in the render target descriptor. Previously, the two 32-bit words were exposed as opaque hex values. This commit identifies a 12-bit Mali swizzle and a 2-bit channel counter, removing some of the magic. It also adds decoding support for the AFBC and MSAA enable bits, which were already known but otherwise ignored in pandecode. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-25 04:49:50 +00:00
Alyssa Rosenzweig	c6be9969d2	panfrost/midgard: Add fround(_even), ftrunc, ffma These ops were discovered by invoking the correspondingly names GLSL functions. The rounding ops here behave exact as expected and are mapped to their corresponding NIR ops where applicable. The ffma behaves as a LUT instruction and requires some special argument packing (since Midgard normally only allows for 2 arguments); this quirk will be addressed in the future, but for now FMA is still lowered. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-25 02:36:26 +00:00
Alyssa Rosenzweig	4a4726af3c	panfrost/nondrm: Split out dump_counters Previously, this function was implied a part of the job submit. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-25 02:34:16 +00:00
Alyssa Rosenzweig	cdca103d43	panfrost/nondrm: Make COHERENT_LOCAL explicit This flag corresponds to what was MEM_COHERENT_LOCAL in the vendor driver, which seems to influence the cache policy, necessary for the varying temporary storage but nothing else. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-25 02:32:45 +00:00
Alyssa Rosenzweig	f44d4653a9	panfrost/nondrm: Flag CPU-invisible regions Potentially, the kernel could optimize these allocations, or perhaps we can save on mapping costs. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-25 02:31:09 +00:00
Alyssa Rosenzweig	10cc251842	panfrost/meson: Remove subdir for nondrm This change fixes cross builds with the (temporary) non-DRM overlay. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-25 02:27:26 +00:00
Alyssa Rosenzweig	77fea552f6	panfrost: Use tiler fast path (performance boost) For reasons that are still unclear (speculation included in the comment added in this patch), the tiler? metadata has a fast path that we were not enabling; there looks to be a possible time/memory tradeoff, but the details remain unclear. Regardless, this patch improves performance dramatically. Particular wins are for geometry-heavy scenes. For instance, glmark2-es2's Phong-shaded bunny, rendering at fullscreen (2400x1600) via GBM, jumped from ~20fps to hitting vsync cap at 60fps. Gains are even more obvious when vsync is disabled, as in glmark2-es2-wayland. With this patch, on GLES 2.0 samples not involving FBOs, it appears performance is converging with (and sometimes surpassing) the blob. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-25 02:25:50 +00:00
Jason Ekstrand	743700be1f	nir/builder: Don't emit no-op swizzles The nir_swizzle helper is used some on it's own but it's also called by nir_channel and nir_channels which are used everywhere. It's pretty quick to check while we're walking the swizzle anyway whether or not it's an identity swizzle. If it is, we now don't bother emitting the instruction. Sure, copy-prop will clean it up for us but there's no sense making more work for the optimizer than we have to. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-02-24 20:01:27 -06:00
Jason Ekstrand	724371c6b9	nir/split_vars: Don't compact vectors unnecessarily Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-02-24 20:01:18 -06:00
Erik Faye-Lund	7a6a5d4bfa	st/mesa: remove unused header-file This header has been unused since `f8f2520e88` ("st/mesa: Remove unnecessary headers"). And in the more than 8 years since, this hasn't been useful. So let's just get rid of it. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-24 20:53:37 +01:00
Maya Rashish	021c496135	configure: fix test portability From the bash manual: string1 == string2 string1 = string2 True if the strings are equal. = should be used with the test command for POSIX conformance.	2019-02-24 19:26:15 +00:00
David Shao	6fa923a65d	meson: ensure that xmlpool_options.h is generated for gallium targets that need it Fixes: `68076b8747` "meson: build gallium vdpau state tracker" Fixes: `22a817af8a` "meson: build gallium xvmc state tracker" Fixes: `5a785d51a6` "meson: build gallium va state tracker" Fixes: `0ba909f0f1` "meson: build gallium xa state tracker" Fixes: `1d36dc674d` "meson: build gallium omx state tracker" Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-24 09:00:39 +00:00
Matthias Lorenz	f91654120b	vulkan/overlay: Add fps counter Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109747	2019-02-24 01:07:26 +00:00
Lionel Landwerlin	239b0d8570	Revert "anv: add support for INTEL_DEBUG=bat" This reverts commit `e4d88396d2`. Apologies, I pushed the wrong commit.	2019-02-24 01:06:39 +00:00
Lionel Landwerlin	e4d88396d2	anv: add support for INTEL_DEBUG=bat As requested by Ken ;) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-02-23 23:29:04 +00:00
Christian Gmeiner	c56e734496	etnaviv: blt: mark used src resource as read from Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-02-23 16:00:50 +01:00
Christian Gmeiner	7244e76804	etnaviv: rs: mark used src resource as read from Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>	2019-02-23 16:00:25 +01:00
Vinson Lee	2bd08b8b9d	gallium/auxiliary/vl: Fix duplicate symbol build errors. CXXLD gallium_dri.la duplicate symbol _compute_shader_video_buffer in: ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor.o) ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor_cs.o) duplicate symbol _compute_shader_weave in: ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor.o) ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor_cs.o) duplicate symbol _compute_shader_rgba in: ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor.o) ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor_cs.o) Fixes: `9364d66cb7` ("gallium/auxiliary/vl: Add video compositor compute shader render") Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: James Zhu <James.Zhu@amd.com>	2019-02-22 23:07:26 -08:00
Caio Marcelo de Oliveira Filho	4c160b6bd8	nir: fix MSVC build Zero initialize struct with {0} instead of {}.	2019-02-22 22:38:05 -08:00
Caio Marcelo de Oliveira Filho	eb13211997	nir/copy_prop_vars: add tests for load/store elements of vectors Test using array deref on vectors in loads and stores. These are marked DISABLED_ as this optimization is currently not done. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho	4f3809d389	nir: nir_build_deref_follower accept array derefs of vectors Code itself already supports it, just make sure we can use it for those cases. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho	c4beadd28e	nir/copy_prop_vars: change test helper to get intrinsics Replace find_next_intrinsic(intrinsic, after) with get_intrinsic(intrinsic, index). This makes slightly more convenient to check the resulting loads/stores/copies, since in most tests we know which one we care about. The cost is to perform more traversals, but for such tests this is not a problem. Added the ASSERT_EQ() on count to some tests missing it, so the indices queried are always expected to find something. Also, drop two nir_print_shader leftover calls in a test. v2: Remove redundant assertions. nir_src_comp_as_uint already assert what we need. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho	fdcb9779d9	nir/copy_prop_vars: keep track of components in copy_entry When a copy_entry is SSA, store not only the nir_ssa_def* for each component, but also the source component they come from. At the moment this is always a match (i.e. 'component[i] == i'), because all the operations for a copy_entry happen using definitions with the same size. This prepares the code for array_derefs of vectors, in which 'component[i] != i'. Also, extract setting all SSA components into a function of its own. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho	6624decbb5	nir/copy_prop_vars: add debug helpers Disabled by default, to be used during development. Adding those so I don't rewrite some ad-hoc version of them everytime I'm working with this pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 21:00:50 -08:00
Caio Marcelo de Oliveira Filho	60d9bb9ff5	nir/copy_prop_vars: don't get confused by array_deref of vectors For now these derefs are not handled, so don't let these get into the copies list -- which would cause wrong propagations. For load_derefs, do nothing. For store_derefs, invalidate whatever the store is writing to. For copy_derefs, invalidate whatever the copy is writing to. These cases will happen once derefs to SSBOs/UBOs are kept around long enough to get optimized by copy_prop_vars. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 21:00:50 -08:00
Timothy Arceri	f48527e51a	nir: allow nir_lower_phis_to_scalar() on more src types Rather than only lowering if all srcs are scalarizable we instead check that at least one src is scalarizable. We change undef type to return false otherwise it will cause regressions when it is the only scalarizable src. total instructions in shared programs: 13219105 -> 13024547 (-1.47%) instructions in affected programs: 1153797 -> 959239 (-16.86%) helped: 581 HURT: 74 total cycles in shared programs: 333968972 -> 324807922 (-2.74%) cycles in affected programs: 129809402 -> 120648352 (-7.06%) helped: 571 HURT: 131 total spills in shared programs: 57947 -> 29130 (-49.73%) spills in affected programs: 53364 -> 24547 (-54.00%) helped: 351 HURT: 0 total fills in shared programs: 51310 -> 25468 (-50.36%) fills in affected programs: 44882 -> 19040 (-57.58%) helped: 351 HURT: 0 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-23 11:11:51 +11:00
Alok Hota	6053499f2e	swr/rast: bypass size limit for non-sampled textures This fixes a bug where SWR will fail to render in cases with large buffer allocations, e.g. very large meshes whose vertex buffers exceed 2GB CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-02-22 23:35:11 +00:00
Marek Olšák	b326a15eda	tgsi: don't set tgsi_info::uses_bindless_images for constbufs and hw atomics This might have decreased performance for radeonsi/tgsi, because most most shaders claimed they used bindless. Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-02-22 18:00:54 -05:00
Jordan Justen	cf652205cf	iris: Add gitlab-ci build testing Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-22 14:08:21 -08:00
Rob Clark	fd360c82f0	freedreno/a6xx: cube image fix Note that emit_intrinsic_load_image() already swaps a .3d flag with an .a flag. I tried doing things the other way around (going back to .3d) but that didn't work. And treating cube images as 2d array is also what blob does, so let's just go with that. Fixes dEQP-GLES31.functional.image_load_store.cube.load_store.* Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-22 14:05:32 -05:00
Rob Clark	f90c3b4485	freedreno/a6xx: fix border-color offset Fixes nearly all of dEQP-GLES31.functional.texture.border_clamp.* when run after a test that binds textures used in vertex shader. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-22 14:05:32 -05:00
Rob Clark	bdedb8277a	freedreno/ir3: don't hardcode wrmask Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.const_literal.vertex.samplercubeshadow and few other similar tests that do multiple texture fetches into individual components of a packet output. Mostly works around the issue mentioned in ra_block_find_definers(). Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-22 14:05:32 -05:00
Rob Clark	5d4fa194b8	freedreno: fix race condition rsc->write_batch can be cleared behind our back, so we need to acquire the lock before deref'ing. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-22 14:05:32 -05:00
Kenneth Graunke	3090c6b9e9	vulkan: Fix 32-bit build for the new overlay layer vulkan_core.h defines non-dispatchable handles as (struct object ) on 64-bit systems, but uint64_t on 32-bit systems. The former can be implicitly cast to void , but the latter requires an explicit cast. While here, %lu is the wrong format specifier for uint64_t on 32-bit systems, so use PRIu64, fixing a warning. Reported-by: Mike Lothian <mike@fireburn.co.uk> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-02-22 08:56:54 -08:00
Juan A. Suarez Romero	4f917e6a61	anv: advertise 8 subpixel precision bits On one side, when emitting 3DSTATE_SF, VertexSubPixelPrecisionSelect is used to select between 8 bit subpixel precision (value 0) or 4 bit subpixel precision (value 1). As this value is not set, means it is taking the value 0, so 8 bit are used. On the other side, in the Vulkan CTS tests, if the reference rasterizer, which uses 8 bit precision, as it is used to check what should be the expected value for the tests, is changed to use 4 bit as ANV was advertising so far, some of the tests will fail. So it seems ANV is actually using 8 bits. v2: explicitly set 3DSTATE_SF::VertexSubPixelPrecisionSelect (Jason) v3: use _8Bit definition as value (Jason) v4: (by Jason) anv: Explicitly set 3DSTATE_CLIP::VertexSubPixelPrecisionSelect This field was added on gen8 even though there's an identically defined one in 3DSTATE_SF. CC: Jason Ekstrand <jason@jlekstrand.net> CC: Kenneth Graunke <kenneth@whitecape.org> CC: 18.3 19.0 <mesa-stable@lists.freedesktop.org> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 17:53:55 +01:00
Juan A. Suarez Romero	3b423eeb2d	genxml: add missing field values for 3DSTATE_SF Fill out "Vertex Sub Pixel Precision Select" possible values. CC: 18.3 19.0 <mesa-stable@lists.freedesktop.org> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 17:53:45 +01:00
Bas Nieuwenhuizen	f324784104	radv: Allow interpolation on non-float types. In particular structs containing floats and 16-bit floating point types. Fixes: `62024fa775` "radv: enable VK_KHR_16bit_storage extension / 16bit storage features" Fixes: `da29594636` "spirv: Only split blocks" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109735 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-22 17:06:55 +01:00
Bas Nieuwenhuizen	a1fdd4a4a7	radv: Fix float16 interpolation set up. float16 types can have non-flat interpolation so set up the HW correctly for that. Fixes: `62024fa775` "radv: enable VK_KHR_16bit_storage extension / 16bit storage features" Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-22 17:06:55 +01:00
Ilia Mirkin	ae2cb72804	nv50: disable compute It causes more trouble than it's worth. Now vl tries to create compute shaders without all the proper checking. Since there's really no (current) way to use compute on nv50, just mark it disabled. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109742 Fixes: `f6ac0b5d71` ("gallium/auxiliary/vl: Add compute shader to support video compositor render") Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-02-22 09:42:41 -05:00
Lionel Landwerlin	1d626fc028	intel: fix urb size for CFL GT1 Same 192Kb amount as SKL/KBL GT1 applies. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Fixes: `de7ed0ba55` ("i965/CFL: Add PCI Ids for Coffee Lake.")	2019-02-22 11:53:49 +00:00
Samuel Iglesias Gonsálvez	bd2c5a8203	isl: the display engine requires 64B alignment for linear surfaces v2: Add PRM quote (Lionel) Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-02-22 11:45:45 +00:00
Gert Wollny	2ee197d6e8	virgl: Enable mixed color FBO attachemnets only when the host supports it Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2019-02-22 10:44:08 +01:00
Mauro Rossi	338dacc341	android: intel/isl: remove redundant building rules Fixes the following building error: including ./external/mesa/Android.mk ... build/core/base_rules.mk:183: * external/mesa/src/intel: MODULE.TARGET.STATIC_LIBRARIES.libmesa_isl_tiled_memcpy already defined by external/mesa/src/intel. make: * [build/core/ninja.mk:164: out/build-android_x86_64.ninja] Error 1 ISL_TILED_MEMCPY_FILES is isl/isl_tiled_memcpy_normal.c and that source file includes isl_tiled_memcpy.c source Fixes: `96bb328` ("iris: add Android build") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-02-22 07:56:11 +02:00
Kenneth Graunke	b21de090d6	Revert "iris: Enable auxiliary buffer support" This reverts commit `cd0ced49e7`. It breaks glxgears rendering.	2019-02-21 15:50:46 -08:00
Kenneth Graunke	e2cb0c5e0e	iris: Enable -msse2 and -mstackrealign This is needed for gen_clflush.h intrinsics to work on 32-bit builds. i965 and anv both set these, and iris needs to as well. Tested-by: Mark Janes <mark.a.janes@intel.com>	2019-02-21 14:51:15 -08:00
Francisco Jerez	7272fe9c08	intel/fs: Rely on undocumented unrestricted regioning for 32x16-bit integer multiply. Even though the hardware spec claims that any "integer DWord multiply" operation is affected by the regioning restrictions of CHV/BXT/GLK, this is inconsistent with the behavior of the simulator and with empirical evidence -- Return false from has_dst_aligned_region_restriction() for such instructions as a micro-optimization. Tested-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-21 14:07:25 -08:00
Francisco Jerez	e03be78252	intel/fs: Implement extended strides greater than 4 for IR source regions. Strides up to 32B can be implemented for the source regions of most instructions by leveraging either the vertical or the horizontal stride of the hardware Align1 region. The main motivation for this is that currently the lower_integer_multiplication() pass will happily double the stride of one of the 32-bit sources, which can blow up if the stride of the original source was already the maximum value allowed by the hardware. An alternative would be to use the regioning legalization pass in order to lower such strides into the composition of multiple legal strides, but that would be somewhat less efficient. This showed up as a regression from my commit `cbea91eb57` in Vulkan 1.1 CTS tests on CHV/BXT platforms, however it was really a pre-existing problem that had affected conformance on other platforms without native support for integer multiplication. CHV/BXT were getting around it because the code I removed in that commit had the "fortunate" side effect of emitting narrower regions that didn't hit the hardware stride limit after lowering. Beyond fixing the regression this fixes ~90 additional Vulkan 1.1 subgroup CTS tests on ICL (that's why this patch is marked for inclusion in mesa-stable even though the original regressing patch was not). According to Jason, a nearly equivalent change had been committed previously as `e8c9e65185` and then (mistakenly?) reverted as `a31d038208`. Cc: mesa-stable@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328 Reported-by: Mark Janes <mark.a.janes@intel.com> Tested-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-21 14:07:25 -08:00
Francisco Jerez	7f9f6263c1	intel/fs: Cap dst-aligned region stride to maximum representable hstride value. This is required in combination with the following commit, because otherwise if a source region with an extended 8+ stride is present in the instruction (which we're about to declare legal) we'll end up emitting code that attempts to write to such a region, even though strides greater than four are still illegal for the destination. Tested-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-21 14:07:25 -08:00
Francisco Jerez	e2f475ddff	intel/fs: Lower integer multiply correctly when destination stride equals 4. Because the "low" temporary needs to be accessed with word type and twice the original stride, attempting to preserve the alignment of the original destination can potentially lead to instructions with illegal destination stride greater than four. Because the CHV/BXT alignment restrictions are now being enforced by the regioning lowering pass run after lower_integer_multiplication(), there is no real need to preserve the original strides anymore. Note that this bug can be reproduced on stable branches, but back-porting would be non-trivial, because the fix relies on the regioning lowering pass recently introduced. Tested-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-21 14:07:25 -08:00
Francisco Jerez	c3c27762f7	intel/fs: Exclude control sources from execution type and region alignment calculations. Currently the execution type calculation will return a bogus value in cases like: mov_indirect(8) vgrf0:w, vgrf1:w, vgrf2:ud, 32u Which will be considered to have a 32-bit integer execution type even though the actual indirect move operation will be carried out with 16-bit precision. Similarly there's no need to apply the CHV/BXT double-precision region alignment restrictions to such control sources, since they aren't directly involved in the double-precision arithmetic operations emitted by these virtual instructions. Applying the CHV/BXT restrictions to control sources was expected to be harmless if mildly inefficient, but unfortunately it exposed problems at codegen level for virtual instructions (namely the SHUFFLE instruction used for the Vulkan 1.1 subgroup feature) that weren't prepared to accept control sources with an arbitrary strided region. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328 Reported-by: Mark Janes <mark.a.janes@intel.com> Fixes: `efa4e4bc5f` "intel/fs: Introduce regioning lowering pass." Tested-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-21 14:07:25 -08:00
Timothy Arceri	d9e08e753b	nir: clone instruction set rather than removing individual entries This reduces the time spent in nir_opt_cse() by almost a half. The massif tool from callgrind reported no change in peak memory use with the large doliphin uber shaders I used for testing. Reviewed-by: Thomas Helland<thomashelland90@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-22 08:36:36 +11:00
Jordan Justen	cd0ac3a6af	genxml: Remove extra space in gen4/45/5 field name Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-21 13:17:10 -08:00
Jordan Justen	a9b0b72a78	genxml/gen_bits_header.py: Use regex to strip no alphanum chars Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-21 13:15:59 -08:00
Kenneth Graunke	cd0ced49e7	iris: Enable auxiliary buffer support This currently regresses KHR-GL4x.compute_shader.resource-texture, but that's a pre-existing bug (https://bugs.freedesktop.org/109113) which should be fixed up once we have fast clear support.	2019-02-21 10:26:12 -08:00
Rafael Antognolli	db81445837	iris: Flag ALL_DIRTY_BINDINGS on aux state change. If we change the aux state for a given resource, we need to re-emit the binding table pointers for any stage that has such resource bound. Since we don't track that, flag IRIS_ALL_DIRTY_BINDINGS and emit all of them.	2019-02-21 10:26:12 -08:00
Rafael Antognolli	95589652a1	iris: Skip resolve if there's no context. If iris_resource_get_handle() gets called without a context, we can't resolve the resource. Hopefully it shouldn't be compressed anyway, so let's just add an assert to ensure it's correct.	2019-02-21 10:26:12 -08:00
Rafael Antognolli	36138bb7fc	iris/clear: Pass on render_condition_enabled.	2019-02-21 10:26:12 -08:00
Rafael Antognolli	8190165d13	iris: Avoid leaking if we fail to allocate the aux buffer. Otherwise we could leak the aux state map or the aux BO.	2019-02-21 10:26:12 -08:00
Kenneth Graunke	7da53d7188	iris: Only resolve compute resources for compute shaders	2019-02-21 10:26:12 -08:00
Kenneth Graunke	95a36bd55c	iris: Fix aux usage in render resolve code	2019-02-21 10:26:12 -08:00
Rafael Antognolli	4f191feb0c	iris: Pin HiZ buffers when rendering.	2019-02-21 10:26:12 -08:00
Rafael Antognolli	dfd54f9954	iris: Flush before hiz_exec.	2019-02-21 10:26:12 -08:00
Kenneth Graunke	f3f7d45a63	iris: Allow disabling aux via INTEL_DEBUG options	2019-02-21 10:26:12 -08:00
Kenneth Graunke	4634b754f4	iris: do flush for buffers still	2019-02-21 10:26:12 -08:00
Kenneth Graunke	15822f33ad	iris: make surface states for CCS_D too CCS_E can fall back to CCS_D with incompatible format views CCS_D is pretty useless without fast clears and we may as well use NONE, but we're surely going to hook those up at some point, so may as well just go ahead and do it now...	2019-02-21 10:26:12 -08:00
Rafael Antognolli	689b590069	iris: Skip msaa16 on gen < 9. Also needed to add gen information to KEY_INIT.	2019-02-21 10:26:12 -08:00
Kenneth Graunke	fd2038b22a	iris: Set program key fields for MCS	2019-02-21 10:26:12 -08:00
Kenneth Graunke	92c310fd3f	iris: don't use hiz for MSAA buffers	2019-02-21 10:26:12 -08:00
Kenneth Graunke	2cddc953cd	iris: some initial HiZ bits	2019-02-21 10:26:12 -08:00
Kenneth Graunke	9b1126c990	iris: disable aux for external things	2019-02-21 10:26:12 -08:00
Kenneth Graunke	45f4dab62b	iris: Resolves for compute	2019-02-21 10:26:12 -08:00
Kenneth Graunke	ecc897b8ad	iris: consider framebuffer parameter for aux usages	2019-02-21 10:26:12 -08:00
Kenneth Graunke	b77d2dc71b	iris: Make blit code use actual aux usages	2019-02-21 10:26:12 -08:00
Kenneth Graunke	bfc76d3525	iris: store modifier info in res	2019-02-21 10:26:12 -08:00
Kenneth Graunke	56f1fe3eac	iris: pin the buffers	2019-02-21 10:26:12 -08:00
Kenneth Graunke	f8aa9aa353	iris: resolve before transfer maps	2019-02-21 10:26:12 -08:00
Kenneth Graunke	c53a67d469	iris: be sure to skip buffers in resolve code Buffers don't have ISL surfaces, and this can get us into trouble.	2019-02-21 10:26:12 -08:00
Kenneth Graunke	5eb75345b8	iris: try to fix copyimage vs copybuffers	2019-02-21 10:26:12 -08:00
Kenneth Graunke	d8f3bc1c4c	iris: actually use the multiple surf states for aux modes	2019-02-21 10:26:12 -08:00
Kenneth Graunke	3c979b0e6d	iris: add some draw resolve hooks	2019-02-21 10:26:12 -08:00
Kenneth Graunke	53c484ba8a	iris: blorp using resolve hooks	2019-02-21 10:26:12 -08:00
Kenneth Graunke	77a1070d36	iris: Initial import of resolve code	2019-02-21 10:26:12 -08:00
Kenneth Graunke	f879349398	iris: create aux surface if needed	2019-02-21 10:26:12 -08:00
Kenneth Graunke	3efd5299af	iris: Fill out SURFACE_STATE entries for each possible aux usage	2019-02-21 10:26:12 -08:00
Kenneth Graunke	3cfc6a207b	iris: Fill out res->aux.possible_usages	2019-02-21 10:26:12 -08:00
Kenneth Graunke	a7bc4d6074	iris: Add iris_resource fields for aux surfaces But without fast clears or HiZ per-level tracking just yet.	2019-02-21 10:26:12 -08:00
Jordan Justen	d0996d5fab	iris: Emit default L3 config for the render pipeline Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>	2019-02-21 10:26:12 -08:00
Kenneth Graunke	51ddc40084	iris: Always emit at least one BLEND_STATE	2019-02-21 10:26:12 -08:00
Kenneth Graunke	d6dd57d43c	iris: Add missing depth cache flushes	2019-02-21 10:26:12 -08:00
Kenneth Graunke	1b5c342f33	iris: Simplify iris_get_depth_stencil_resources We can safely assume that the given resource is depth, depth/stencil, or stencil already. The stencil-only case is easily detectable with a single format check, and all other cases are handled identically. This saves some CPU overhead.	2019-02-21 10:26:12 -08:00
Kenneth Graunke	07ec1f0b25	iris: Make an IRIS_MAX_MIPLEVELS define	2019-02-21 10:26:12 -08:00
Rafael Antognolli	455c959689	iris: Store internal_format when getting resource from handle.	2019-02-21 10:26:12 -08:00
Kenneth Graunke	973f01d55a	iris: Move create and bind driver hooks to the end of iris_program.c This just moves the code for dealing with pipe_shader_state / pipe_compute_state / iris_uncompiled_shader to the end of the file. Now that those do precompiles, they want to call the actual compile functions. Putting them at the end eliminates the need for a bunch of prototypes.	2019-02-21 10:26:12 -08:00
Timur Kristóf	cacf84ed5f	iris: implement clearing render target and depth stencil v2 (Kenneth Graunke): split color/depthstencil cases, fix iris_clear	2019-02-21 10:26:12 -08:00
Kenneth Graunke	8ab82bd1fd	iris: Drop XXX about checking for swizzling Caio noted that this is not necessary on Gen8+: "Before Gen8, there was a historical configuration control field to swizzle address bit[6] for in X/Y tiling modes. This was set in three different places: TILECTL[1:0], ARB_MODE[5:4], and DISP_ARB_CTL[14:13]. For Gen8 and subsequent generations, the swizzle fields are all reserved, and the CPU's memory controller performs all address swizzling modifications." Since we don't support earlier hardware, we can skip it entirely.	2019-02-21 10:26:12 -08:00
Kenneth Graunke	bf23e79629	iris: Set HasWriteableRT correctly A bit of irritating state cross dependency here, but nothing too hard	2019-02-21 10:26:12 -08:00
Kenneth Graunke	d612cd1bf8	iris: Set 3DSTATE_WM::ForceThreadDispatchEnable The Vulkan driver only sets this if color writes are disabled, which is more conservative - but would require us to inspect blend state. (If color writes are enabled, we don't need to force anything, because the internal signal is already correct. But it shouldn't hurt to do so.)	2019-02-21 10:26:12 -08:00
Kenneth Graunke	27d751cdd8	iris: Drop XXX about alpha testing I was misreading i965 - the 3DSTATE_WM::PixelShaderKillsPixel bit from Gen < 8 needed all of this, but the 3DSTATE_PS_EXTRA bit only needs prog_data->uses_kill.	2019-02-21 10:26:12 -08:00
Andre Heider	bffb65d28e	iris: improve PIPE_CAP_VIDEO_MEMORY bogus value -1 is a little too bogus for most games ;) Signed-off-by: Andre Heider <a.heider@gmail.com>	2019-02-21 10:26:12 -08:00
Andre Heider	f89a578818	iris: fix build with gallium nine Signed-off-by: Andre Heider <a.heider@gmail.com>	2019-02-21 10:26:12 -08:00
Kenneth Graunke	be49fb051d	iris: Stop chopping off the first nine characters of the renderer string	2019-02-21 10:26:12 -08:00
Kenneth Graunke	15341778ba	iris: rework num textures to util_lastbit	2019-02-21 10:26:12 -08:00
Kenneth Graunke	974229df46	iris: Add PIPE_CAP_MAX_VARYINGS	2019-02-21 10:26:11 -08:00
Kenneth Graunke	1cd001aa63	iris: Make a iris_batch_reference_signal_syncpt helper function. Suggested by Chris Wilson. More obvious what's going on.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	9376799bd6	iris: Use READ_ONCE and WRITE_ONCE for snapshots_landed Suggested by Chris Wilson, if only to make it obvious to the human readers that these are volatile reads. It may also be necessary for the compiler in a few cases.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	18e31a9b31	iris: Fix accidental busy-looping in query waits When switching from bo_wait to sync-points, I missed that we turned an if (not landed) bo_wait into a while (not landed) check_syncpt(), which has a timeout of 0. This meant, rather than sleeping until the batch is complete, we'd busy-loop, continually asking the kernel "is the batch done yet???". This is not what we want at all - if we wanted a busy loop, we'd just loop on !snapshots_landed. We want to sleep. Add an effectively infinite timeout so that we sleep.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	3b1ac8244e	iris: Add a timeout_nsec parameter, rename check_syncpt to wait_syncpt I want to be able to wait with a non-zero timeout from elsewhere.	2019-02-21 10:26:11 -08:00
Sagar Ghuge	c24a574e6c	iris: Don't allocate a BO per query object Instead of allocating 4K BO per query object, we can create a large blob of memory and split it into pieces as required. Having one BO for multiple query objects, we don't want to wait on all of them, instead when we write last snapshot, we create a sync point, and check syncpoints while waiting on particular object. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>	2019-02-21 10:26:11 -08:00
Kenneth Graunke	a1ebac3750	iris: Implement ALT mode for ARB_{vertex,fragment}_shader Fixes gl-1.0-spot-light	2019-02-21 10:26:11 -08:00
Kenneth Graunke	732c3a90a4	iris: Fix bug in bound vertex buffer tracking res might be NULL, at which point this is an unbind.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	4bfd12bbf7	iris: minor tidying	2019-02-21 10:26:11 -08:00
Kenneth Graunke	b1bacbf038	iris: Unreference some more things on state module teardown	2019-02-21 10:26:11 -08:00
Kenneth Graunke	e092ed9213	iris: Drop dead state_size hash table I inherited this from i965. It would be nice to track the state size so INTEL_DEBUG=color,bat decoding can print the right number of e.g. binding table entries or blend states, but...without a single point of entry for state, it's a little tricky to get right. Punt for now, and drop the dead code in the meantime.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	6e41f1b459	iris: Drop comment about ISP_DIS i965 re-emits 3DSTATE_CONSTANT_* on every batch, so there's no point in restoring the constants from the context. Iris actually re-pins the constant buffers properly across the batch, and avoids re-emitting the constant packets unless it's necessary. So, we don't want ISP_DIS.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	edd3ce5a63	iris: Enable PIPE_CAP_COMPACT_ARRAYS	2019-02-21 10:26:11 -08:00
Kenneth Graunke	1db394f46b	iris: Remap stream output indexes back to VARYING_SLOT_. Previously I had a hack in st/mesa to make it stop remapping VARYING_SLOT_ into the naively compacted slots, which aren't what we want. But that wasn't very feasible, as we'd have to update all drivers, or add capability bits, and it gets messy fast. It turns out that I can map back to VARYING_SLOT_* in about 5 LOC, so let's just do that. It removes the need for hacks, and is easy. This also fixes KHR-GL46.enhanced_layouts.xfb_capture_struct, which apparently with my hack was still getting the wrong slot info.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	5d3d757178	iris: Zero the compute predicate when changing the render condition 1. Set a render condition. We emit it immediately on the render engine, and stash q->bo as ice->state.compute_predicate in case the compute engine needs it. 2. Clear the render condition. We were incorrectly leaving a stale compute_predicate kicking around... 3. Dispatch compute. We would then read the stale compute predicate, and try to load it into MI_PREDICATE_DATA. But q->bo may have been freed altogether, causing us to try and use garbage memory as a BO, adding it to the validation list, failing asserts, and tripping EINVALs in execbuf. Huge thanks to Mark Janes for narrowing this sporadic GL CTS failure down to a list of 48 tests I could easily run to reproduce it. Huge thanks to the Valgrind authors for the memcheck tool that immediately pinpointed the problem.	2019-02-21 10:26:11 -08:00
Caio Marcelo de Oliveira Filho	4fd1f70e62	iris: always include an extra constbuf0 if using UBOs In st_nir_lower_uniforms_to_ubo() all UBO access in the shader have its index incremented to open room for uniforms in constbuf0. So if we use UBOs, we always need to include the extra binding entry in the table. To avoid doing this checks both when compiling the shader and when assigning binding tables, store the num_cbufs in iris_compiled_shader. Fixes a bunch of tests from Piglit and CTS that use UBOs but don't use uniforms or system values. Note that some tests fitting this criteria were passing because the UBOs were moved to be push constants (avoiding the problem). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-21 10:26:11 -08:00
Kenneth Graunke	4801af2f26	iris: Do binder address allocations per-context, not globally. iris_bufmgr allocates addresses across the entire screen, since buffers may be shared between multiple contexts. There used to be a single special address, IRIS_BINDER_ADDRESS, that was per-context - and all contexts used the same address. When I moved to the multi-binder system, I made a separate memory zone for them. I wanted there to be 2-3 binders per context, so we could cycle them to avoid the stalls inherent in pinning two buffers to the same address in back-to-back batches. But I figured I'd allow 100 binders just to be wildly excessive/cautious. What I didn't realize was that we need 2-3 binders per context, and what I did was allocate 100 binders per screen. Web browsers, for example, might have 1-2 contexts per tab, leading to hundreds of contexts, and thus binders. To fix this, we stop allocating VMA for binders in bufmgr, and let the binder handle it itself. Binders are per-context, and they can assign context-local addresses for the buffers by simply doing a ringbuffer style approach. We only hold on to one binder BO at a time, so we won't ever have a conflicting address. This fixes dEQP-EGL.functional.multicontext.non_shared_clear. Huge thanks to Tapani Pälli for debugging this whole mess and figuring out what was going wrong. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-02-21 10:26:11 -08:00
Kenneth Graunke	0f33204f05	iris: Fix memzone_for_address for the surface and binder zones We use > for IRIS_MEMZONE_DYNAMIC because IRIS_BORDER_COLOR_POOL_ADDRESS lives at the very start of that zone. However, IRIS_MEMZONE_SURFACE and IRIS_MEMZONE_BINDER are normal zones. They used to be a single zone (surface) with a single binder BO at the beginning, similar to the border color pool. But when I moved us to multiple binders, I made them have a real zone (if a small one). So both zones should use >=. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-02-21 10:26:11 -08:00
Kenneth Graunke	3bcb1a7fcd	iris: Don't whack SO dirty bits when finishing a BLORP op Re-emitting 3DSTATE_SO_BUFFERS can be hazardous, as it could zero offsets. Plus, it's just not necessary - BLORP doesn't change these.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	b9697dd820	iris: Fix SO issue with INTEL_DEBUG=reemit, set fewer bits INTEL_DEBUG=reemit was breaking streamout tests, by re-emitting 3DSTATE_SO_BUFFER commands that tell the HW to zero the SO write offsets. We would need to alter them to use 0xFFFFFFFF for the offset. Also, have each upload function only flag bits relevant to its own pipeline.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	61798e3c88	iris: CS stall on VF cache invalidate workarounds See commit `31e4c9ce40` in i965.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	c81941f1e7	iris: Pay attention to blit masks For combined depth/stencil formats, we may want to only blit one half. If PIPE_BLIT_Z is set, blit depth; if PIPE_BLIT_S is set, blit stencil.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	7837fec740	iris: Assert about blits with color masking st/mesa never asks for this today, but in theory someone might, and we don't support it.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	0f677b0d87	iris: Don't enable smooth points when point sprites are enabled dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_*.primitives.points	2019-02-21 10:26:11 -08:00
Kenneth Graunke	3b336a1513	iris: Allow sample mask of 0 I think this was an attempt to work around various sample mask bugs I had early on. It's not correct. A sample mask of 0 is legal and means to disable all samples. Fixes dEQP-GLES31.functional.texture.multisample..sample_mask*	2019-02-21 10:26:11 -08:00
Kenneth Graunke	e17333ea1e	iris: fail to create screen for older unsupported HW loader shouldn't try, but let's be paranoid	2019-02-21 10:26:11 -08:00
Kenneth Graunke	1f91f688e8	iris: Switch to the new PIPELINE_STATISTICS_QUERY_SINGLE capability I had a hack in place earlier to pass the query type as q->index for the regular statistics query, but we ended up adjusting the interface and adding a new query type. Use that instead, fixing pipeline statistics queries since the rebase.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	a23c06cabc	iris: Use new PIPE_STAT_QUERY enums rather than hardcoded numbers.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	5aef30b886	iris: Fix Broadwell WaDividePSInvocationCountBy4 We were dividing by 4 in calculate_result_on_gpu(), and also in iris_get_query_result(). We should stop doing the latter, and instead divide by 4 in calculate_result_on_cpu() as well. Otherwise, if snapshots were available, and you hit the calculate_result_on_cpu() path, but requested it be written to a QBO, you'd fail to get a divide.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	7f318bf2ac	iris: Delete genx->bound_vertex_buffers This is actually stored in ice->state, as it isn't gen-specific	2019-02-21 10:26:11 -08:00
Kenneth Graunke	02991e2878	iris: Drop a dead comment	2019-02-21 10:26:11 -08:00
Kenneth Graunke	572fad1e84	iris: Don't check other batches for our batch BO This is an awkward corner case. We create batches in order, each of which creates and pins a BO. The other batches may not be set up yet, so it may not be safe to ask whether they reference a BO. Just avoid this for now. We could avoid it for other context-local BOs too, but we currently don't have a flag for that (and I'm not certain whether it's worth it).	2019-02-21 10:26:11 -08:00
Kenneth Graunke	8eda6f2288	iris: Handle PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE somewhat Various places in the transfer code need to know whether they must read the existing resource's values. Rather than checking both flags everywhere, just make PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE also flag PIPE_TRANSFER_DISCARD_RANGE - if we can discard everything, we can discard a subrange, too. Obviously, we can do better for PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE, but eventually u_threaded_context should handle swapping out buffers for new idle buffers, anyway. In the meantime, this is at least better.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	bacc722d13	iris: Flush the render cache in flush_and_dirty_for_history BLORP uses the render engine to write to buffers, and we need to flush that data out to the actual surface (finishing the write). Then, the rest of this function invalidates any caches that might have stale data which needs to be refetched.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	7a9e87c224	iris: Implement multi-slice copy_region I don't know if this is required - surprisingly, I haven't seen it matter - but I'd like to use it for multi-slice transfer maps. We may as well do the right thing.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	307f3f9924	iris: Leave a comment about why Broadwell images are broken There are a variety of ways to fix this, many of which are simple, but I could use some advice on which ones other people prefer, and so we'll punt until after the holidays.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	7ed1383c0a	iris: Fix surface states for Gen8 lowered-to-untype images We have to use SURFTYPE_BUFFER and ISL_FORMAT_RAW for these.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	477e7d575b	iris: Fill out brw_image_params for storage images on Broadwell	2019-02-21 10:26:11 -08:00
Kenneth Graunke	7e35333c73	iris: Don't make duplicate system values We were relying on CSE/GVN/etc to coalesce all intrinsics that load the same value, but that's a bad idea. We might have a couple intrinsics that reload the same value. If so, we only want to set up the uniform on the first one we see.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	bc3bb28645	iris: Don't enable push constants just because there are system values System values are built-in uniforms. We set them up as UBO values, and might pull or push them. UBO push analysis will take care of that. We only want to enable push constants if there's an actual range being pushed. Otherwise, we might get into a scenario where 3DSTATE_PS enables push constants but 3DSTATE_CONSTANT_PS isn't pushing anything. This fixes GPU hangs in Broadwell image load store tests which have unused image param system values but no other uniforms. (We shouldn't be making those anyway, but that's a separate fix...)	2019-02-21 10:26:11 -08:00
Kenneth Graunke	2ca0d913ea	iris: Fix framebuffer layer count cso_fb->layers is only valid for no-attachment framebuffers. Use the helper function to get the real value, then stash it so we don't have to call the helper function on the old value for comparison, or at draw time for Force Zero RTA Index setting. This fixes Force Zero RTA Index being set even when attempting layered rendering.	2019-02-21 10:26:11 -08:00
Dave Airlie	df60241ff7	iris: handle qbo fragment shader invocation workaround	2019-02-21 10:26:11 -08:00
Dave Airlie	5ae2e5aa94	iris: add fs invocations query workaround for broadwell	2019-02-21 10:26:11 -08:00
Dave Airlie	8806b29e16	iris: setup gen8 caps	2019-02-21 10:26:11 -08:00
Dave Airlie	1bbf095473	iris: limit gen8 to 8 samples	2019-02-21 10:26:11 -08:00
Dave Airlie	823609b1a3	iris/WIP: add broadwell support This adds all the state changes, MOCS changes,	2019-02-21 10:26:11 -08:00
Kenneth Graunke	5be72d9a20	iris: Delete bogus comment about cube array counting. Both 'z' and 'depth' are counted in slices, according to the Gallium docs (context.rst). In our temporary memory, we allocate `box.depth` slices, so we need to rebase the starting slice (box.z) down to 0, and back again when writing on unmap. There's nothing strange about cubes here.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	73709be0c3	iris: Fix compute scratch pinning Thanks to Eero Tamminen for helping catch this.	2019-02-21 10:26:11 -08:00
Kenneth Graunke	3ab3aa23c2	iris: Add a more long term TODO about timebase scaling	2019-02-21 10:26:11 -08:00
Kenneth Graunke	7ddc1f8ded	iris: Only resolve inputs for actual shader stages We don't need to consider compute at render time, and don't need to consider disabled stages. 4% on drawoverhead.	2019-02-21 10:26:11 -08:00
Rhys Kidd	6c17e7d95f	iris: Fix assertion in iris_resource_from_handle() tiling usage Assertion error: iris_resource_from_handle: Assertion `res->bo->tiling_mode == isl_tiling_to_i915_tiling(res->surf.tiling)' failed. This patch fixes 16 piglit tests on KBL: glx/glx-multithread-texture glx/glx-query-drawable-glx_fbconfig_id-glxpbuffer glx/glx-query-drawable-glx_fbconfig_id-glxpixmap glx/glx-query-drawable-glx_preserved_contents glx/glx-query-drawable-glxpbuffer-glx_height glx/glx-query-drawable-glxpbuffer-glx_width glx/glx-query-drawable-glxpixmap-glx_height glx/glx-query-drawable-glxpixmap-glx_width glx/glx-swap-pixmap glx/glx-swap-pixmap-bad glx/glx-tfp glx/glx-visuals-depth -pixmap glx/glx-visuals-stencil -pixmap spec/egl 1.4/eglcreatepbuffersurface and then glclear spec/egl 1.4/largest possible eglcreatepbuffersurface and then glclear spec/egl_nok_texture_from_pixmap/basic Cc: Kenneth Graunke <kenneth@whitecape.org> Cc: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>	2019-02-21 10:26:11 -08:00
Kenneth Graunke	73d525f188	iris: Fix scratch space allocation on Icelake. Gen9-10 have fewer than 4 subslices per slice, so they need this to be rounded up. Gen11 isn't documented as needing this hack, and it can also have more than 4 subslices, so the hack actually can break things. Fixes tests/spec/arb_enhanced_layouts/execution/component-layout/ sso-vs-gs-fs-array-interleave	2019-02-21 10:26:11 -08:00
Kenneth Graunke	154e3e45bb	iris: better MOCS	2019-02-21 10:26:11 -08:00
Dave Airlie	aaaf611130	iris: fix gpu calcs for timestamp queries	2019-02-21 10:26:11 -08:00
Kenneth Graunke	3c45d03049	iris: only mark depth/stencil as writable if writes are actually enabled	2019-02-21 10:26:11 -08:00
Kenneth Graunke	3a938a4b23	iris: more dead comments	2019-02-21 10:26:11 -08:00
Kenneth Graunke	e169cb09c3	iris: pin and re-pin the scratch BO	2019-02-21 10:26:11 -08:00
Kenneth Graunke	dd0d47a5d2	iris: delete finished comments	2019-02-21 10:26:11 -08:00
Kenneth Graunke	32ee2e4c27	iris: always pin the binder...in the compute context, too. not sure why this hasn't tripped things up	2019-02-21 10:26:11 -08:00
Kenneth Graunke	fbfe07c4f3	iris: Track blend enables, save outbound for resolve code	2019-02-21 10:26:11 -08:00
Kenneth Graunke	5481887ca8	iris: whitespace fixes	2019-02-21 10:26:11 -08:00
Kenneth Graunke	b2fa90706e	iris: Make a alloc_surface_state helper This does the gtt_offset addition for us	2019-02-21 10:26:11 -08:00
Kenneth Graunke	b358c4b92b	iris: Use a surface state fill helper This will check aux_usage eventually	2019-02-21 10:26:11 -08:00
Kenneth Graunke	b92ca4d0f6	iris: don't print the pointer in INTEL_DEBUG=submit lots of noise in diff, hope was it would be useful for gdb, but the the GEM handle is good enough	2019-02-21 10:26:11 -08:00
Kenneth Graunke	ad969a00c0	iris: Fix the prototype for iris_bo_alloc_tiled This now matches the actual function in iris_bufmgr.c, as well as the equivalent brw_bufmgr.c function...	2019-02-21 10:26:11 -08:00
Kenneth Graunke	598a78849e	iris: Fix for PIPE_CAP_SIGNED_VERTEX_BUFFER_OFFSET This fixes ext_transform_feedback-builtin-varyings gl_Position after the combination of my transform feedback reworks and my vertex buffer reworks (?)	2019-02-21 10:26:11 -08:00
Kenneth Graunke	392fba5f31	iris: drop unnecessary genx->streamout field	2019-02-21 10:26:11 -08:00
Kenneth Graunke	5307ff6a5f	iris: Implement DrawTransformFeedback() We get the count by dividing the offset by the stride.	2019-02-21 10:26:11 -08:00
Jason Ekstrand	2e103fff63	iris: Copy anv's MI_MATH helpers for multiplication and division (import done by Ken but with author set to Jason because it's his code that's being imported, so he deserves the credit)	2019-02-21 10:26:11 -08:00
Kenneth Graunke	52baba80f3	iris: only get space for one offset in stream output targets Target corresponds to a buffer, buffer only records one offset, not multiple.	2019-02-21 10:26:10 -08:00
Kenneth Graunke	31357bae4b	iris: Move iris_stream_output_target def to iris_context.h now that it doesn't have genxml	2019-02-21 10:26:10 -08:00
Kenneth Graunke	cf4931e586	iris: Don't bother packing 3DSTATE_SO_BUFFER at create time We have to do half the packet late anyway, we may as well just do it all at set time. This also lets us move the struct def out of genxml	2019-02-21 10:26:10 -08:00
Kenneth Graunke	754d678b0a	iris: Add _MI_ALU helpers that don't paste This lets you pass arguments as function parameters	2019-02-21 10:26:10 -08:00
Kenneth Graunke	5094062bbe	iris: Reorder LRR parameters to have dst first. LRI and LRM both put dst first, be consistent.	2019-02-21 10:26:10 -08:00
Kenneth Graunke	2f5d85661f	iris: rewrite set_vertex_buffer and VB handling I was using the Gallium API wrong. set_* functions with start_slot and count parameters are supposed to update a subrange of the items. I had been trashing all bound vertex buffers and starting over. This should hopefully also make it easier to slot in additional VERTEX_BUFFER_STATEs at draw time, say, for shader draw parameters.	2019-02-21 10:26:10 -08:00
Kenneth Graunke	286b8b8f99	iris: handle PatchVerticesIn as a system value.	2019-02-21 10:26:10 -08:00
Tapani Pälli	96bb328e9b	iris: add Android build Note that at least following additional libs/components require changes since they refer to BOARD_GPU_DRIVERS variable which is used to select the driver: - mixins - minigbm - libdrm - drm_gralloc v2: (feedback by Gustaw Smolarczyk) Fix trailing \ in a few cases Signed-off-by: Tapani Pälli <tapani.palli@intel.com>	2019-02-21 10:26:10 -08:00
Kenneth Graunke	97e82e80f9	iris: override alpha to one src1 blend factors No idea why this used to pass and doesn't after updating...seems like we should have been handling it all along...	2019-02-21 10:26:10 -08:00
Kenneth Graunke	90b2745148	iris: Always do rasterizer discard in clipper but continue doing it in SOL if possible because it's faster Fixes ./bin/ext_transform_feedback-discard-drawarrays - simpler too	2019-02-21 10:26:10 -08:00
Kenneth Graunke	5f511798d0	iris: Fix primitive generated query active flag	2019-02-21 10:26:10 -08:00
Kenneth Graunke	99cab4d381	iris: Enable guardband clipping	2019-02-21 10:26:10 -08:00
Kenneth Graunke	f062dcdfbb	iris: Clamp viewport extents to the framebuffer dimensions Fixes arb_framebuffer_no_attachments-query's resize subtest.	2019-02-21 10:26:10 -08:00
Kenneth Graunke	fb2df1b5d5	iris: Fix clear dimensions Fixes depthstencil-render-miplevels 1024 s=z24_s8	2019-02-21 10:26:10 -08:00
Kenneth Graunke	2e79e46d23	iris: Drop continues in resolve Now that we u_bit_scan we know it exists	2019-02-21 10:26:10 -08:00
Kenneth Graunke	5fde1fa988	iris: Replace num_textures etc with a bitmask we can scan More accurate bounds, plus can skip dead ones	2019-02-21 10:26:10 -08:00
Kenneth Graunke	7ad7d0beea	iris: Fix set_sampler_views with start > 0	2019-02-21 10:26:10 -08:00
Kenneth Graunke	1c6fea8e7b	iris: fix set_sampler_views to not unbind, be better about bounds	2019-02-21 10:26:10 -08:00
Kenneth Graunke	598ce8e88e	iris: fix overhead regression from flushing for storage images st calls us with count = 32 but a NULL pointer...we only really care about the highest non-NULL image...	2019-02-21 10:26:10 -08:00
Kenneth Graunke	4749f6cc4f	iris: Fix NOS mechanism Set bits, not values	2019-02-21 10:26:10 -08:00
Kenneth Graunke	a24734a2d7	iris: re-pin inherited streamout buffers	2019-02-21 10:26:10 -08:00
Kenneth Graunke	19803d0aa7	iris: reemit SBE when sprite coord origin changes fixes arb_point_sprite-checkerboard	2019-02-21 10:26:10 -08:00
Kenneth Graunke	480c62bc7e	iris: omask can kill	2019-02-21 10:26:10 -08:00
Kenneth Graunke	bd031eb2e8	iris: reject all clipping when we can't use streamout render disabled	2019-02-21 10:26:10 -08:00
Kenneth Graunke	72cf2185c8	iris: make clipper statistics dynamic	2019-02-21 10:26:10 -08:00
Kenneth Graunke	1114f0c1ce	iris: CS stall for stream out -> VB i965 doesn't do this, but I suspect it just stalls a lot and doesn't hit this. Fixes ext_transform_feedback-position render among others.	2019-02-21 10:26:10 -08:00
Kenneth Graunke	c03fbb41aa	iris: fix dma buf import strides	2019-02-21 10:26:10 -08:00
Kenneth Graunke	90274bd48f	iris: fix alpha channel for RGB BC1 formats	2019-02-21 10:26:10 -08:00
Jason Ekstrand	47d4ea1a16	iris: Allocate buffer resources separately (cleaned up by Ken - make sure a bunch of things were more obviously not using res->surf, do allow checking res->surf.tiling == LINEAR, drop format cpp checks that aren't needed, drop memzone handling for images, assume buffers / non-buffers in a few places...)	2019-02-21 10:26:10 -08:00
Kenneth Graunke	585c95f8cc	iris: Don't bother considering if the underlying surface is a cube Dave fixed it to consider whether the sampler view is a cube. With that, there's no point (possibly harm) in looking if the original resource was a cube...if it's an array view, we don't want to treat it as a cube anymore...	2019-02-21 10:26:10 -08:00
Kenneth Graunke	773adeb9e9	iris: move some non-buffer case code in a bit	2019-02-21 10:26:10 -08:00
Kenneth Graunke	2c0f001295	iris: Stop leaking iris_uncompiled_shaders like mad Now shader-db actually executes. We still need a plan for culling dead iris_compiled_shaders...	2019-02-21 10:26:10 -08:00
Kenneth Graunke	68d531d7d7	iris: Destroy the bufmgr Plugs a 12360 byte leak	2019-02-21 10:26:10 -08:00
Kenneth Graunke	7c29c3d01e	iris: Fix IRIS_MEMZONE_COUNT to exclude the border color pool This is supposed to exclude single address zones. We were getting too many VMA allocators but failing to set them up, which worked out because we also forgot to destroy them...	2019-02-21 10:26:10 -08:00
Kenneth Graunke	6cb211121b	iris: Unref unbound_tex resource Plugs a 12536 byte leak	2019-02-21 10:26:10 -08:00
Kenneth Graunke	f73fdb4001	iris: Destroy the border color pool This plugs a 12224 byte leak	2019-02-21 10:26:10 -08:00
Kenneth Graunke	3d55e9a2aa	iris: Destroy transfer helper on screen teardown Plugs a 16 byte leak	2019-02-21 10:26:10 -08:00
Kenneth Graunke	bdc1269eb2	iris: Fix failed to compile TCS message	2019-02-21 10:26:10 -08:00
Kenneth Graunke	fbf3124771	iris: Rework tiling/modifiers handling We were being very picky about things being Y tiled. But, not everything can be - for example, > 16382 surfaces on SKL GT1-3 have to fall back to linear. Instead, give ISL options and let it pick.	2019-02-21 10:26:10 -08:00
Kenneth Graunke	761a5fb36a	iris: fix conditional compute, don't stomp predicate for pipelined queries	2019-02-21 10:26:10 -08:00
Kenneth Graunke	40b12c103c	iris: check query first this lets us avoid the predicate bit in more cases, which is nice	2019-02-21 10:26:10 -08:00
Kenneth Graunke	0c3ea03e4b	iris: for BLORP, only use the predicate enable bit when USE_BIT	2019-02-21 10:26:10 -08:00
Dave Airlie	7bbf3ff4a9	iris: add conditional render support	2019-02-21 10:26:10 -08:00
Kenneth Graunke	dbe198d6ba	iris: drop key_size_for_cache dead since my program cache API rework. we could still use it for one function, but it's so trivial to pass the size, that it's probably not worth the extra code	2019-02-21 10:26:10 -08:00
Dave Airlie	e4115eaca0	iris: iris add load register reg32/64 These will be needed for broadwell and conditional render	2019-02-21 10:26:10 -08:00
Dave Airlie	311a1b3198	iris: execute compute related query on compute batch. This only happens for the compute invocations query.	2019-02-21 10:26:10 -08:00
Dave Airlie	00645ea01c	iris: fix cube texture view	2019-02-21 10:26:10 -08:00
Kenneth Graunke	39d1056d10	iris: fix some SO overflow query bugs and tidy the code a bit	2019-02-21 10:26:10 -08:00
Dave Airlie	527e5bcdc7	iris: add initial transform feedback overflow query paths (V3) v2: fix cpu overflow calc v3: use a struct	2019-02-21 10:26:10 -08:00
Kenneth Graunke	0ded23a552	iris: actually flush for storage images	2019-02-21 10:26:10 -08:00
Kenneth Graunke	69e97670bc	iris: add an extra BT assert from Chris Wilson	2019-02-21 10:26:10 -08:00
Kenneth Graunke	4312784674	iris: add assertions about binding table starts	2019-02-21 10:26:10 -08:00
Kenneth Graunke	240615695d	iris: drop pull constant binding table entry nothing uses this	2019-02-21 10:26:10 -08:00
Kenneth Graunke	10d04cdaa4	iris: Use program's num textures not the state tracker's bound the state tracker might bind more textures than the program is using.	2019-02-21 10:26:10 -08:00
Kenneth Graunke	855ff47d36	iris: Enable precompiles	2019-02-21 10:26:10 -08:00
Kenneth Graunke	ed4ffb9715	iris: rework program cache interface This exposes iris_upload_shader() without having to bind it, which will be useful for precompiles. It also lets us examine the old programs and flag dirty bits at a higher level, rather than cramming all that knowledge into the cache layer.	2019-02-21 10:26:10 -08:00
Kenneth Graunke	701a6b6006	iris: Use wrappers for create_xs_state rather than a switch statement	2019-02-21 10:26:10 -08:00
Kenneth Graunke	e628095b9a	iris: fix comment location	2019-02-21 10:26:10 -08:00
Kenneth Graunke	e5df8913e1	iris: export iris_upload_shader	2019-02-21 10:26:10 -08:00
Kenneth Graunke	d525b3dfad	iris: fix prototype warning	2019-02-21 10:26:10 -08:00
Kenneth Graunke	84a8c63527	iris: Re-pin even if nothing is dirty	2019-02-21 10:26:10 -08:00
Kenneth Graunke	415ede346d	iris: Flush for history at various moments When we blit, transfer, or copy_resource to a buffer, we need to flush to ensure any stale data for that buffer is invalidated in the caches. bind_history will inform us which caches need to be flushed. Also, for any push constant buffers, we need to flag those dirty so that we re-emit 3DSTATE_CONSTANT_*, causing the data to be re-pushed.	2019-02-21 10:26:10 -08:00
Kenneth Graunke	c8579e708e	iris: add iris_flush_and_dirty_for_history	2019-02-21 10:26:10 -08:00
Kenneth Graunke	d169747a3e	iris: Track a binding history for buffer resources This will let us know what caches to flush / state to dirty when altering the contents of a buffer.	2019-02-21 10:26:10 -08:00
Kenneth Graunke	f49f506b13	iris: drop long dead XXX comment	2019-02-21 10:26:10 -08:00
Kenneth Graunke	5dbd6df9f7	iris: Do the 48-bit vertex buffer address invalidation workaround	2019-02-21 10:26:10 -08:00
Kenneth Graunke	1b1ea23766	iris: Fix VIEWPORT/LAYER in stream output info Fixes glsl-1.50-transform-feedback-builtins and ext_transform_feedback-builtin-varyings gl_PointSize	2019-02-21 10:26:10 -08:00
Kenneth Graunke	c5b22441f1	iris: Fix buffer -> buffer copy_region Size can be too large for a surf, blorp_buffer_copy chops things up into segments we can actually handle Fixes map_buffer_range_test and copy_buffer_coherency	2019-02-21 10:26:10 -08:00
Kenneth Graunke	beb2d5e065	iris: Lie about indirects fixes interpolateAt tests	2019-02-21 10:26:10 -08:00
Kenneth Graunke	b9ccb00e2c	iris: Enable ctx->Const.UseSTD430AsDefaultPacking hooray for obscurely named pipe caps with bizarre descriptions!	2019-02-21 10:26:10 -08:00
Kenneth Graunke	39cb10613c	iris: update comment	2019-02-21 10:26:10 -08:00
Kenneth Graunke	f9612e7682	iris: RT flush for memorybarrier with texture bit PIXEL_BUFFER_BARRIER_BIT turns into PIPE_BARRIER_TEXTURE and it ought to trigger an RT flush, according to brw_memory_barrier	2019-02-21 10:26:10 -08:00
Kenneth Graunke	2c23721397	iris: PIPE_CONTROL workarounds for GPGPU mode	2019-02-21 10:26:10 -08:00
Kenneth Graunke	f1a7392be1	iris: Put batches in an array We keep re-making this array all over the place	2019-02-21 10:26:10 -08:00
Kenneth Graunke	c2a77efa71	iris: put render batch first in fence code this shouldn't matter, but it will make the next refactor easier	2019-02-21 10:26:10 -08:00
Kenneth Graunke	d918c09975	iris: flush the compute batch too if border pool is redone	2019-02-21 10:26:10 -08:00
Kenneth Graunke	017b556609	iris: leave a TODO	2019-02-21 10:26:10 -08:00
Chris Wilson	f459c56be6	iris: Add fence support using drm_syncobj	2019-02-21 10:26:10 -08:00
Kenneth Graunke	db199d9d07	iris: Add wait fences to properly sync between render/compute When flushing a batch due to a data dependency, we need to not only kick off the other batch's work, but stall our execution until it completes. Just wait on last_syncpt after flushing it.	2019-02-21 10:26:10 -08:00
Kenneth Graunke	d69bc4ac12	iris: Hang on to the last batch's sync-point, so we can wait on it	2019-02-21 10:26:10 -08:00
Chris Wilson	fae74234d9	iris: Tag each submitted batch with a syncobj (adjusted by Ken to make the signalling sync object immediately on batch reset, rather than batch finish time. this will work better with deferred flushes...)	2019-02-21 10:26:10 -08:00
Kenneth Graunke	3e332af611	iris: Drop vestiges of throttling code	2019-02-21 10:26:10 -08:00
Chris Wilson	54347c078e	iris: Merge two walks of the exec_bos list	2019-02-21 10:26:10 -08:00
Kenneth Graunke	3455f57575	iris: replace vestiges of fence fds with newer exec_fence API patch by me and Chris Wilson	2019-02-21 10:26:10 -08:00
Kenneth Graunke	11da219be9	iris: Avoid synchronizing due to the workaround BO	2019-02-21 10:26:10 -08:00
Kenneth Graunke	30d7bebc8a	iris: Avoid cross-batch synchronization on read/reads This avoids flushing batches just because e.g. both are reading the same dynamic state streaming buffer, or shader assembly buffer.	2019-02-21 10:26:10 -08:00
Kenneth Graunke	b21e916a62	iris: Combine iris_use_pinned_bo and add_exec_bo	2019-02-21 10:26:10 -08:00
Kenneth Graunke	fb4c898842	iris: Use iris_use_pinned_bo rather than add_exec_bo directly less special this way	2019-02-21 10:26:10 -08:00
Chris Wilson	e5528151a7	iris: Fix assigning the output handle for exporting for KMS Fixes gbm_bo_get_handle() used for KMS in glamor.	2019-02-21 10:26:10 -08:00
Chris Wilson	01e729f883	iris: Tidy exporting the flink handle	2019-02-21 10:26:10 -08:00
Kenneth Graunke	1b69b14c2a	iris: Fix SLM Now that Jason has set up the L3 we can do this. Also, my assert was useless because we hadn't set up the field in the first place. Oops.	2019-02-21 10:26:10 -08:00
Jason Ekstrand	f9c5e277ac	iris: Don't set constant read lengths at upload time They're set in derived_data as part of store_cs_state	2019-02-21 10:26:10 -08:00
Jason Ekstrand	a90a0e22cb	iris: Configure the L3$ on the compute context	2019-02-21 10:26:10 -08:00
Kenneth Graunke	25a41b1aef	iris: properly pin stencil buffers	2019-02-21 10:26:10 -08:00
Kenneth Graunke	8545e39808	iris: Fix TCS/TES slot unification TCS outputs, TES inputs...not TCS inputs Fixes some barrier tests	2019-02-21 10:26:10 -08:00
Kenneth Graunke	da5590496e	iris: more todo notes	2019-02-21 10:26:10 -08:00
Kenneth Graunke	9878ea842f	iris: scissored and mirrored blits	2019-02-21 10:26:10 -08:00
Kenneth Graunke	25f194d5ac	iris: more TODO	2019-02-21 10:26:10 -08:00
Kenneth Graunke	5207a5f5d5	iris: Fix independent alpha blending. independent_blend_enable means per-RT blending, not RGB != A	2019-02-21 10:26:10 -08:00
Kenneth Graunke	c06f6d12a5	iris: "Fix" transfer maps of buffers x should be in bytes, not cpp units This generally worked out because PIPE_BUFFER is supposedly required to be R8_UINT or R8_UNORM. I hear some state trackers pass PIPE_FORMAT_NONE instead, however, which would make this break. Just do the right thing directly, to be defensive and clear.	2019-02-21 10:26:10 -08:00
Kenneth Graunke	b2c04aa3a0	iris: Fix SourceAlphaBlendFactor	2019-02-21 10:26:10 -08:00
Kenneth Graunke	89833eddab	iris: leave another TODO	2019-02-21 10:26:10 -08:00
Kenneth Graunke	983e2ae7d2	iris: only clip lower if there's something to clip against	2019-02-21 10:26:10 -08:00
Kenneth Graunke	e11c497fc6	iris: fix sysval only binding tables	2019-02-21 10:26:10 -08:00
Kenneth Graunke	2ddbc1025e	iris: don't forget to upload CS consts	2019-02-21 10:26:10 -08:00
Kenneth Graunke	f1f84a1ae7	iris: drop param stuffs	2019-02-21 10:26:10 -08:00
Kenneth Graunke	1b5d35319e	iris: don't trip on param asserts I'd rather not rewrite i965's compute system value handling right now :(	2019-02-21 10:26:10 -08:00
Kenneth Graunke	f4829a2fe1	iris: don't support pull constants. I don't think it matters, we won't have any params anyway, but let's be sure it doesn't try	2019-02-21 10:26:10 -08:00
Kenneth Graunke	911f9e8f3f	iris: regather info so we get CLIP_DIST slots, not CLIP_VERTEX	2019-02-21 10:26:09 -08:00
Kenneth Graunke	6d19fe376d	iris: enable push constants if we have sysvals but no uniforms	2019-02-21 10:26:09 -08:00
Kenneth Graunke	1ef68d77c0	iris: drop iris_setup_push_uniform_range it doesn't do anything, we have no params. I guess I thought there would be some, but they all get dead code eliminated even if we try to make them exist in the first place.	2019-02-21 10:26:09 -08:00
Kenneth Graunke	7eeb124c02	iris: fix more uniform setup	2019-02-21 10:26:09 -08:00
Kenneth Graunke	50743eb748	iris: fix num clip plane consts	2019-02-21 10:26:09 -08:00
Kenneth Graunke	a98634a28f	iris: actually upload clip planes.	2019-02-21 10:26:09 -08:00
Kenneth Graunke	c60ce3f4fd	iris: bypass params and do it ourselves the backend keeps dead code eliminating them all, so we can't do that, plus we don't want to because params[] is lame	2019-02-21 10:26:09 -08:00
Kenneth Graunke	78fc760bab	iris: dodge backend UCP lowering	2019-02-21 10:26:09 -08:00
Kenneth Graunke	deb6d588a6	iris: fix system value remapping	2019-02-21 10:26:09 -08:00
Kenneth Graunke	2b0a2915dc	iris: hook up key stuff for clip plane lowering	2019-02-21 10:26:09 -08:00
Kenneth Graunke	2876dd1a37	iris: lower user clip planes	2019-02-21 10:26:09 -08:00
Kenneth Graunke	80c856cbee	iris: only bother with params if there are any...	2019-02-21 10:26:09 -08:00
Kenneth Graunke	2186d83185	iris: fill out params array with built-ins, like clip planes	2019-02-21 10:26:09 -08:00
Kenneth Graunke	d3e8ff143d	iris: add param domain defines	2019-02-21 10:26:09 -08:00
Kenneth Graunke	ecb28b2802	iris: drop unnecessary param[] setup from iris_setup_uniforms the backend just considers these dead anyway	2019-02-21 10:26:09 -08:00
Kenneth Graunke	ed08f022f0	iris: Defer cbuf0 upload to draw time	2019-02-21 10:26:09 -08:00
Kenneth Graunke	e98cf9c24b	iris: Clone the NIR The backend compiler used to do this for us, but after a rebase, it's now the driver's responsibility. This lets us alter it for say, clip vertex lowering, at the global level rather than the per-variant level.	2019-02-21 10:26:09 -08:00
Kenneth Graunke	587e438128	iris: Print the batch name when decoding	2019-02-21 10:26:09 -08:00
Kenneth Graunke	2727a942a4	iris: partial set_query_active_state used to avoid OQ during clears for example fixes occlusion_query_meta_no_fragments	2019-02-21 10:26:09 -08:00
Kenneth Graunke	64af1d9248	iris: Fix multiple RTs with non-independent blending rt[i] isn't filled out in this case, so we have to use rt[0]	2019-02-21 10:26:09 -08:00
Kenneth Graunke	58507c02ce	iris: Fix TextureBarrier I don't know how I came up with the old one, this is now what i965 does Also we now do compute batches too	2019-02-21 10:26:09 -08:00
Kenneth Graunke	e5d84bbd36	iris: Fix MSAA smooth points Fixes bin/ext_framebuffer_multisample-point-smooth 2 -auto -fbo	2019-02-21 10:26:09 -08:00
Kenneth Graunke	4d219b0eb3	iris: implement scratch space! we borrow the approach from anv rather than i965, as it works better with pre-baked state that needs to contain scratch BO addresses fixes a bunch of varying packing tests	2019-02-21 10:26:09 -08:00
Kenneth Graunke	9511b89ef9	iris: tidy more warnings	2019-02-21 10:26:09 -08:00
Kenneth Graunke	846316b258	iris: Enable msaa_map transfer helpers This does the downsampling for us. It'll use BLORP anyway because it uses blit(), and that uses BLORP.	2019-02-21 10:26:09 -08:00
Kenneth Graunke	9ec927497e	iris: Actually create/destroy HW contexts The intention is that render and compute use their own contexts, and each is PIPELINE_SELECT'd to the right pipeline. But we hadn't actually made them, so we got the fd-default context. Thanks to Chris Wilson for catching this!	2019-02-21 10:26:09 -08:00
Kenneth Graunke	cb5f47f585	iris: Don't leak the compute batch	2019-02-21 10:26:09 -08:00
Kenneth Graunke	fbe5d75f11	iris: cross batch flushing	2019-02-21 10:26:09 -08:00
Kenneth Graunke	c3cc525c7a	iris: Cross-link iris_batches so they can potentially flush each other This makes e.g. the render batch aware of the compute batch, so it can ask questions like "is this BO referenced by some other batch?" and do something about that.	2019-02-21 10:26:09 -08:00
Dave Airlie	ed016b2a0b	iris: fix crash in sparse vertex array this fixes crash in array-stride piglit.	2019-02-21 10:26:09 -08:00
Kenneth Graunke	bcac11c8f1	iris: Use at least 1x1 size for null FB surface state. Otherwise we get 0 - 1 = 0xffffffff and fail to pack SURFACE_STATE. Fixes some object namespace pollution gltexsubimage2d tests	2019-02-21 10:26:09 -08:00
Kenneth Graunke	9c8fdf8133	iris: Drop B5G5R5X1 support This is oddly renderable but not supported for sampling, which is the opposite of other X formats. Just skip it and fall back to BGRA.	2019-02-21 10:26:09 -08:00
Kenneth Graunke	4b31f506f8	iris: Enable A8/A16_UNORM in an inefficient manner These are currently just use the 'A' hardware formats, rather than the faster 'R' formats. glBitmap handling needs these, it seems. :(	2019-02-21 10:26:09 -08:00
Kenneth Graunke	80497af192	iris: Enable ARB_shader_stencil_export	2019-02-21 10:26:09 -08:00
Kenneth Graunke	3e6aaa1ba5	iris: Disable a PIPE_CONTROL workaround on Icelake	2019-02-21 10:26:09 -08:00
Kenneth Graunke	84a419432d	iris: Flag constants dirty on program changes 3DSTATE_CONSTANT_* looks at prog_data->ubo_ranges. We were getting saved by iris_set_constant_buffers() usually happening when changing programs (as they usually change uniforms too), but with the clear shader that doesn't use uniforms, we weren't getting one and were leaving push constants enabled, screwing things up. Also clean up a bit of a mess left by the hacks - we were missing bindings in the VS/FS/CS case, among other issues...	2019-02-21 10:26:09 -08:00
Kenneth Graunke	317ba8796f	iris: allow binding a null vertex buffer PBO upload apparently does this...	2019-02-21 10:26:09 -08:00
Kenneth Graunke	aef1ba5ce4	iris: fix overhead regression from "don't stomp each other's dirty bits" The change from dirty = 0ull to dirty &= ~NOT_MY_BITS broke the "nothing to do? skip it!" optimization. thanks to Chris for noticing this!	2019-02-21 10:26:09 -08:00
Kenneth Graunke	525d89cafc	iris: delete dead code	2019-02-21 10:26:09 -08:00
Kenneth Graunke	8a98e90415	iris: Fix refcounting of grid surface	2019-02-21 10:26:09 -08:00
Jason Ekstrand	8e8868d5ad	iris/compute: Zero out the last grid size on indirect dispatches	2019-02-21 10:26:09 -08:00
Jason Ekstrand	c16e711ff2	iris/compute: Don't increment the grid size offset It may be in the dynamic state buffer but the fact that we have a resource takes care of that. We don't need to add in the address of the dynamic state buffer again.	2019-02-21 10:26:09 -08:00
Kenneth Graunke	a3e813c5af	iris: SO_DECL_LIST fix	2019-02-21 10:26:09 -08:00
Kenneth Graunke	927c4a21bd	iris: Fall back to 1x1x1 null surface if no framebuffer supplied If the state tracker never gave us the framebuffer dimensions via a set_framebuffer_state() call, just fall back to the unbound texture null surface, which is 1x1x1. Otherwise we'd use a NULL resource (no pun intended).	2019-02-21 10:26:09 -08:00
Kenneth Graunke	5d1a9db720	iris: Fix off by one in scissoring, empty scissors, default scissors	2019-02-21 10:26:09 -08:00
Kenneth Graunke	938d63b2e8	iris: Move snapshots_landed to the front. Transform feedback overflow queries need to write additional data, and it would be nice to have this field remain at a consistent offset.	2019-02-21 10:26:09 -08:00
Kenneth Graunke	ba2a4207f9	iris: Clamp UBO and SSBO access to the actual BO size, for safety	2019-02-21 10:26:09 -08:00
Kenneth Graunke	a9b32f2bbf	iris: Fix texture buffer / image buffer sizes. Also fix image buffers with offsets.	2019-02-21 10:26:09 -08:00
Kenneth Graunke	d1f8947792	iris: fix SF_CLIP_VIEWPORT array indexing with multiple VPs fixes bunches of viewport stuffs	2019-02-21 10:26:09 -08:00
Kenneth Graunke	5bd49a47b6	iris: flag CC_VIEWPORT when changing num viewports this also has a loop over num_viewports	2019-02-21 10:26:09 -08:00
Kenneth Graunke	d98967d936	iris: fix UBOs with bindings that have an offset	2019-02-21 10:26:09 -08:00
Kenneth Graunke	3f70956a4e	iris: try and avoid pointless compute submissions if apps don't use compute shaders, we don't even want to kick off the compute initialization batch	2019-02-21 10:26:09 -08:00
Kenneth Graunke	97125e9bb3	iris: fix SBA flushing by refactoring code	2019-02-21 10:26:09 -08:00
Kenneth Graunke	8fa99481e7	iris: do PIPELINE_SELECT for render engine, add flushes, GLK hacks	2019-02-21 10:26:09 -08:00
Kenneth Graunke	b2d223b6bf	iris: hack to avoid memorybarriers out the wazoo we don't want to emit piles of pipe controls to a compute batch if it isn't necessary... prevents double-batch-wraps in cs-op-selection-bool-bvec4-bvec4 (but it's still kinda a big ol' hack...)	2019-02-21 10:26:09 -08:00
Kenneth Graunke	b3a40c27a2	iris: don't let render/compute contexts stomp each other's dirty bits only clear what you process	2019-02-21 10:26:09 -08:00
Kenneth Graunke	f8796079da	iris: better dirty checking	2019-02-21 10:26:09 -08:00
Kenneth Graunke	06a993dac2	iris: rewrite grid surface handling now we only upload a new grid when it's actually changed, which saves us from having to emit a new binding table every time it changes. this also moves a bunch of non-gen-specific stuff out of iris_state.c	2019-02-21 10:26:09 -08:00
Kenneth Graunke	155e1a63d5	iris: XXX for compute state tracking :/ Maybe we should just move dirty to batch, it would help with the reset stuff too	2019-02-21 10:26:09 -08:00
Kenneth Graunke	643030f4fb	iris: fix whitespace	2019-02-21 10:26:09 -08:00
Kenneth Graunke	b0dc11993e	iris: bail if SLM is needed	2019-02-21 10:26:09 -08:00
Kenneth Graunke	973b937cac	iris: leave XXX about unnecessary binding table uploads	2019-02-21 10:26:09 -08:00
Kenneth Graunke	7fb8c20d7b	iris: drop unnecessary #ifdefs	2019-02-21 10:26:09 -08:00
Kenneth Graunke	549db5b90e	iris: drop XXX that Jordan handled	2019-02-21 10:26:09 -08:00
Jordan Justen	942bdb2906	iris/compute: Support indirect compute dispatch Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>	2019-02-21 10:26:09 -08:00
Jordan Justen	b35c8f2182	iris/compute: Push subgroup-id Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>	2019-02-21 10:26:09 -08:00
Jordan Justen	229450a2a6	iris/compute: Flush compute batch on memory-barriers Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>	2019-02-21 10:26:09 -08:00
Jordan Justen	fb4637797e	iris/compute: Provide binding table entry for gl_NumWorkGroups Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>	2019-02-21 10:26:09 -08:00
Jordan Justen	fcd0364857	iris/compute: Wait on compute batch when mapping Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>	2019-02-21 10:26:09 -08:00
Jordan Justen	ea416d0b5d	iris/program: Don't try to push ubo ranges for compute We only can push constants for compute shaders from one range. Gallium glsl-to-nir (src/mesa/state_tracker/st_glsl_to_nir.cpp) lowers all uniform accesses to a ubo. Unfortunately we also load the subgroup-id as a uniform in the compiler. Since we use the 1 push range for this subgroup-id, we then lose the ability to actually push the ubo with all the normal user uniform values. In other words, there is lots of room for performance improvement, but at least retrieving the uniforms as pull-constants is functional for now. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>	2019-02-21 10:26:09 -08:00
Jordan Justen	c7cfa4000f	iris/compute: Get group counts from grid->grid Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>	2019-02-21 10:26:09 -08:00
Jordan Justen	fd9ccd8b5d	iris/compute: Flush compute batches Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>	2019-02-21 10:26:09 -08:00
Jordan Justen	9b5cda95aa	iris/compute: Add MEDIA_STATE_FLUSH following WALKER Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>	2019-02-21 10:26:09 -08:00
Jordan Justen	6ebd04ac8f	iris: Add iris_restore_compute_saved_bos Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>	2019-02-21 10:26:09 -08:00
Jordan Justen	622aaa290f	iris: Add IRIS_DIRTY_CONSTANTS_CS Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>	2019-02-21 10:26:09 -08:00
Jordan Justen	25f1625edf	iris/compute: Set mask bits on PIPELINE_SELECT Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>	2019-02-21 10:26:09 -08:00
Kenneth Graunke	9fc672428d	iris: little bits of compute basics	2019-02-21 10:26:09 -08:00
Kenneth Graunke	860ce6af3f	iris: drop XXX's about swizzling pretty sure this is unnecessary on modern HW	2019-02-21 10:26:09 -08:00
Kenneth Graunke	12de56f53d	iris: drop dead format //'s these just aren't supported	2019-02-21 10:26:09 -08:00
Kenneth Graunke	f6c68066a6	iris: yes	2019-02-21 10:26:09 -08:00
Kenneth Graunke	752abeb690	iris: initial compute caps RET macro borrowed from freedreno	2019-02-21 10:26:09 -08:00
Kenneth Graunke	4da28c2c22	iris: Enable fb fetch needed for ES 3.2	2019-02-21 10:26:09 -08:00
Kenneth Graunke	be905bd461	iris: advertise GL_ARB_shader_texture_image_samples	2019-02-21 10:26:09 -08:00
Jordan Justen	6441e906e8	iris: Set num_uniforms in bytes Ref: brw_nir_lower_uniforms, type_size_scalar_bytes Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>	2019-02-21 10:26:09 -08:00
Kenneth Graunke	c29fd34259	iris: move images next to textures in binding table	2019-02-21 10:26:09 -08:00
Kenneth Graunke	0d9c5b4e7e	iris: null for non-existent cbufs prevents BTs from being shifted down incorrectly	2019-02-21 10:26:09 -08:00
Kenneth Graunke	98e8f80e7d	iris: actually set image access	2019-02-21 10:26:09 -08:00
Jason Ekstrand	d9aee25a46	iris: Don't lower image formats for write-only images	2019-02-21 10:26:09 -08:00
Kenneth Graunke	a06f0fe517	iris: set image access correctly	2019-02-21 10:26:09 -08:00
Kenneth Graunke	5d1dadfc38	iris: bother with BTIs	2019-02-21 10:26:09 -08:00
Kenneth Graunke	f5b887da6c	iris: implement set_shader_images hook	2019-02-21 10:26:09 -08:00
Kenneth Graunke	26a54ae4b2	iris: lower storage image derefs	2019-02-21 10:26:09 -08:00
Kenneth Graunke	e97a24da89	iris: set the binding table size we weren't doing mark_surface_used on images (i965 does it while uploading the unnecessary image uniforms), so our binding tables were too small...	2019-02-21 10:26:09 -08:00
Kenneth Graunke	28b41992c8	iris: X32_S8X24 :/ This can happen when faking Z32_S8X24 and setting StencilSampling = true I guess we'll just turn it into S8_UINT... Fixes KHR-GL45.texture_swizzle.functional	2019-02-21 10:26:09 -08:00
Kenneth Graunke	6e7957a22d	iris: enable I/L formats	2019-02-21 10:26:09 -08:00
Kenneth Graunke	bfbebbaa36	iris: Use R/RG instead of I/L/A when sampling	2019-02-21 10:26:09 -08:00
Kenneth Graunke	94569a6458	iris: rework format translation apis	2019-02-21 10:26:09 -08:00
Kenneth Graunke	b9eeed3e8f	iris: Allow PIPE_CONTROL with Stall at Scoreboard and RT flush It's nonsensical, but not illegal, and mandatory on Icelake	2019-02-21 10:26:09 -08:00
Kenneth Graunke	65d1cda995	iris: add gen11 to genX_call	2019-02-21 10:26:09 -08:00
Kenneth Graunke	0fdcb20803	iris: inline stage_from_pipe to avoid unused warnings	2019-02-21 10:26:09 -08:00
Kenneth Graunke	6fbb6ba290	iris: pipe to scs -> iris_pipe.h	2019-02-21 10:26:09 -08:00
Kenneth Graunke	87351b8dfe	iris: force persample interp cap	2019-02-21 10:26:09 -08:00
Kenneth Graunke	90b9efc1f9	iris: stencil texturing	2019-02-21 10:26:09 -08:00
Kenneth Graunke	9b229d266d	iris: fix Z32_S8 depth sampling We were accidentally using the ISL_FORMAT_R32_FLOAT_X8X24_TYPELESS format, which is NOT what we use. We just store R32_FLOAT depth. fixes Piglit's texwrap GL_ARB_depth_buffer_float	2019-02-21 10:26:09 -08:00
Kenneth Graunke	822f91508e	iris: don't mark contains_draw = false when chaining batches chaining to a new batch reuses create_batch(), but we don't need to do the work of pinning BOs we inherit from a previous batch...when that is actually part of the same execbuf invocation. instead, just flag it when setting primary_batch_size = 0, in iris_batch_reset	2019-02-21 10:26:09 -08:00
Kenneth Graunke	294ce58a30	iris: vma_free bo->size, not bo_size this is more obviously correct. I think the two end up being the same in practice, since this is in the alloc_from_cache case, and presumably bo from the bucket has bo->size == bucket->size, and bo_size also is bucket->size... still. better to do the obvious thing. brw_bufmgr already does it this way.	2019-02-21 10:26:09 -08:00
Kenneth Graunke	2f24000662	iris: drop a bunch of pipe_sampler_state stuff we don't need	2019-02-21 10:26:09 -08:00
Kenneth Graunke	c6016d3761	iris: just mark snapshots_landed from the CPU otherwise, get results may check q->map->snapshots_landed...before our commands to initialize it to false have actually executed...so it'd get some random garbage from the BO...	2019-02-21 10:26:09 -08:00
Kenneth Graunke	3c0ef22edb	iris: Enable ARB_shader_vote The easiest get out the vote campaign ever	2019-02-21 10:26:08 -08:00
Kenneth Graunke	0395eba20f	iris: magic number 36 -> #define	2019-02-21 10:26:08 -08:00
Kenneth Graunke	57f8a623c5	iris: better query file comment	2019-02-21 10:26:08 -08:00
Kenneth Graunke	d3a5d87219	iris: early return properly	2019-02-21 10:26:08 -08:00
Kenneth Graunke	07ff8c752f	iris: 36-bit overflow fixes	2019-02-21 10:26:08 -08:00
Kenneth Graunke	dff174c103	iris: Need to \| 1 when asking for timestamps	2019-02-21 10:26:08 -08:00
Kenneth Graunke	1d91eba7dc	iris: glGet timestamps, more correct timestamps	2019-02-21 10:26:08 -08:00
Kenneth Graunke	36fbcfb06c	iris: ...and SO prims emitted queries looks like we have queries some fails still due to races between snapshots_written and start/end not being garbage...not sure what that's about	2019-02-21 10:26:08 -08:00
Kenneth Graunke	ec82be57e8	iris: timestamps	2019-02-21 10:26:08 -08:00
Kenneth Graunke	23572cdd07	iris: drop explicit pinning writes will already rw_bo or ro_bo that	2019-02-21 10:26:08 -08:00
Kenneth Graunke	d8875fe406	iris: primitives generated query support	2019-02-21 10:26:08 -08:00
Kenneth Graunke	ffae6e3105	iris: pipeline stats	2019-02-21 10:26:08 -08:00
Kenneth Graunke	7840d0e091	iris: play chicken with timer queries for now they have been crashy in the past and I don't want to risk tanking my laptop right before my XDC talk	2019-02-21 10:26:08 -08:00
Kenneth Graunke	0b095c665d	iris: gpr0 to bool I think OQ is basically working now.	2019-02-21 10:26:08 -08:00
Kenneth Graunke	f5a8908bd1	iris: fix random failures via CS stall...but why?	2019-02-21 10:26:08 -08:00
Kenneth Graunke	ad14795805	iris: flush batch when asking for result via QBO	2019-02-21 10:26:08 -08:00
Kenneth Graunke	cf261caad9	iris: results write	2019-02-21 10:26:08 -08:00
Kenneth Graunke	d4e4517569	iris: gen10+ workarounds and break fix	2019-02-21 10:26:08 -08:00
Kenneth Graunke	dca5632de1	iris: initial query code	2019-02-21 10:26:08 -08:00
Kenneth Graunke	dd478913d5	iris: LRM/SRM/SDI hooks	2019-02-21 10:26:08 -08:00
Kenneth Graunke	af9fe0d472	iris: rw_bo for pipe controls this is used for WRITE IMMEDIATE... but maybe we don't want to for the workaround BO?	2019-02-21 10:26:08 -08:00
Kenneth Graunke	30c370ed4b	iris: use 0 for TCS passthrough program string ID the passthrough shader doesn't need a real program string ID - that's basically used for ARB programs indicating total program source code changes, or other pre-baked uniform changes, etc...none of which a passthrough shader has...so we don't need a unique identifier to distinguish them. We want to use a consistent value so we find existing passthrough shaders in the cache.	2019-02-21 10:26:08 -08:00
Caio Marcelo de Oliveira Filho	54e23442e2	iris: Add support for TCS passthrough If no TCS is provided, create a "passthrough" TCS that will take the default values set in the API as constants and pass to the TES, along with any other inputs it expects. The code to create the NIR shader is the same as in i965. Tested with ./piglit run -t 'tess' quick_shader r and fixed a dozen crashes from that list.	2019-02-21 10:26:08 -08:00
Kenneth Graunke	5395658c61	iris: inherit the index buffer properly	2019-02-21 10:26:08 -08:00
Kenneth Graunke	a858b69880	iris: delete bogus comment Caio asked what was wrong. There is nothing wrong. :)	2019-02-21 10:26:08 -08:00
Kenneth Graunke	f2f506fa43	iris: properly re-pin stencil buffers	2019-02-21 10:26:08 -08:00
Kenneth Graunke	aaced066e8	iris: fix context restore of 3DSTATE_CONSTANT ranges if clean we want to DO the pinning...not SKIP the pinning. thanks to Jordan Justen for catching this!	2019-02-21 10:26:08 -08:00
Kenneth Graunke	58a6c99ebe	iris: silence const warning not sure why this is labeled const, I'm pretty sure we are taking the reference and owning this, so there's no particular reason we can't change it. it certainly seems to be working for non-compute. and, freedreno's ir3_shader.c seems to do this as well. still...gross :/	2019-02-21 10:26:08 -08:00
Kenneth Graunke	897f8d9232	iris: refactor program CSO stuff	2019-02-21 10:26:08 -08:00
Caio Marcelo de Oliveira Filho	fb4a3e2736	iris: Fix uses of gl_TessLevel* The backend compiler expects the gl_TessLevel* variables to be mapped as inputs instead of system values. Use the new PIPE_CAP to get this behavior from GLSL compiler. Tested with: tests/spec/arb_tessellation_shader/execution/vs-tcs-tes-tessinner-tessouter-inputs-quads.shader_test	2019-02-21 10:26:08 -08:00
Kenneth Graunke	2b956a093a	iris: totally untested icelake support	2019-02-21 10:26:08 -08:00
Kenneth Graunke	921790b080	iris: initialize "don't suck" bits, as Ben likes to call them	2019-02-21 10:26:08 -08:00
Kenneth Graunke	73a4cef220	iris: refactor LRIs in context setup we're going to have more of them, so reduce the boilerplate	2019-02-21 10:26:08 -08:00
Kenneth Graunke	2d1db44e8e	iris: enable ARB_enhanced_layouts	2019-02-21 10:26:08 -08:00
Kenneth Graunke	c0422d623c	iris: re-pin binding table contents if we didn't re-emit them fixes glsl-vs-loop and other regressions from multibinder.	2019-02-21 10:26:08 -08:00
Kenneth Graunke	2963276a58	iris: move binder pinning outside the dirty == 0 check This might be a new batch with back to back non-dirty calls, if so we need to inherit the old binder...	2019-02-21 10:26:08 -08:00
Chris Wilson	1a61a211f0	iris: fix memzone_for_address since multibinder changes	2019-02-21 10:26:08 -08:00
Kenneth Graunke	f6924e2379	iris: update comments for multibinder	2019-02-21 10:26:08 -08:00
Kenneth Graunke	5cb0527c4f	iris: fix SO offset writes for multiple streams	2019-02-21 10:26:08 -08:00
Kenneth Graunke	eff081cdd9	iris: Support multiple binder BOs, update Surface State Base Address	2019-02-21 10:26:08 -08:00
Kenneth Graunke	148e315d96	iris: fix null FB and unbound tex surface state addresses	2019-02-21 10:26:08 -08:00
Kenneth Graunke	f838400a59	iris: set EXEC_OBJECT_CAPTURE on all driver internal buffers	2019-02-21 10:26:08 -08:00
Kenneth Graunke	938afd484a	iris: fix constant buffer 0 to be absolute thanks to Jason for catching this. Fixes some va64 tests. Surprisingly not much else, as apparently getting to UBO range 4 is uncommon!	2019-02-21 10:26:08 -08:00
Kenneth Graunke	5a2257bb2f	iris: don't unconditionally emit 3DSTATE_VF / 3DSTATE_VF_TOPOLOGY this was just laziness on my part	2019-02-21 10:26:08 -08:00
Kenneth Graunke	4c27cb031c	iris: skip over whole function if dirty == 0 kinda pointless in non-pathological cases, but does boost our score in the drawarrays case by 50%...	2019-02-21 10:26:08 -08:00
Kenneth Graunke	888efcd192	iris: Allow inlining of require/get_command_space eliminates so many callqs for ptr++	2019-02-21 10:26:08 -08:00
Kenneth Graunke	2ebce6f8c8	iris: use Eric's new caps helper this does change a couple caps...PRIMITIVE_RESTART_FOR_PATCHES...	2019-02-21 10:26:08 -08:00
Kenneth Graunke	3e7a41f228	iris: new caps	2019-02-21 10:26:08 -08:00
Kenneth Graunke	52eb8d5593	iris: fix blend state memcpy thanks to Jason for noticing grumpy valgrind	2019-02-21 10:26:08 -08:00
Kenneth Graunke	9ce92fa036	iris: Skip primitive ID overrides if the shader wrote a custom value Fixes glsl-1.50/execution/geometry/primitive-id-out	2019-02-21 10:26:08 -08:00
Kenneth Graunke	47d3019c4a	iris: fix crash when binding optional shader for the first time	2019-02-21 10:26:08 -08:00
Kenneth Graunke	6331b754df	iris: handle level/layer in direct maps needed now that we do 1D linear	2019-02-21 10:26:08 -08:00
Kenneth Graunke	9f7654139b	iris: use linear for 1D textures This gets us the gen9 compact linear storage	2019-02-21 10:26:08 -08:00
Kenneth Graunke	b2a5e1ebb3	iris: big old hack for tex-miplevel-selection copied from ilo. I don't understand this at all..	2019-02-21 10:26:08 -08:00
Kenneth Graunke	e4d22b16c8	iris: fix sampler state setting	2019-02-21 10:26:08 -08:00
Kenneth Graunke	b3bb33c4c1	iris: try to hack around binder issue	2019-02-21 10:26:08 -08:00
Kenneth Graunke	d2516358f9	iris: fix line-aa-width we should probably move the roundf to st_atom_raster	2019-02-21 10:26:08 -08:00
Kenneth Graunke	701b47a197	iris: implement get_sample_position Fixes arb_sample_shading/builtin-gl-sample-position	2019-02-21 10:26:08 -08:00
Kenneth Graunke	7ed4b80233	iris: z_res -> s_res fixes crashes introduced a few commits ago	2019-02-21 10:26:08 -08:00
Kenneth Graunke	d1cb4b330a	iris: reenable R32G32B32 texture buffers This dropped us from GL 4.2 to GL 3.3 by mistake. Thanks to Dave for catching this!	2019-02-21 10:26:08 -08:00
Chris Wilson	367f6bbd01	iris: Record reusability of bo on construction We know that if the bufmgr->reuse is set to false or if the bo is too large for a bucket, the same will be true when we come to free the bo.	2019-02-21 10:26:08 -08:00
Kenneth Graunke	abe7dbfa4a	iris: Reduce binder alignment from 64 to 32 3DSTATE_BINDING_TABLE_POINTER_XS's alignment requirement is only 32B. Makes us waste less precious binder space.	2019-02-21 10:26:08 -08:00
Kenneth Graunke	04e8c5bb43	iris: precompute hashes for cache tracking saves a touch of cpu overhead in the new resolve tracking	2019-02-21 10:26:08 -08:00
Chris Wilson	d209cc5170	iris: AMD_pinned_memory (rebased by Ken, mainly set res->internal_format)	2019-02-21 10:26:08 -08:00
Kenneth Graunke	93c1921ce2	iris: proper cache tracking this is copied from the i965 aux resolve stuff...minus the aux resolves	2019-02-21 10:26:08 -08:00
Kenneth Graunke	5e30b1083b	iris: Move cache tracking to iris_resolve.c	2019-02-21 10:26:08 -08:00
Kenneth Graunke	42dccb1233	iris: use consistent copyright formatting some of them had typos, didn't say 'authors or copyright holders', or other mistakes. This is now https://opensource.org/licenses/MIT text, formatted consistently.	2019-02-21 10:26:08 -08:00
Kenneth Graunke	1d33982e9b	iris: track depth/stencil writes enabled	2019-02-21 10:26:08 -08:00
Kenneth Graunke	3fecb1c44d	iris: Move iris_sampler_view declaration to iris_resource.h We'll need this for resolve tracking. There's also no genxml stuff here	2019-02-21 10:26:08 -08:00
Kenneth Graunke	b75b52530a	iris: Move things to iris_shader_state We didn't originally have this struct, so we had lots of ad-hoc arrays. Now that we have it, it makes sense to group things there.	2019-02-21 10:26:08 -08:00
Kenneth Graunke	410a555bfb	iris: move iris_shader_state from ice->shaders.state to ice->state.shaders it's more state related...	2019-02-21 10:26:08 -08:00
Kenneth Graunke	33701d5341	iris: Drop bogus sampler state saving We do this in an earlier loop. This was just reading things out of the array, and saving them back over the same array...but in the wrong slots	2019-02-21 10:26:08 -08:00
Kenneth Graunke	aba2cee711	iris: rename pipe to base	2019-02-21 10:26:08 -08:00
Kenneth Graunke	7705f62cb6	iris: don't emit SBE all the time	2019-02-21 10:26:08 -08:00
Kenneth Graunke	630d602900	iris: port non-bucket alignment bugfix Sergii's `24839663a4`	2019-02-21 10:26:08 -08:00
Kenneth Graunke	ad6ba5a712	iris: drop pwrite nobody uses it	2019-02-21 10:26:08 -08:00
Kenneth Graunke	aad70ad8a1	iris: drop dead assignments Eric's commit `9a6a631762`	2019-02-21 10:26:08 -08:00
Kenneth Graunke	2bd7d6fa71	iris: last VUE map NOS, handle > 16 FS inputs not sure if the UNCOMPILED_FS flagging is still needed, should reevaluate those hacks at some point	2019-02-21 10:26:08 -08:00
Kenneth Graunke	ee8cb7e0ee	iris: implement ARB_clear_texture	2019-02-21 10:26:08 -08:00
Kenneth Graunke	84b30a2900	iris: call maybe_flush for each blorp operation otherwise with high layer counts we may exceed two batches worth of commands... (!)	2019-02-21 10:26:08 -08:00
Kenneth Graunke	0e059e4829	iris: assert depth is 1 in resource_copy_region given the dstz parameter I don't think it does multiple slices..	2019-02-21 10:26:08 -08:00
Kenneth Graunke	03933a2d1b	iris: blorp blit multiple slices fixes getteximage-depth	2019-02-21 10:26:08 -08:00
Kenneth Graunke	84832ab7d4	iris: Fix tiled memcpy for cubes...and for array slices tiled_memcpy_map was not offsetting map->ptr based on the slice, while unmap was. also, we were doing offsetting wrong for cubes.	2019-02-21 10:26:08 -08:00
Kenneth Graunke	bce7398646	iris: disallow RGB32 formats too	2019-02-21 10:26:08 -08:00
Kenneth Graunke	ea19d359cc	iris: Convert RGBX to RGBA for rendering. Fixes a bunch of RGB bugs.	2019-02-21 10:26:08 -08:00
Kenneth Graunke	906becec70	iris: we can do multisample Z resolves	2019-02-21 10:26:08 -08:00
Kenneth Graunke	1f156f004b	iris: deal with Marek's new MSAA caps storage sample count is equal to sample count for us, for now, so 0 the pipe cap and ignore the new parameter	2019-02-21 10:26:08 -08:00
Kenneth Graunke	532cf23d25	iris: say no to more formats copied from brw_surface_formats.c	2019-02-21 10:26:08 -08:00
Kenneth Graunke	d5146ba670	iris: actually do stencil blits	2019-02-21 10:26:08 -08:00
Kenneth Graunke	ad76389f88	iris: refcounting, who needs it? that's right, we do!	2019-02-21 10:26:08 -08:00
Kenneth Graunke	be60e3247c	iris: drop stencil handling now that u_transfer_helper does it	2019-02-21 10:26:08 -08:00
Kenneth Graunke	b932938d01	iris: use u_transfer_helper for depth stencil packing/unpacking	2019-02-21 10:26:08 -08:00
Kenneth Graunke	853230b5e6	iris: WTF transfers stencil unfortunately is stored in the Weird Tile Format (WTF or Tile-W) which needs special CPU detiling code.	2019-02-21 10:26:08 -08:00
Kenneth Graunke	d93a20e258	iris: allow S8 as a stencil format	2019-02-21 10:26:08 -08:00
Kenneth Graunke	7972599eab	iris: actually emit stencil packets	2019-02-21 10:26:08 -08:00
Kenneth Graunke	753646dd6b	iris: clear stencil	2019-02-21 10:26:08 -08:00
Kenneth Graunke	9ec2d3640e	iris: depth or stencil fixes	2019-02-21 10:26:08 -08:00
Kenneth Graunke	763f9095ea	iris: fill out more caps	2019-02-21 10:26:08 -08:00
Kenneth Graunke	2d578e71d5	iris: get angry about execbuf failures want this to be easy to detect for now	2019-02-21 10:26:08 -08:00
Kenneth Graunke	a378ee3607	iris: simplify batch len qword alignment Split from a patch by Chris Wilson so I can test it independently	2019-02-21 10:26:08 -08:00
Kenneth Graunke	621cb43f41	iris: rename ring to engine makes more sense these days. split from a patch by Chris Wilson	2019-02-21 10:26:08 -08:00
Kenneth Graunke	1a9651f29a	iris: remember to set bo->userptr	2019-02-21 10:26:08 -08:00
Chris Wilson	796ad6fe97	iris: Wrap userptr for creating bo	2019-02-21 10:26:08 -08:00
Kenneth Graunke	5911fb8801	iris: sync bugfixes from brw_bufmgr I wrote softpin support here first, then debugged and landed it in brw; some of those fixes need to get brought back.	2019-02-21 10:26:08 -08:00
Kenneth Graunke	dfe1ee4f6f	iris: comment everything 1. Write the code 2. Add comments 3. PROFIT (or just avoid cost of explaining or relearning things...)	2019-02-21 10:26:08 -08:00
Kenneth Graunke	387a414f2c	iris: add minor comments	2019-02-21 10:26:08 -08:00
Dave Airlie	9d39e69219	iris: fix some hangs around null framebuffers This fixes some cases in fbo-none* and framebuffer_no_attachments. I'm not sure this is correct otherwise, the tests don't all pass yet No idea if this is in any way the correct answer	2019-02-21 10:26:08 -08:00
Chris Wilson	02b82fe80a	iris: Set resource modifier on handle Required for gdm_bo_create_with_modifiers	2019-02-21 10:26:08 -08:00
Kenneth Graunke	682aeff8d0	iris: we don't support textureGatherOffsets, need it lowered	2019-02-21 10:26:08 -08:00
Kenneth Graunke	03dc99475d	iris: cube arrays are cubes too	2019-02-21 10:26:08 -08:00
Kenneth Graunke	80c7096672	iris: fix sample mask 0xffffffff does not mean 1, it means enable as many as there actually are. we don't get set_sample_mask() calls until some masking is actually applied...i.e. it doesn't get updated based on # of samples in the FBO changing.	2019-02-21 10:26:08 -08:00
Kenneth Graunke	e990558152	iris: drop pipe_shader_state looking at the freedreno code, this is totally unnecessary! we can just store the NIR and be happy, and not have any vestiges of TGSI. plus we can reuse this structure for compute shaders, without needing a pipe_compute_state base.	2019-02-21 10:26:08 -08:00
Kenneth Graunke	834b97c34b	iris: fix GS output component limit this is total, so should be 1024, not 128	2019-02-21 10:26:08 -08:00
Kenneth Graunke	c9f9a6f61b	iris: Avoid croaking when trying to create FBO surfaces with bad formats create_surface happens before st_validate_attachment, which actually does the "hey, this is a render target now, is that OK?" check Fixes asserts in ./bin/arb_texture_view-rendering-formats, allowing the rest of the tests to run.	2019-02-21 10:26:08 -08:00
Kenneth Graunke	8da91ebb68	iris: enable texture gather	2019-02-21 10:26:08 -08:00
Kenneth Graunke	f3dd70182d	iris: BIG OL' HACK for UBO updates We need to re-push data when UBO changes. This will need to be replaced with a usage history based flushing system later.	2019-02-21 10:26:08 -08:00
Kenneth Graunke	a7311ef068	iris: update a todo comment	2019-02-21 10:26:07 -08:00
Kenneth Graunke	8e7b0deee2	iris: Don't reserve new binding table section unless things are dirty	2019-02-21 10:26:07 -08:00
Kenneth Graunke	870f2e8434	iris: implement texture/memory barriers	2019-02-21 10:26:07 -08:00
Kenneth Graunke	82ee971497	iris: drop unused bo parameter	2019-02-21 10:26:07 -08:00
Kenneth Graunke	f0159d5ca3	iris: update bindings when changing programs the binding table layout depends on program info. not known to fix anything yet.	2019-02-21 10:26:07 -08:00
Kenneth Graunke	b0e9c5797b	iris: fix for disabling ssbos	2019-02-21 10:26:07 -08:00
Kenneth Graunke	b7b061c4e2	iris: fix SSBO indexing st/nir offsets SSBO indexes by MaxABOs. This is not what we want, as it bloats the binding tables. We'll need to adjust it to use info->num_abos as the offset and buffer base instead. For now, just use the inefficient format to get us rolling. We can add a PIPE_CAP later.	2019-02-21 10:26:07 -08:00
Kenneth Graunke	376c7253f8	iris: enable SSBOs	2019-02-21 10:26:07 -08:00
Kenneth Graunke	75709d982b	iris: fix TBO alignment to match 965	2019-02-21 10:26:07 -08:00
Kenneth Graunke	77b9219818	iris: unbind compiled shaders if none are present avoids the case where you have a stale compiled shader bound, but no uncompiled shader bound, which is not just boats, but an entire marina	2019-02-21 10:26:07 -08:00
Kenneth Graunke	fd5ed7b46b	iris: shorten loop num_ubos doesn't include Tim's magic UBO for regular uniforms, so +1	2019-02-21 10:26:07 -08:00
Kenneth Graunke	bf795b0244	iris: emit binding table for atomic counters and SSBOs	2019-02-21 10:26:07 -08:00
Kenneth Graunke	2d5f545464	iris: implement set_shader_buffers for SSBOs/ABOs. We just stream out SURFACE_STATE for now...since it's a set_* API...and the buffer offset may change...not sure where else we'd do it.	2019-02-21 10:26:07 -08:00
Kenneth Graunke	541cb60e7e	iris: export get_shader_info	2019-02-21 10:26:07 -08:00
Kenneth Graunke	f0558ca22c	iris: fix msaa flipping filters	2019-02-21 10:26:07 -08:00
Kenneth Graunke	2c73d7e3f1	iris: expose more things that we already support	2019-02-21 10:26:07 -08:00
Kenneth Graunke	5b8dd5f303	iris: fix blorp filters we have to switch to blorp enums after the rebase, but also we were probably doing it wrong for MSAA before this.	2019-02-21 10:26:07 -08:00
Kenneth Graunke	3aa1fcc65a	iris: hack around samples confusion	2019-02-21 10:26:07 -08:00
Kenneth Graunke	2c15f38a29	iris: point sprite enables	2019-02-21 10:26:07 -08:00
Kenneth Graunke	c60a4de1f5	iris: reemit blend state for alpha test function changes fixes bin/fbo-alphatest-formats GL_EXT_texture_snorm	2019-02-21 10:26:07 -08:00
Kenneth Graunke	a4036635b1	iris: fix Z24 This was backwards. thanks to Jason Ekstrand for realizing that I was seeing the wrong bits.	2019-02-21 10:26:07 -08:00
Kenneth Graunke	a12a370d7b	iris: fix EmitNoIndirect we were using pipe stages, which are ordered dumbly for historical reasons. we want gl_shader_stage here. this got us the wrong options	2019-02-21 10:26:07 -08:00
Kenneth Graunke	5bd861de8b	iris: assert about passthrough shaders to make this easier to detect otherwise it just silently fails and looks like some obscure problem	2019-02-21 10:26:07 -08:00
Kenneth Graunke	5e19885d5a	iris: fill out MAX_PATCH_VERTICES	2019-02-21 10:26:07 -08:00
Kenneth Graunke	3e9e3121e5	iris: fix SGVS when there are no valid vertex elements tessellation nop.shader_test now passes	2019-02-21 10:26:07 -08:00
Kenneth Graunke	5520a54bc5	iris: vertex ID, instance ID	2019-02-21 10:26:07 -08:00
Kenneth Graunke	a9083bdb71	iris: don't emit SO_BUFFERS and SO_DECL_LIST unless streamout is enabled Otherwise on the first draw, if XFB isn't enabled, we get a pile of MI_NOOPS where SO_BUFFERS should be	2019-02-21 10:26:07 -08:00
Kenneth Graunke	ebb960c6d3	iris: compile a TCS...don't bother with passthrough yet	2019-02-21 10:26:07 -08:00
Kenneth Graunke	9aa8be3d8e	iris: TES program key inputs	2019-02-21 10:26:07 -08:00
Kenneth Graunke	fcee21da6b	iris: fix texture buffer stride	2019-02-21 10:26:07 -08:00
Kenneth Graunke	3c41d4cf3f	iris: fix sampler views of TBOs we can't read levels/layers, they're invalid for PIPE_BUFFER	2019-02-21 10:26:07 -08:00
Kenneth Graunke	6e7e49cc4f	iris: fix crash	2019-02-21 10:26:07 -08:00
Kenneth Graunke	841fc3e3ca	iris: record FS NOS	2019-02-21 10:26:07 -08:00
Kenneth Graunke	d223b316ad	iris: NOS mechanics	2019-02-21 10:26:07 -08:00
Kenneth Graunke	a6d480f892	iris: bind state helper function	2019-02-21 10:26:07 -08:00
Kenneth Graunke	48b826cdaf	iris: s/hwcso/state/g	2019-02-21 10:26:07 -08:00
Kenneth Graunke	aeb6fc8782	iris: bits of multisample program key	2019-02-21 10:26:07 -08:00
Kenneth Graunke	e6b1cc2106	iris: save query type	2019-02-21 10:26:07 -08:00
Kenneth Graunke	44ba48eba7	iris: draw indirect support?	2019-02-21 10:26:07 -08:00
Kenneth Graunke	b030671298	iris: fix CC_VIEWPORT I was confusing depth bounds test with depth clamping	2019-02-21 10:26:07 -08:00
Kenneth Graunke	fdbc205552	iris: multislice transfer maps	2019-02-21 10:26:07 -08:00
Kenneth Graunke	44248d16d2	iris: disable 6x MSAA support	2019-02-21 10:26:07 -08:00
Kenneth Graunke	bc1b4db3b3	iris: fix sample mask for MSAA-off	2019-02-21 10:26:07 -08:00
Kenneth Graunke	7b8c0f058e	iris: actually pin the buffers	2019-02-21 10:26:07 -08:00
Kenneth Graunke	5635abadef	iris: fix SO_DECL_LIST	2019-02-21 10:26:07 -08:00
Kenneth Graunke	dc3b927e97	iris: bother setting program_string_id... not sure how useful this really is... ./bin/ext_transform_feedback-tessellation triangles flat_first is hitting a case where we rebind the same VS program, but with different streamout info...which isn't in the key...but is in the cache...so we don't rebuild it...	2019-02-21 10:26:07 -08:00
Kenneth Graunke	9c1cefff52	iris: set even if no outputs	2019-02-21 10:26:07 -08:00
Kenneth Graunke	cef0b8b13b	iris: streamout	2019-02-21 10:26:07 -08:00
Kenneth Graunke	059c096eff	iris: SO buffers	2019-02-21 10:26:07 -08:00
Kenneth Graunke	5c00f5fdca	iris: Implement 3DSTATE_SO_DECL_LIST	2019-02-21 10:26:07 -08:00
Kenneth Graunke	6794f1ffb9	iris: rearrange iris_resource.h	2019-02-21 10:26:07 -08:00
Kenneth Graunke	a3f77eceb4	iris: slab allocate transfers apparently we need this for u_threaded_context	2019-02-21 10:26:07 -08:00
Kenneth Graunke	5165308169	iris: don't crash on shader perf logs	2019-02-21 10:26:07 -08:00
Kenneth Graunke	f20fc950a7	iris: fix depth bounds clamp enables fixes depthrange-clear among others	2019-02-21 10:26:07 -08:00
Kenneth Graunke	eb274a31bc	iris: fix clip flagging on fb changes	2019-02-21 10:26:07 -08:00
Kenneth Graunke	0232fbc2c4	iris: comment out l/a/i/la in hopes of r/rg fallbacks	2019-02-21 10:26:07 -08:00
Kenneth Graunke	cf34dd7a61	iris: actually handle array layers in blits	2019-02-21 10:26:07 -08:00
Kenneth Graunke	33a17d566f	iris: keep DISCARD_RANGE this isn't really an iris_bo_map flag, but the various resource mappers want to check for it to avoid making temp copies.	2019-02-21 10:26:07 -08:00
Kenneth Graunke	c0ab9c9890	iris: actually set cube bit properly	2019-02-21 10:26:07 -08:00
Kenneth Graunke	d849501f4c	iris: rename map->stride	2019-02-21 10:26:07 -08:00
Kenneth Graunke	36301bbe40	iris: fix zoffset asserts with 2DArray/Cube	2019-02-21 10:26:07 -08:00
Kenneth Graunke	7f39f4843f	iris: SBE change stash not used yet, but want to flag it so I don't forget	2019-02-21 10:26:07 -08:00
Kenneth Graunke	8a080223e6	iris: just malloc one iris_genx_state instead of a bunch of oddball pieces Things that are gen-specific can go in iris_genx_state. Things that are gen-agnostic can go directly in ice->state.	2019-02-21 10:26:07 -08:00
Kenneth Graunke	a7e0edffb6	iris: dead pointer	2019-02-21 10:26:07 -08:00
Kenneth Graunke	ccec5bab5b	iris: implement border color, fix other sampler nonsense	2019-02-21 10:26:07 -08:00
Kenneth Graunke	8a16249285	iris: border color memory zone :( They took away our pointer bits, so now we need a pile of special code to handle this instead of just using u_upload_mgr. :(	2019-02-21 10:26:07 -08:00
Kenneth Graunke	1c19e3b21f	iris: don't include binder in surface VMA range	2019-02-21 10:26:07 -08:00
Kenneth Graunke	1cea195a95	iris: state ref tuple	2019-02-21 10:26:07 -08:00
Kenneth Graunke	c0e80a8d0a	iris: null surface for unbound textures avoids crashes...may not be really right	2019-02-21 10:26:07 -08:00
Kenneth Graunke	d358a4a040	iris: depth clears	2019-02-21 10:26:07 -08:00
Kenneth Graunke	470fb01a7a	iris: fix GS dispatch mode	2019-02-21 10:26:07 -08:00
Kenneth Graunke	01483c7933	iris: fix 3DSTATE_VERTEX_ELEMENTS / VF_INSTANCING for 0 elements	2019-02-21 10:26:07 -08:00
Kenneth Graunke	4c9067ae1d	iris: don't emit garbage 3DSTATE_VERTEX_BUFFERS when there aren't any	2019-02-21 10:26:07 -08:00
Kenneth Graunke	adf0c20461	iris: geometry shader support	2019-02-21 10:26:07 -08:00
Kenneth Graunke	de08ac9b0f	iris: TES uniform fixes not that we have a TES, but...	2019-02-21 10:26:07 -08:00
Kenneth Graunke	d207f97840	iris: larger polygon offset	2019-02-21 10:26:07 -08:00
Kenneth Graunke	5188e54e97	iris: fix provoking vertex ordering had this backwards	2019-02-21 10:26:07 -08:00
Kenneth Graunke	cbbd6a61c4	iris: maybe-flush before blorp operations otherwise if we have a lot of back-to-back blorp operations we can potentially overflow even the chained batch	2019-02-21 10:26:07 -08:00
Kenneth Graunke	e0f3971280	iris: lightmodel flat	2019-02-21 10:26:07 -08:00
Kenneth Graunke	4d04111bfb	iris: implement copy image	2019-02-21 10:26:07 -08:00
Kenneth Graunke	40fd2fd603	iris: fall back to u_generate_mipmap It just does blits between layers, which is all we'd do anyway, and it already should use BLORP because of iris_blit(). Plus it handles 3D, which our code in i965 doesn't.	2019-02-21 10:26:07 -08:00
Kenneth Graunke	6cf04c6ded	iris: clear fix	2019-02-21 10:26:07 -08:00
Kenneth Graunke	d416b81779	iris: shader dirty bits	2019-02-21 10:26:07 -08:00
Kenneth Graunke	b7cd3a083a	iris: rework DEBUG_REEMIT don't want to have to special case this everywhere	2019-02-21 10:26:07 -08:00
Kenneth Graunke	72416a2d0d	iris: clears	2019-02-21 10:26:07 -08:00
Kenneth Graunke	eef0d33cee	iris: better boxing on maps	2019-02-21 10:26:07 -08:00
Kenneth Graunke	419fac2fc6	iris: fix fragcoord ytransform the TGSI in the name is a misnomer, it actually controls wpos_ytransform lowering in NIR these days.	2019-02-21 10:26:07 -08:00
Kenneth Graunke	e67951227d	iris: Disable unsupported mirror clamp modes	2019-02-21 10:26:07 -08:00
Kenneth Graunke	234cf647a4	iris: tidy comments about mirroring modes	2019-02-21 10:26:07 -08:00
Kenneth Graunke	a3a998f19a	iris: iris - fix QWord aligned endings after batch chaining rework I need to save the primary batch size after expanding it to include MI_BATCH_BUFFER_END and the QWord padding NOP	2019-02-21 10:26:07 -08:00
Kenneth Graunke	aacbcbbf47	iris: colorize batchbuffer failures to make them stand out	2019-02-21 10:26:07 -08:00
Kenneth Graunke	8e2b71b190	iris: bad inherited comments	2019-02-21 10:26:07 -08:00
Kenneth Graunke	8c54433275	iris: Handle batch submission failure "better" We used to not reset the batch, and just keep appending to it, so you'd get the same invalid contents over and over. I'd also really like to know about this, so aborting seems wise for now, if not for the long term	2019-02-21 10:26:07 -08:00
Kenneth Graunke	d0b55ca782	iris: don't always flush	2019-02-21 10:26:07 -08:00
Kenneth Graunke	9226ebfa85	iris: print second batch size separately	2019-02-21 10:26:07 -08:00
Kenneth Graunke	f12b079c0e	iris: actually init num_viewports fixes regressions	2019-02-21 10:26:07 -08:00
Kenneth Graunke	81f899c148	iris: scissor count fixes	2019-02-21 10:26:07 -08:00
Kenneth Graunke	92d6a70853	iris: fix VP iteration	2019-02-21 10:26:07 -08:00
Kenneth Graunke	4a94628513	iris: fix num viewports to be based on programs	2019-02-21 10:26:07 -08:00
Kenneth Graunke	b17215800c	iris: fix viewport counts and settings seeing set_viewport_state 0 1 set_viewport_state 1 15 which gives us a total of 16 viewports, updated incrementally so keep old values around and update them...	2019-02-21 10:26:07 -08:00
Kenneth Graunke	636cf8971e	iris: max VP index	2019-02-21 10:26:07 -08:00
Kenneth Graunke	7cdc6b1173	iris: emit 3DSTATE_SBE_SWIZ	2019-02-21 10:26:07 -08:00
Kenneth Graunke	26db2ea782	iris: avoid crashing on unbound constant resources instead, read from the workaround BO	2019-02-21 10:26:07 -08:00
Kenneth Graunke	a7770501a7	iris: fix caps so tests run again	2019-02-21 10:26:07 -08:00
Kenneth Graunke	a6aeca9727	iris: fix major refcounting bug with resources DONTBLOCK -> NULL was happening after taking a reference, causing those to live forever This resolves the OOM problems	2019-02-21 10:26:07 -08:00
Kenneth Graunke	49f9c88801	iris: support signed vertex buffer offsets	2019-02-21 10:26:07 -08:00
Kenneth Graunke	0a43c9defa	iris: print refcounts in INTEL_DEBUG=submit	2019-02-21 10:26:07 -08:00
Kenneth Graunke	7d1e6f1fa1	iris: redo VB CSO a bit	2019-02-21 10:26:07 -08:00
Kenneth Graunke	432790bacd	iris: print binder utilization in INTEL_DEBUG=submit	2019-02-21 10:26:07 -08:00
Kenneth Graunke	f8179dc760	iris: clean up some warnings so I can see through the noise	2019-02-21 10:26:07 -08:00
Kenneth Graunke	5f3a7ee701	iris: use pipe resources not direct BOs	2019-02-21 10:26:07 -08:00
Kenneth Graunke	5619c15ecc	iris: indentation	2019-02-21 10:26:07 -08:00
Kenneth Graunke	27d45eb2f2	iris: don't leak keyboxes when searching for an existing program	2019-02-21 10:26:07 -08:00
Kenneth Graunke	7d504f3d52	iris: don't leak sampler state table resources	2019-02-21 10:26:07 -08:00
Kenneth Graunke	8e186cef2c	iris: rzalloc iris_compiled_shader so memcmp works even if padding creeps in	2019-02-21 10:26:07 -08:00
Kenneth Graunke	5f722bf7c4	iris: remove 4 bytes of padding in iris_compiled_shader	2019-02-21 10:26:07 -08:00
Kenneth Graunke	0db86016f7	iris: pc fixes	2019-02-21 10:26:07 -08:00
Kenneth Graunke	f9f8ea7070	iris: more leak fixes	2019-02-21 10:26:07 -08:00
Kenneth Graunke	c763ecaa65	iris: plug leaks	2019-02-21 10:26:07 -08:00
Kenneth Graunke	477ea6c39a	iris: clear dirty	2019-02-21 10:26:07 -08:00
Kenneth Graunke	23987df412	iris: some dirty fixes two scissor bits, constants not being flagged, ZeroRTA, clip not being flagged	2019-02-21 10:26:07 -08:00
Kenneth Graunke	ccf37c7da9	iris: bindings dirty tracking	2019-02-21 10:26:07 -08:00
Kenneth Graunke	bbc6d15b59	iris: flag DIRTY_WM properly	2019-02-21 10:26:06 -08:00
Kenneth Graunke	3f863cf680	iris: fix the validation list on new batches	2019-02-21 10:26:06 -08:00
Kenneth Graunke	80dee31846	iris: save pointers to streamed state resources will be used for cross-batch validation list fixing	2019-02-21 10:26:06 -08:00
Kenneth Graunke	daceb04bc0	iris: put back the always flush - fixes some things :(	2019-02-21 10:26:06 -08:00
Kenneth Graunke	149408a360	iris: untested SAMPLER_STATE pin BO fix	2019-02-21 10:26:06 -08:00
Kenneth Graunke	de782e5b39	iris: delete some pointless STATIC_ASSERTS these were useful when I was patching relocs	2019-02-21 10:26:06 -08:00
Kenneth Graunke	3eebea88dc	iris: untested index buffer upload	2019-02-21 10:26:06 -08:00
Kenneth Graunke	9247546181	iris: state cleaning	2019-02-21 10:26:06 -08:00
Kenneth Graunke	7c40cdc12f	iris: comment about reemitting and flushing	2019-02-21 10:26:06 -08:00
Kenneth Graunke	d46c5b7c6c	iris: allow mapped buffers during execution (faster)	2019-02-21 10:26:06 -08:00
Kenneth Graunke	92de0f5aa6	iris: disable __gen_validate_value in release mode	2019-02-21 10:26:06 -08:00
Kenneth Graunke	08d1f13818	iris: drop assert for now	2019-02-21 10:26:06 -08:00
Kenneth Graunke	a9e357caac	iris: fix release builds	2019-02-21 10:26:06 -08:00
Kenneth Graunke	73f3c2cad0	iris: better VFI	2019-02-21 10:26:06 -08:00
Chris Wilson	2cbd42cddd	iris: IndexFormat = size/2 brw uses: IndexFormat = index_size >> 1 anv uses: IndexFromat = index_type[index_size]	2019-02-21 10:26:06 -08:00
Kenneth Graunke	5dcf62bb43	iris: use u_transfer helpers for now	2019-02-21 10:26:06 -08:00
Kenneth Graunke	48dc8bd4b0	iris: fix pull bufs that aren't the first user upload	2019-02-21 10:26:06 -08:00
Kenneth Graunke	eed7f7253e	iris: fill out pull constant buffers	2019-02-21 10:26:06 -08:00
Kenneth Graunke	90046b43cc	iris: make surface states for cbufs	2019-02-21 10:26:06 -08:00
Kenneth Graunke	4e007dbb30	iris: have more than one const_offset	2019-02-21 10:26:06 -08:00
Kenneth Graunke	9ea05ccf1f	iris: completely rewrite binder now we get a new one per batch, and flush if it fills up	2019-02-21 10:26:06 -08:00
Kenneth Graunke	26cc609927	iris: better ubo handling	2019-02-21 10:26:06 -08:00
Chris Wilson	a504b98e72	iris: fix import from dri2/3	2019-02-21 10:26:06 -08:00
Kenneth Graunke	badefe50a0	iris: fix constant packet length to match i965	2019-02-21 10:26:06 -08:00
Kenneth Graunke	201a4d923c	iris: maybe slightly less boats uniforms	2019-02-21 10:26:06 -08:00
Kenneth Graunke	a6dd9caf0d	iris: flush always	2019-02-21 10:26:06 -08:00
Kenneth Graunke	04d1a3a7de	iris: transfers	2019-02-21 10:26:06 -08:00
Kenneth Graunke	7437c28c0d	iris: util_copy_framebuffer_state (ported from Rob's v3d patches)	2019-02-21 10:26:06 -08:00
Kenneth Graunke	f6017da83f	iris: fix VF INSTANCING length	2019-02-21 10:26:06 -08:00
Kenneth Graunke	7fb7704b2e	iris: more depth stuffs... still missing stencil	2019-02-21 10:26:06 -08:00
Kenneth Graunke	02890c75b5	iris: fix 3DSTATE_VERTEX_ELEMENTS length	2019-02-21 10:26:06 -08:00
Kenneth Graunke	601ee4c189	iris: fix whitespace	2019-02-21 10:26:06 -08:00
Kenneth Graunke	4d24874236	iris: Lower the max number of decoded VBO lines saint foo, vbo lines!	2019-02-21 10:26:06 -08:00
Kenneth Graunke	48ddd7212d	iris: fix decoding and undo testing code	2019-02-21 10:26:06 -08:00
Kenneth Graunke	f31eea1f00	iris: fix batch chaining... don't chain a batch just for the end	2019-02-21 10:26:06 -08:00
Kenneth Graunke	5b914a6d58	iris: caps	2019-02-21 10:26:06 -08:00
Kenneth Graunke	604a1a1614	iris: chaining not growing	2019-02-21 10:26:06 -08:00
Kenneth Graunke	053fb51125	iris: just turn batch reset_and_clear_caches into reset	2019-02-21 10:26:06 -08:00
Kenneth Graunke	ca735c5e0c	iris: delete growing code and just die for now we need proper batch chaining. without relocations, we can't grow, since we've only allocated so much VMA for the batch, and the mechanism only works if we can pin it at the old address	2019-02-21 10:26:06 -08:00
Kenneth Graunke	7167c6d508	iris: blorp bug fixes I wrote this earlier, but it got lost somehow...	2019-02-21 10:26:06 -08:00
Kenneth Graunke	3650f8dfa1	iris: properly reject formats, fixes RGB32 rendering with texture float	2019-02-21 10:26:06 -08:00
Kenneth Graunke	4510098b9c	iris: proper # of uniforms or at least closer...we were using bytes, we want 256-bit units...	2019-02-21 10:26:06 -08:00
Kenneth Graunke	6091dc470f	iris: proper length for VE packet?	2019-02-21 10:26:06 -08:00
Kenneth Graunke	64a3f7423a	iris: uniforms for VS	2019-02-21 10:26:06 -08:00
Kenneth Graunke	d4a64e0a64	iris: bump GL version to 4.2	2019-02-21 10:26:06 -08:00
Kenneth Graunke	44993d451c	iris: some depth stuff :(	2019-02-21 10:26:06 -08:00
Kenneth Graunke	eb12cc70f0	iris: assert surf init	2019-02-21 10:26:06 -08:00
Kenneth Graunke	a4a426008b	iris: no more drawing rectangle in blorp there's some bug here as Jason's patches for only emitting 3DS_DR once got reverted by Mark later on, apparently they regressed MSAA tests. need to sort that out.	2019-02-21 10:26:06 -08:00
Kenneth Graunke	0e3870b9de	iris: blorp URB	2019-02-21 10:26:06 -08:00
Kenneth Graunke	01fe6df0ed	iris: make blorp pin the binder	2019-02-21 10:26:06 -08:00
Kenneth Graunke	063fc7bbb0	iris: linear staging buffers - fast CPU access...	2019-02-21 10:26:06 -08:00
Kenneth Graunke	84abf77c67	iris: hacky flushing for now	2019-02-21 10:26:06 -08:00
Kenneth Graunke	75a1639262	iris: drop the 48b printout, we never use anything else	2019-02-21 10:26:06 -08:00
Kenneth Graunke	86d7fd71f4	iris: add INTEL_DEBUG=reemit	2019-02-21 10:26:06 -08:00
Kenneth Graunke	b8a11ad256	iris: fix blorp prog data crashes	2019-02-21 10:26:06 -08:00
Kenneth Graunke	e2ba98ba39	iris: more blorp	2019-02-21 10:26:06 -08:00
Kenneth Graunke	1bba60a4bf	iris: fix sampler view crashes	2019-02-21 10:26:06 -08:00
Kenneth Graunke	e22da1e7b1	iris: drop bogus binder free I was malloc'ing it but then I changed my mind and embedded it directly	2019-02-21 10:26:06 -08:00
Kenneth Graunke	698d45b725	iris: more blitting code to make readpixels work	2019-02-21 10:26:06 -08:00
Kenneth Graunke	c9d9e44720	iris: bits of blorp code	2019-02-21 10:26:06 -08:00
Kenneth Graunke	79466c1313	iris: move bo_offset_from_sba for wider use	2019-02-21 10:26:06 -08:00
Kenneth Graunke	60d708bb80	iris: copy over i965's cache tracking needed to split out vtbl so I can pipe control without ice	2019-02-21 10:26:06 -08:00
Kenneth Graunke	dbd4770397	iris: pull in newer comments	2019-02-21 10:26:06 -08:00
Kenneth Graunke	841b3b9003	iris: Defines for base addresses rather than numbers everywhere	2019-02-21 10:26:06 -08:00
Kenneth Graunke	c75a1254a4	iris: Move get_command_space to iris_batch.c for reuse in blorp. it's a better interface anyway.	2019-02-21 10:26:06 -08:00
Kenneth Graunke	39e795d473	iris: fix texturing!	2019-02-21 10:26:06 -08:00
Kenneth Graunke	4929f020c3	iris: better SBE	2019-02-21 10:26:06 -08:00
Kenneth Graunke	8bf167c9e9	iris: vma - fix assert	2019-02-21 10:26:06 -08:00
Kenneth Graunke	10e4f1e68c	iris: vma fixes - don't free binder address	2019-02-21 10:26:06 -08:00
Kenneth Graunke	5a101e6434	iris: bo reuse	2019-02-21 10:26:06 -08:00
Kenneth Graunke	21acc00490	iris: crazy pipe control code imported from ~kwg/mesa pcx-2, gen < 8 code dropped	2019-02-21 10:26:06 -08:00
Kenneth Graunke	87aa880795	iris: fixes	2019-02-21 10:26:06 -08:00
Kenneth Graunke	3fbf7294b1	iris: fixes from i965	2019-02-21 10:26:06 -08:00
Kenneth Graunke	999ed6e213	iris: port bug fix from i965	2019-02-21 10:26:05 -08:00
Kenneth Graunke	19d11a6df3	iris: fix index	2019-02-21 10:26:05 -08:00
Kenneth Graunke	010e845af7	iris: increase allocator alignment	2019-02-21 10:26:05 -08:00
Kenneth Graunke	35afa8c8f3	iris: better BT asserts Probably nothing is working because texture upload isn't implemented	2019-02-21 10:26:05 -08:00
Kenneth Graunke	0148bd6839	iris: decoder fixes	2019-02-21 10:26:05 -08:00
Kenneth Graunke	5d2673ba7e	iris: set sampler views	2019-02-21 10:26:05 -08:00
Kenneth Graunke	34164ce622	iris: isv freeing fixes	2019-02-21 10:26:05 -08:00
Kenneth Graunke	012154c20f	iris: TES stash TODO: key setup	2019-02-21 10:26:05 -08:00
Kenneth Graunke	d890aee15d	iris: SBA once at context creation, not per batch hooray!	2019-02-21 10:26:05 -08:00
Kenneth Graunke	e0eac28bd4	iris: fix a scissor bug	2019-02-21 10:26:05 -08:00
Kenneth Graunke	0707ff3f2f	iris: assemble SAMPLER_STATE table at bind time It's useless to allocate SAMPLER_STATEs in GPU memory on creation like we do for SURFACE_STATES, because they need to be organized into a contiguous block of memory. But we can do that at bind time, rather than draw time.	2019-02-21 10:26:05 -08:00
Kenneth Graunke	199c080926	iris: same treatment for sampler views	2019-02-21 10:26:05 -08:00
Kenneth Graunke	f51204a160	iris: allocate SURFACE_STATEs up front and stop streaming them	2019-02-21 10:26:05 -08:00
Kenneth Graunke	bf90d8a125	iris: delete more trash	2019-02-21 10:26:05 -08:00
Kenneth Graunke	1398c99aff	iris: canonicalize addresses. Back to working! Woo!	2019-02-21 10:26:05 -08:00
Kenneth Graunke	b69a85bc4d	iris: validation dumping improvements backported from i965. don't bother with (pinned) because everything is.	2019-02-21 10:26:05 -08:00
Kenneth Graunke	24bcf1054b	iris: update vb BO handling now that we have softpin	2019-02-21 10:26:05 -08:00
Kenneth Graunke	9ac81f1890	iris: decoder fixes	2019-02-21 10:26:05 -08:00
Kenneth Graunke	9955e8334b	iris: binder fixes	2019-02-21 10:26:05 -08:00
Kenneth Graunke	65073c2217	iris: hook up batch decoder	2019-02-21 10:26:05 -08:00
Kenneth Graunke	6cbd1d1692	iris: binders	2019-02-21 10:26:05 -08:00
Kenneth Graunke	209692c716	iris: include p_defines.h in iris_bufmgr.h for PIPE_TRANSFER_WRITE and friends	2019-02-21 10:26:05 -08:00
Kenneth Graunke	1af84d345a	iris: set EXEC_OBJECT_WRITE	2019-02-21 10:26:05 -08:00
Kenneth Graunke	651be7cf3d	iris: rewrite to use memzones and not relocs	2019-02-21 10:26:05 -08:00
Kenneth Graunke	68229caa38	iris: more uploaders	2019-02-21 10:26:05 -08:00
Kenneth Graunke	3861d24e23	iris: Also set SUPPORTS_48B? Not sure if necessary.	2019-02-21 10:26:05 -08:00
Kenneth Graunke	e95ad5994a	iris: dump gtt offset in dump_validation_list	2019-02-21 10:26:05 -08:00
Kenneth Graunke	d78be0188e	iris: fix icache memzone	2019-02-21 10:26:05 -08:00
Kenneth Graunke	e4aa8338c3	iris: Soft-pin the universe Breaks everything, woo!	2019-02-21 10:26:05 -08:00
Kenneth Graunke	3693307670	iris: some thinking about binding tables	2019-02-21 10:26:05 -08:00
Kenneth Graunke	f6be3d4f3a	iris: bufmgr updates. Drop BO_ALLOC_BUSY (best not to hand people a loaded gun...) Drop vestiges of alignment	2019-02-21 10:26:05 -08:00
Kenneth Graunke	902a122404	iris: stop adding 9 to our varyings	2019-02-21 10:26:05 -08:00
Kenneth Graunke	a235da3e68	iris: set strides on transfers	2019-02-21 10:26:05 -08:00
Kenneth Graunke	6891f70d87	iris: enable a few more formats	2019-02-21 10:26:05 -08:00
Kenneth Graunke	7130c43d96	iris: decode batches if they fail to submit	2019-02-21 10:26:05 -08:00
Kenneth Graunke	23367688e9	iris: NOOP pad batches correctly	2019-02-21 10:26:05 -08:00
Kenneth Graunke	f3150e9ecd	iris: warn if execbuf fails	2019-02-21 10:26:05 -08:00
Kenneth Graunke	a50a3a8edf	iris: uniform bits...badly	2019-02-21 10:26:05 -08:00
Kenneth Graunke	213b70a222	iris: sample mask...not 0. We now have a first triangle!	2019-02-21 10:26:05 -08:00
Kenneth Graunke	1a6bb266cf	iris: write DISABLES are not write ENABLES...whoops	2019-02-21 10:26:05 -08:00
Kenneth Graunke	50a2596f46	iris: fix extents	2019-02-21 10:26:05 -08:00
Kenneth Graunke	ffcd84f55a	iris: catastrophic state pointer mistake	2019-02-21 10:26:05 -08:00
Kenneth Graunke	1739dc0d5e	iris: more SF CL VPs	2019-02-21 10:26:05 -08:00
Kenneth Graunke	ade381fb9c	iris: fix dmabuf retval comparisons 0 means success	2019-02-21 10:26:05 -08:00
Kenneth Graunke	ed42ae2f9b	iris: more sketchy SBE	2019-02-21 10:26:05 -08:00
Kenneth Graunke	9be4b3baaf	iris: compctrl oh, also run things	2019-02-21 10:26:05 -08:00
Kenneth Graunke	db15993cfd	iris: actually pin the instruction cache buffers	2019-02-21 10:26:05 -08:00
Kenneth Graunke	bda9a77b47	iris: smaller blend state	2019-02-21 10:26:05 -08:00
Kenneth Graunke	f9d834d588	iris: don't do samplers for disabled stages	2019-02-21 10:26:05 -08:00
Kenneth Graunke	e21bddeb4f	iris: render targets!	2019-02-21 10:26:05 -08:00
Kenneth Graunke	8503578e82	iris: fix silly unused batch with addr macro	2019-02-21 10:26:05 -08:00
Kenneth Graunke	352ec1f378	iris: warning fixes	2019-02-21 10:26:05 -08:00
Kenneth Graunke	54ba8a60d5	iris: basic SBE code	2019-02-21 10:26:05 -08:00
Kenneth Graunke	5af16f5e20	iris: alpha testing in PSB	2019-02-21 10:26:05 -08:00
Kenneth Graunke	c96132d5fd	iris: blend state	2019-02-21 10:26:05 -08:00
Kenneth Graunke	bb3c0be7a8	iris: dummy constants	2019-02-21 10:26:05 -08:00
Kenneth Graunke	538decc0de	iris: URB configs.	2019-02-21 10:26:05 -08:00
Kenneth Graunke	b1115799e6	iris: actually set KSP offsets	2019-02-21 10:26:05 -08:00
Kenneth Graunke	6f1c07d7dd	iris: actually softpin at an address	2019-02-21 10:26:05 -08:00
Kenneth Graunke	acdff2f9a6	iris: actually destroy the cache	2019-02-21 10:26:05 -08:00
Kenneth Graunke	9437e135ed	iris: rewrite program cache to use u_upload_mgr	2019-02-21 10:26:05 -08:00
Kenneth Graunke	67ca2be992	iris: no NEW_SBA	2019-02-21 10:26:05 -08:00
Kenneth Graunke	e7a729ba34	iris: shuffle comments	2019-02-21 10:26:05 -08:00
Kenneth Graunke	6ecc93f764	iris: bits of WM key	2019-02-21 10:26:05 -08:00
Kenneth Graunke	bba13b1501	iris: move key pop to state module shader key population needs to read state	2019-02-21 10:26:05 -08:00
Kenneth Graunke	5864c9414a	iris: fix SBA	2019-02-21 10:26:05 -08:00
Kenneth Graunke	5ae278da18	iris: use vtbl to avoid multiple symbols, fix state base address	2019-02-21 10:26:05 -08:00
Kenneth Graunke	876417f9e8	iris: softpin some things	2019-02-21 10:26:05 -08:00
Kenneth Graunke	c493fee73f	iris: drop const from prog data parameters we ralloc steal things, which makes it a little bogus	2019-02-21 10:26:05 -08:00
Kenneth Graunke	cf7ba838ad	iris: more comes from bits filled in tomorrow, fix the build system to avoid symbol clashes somehow... we're getting gen9 functions because they happen to be listed before 10 in the link list.	2019-02-21 10:26:05 -08:00
Kenneth Graunke	8dffc9b195	iris: index buffer BO	2019-02-21 10:26:05 -08:00
Kenneth Graunke	8665dfd602	iris: WM. I could have added a dirty bit for this, but it doesn't seem worth it	2019-02-21 10:26:05 -08:00
Kenneth Graunke	bae5414594	iris: initial gpu state	2019-02-21 10:26:05 -08:00
Kenneth Graunke	0477591355	iris: reorganize commands to match brw	2019-02-21 10:26:05 -08:00
Kenneth Graunke	3e684d0eb7	iris: don't forget about TE	2019-02-21 10:26:05 -08:00
Kenneth Graunke	d71d2028ef	iris: convert IRIS_DIRTY_* to #defines enums are SIGNED. so IRIS_DIRTY_VS << 4 gets sign extended, making it not equal to IRIS_DIRTY_FS. Surprising!	2019-02-21 10:26:05 -08:00
Kenneth Graunke	cfd5fcc256	iris: emit shader packets	2019-02-21 10:26:05 -08:00
Kenneth Graunke	1cf21cc813	iris: actually save derived state	2019-02-21 10:26:05 -08:00
Kenneth Graunke	57c1b71418	iris: promote iris_program_cache_item to iris_compiled_shader	2019-02-21 10:26:05 -08:00
Kenneth Graunke	581459a9fe	iris: some shader bits	2019-02-21 10:26:05 -08:00
Kenneth Graunke	df401aaa11	iris: scissor slots	2019-02-21 10:26:05 -08:00
Kenneth Graunke	dc4453d886	iris: bind_state -> compute state	2019-02-21 10:26:05 -08:00
Kenneth Graunke	2f100c6e31	iris: 3DPRIMITIVE fields	2019-02-21 10:26:05 -08:00
Kenneth Graunke	b3646e2b48	iris: fix VF instancing length so we don't get garbage in batch	2019-02-21 10:26:05 -08:00
Kenneth Graunke	317263ab11	iris: vertex packet fixes	2019-02-21 10:26:05 -08:00
Kenneth Graunke	129fae5a90	iris: fix VBs	2019-02-21 10:26:05 -08:00
Kenneth Graunke	fc5ddc64f9	iris: fix assert	2019-02-21 10:26:05 -08:00
Kenneth Graunke	e91289908a	iris: fix indentation	2019-02-21 10:26:05 -08:00
Kenneth Graunke	41b32a4eda	iris: hack to stop crashing on samplers for now	2019-02-21 10:26:05 -08:00
Kenneth Graunke	dcfb06375a	iris: initialize dirty bits to ~0ull	2019-02-21 10:26:05 -08:00
Kenneth Graunke	0a513d63a1	iris: actually advance forward when emitting commands	2019-02-21 10:26:05 -08:00
Kenneth Graunke	24cc627612	iris: actually flush the commands	2019-02-21 10:26:05 -08:00
Kenneth Graunke	082911409e	iris: actually APPEND commands, not stomp over the top and never incr	2019-02-21 10:26:05 -08:00
Kenneth Graunke	b332ff489c	iris: VB fixes	2019-02-21 10:26:05 -08:00
Kenneth Graunke	50b1e01996	iris: DEBUG=bat Deleted in the interest of making the branch compile at each step	2019-02-21 10:26:05 -08:00
Kenneth Graunke	6e01bc0637	iris: VB addresses	2019-02-21 10:26:05 -08:00
Kenneth Graunke	b574b56325	iris: reference VB BOs	2019-02-21 10:26:05 -08:00
Kenneth Graunke	4dc683f64b	iris: so, sba then.	2019-02-21 10:26:05 -08:00
Kenneth Graunke	d900a235b1	iris: try and have an iris address	2019-02-21 10:26:05 -08:00
Kenneth Graunke	f31ae76216	iris: flag SBA updates when instruction BO changes	2019-02-21 10:26:05 -08:00
Kenneth Graunke	7d90cc8da4	iris: bit of SBA code genxml MOCS is stupid, addresses are hard news at 11	2019-02-21 10:26:05 -08:00
Kenneth Graunke	ff5c886fb3	iris: move MAX defines to iris_batch.h for SBA	2019-02-21 10:26:05 -08:00
Kenneth Graunke	7bfc8f7d7d	iris: kill iris_new_batch reset and new are too similar, and this had exactly one caller	2019-02-21 10:26:05 -08:00
Kenneth Graunke	b701096ab9	iris: make iris_batch target a particular ring	2019-02-21 10:26:05 -08:00
Kenneth Graunke	64f043570d	iris: lower io	2019-02-21 10:26:05 -08:00
Kenneth Graunke	695bd55d1a	iris: do the FS...asserts because we don't lower uniforms yet	2019-02-21 10:26:05 -08:00
Kenneth Graunke	6aa15cadf3	iris: import program cache code	2019-02-21 10:26:05 -08:00
Kenneth Graunke	4525dda75f	iris: reworks, FS compile pieces	2019-02-21 10:26:05 -08:00
Kenneth Graunke	628a71c2e3	iris: parse INTEL_DEBUG	2019-02-21 10:26:05 -08:00
Kenneth Graunke	d62b0b9ee8	iris: draw->restart_index is uninitialized if PR is not enabled	2019-02-21 10:26:05 -08:00
Kenneth Graunke	5fad62cef1	iris: fix bogus index buffer reference	2019-02-21 10:26:05 -08:00
Kenneth Graunke	95fe254cf2	iris: fix prim type	2019-02-21 10:26:05 -08:00
Kenneth Graunke	793276cd8b	iris: msaa sample count packing problems 0 -> ffffffffffffffffffffffffffff	2019-02-21 10:26:05 -08:00
Kenneth Graunke	0252fb36e9	iris: actually save VBs	2019-02-21 10:26:05 -08:00
Kenneth Graunke	ed6ee3e270	iris: fix/rework line stipple	2019-02-21 10:26:05 -08:00
Kenneth Graunke	231935efa2	iris: init the batch!	2019-02-21 10:26:05 -08:00
Kenneth Graunke	9ca58ca517	iris: delete iris_pipe.c, shuffle code around	2019-02-21 10:26:05 -08:00
Kenneth Graunke	455e2d6dce	iris: disable execbuf for now	2019-02-21 10:26:05 -08:00
Kenneth Graunke	86e0c08b14	iris: make an ice->render_batch field we may want a second one for transfers	2019-02-21 10:26:05 -08:00
Kenneth Graunke	ffd7f13b4d	iris: drop unused field	2019-02-21 10:26:05 -08:00
Kenneth Graunke	8097dc9dd9	iris: shader debug log	2019-02-21 10:26:05 -08:00
Kenneth Graunke	6c7a276470	iris: maps	2019-02-21 10:26:05 -08:00
Kenneth Graunke	49896861ce	iris: linear resources	2019-02-21 10:26:05 -08:00
Kenneth Graunke	c820f5a4bd	iris: some program code	2019-02-21 10:26:04 -08:00
Kenneth Graunke	d48dc416fa	iris: basic push constant alloc	2019-02-21 10:26:04 -08:00
Kenneth Graunke	21c016b496	iris: emit 3DSTATE_SAMPLER_STATE_POINTERS	2019-02-21 10:26:04 -08:00
Kenneth Graunke	7b80f4587d	iris: sampler states	2019-02-21 10:26:04 -08:00
Kenneth Graunke	60208d12b4	iris: COLOR_CALC_STATE	2019-02-21 10:26:04 -08:00
Kenneth Graunke	9367c44639	iris: fix crash - CSO binding can be NULL (when destroying context)	2019-02-21 10:26:04 -08:00
Kenneth Graunke	efea4d96d9	iris: some draw info, vbs, sample mask	2019-02-21 10:26:04 -08:00
Kenneth Graunke	d6ad9f4732	iris: a bit of depth still need to allocate separate stencil	2019-02-21 10:26:04 -08:00
Kenneth Graunke	7abe5aefd3	iris: fix SF_CL length	2019-02-21 10:26:04 -08:00
Kenneth Graunke	c1c6c3a18a	iris: don't segfault on !old_cso	2019-02-21 10:26:04 -08:00
Kenneth Graunke	3eadb1b3a1	iris: framebuffers	2019-02-21 10:26:04 -08:00
Kenneth Graunke	e7c9bddda7	iris: stipples and vertex elements	2019-02-21 10:26:04 -08:00
Kenneth Graunke	d0aab78dc3	iris: sampler views	2019-02-21 10:26:04 -08:00
Kenneth Graunke	831d630b8b	iris: Surfaces!	2019-02-21 10:26:04 -08:00
Kenneth Graunke	4ec5f8be3e	iris: SF_CLIP_VIEWPORT	2019-02-21 10:26:04 -08:00
Kenneth Graunke	970836c34e	iris: scissors	2019-02-21 10:26:04 -08:00
Kenneth Graunke	7c875deaf0	iris: RASTER + SF + some CLIP, fix DIRTY vs. NEW	2019-02-21 10:26:04 -08:00
Kenneth Graunke	02f583b0a0	iris: initial gpu state, merges	2019-02-21 10:26:04 -08:00
Kenneth Graunke	a13d417ac1	iris: merge pack this lets us merge dynamic and pre-baked state, also like anv	2019-02-21 10:26:04 -08:00
Kenneth Graunke	aee39df710	iris: packing with valgrind. borrowed macros from anv!	2019-02-21 10:26:04 -08:00
Kenneth Graunke	d3d6ef37f6	iris: initial render state upload	2019-02-21 10:26:04 -08:00
Kenneth Graunke	26fb5a8ae2	iris: port over batchbuffer updates	2019-02-21 10:26:04 -08:00
Kenneth Graunke	14ca30507f	iris: viewport state, sort of	2019-02-21 10:26:04 -08:00
Kenneth Graunke	2dce0e94a3	iris: Initial commit of a new 'iris' driver for Intel Gen8+ GPUs. This commit introduces a new Gallium driver for Intel Gen8+ GPUs, named 'iris_dri.so' after the hardware. Developed by: - Kenneth Graunke (overall driver) - Dave Airlie (shaders, conditional render, overflow query, Gen8 port) - Chris Wilson (fencing, pinned memory, ...) - Jordan Justen (compute shaders) - Jason Ekstrand (image load store) - Caio Marcelo de Oliveira Filho (tessellation control passthrough) - Rafael Antognolli (auxiliary buffer fixes) - The rest of the i965 contributors and the Mesa community	2019-02-21 10:26:04 -08:00
James Zhu	eac822eac1	gallium/auxiliary/vl: Fix transparent issue on compute shader with rgba Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109646 Problem 1,4: they are caused by imcomplete blend comute shader implementation. So Reverts rgba back to frament shader. Fixes: `9364d66cb7` (Add video compositor compute shader render) Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Tested-by: Bruno Milreu <bmilreu@gmail.com>	2019-02-21 13:11:53 -05:00
Lionel Landwerlin	20c370c6b1	vulkan: add an overlay layer Just a starting point to display frame timings & drawcalls/submissions per frame. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> +1-by: Mike Lothian <mike@fireburn.co.uk> +1-by: Tapani Pälli <tapani.palli@intel.com> +1-by: Eric Engestrom <eric.engestrom@intel.com> +1-by: Yurii Kolesnykov <root@yurikoles.com> +1-by: myfreeweb <greg@unrelenting.technology> +1-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-21 18:06:05 +00:00
Lionel Landwerlin	89f03d1872	imgui: make sure our copy of imgui doesn't clash with others in the same process Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> +1-by: Mike Lothian <mike@fireburn.co.uk> +1-by: Tapani Pälli <tapani.palli@intel.com> +1-by: Eric Engestrom <eric.engestrom@intel.com> +1-by: Yurii Kolesnykov <root@yurikoles.com> +1-by: myfreeweb <greg@unrelenting.technology> +1-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-21 18:06:05 +00:00
Lionel Landwerlin	3950e7c11e	imgui: bump copy Updated at : commit f977871854af941289f2a9090dcc90f7aa3449a8 Author: omar <omarcornut@gmail.com> Date: Fri Feb 15 13:10:22 2019 +0100 ImFont: Minor adjustment to the structure. Examples: Removed unused variable. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> +1-by: Mike Lothian <mike@fireburn.co.uk> +1-by: Tapani Pälli <tapani.palli@intel.com> +1-by: Eric Engestrom <eric.engestrom@intel.com> +1-by: Yurii Kolesnykov <root@yurikoles.com> +1-by: myfreeweb <greg@unrelenting.technology> +1-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-21 18:06:05 +00:00
Lionel Landwerlin	51047cd2e8	build: move imgui out of src/intel/tools to be reused Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> +1-by: Mike Lothian <mike@fireburn.co.uk> +1-by: Tapani Pälli <tapani.palli@intel.com> +1-by: Eric Engestrom <eric.engestrom@intel.com> +1-by: Yurii Kolesnykov <root@yurikoles.com> +1-by: myfreeweb <greg@unrelenting.technology> +1-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-21 18:06:05 +00:00
Jason Ekstrand	f98fd9d15a	nir/lower_clip_cull: Fix an incorrect assert Copy+paste error. It was supposed to test cull and not clip. Fixes: `4e69fba534` "nir: Rewrite lower_clip_cull_distance_arrays..." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109717 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-02-21 12:05:12 -06:00
Jason Ekstrand	f9b2f10a41	nir: Fix a compile warning	2019-02-21 09:44:42 -06:00
Rob Clark	908d5ee9eb	freedreno/a6xx: enable tiled images Turns out we can write to tiled images as well as read. This avoids having to linearize or do the tiling in the shader. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-21 09:06:06 -05:00
Alejandro Piñeiro	0629b2a462	nir, glsl: move pixel_center_integer/origin_upper_left to shader_info.fs On GLSL that info is set as a layout qualifier when redeclaring gl_FragCoord, so somehow tied to a specific variable. But in practice, they behave as a global of the shader. On ARB programs they are set using a global OPTION (defined at ARB_fragment_coord_conventions), and on SPIR-V using ExecutionModes, that are also not tied specifically to the builtin. This patch moves that info from nir variable and ir variable to nir shader and gl_program shader_info respectively, so the map is more similar to SPIR-V, and ARB programs, instead of more similar to GLSL. FWIW, shader_info.fs already had pixel_center_integer, so this change also removes some redundancy. Also, as struct gl_program also includes a shader_info, we removed gl_program::OriginUpperLeft and PixelCenterInteger, as it would be superfluous. This change was needed because recently spirv_to_nir changed the order in which execution modes and variables are handled, so the variables didn't get the correct values. Now the info is set on the shader itself, and we don't need to go back to the builtin variable to set it. Fixes: `e68871f6a` ("spirv: Handle constants and types before execution modes") v2: (Jason) * glsl_to_nir: get the info before glsl_to_nir, while all the rest of the info gathering is happening * prog_to_nir: gather the info on a general info-gathering pass, not on variable setup. v3: (Jason) * Squash with the patch that removes that info from ir variable * anv: assert that OriginUpperLeft is true. It should be already set by spirv_to_nir. * blorp: set origin_upper_left on its core "compile fragment shader", not just on some specific places (for this we added an helper on a previous patch). * prog_to_nir: no need to gather specifically this fragcoord modes as the full gl_program shader_info is copied. * spirv_to_nir: assert that we are a fragment shader when handling this execution modes. v4: (reported by failing gitlab pipeline #18750) * state_tracker: update too due changes on ir.h/gl_program v5: * blorp: minor change after change on previous patch * radeonsi: update due this change. v6: (Timothy Arceri) * prog_to_nir: remove extra whitespace * shader_info: don't use :1 on origin_upper_left * glsl: program.fs.origin_upper_left/pixel_center_integer can be move out of the shader list loop	2019-02-21 11:47:59 +01:00
Alejandro Piñeiro	675eabb560	blorp: introduce helper method blorp_nir_init_shader This initializes the nir shader that will be used by blorp. Right now it doesn't do too much beyond calling nir_builder_init_simple_shader, and setting a name. More stuff will be added on following patches. v2: there is a case were it is used a VERTEX_SHADER (Alejandro)	2019-02-21 11:47:51 +01:00
Alyssa Rosenzweig	705723e6be	panfrost: Verify and print brx condition in disasm The condition code in extended branches is repeated 8 times for unclear reasons; accordingly, the code would be disassembled as "unknown5555", "unknownAAAA", etc. This patch correctly masks off the lower two bits to find the true code to print, verifying that the code is repeated as believed to be necessary (providing some assurance for compiler quality and an assert trip in case we encounter a shader in the wild that breaks the convention). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-21 07:09:06 +00:00
Alyssa Rosenzweig	779e140b1a	panfrost: Dynamically set discard branch targets discard and discard_if are both implemented with the branching pipeline on Midgard; essentially, we branch to the end of the fragment shader in a special "discard" mode, setting the condition as necessary. Previously, we hardcoded the form of this instruction, which worked for very simple shaders but was incorrect for anything remotely interesting. This patch instead emits logical branches in the IR, which are flattened to real discard ops the same way other branches are, allowing targets to be computed correctly. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-21 07:08:59 +00:00
Alyssa Rosenzweig	5abb7b559e	panfrost/midgard: Emit extended branches Previously, we only emitted compact branches; however, the offset range of these branches is too small for many real world shaders. This patch implements support for emitting extended branches and switches to always using them for control flow. This incurs a code size and possibly performance penalty, but expands the range of working shaders and provides opportunity for further optimization. Support for emitting compact branches is retained but this code path is presently unused. In the future, we'll want to heuristically determine which type of branch should be emitted for optimal codegen. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-21 07:08:47 +00:00
Alyssa Rosenzweig	813bb34fd8	panfrost: Rectify doubleplusungood extended branch Midgard features "compact branches" and "extended branches", i.e. corresponds to short jumps and far jumps. The form of the extended branch was previously incorrect in the ISA headers; this patch corrects it and updates the disassembler (simultaneous to preserve bisectability). Additionally, we fix some a corner case in the disassembly of extended branches, and we now prefix extended branches with "brx", to visually differentiate from compact branches prefixed with "br". Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-21 07:07:39 +00:00
Alyssa Rosenzweig	2c74709517	panfrost/midgard: Fix nested/chained if-else An if-else statement is compiled to a conditional branch (from the start to the second block) and an unconditional branch (from the end of the first block to the end of the else). We previously incorrectly computed the block index of the unconditional branch to be exactly one after that of the conditional branch, valid for a single if-else statement but nothing fancier. This patch correctly computes the unconditional branch target, fixing more complex if-else chains. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-21 07:06:26 +00:00
Alyssa Rosenzweig	5e55c11a1b	panfrost/midgard: Refactor tag lookahead code Each Midgard instruction is scheduled to a particular instruction type ("tag"). Presumably the hardware prefetches memory based on tag, so it is required to report out the first tag to the command stream and the next tag of a branch target. This procedure was implemented in two separate parts of the compiler (one time with a slight bug relating to empty blocks); this patch refactors to unite the two routines and solve the bug when branching to empty blocks. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-21 07:05:59 +00:00
Alyssa Rosenzweig	396eb1440a	panfrost: Implement pantrace (command stream dump) Historically, Panfrost debugging entailed the use of the LD_PRELOADable `panwrap` tool. This setup is a tad fragile; Panfrost can be traced directly without the intermediate layer. pantrace implements the quivalent functionality of panwrap into Panfrost proper, allowing dumps to work regardless of the kernel layer in use. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-21 07:03:21 +00:00
Alyssa Rosenzweig	f611782045	panfrost: Add pandecode (command stream debugger) The `panwrap` utility can be LD_PRELOAD'd into a GLES app, intercepting communication between the driver and the kernel. Modern panwrap versions do no processing of their own; instead, they create a trace directory. This directory contains the following files: - control.log: a line-by-line plain text file, denoting important syscalls (mmaps and job submits) along with their arguments - memory_.bin, shader_.bin: binary dumps of mapped memory Together, these files contain enough information to reconstruct the command stream and shaders of (at minimum) a single frame. The `pandecode` utility takes this directory structure as input, reconstructing the mapped memory and using the job submit command as an entrypoint. It then walks the descriptors as the hardware would, parsing and pretty-printing. Its final output is the pretty-printed command stream interleaved with the disassembled shaders, suitable for driver debugging. For instance, the behaviour of two driver versions (one working, one broken) can be compared by diff'ing their decoded logs. pandecode/decode.c was originally a part of `panwrap`; it is the oldest living code in the project. Its history is generally not worth preserving. panwrap itself will continue to live downstream for the foreseeable future, as it is specifically written for the vendor kernel. It is possible, however, to produce equivalent traces directly from Panfrost, bypassing the intermediate wrapping layer for well-behaved drivers. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-21 07:01:48 +00:00
Alyssa Rosenzweig	fb3bbd0c1c	panfrost: Stub out separate stencil functions This is not yet functional, but it resolves a crash in various apps and provides a framework for further work. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-21 06:58:50 +00:00
Marek Olšák	edbd2c1ff5	radeonsi: use SDMA for uploading data through const_uploader v2: use tc.stream_uploader in si buffer_transfer_map if not called from the driver thread Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1) Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2019-02-20 21:04:29 -05:00
Marek Olšák	54f7545cd7	gallium/u_upload_mgr: allow use of FLUSH_EXPLICIT with persistent mappings for radeonsi Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2019-02-20 21:04:29 -05:00
Marek Olšák	dc8a2c139d	gallium/u_threaded: always unmap const_uploader radeonsi will require this. It's a no-op for drivers supporting persistent mappings. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2019-02-20 21:04:29 -05:00
Marek Olšák	8ef6f68fa5	st/mesa: always unmap the uploader in st_atom_array.c This is a no-op for drivers supporting persistent mappings. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2019-02-20 21:04:29 -05:00
Jason Ekstrand	1a93fc382b	nir/xfb: Handle compact arrays in gather_xfb_info This makes us properly handle gl_ClipDistance and gl_CullDistance. Fixes: `19064b8c` "nir: Add a pass for gathering transform feedback info" Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-02-21 00:08:42 +00:00
Jason Ekstrand	558c314504	nir/xfb: Work in terms of components rather than slots We needed to better handle cases where a chunk of a variable starts at some non-zero location_frac and rolls over into the next slot but may not be more than 4 dwords. For example, if gl_CullDistance is an array of 3 things and has location_frac = 2, it will span across two vec4s but is not, itself, bigger than a vec4. If you ignore the clip/cull special case, it's not allowed to happen for anything else because the only things that can span more than one slot is dvec3 and dvec4 and they're both bigger than a vec4. The current code uses this attrib_slot thing where we count attribute slots and iterate over them. However, that doesn't work in the case above because gl_CullDistance will have an attrib_slot count of 1 even though it does span two slots. We could fix this by adjusting attrib_slot but we already have comp_mask and it's easier to just handle it that way. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-02-21 00:08:42 +00:00
Jason Ekstrand	4e69fba534	nir: Rewrite lower_clip_cull_distance_arrays to do a lot less lowering Instead of going to all the work of to combine them into one array, just make two arrays and use location_frac to colocate them within CLIP0. Then the back-end can sort things out and stack them on top of each other. Thanks to `ef99f4c8`, we also don't need to set compact anymore. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-21 00:08:42 +00:00
Jason Ekstrand	8f0fe71cc5	nir/xfb: Properly align 64-bit values Fixes: `19064b8c` "nir: Add a pass for gathering transform feedback info" Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-02-21 00:08:42 +00:00
Jason Ekstrand	30b548fc62	compiler/types: Add a contains_64bit helper Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-02-21 00:08:42 +00:00
Rob Clark	323958908e	freedreno/a6xx: samplerBuffer fixes Use the 'UNK31' bit (which should probably be called 'BUFFER') for samplerBuffer case, which increases the size of supported buffer texture beyond 2^15 elements. Also need to fix the 2nd coord injected to handle the tex instructions that take integer coords. Fixes dEQP-GLES31.functional.texture.texture_buffer.render.as_fragment_texture.buffer_size_131071 and similar Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-20 18:50:08 -05:00
Rob Clark	50dd773a2d	freedreno/ir3/a6xx: use ldib for ssbo reads ... instead of isam. It seems like when using isam, plus atomics, we can have the problem of old data being in the texture cache. Plus this way we don't have to load a component at a time. Note that blob still seems to use isam in some cases. I suppose it might be preferable in the case of loading a single component, when atomics are not in the picture (or that the ssbo does not need to otherwise be coherent). Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-20 18:50:08 -05:00
Rob Clark	c543a2cf6f	freedreno/ir3: sync instr/disasm and add ldib encoding Resync disasm and instr header from envytools, and add ldib encoding. This replaces an opcode from a3xx which was never seen in practice, since that seemed easier than dealing with the same opc # meaning a different thing on a6xx. (Not really sure if 'sti' was actually a real thing, I think it was only seen in fuzzing.) Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-20 18:50:08 -05:00
Rob Clark	cadf6def0c	freedreno/ir3/a6xx: fix load_ssbo barrier type. Silly copy/pasta bug, since load_image is actually the same instruction but different barrier class. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-20 18:50:08 -05:00
Rob Clark	0df0fc28a5	freedreno/ir3: rename put_dst() This was overlooked when it moved to ir3_context.c and ceased to be static.. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-20 18:50:08 -05:00
Rob Clark	7fe9e790e7	freedreno: fix crash w/ masked non-SSA dst Fixes dEQP-GLES3.functional.shaders.indexing.varying_array.vec3_dynamic_write_dynamic_loop_read regression. Fixes: `c1a27ba9ba` freedreno/ir3: HIGH reg w/a for a6xx Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-20 18:50:08 -05:00
Rob Clark	8c486083d0	freedreno/a6xx: 3d and cube image fixes Fixes dEQP-GLES31.functional.image_load_store.{3d,cube}.store.* and a bunch more Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-20 18:50:08 -05:00
Rob Clark	97479df8aa	freedreno/ir3: fix crash in compile fail case The variant will be NULL if RA failed. Which isn't ideal, but at least lets not segfault and bring down the rest of the dEQP run with us. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-20 18:50:08 -05:00
Rob Clark	f5ee8c54ed	freedreno/ir3: fix legalize for vecN inputs The wrmask is handled in regmask_get()/regmask_set(), but it wasn't being propagated from SSA src to dst. So for example, an SSBO read value that is passed in as src2.y component to atomic op, wasn't getting the (sy) flag set. Causing lots of fail. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-20 18:50:08 -05:00
Bas Nieuwenhuizen	688f5e456a	radv: Disable depth clamping even without EXT_depth_range_unrestricted. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-20 23:24:31 +00:00
Bas Nieuwenhuizen	9f7e0523ce	radv: Implement VK_EXT_depth_clip_enable. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-20 23:24:31 +00:00
Timothy Arceri	03783253b1	nir: remove non-ssa support from nir_copy_prop() Even in a very basic shader this reduces the time spent in nir_copy_prop() by ~17%. No shader-db changes for radeonsi NIR or i965. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-21 10:18:24 +11:00
Bas Nieuwenhuizen	1ef2855692	radv: Handle clip+cull distances more generally as compact arrays. Needed for https://gitlab.freedesktop.org/mesa/mesa/merge_requests/248 . That MR keeps the clip and cull arrays split. So we have to handle - compact arrays with location_frac != 0 - VARYING_SLOT_CLIP_DIST1 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-20 22:49:52 +00:00
Eric Anholt	8cfc17bdda	kmsro: Add the rest of the current set of tinydrm drivers. While I haven't tested them all, given that they're all using the same allocation paths and modifiers in the kernel they should be fine to use in the same way. v2: Rebase on other kmsro changes. v3: Skip repeated '[with_gallium_kmsro,' in the meson build. Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-20 21:49:41 +00:00
Andrii Simiklit	f4f4ec941e	i965: re-emit index buffer state on a reset option change. Seems like we forget to update the index buffer (ib) status and IndexedDrawCutIndexEnable or CutIndexEnable flag is left unchanged it leads to ignoring of glEnable/glDisable functions for GL_PRIMITIVE_RESTART in some cases. The index buffer (ib) status should be re-emmited after the reset option change to avoid some unexpected behavior. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109451 Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com> Signed-off-by: Andrii Simiklit <asimiklit.work@gmail.com>	2019-02-20 20:27:56 +02:00
Kenneth Graunke	d6337b59f6	nir: Don't forget if-uses in new nir_opt_dead_cf liveness check Commit `08bfd710a2`. (nir/dead_cf: Stop relying on liveness analysis) introduced a new check that iterated through a SSA def's uses, to see if it's used. But it only checked normal uses, and not uses which are part of an 'if' condition. This led to it thinking more nodes were dead than possible. Fixes Piglit's variable-indexing/tcs-output-array-float-index-wr test (and related tests) with the out-of-tree Iris driver. Fixes: `08bfd710a2` nir/dead_cf: Stop relying on liveness analysis Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-20 09:44:06 -08:00
Kristian H. Kristensen	b9eed05e7f	freedreno/a6xx: Support MSAA resolve blits on blitter This gets stencil and depth resolves working properly. Fixes: dEQP-GLES3.functional.fbo.msaa.2_samples.depth32f_stencil8 dEQP-GLES3.functional.fbo.msaa.2_samples.depth24_stencil8 dEQP-GLES3.functional.fbo.msaa.4_samples.depth32f_stencil8 dEQP-GLES3.functional.fbo.msaa.4_samples.depth24_stencil8 dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_msaa_color dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_msaa_color Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-20 08:56:21 -08:00
Kristian H. Kristensen	686211f4c9	freedreno/a6xx: Copy stencil as R8_UINT Blitter does support it after all. Previous attempt to use R8_UINT failed because we overwrote the a6xx format in emit_blit_texture(), but some of the later setup still looked at the gallium format. If we overwrite it in the pipe_blit_info before we even call into emit_blit_texture() it works properly. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-20 08:56:21 -08:00
Kristian H. Kristensen	e827ea8c83	freedreno: Update headers Add support for multisampled sources for the blitter. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-20 08:56:21 -08:00
Eric Engestrom	a16c398668	anv: use anv_shader_bin_write_to_blob()'s return value Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-20 16:40:13 +00:00
Eric Engestrom	d3115f34a6	anv: drop unused imports Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-20 14:28:55 +00:00
Eric Engestrom	8cbfcab425	anv: make sure the extensions stay sorted Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-20 14:28:55 +00:00
Eric Engestrom	bc76ce1033	anv: sort vendors extensions after KHR and EXT Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-20 14:28:55 +00:00
Eric Engestrom	427aa9d154	anv: sort extensions alphabetically Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-20 14:28:55 +00:00
Tapani Pälli	886cee1f96	anv: anv: refactor error handling in anv_shader_bin_write_to_blob() v2: blob manages error state internally, just return true if errors did not occur (Jason) CID: 1442546 Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-20 15:39:19 +02:00
Carlos Garnacho	30a01cd923	wayland/egl: Ensure EGL surface is resized on DRI update_buffers() Fullscreening and unfullscreening a totem window while playing a video sometimes results in the video subsurface not changing size along. This is also reproducible with epiphany. If a surface gets resized while we have an active back buffer for it, the resized dimensions won't get neither immediately applied on the resize callback, nor correctly synchronized on update_buffers(), as the (now stale) surface size and currently attached buffer size still do match. There's actually 2 things to synchronize here, first the surface query size might not be updated yet to the wl_egl_window's (i.e. resize_callback happened while there is a back buffer), and second the wayland buffers would need dropping if new surface size differs with the currently attached buffer. These are done in separate steps now. https://bugzilla.redhat.com/show_bug.cgi?id=1650929 https://bugs.freedesktop.org/show_bug.cgi?id=109594 Fixes: `a9fb331ea7` ("wayland/egl: update surface size on window resize") Signed-off-by: Carlos Garnacho <carlosg@gnome.org> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Reviewed-by: Daniel Stone <daniels@collabora.com> Tested-by: Bastien Nocera <hadess@hadess.net> Tested-by: Denys Kostin <denys.kostin@globallogic.com>	2019-02-20 12:04:33 +01:00
Lionel Landwerlin	f509213675	anv: implement VK_EXT_depth_clip_enable A new extension allowing the user to explictly specify the clipping behavior. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-02-20 09:57:58 +00:00
Lionel Landwerlin	fa4e103c32	vulkan: Update the XML and headers to 1.1.101	2019-02-20 09:57:58 +00:00
Samuel Iglesias Gonsálvez	63a919a3ce	isl: remove the cache line size alignment requirement The cacheline size was a requirement for using the BLT engine, which we don't use anymore except for a few things on old HW, so we drop it. Fixes CTS's CL#3500 test: dEQP-VK.api.image_clearing.core.clear_color_image.2d.linear.single_layer.r8g8b8_unorm Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-20 08:28:31 +01:00
Bas Nieuwenhuizen	572854e706	radv: Clean up a bunch of compiler warnings. Random unused vars. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-20 03:21:09 +01:00
Bas Nieuwenhuizen	7631feaa00	radv: Sync ETC2 whitelisted devices. Fixes: `4bb6c49375` "radv: Allow ETC2 on RAVEN and VEGA10 instead of all GFX9." Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-02-20 02:55:41 +01:00
Timothy Arceri	3d7611e9a6	st/nir: use NIR for asm programs This uses prog_to_nir to translate ARB assembly programs to NIR. Co-authored by Tim Arceri, Dave Airlie, and Ken Graunke: - [Tim Arceri]: original patch - [Dave Airlie]: fix crashes with parameter names - [Ken Graunke]: - Rebase on SCALAR_ISA cap, lower wpos_ytransform too. - Rebase on streamout fixes. - Lower system values for fragcoord support. - Don't try to use prog_to_nir for ATI_fragment_shader programs. - Create TGSI for fixed-function or ARB vertex shaders even if the driver prefers NIR, so we can create draw module shaders for feedback/select emulation, which rely on TGSI. Tested on: - iris (Intel Skylake/Kabylake): Piglit & GL CTS - Ken Graunke - radeonsi (AMD Vega 64): Piglit - Ken Graunke - vc4/v3d - Piglit - Eric Anholt - freedreno - dEQP - Kristian Høgsberg Fixes lit_degenerate_case on vc4 and v3d, and vp-address-01, vp-arl-constant-array-huge-offset-neg, and vp-arl-neg-array on v3d. No Piglit regressions on radeonsi; no dEQP regressions on freedreno. Acked-by: Eric Anholt <eric@anholt.net> Tested-by: Eric Anholt <eric@anholt.net> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-19 15:56:26 -08:00
Kenneth Graunke	3b4929ec6e	st/mesa: Copy VP TGSI tokens if they exist, even for NIR shaders. Even if the driver wants to use NIR shaders, we may need to have TGSI tokens for creating draw module vertex shaders for the feedback/select render modes. So...if the st_vertex_program has any TGSI...copy it to the variant. Acked-by: Eric Anholt <eric@anholt.net> Tested-by: Eric Anholt <eric@anholt.net> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-19 15:56:19 -08:00
Kenneth Graunke	ba7519ca36	radeonsi: Go back to using llvm.pow intrinsic for nir_op_fpow ARB_vertex_program and ARB_fragment_program define 0^0 = 1 (while GLSL leaves it undefined). Performing fpow lowering in NIR would break this behavior, preventing us from using prog_to_nir. According to llvm/lib/Target/AMDGPU/SIInstructions.td, POW_common expands to <V_LOG_F32_e32, V_EXP_F32_e32, V_MUL_LEGACY_F32_e32>, which presumably does a zero-wins multiply. Lowering in NIR results in a non-legacy multiply, where: pow(0, 0) = 2^(log2(0) * 0) = 2^(-INF * 0) = 2^(-NaN) = -NaN which isn't the desired result. This reverts: - commit `d6b7539206` (ac/nir: remove emission of nir_op_fpow) - commit `22430224fe` (radeonsi/nir: enable lowering of fpow) and prevents a regression in gl-1.0-spot-light with AMD_DEBUG=nir after enabling prog_to_nir in st/mesa later in this series. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-19 15:56:19 -08:00
Timothy Arceri	9c4d5926aa	radeonsi/nir: set shader_buffers_declared properly Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-02-20 10:46:19 +11:00
Timothy Arceri	94a3df62d7	radeonsi/nir: set colors_read properly shader-db results for VEGA64: Totals from affected shaders: SGPRS: 1976 -> 1976 (0.00 %) VGPRS: 1240 -> 1144 (-7.74 %) Spilled SGPRs: 145 -> 145 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 34632 -> 34604 (-0.08 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 261 -> 285 (9.20 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-02-20 10:46:19 +11:00
Timothy Arceri	05cc1dd764	radeonsi/nir: set input_usage_mask properly shader-db results for VEGA64: Totals from affected shaders: SGPRS: 791528 -> 792616 (0.14 %) VGPRS: 421624 -> 410784 (-2.57 %) Spilled SGPRs: 1639 -> 1674 (2.14 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 16103516 -> 16063696 (-0.25 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 136307 -> 137830 (1.12 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-02-20 10:46:19 +11:00
Timur Kristóf	9429bcc4b0	radeonsi/nir: Use uniform location when calculating const_file_max. The nine state tracker can produce NIR uniform variables whose location is explicitly set. radeonsi did not take that into account when calculating const_file_max, resulting in rendering glitches. This patch fixes that. Signed-Off-By: Timur Kristóf <timur.kristof@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-20 10:37:47 +11:00
Mario Kleiner	afb15d14ca	drirc: Add sddm-greeter to adaptive_sync blacklist. This is the sddm login screen. Fixes: `a9c36dbf9c` ("drirc: Initial blacklist for adaptive sync") Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Cc: 19.0 <mesa-stable@lists.freedesktop.org> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-02-19 18:03:05 -05:00
Marek Olšák	bff8da6c59	driconf: add Civ6Sub executable for Civilization 6 I'm getting Civ6Sub instead of Civ6. Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-19 17:59:17 -05:00
Marek Olšák	ae21bdf47c	radeonsi: always enable NIR for Civilization 6 to fix corruption Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104602 Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-19 17:59:17 -05:00
Marek Olšák	ccbfe44e5f	radeonsi: add driconf option radeonsi_enable_nir Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-19 17:59:17 -05:00
Kenneth Graunke	f9c835eb56	mesa: Align doubles to a 64-bit starting boundary, even if packing. In the new Intel Iris driver, I am using Tim's new packed uniform storage system. It works great, with one caveat: our scalar compiler backend assumes that uniform offsets will be aligned to the underlying data type. For example, doubles must be 64-bit aligned, floats 32-bit, half-floats 16-bit, and so on. It does not need any other padding. Currently, _mesa_add_parameter aligns everything to 32-bit offsets, creating doubles that have an unaligned offset. This patch alters that code to align doubles to 64-bit offsets. This may be slightly less optimal for drivers which can support full packing, and allow reads from unaligned offsets at no penalty. We could make this extra alignment optional. However, it only comes into play when intermixing double and single precision uniforms. Doubles are already not too common, and intermixed values (floats then doubles) is probably even less common. At most, we burn a single 32-bit slot to the alignment, which is not that expensive. So, it doesn't seem worthwhile to add the extra complexity. Eventually, we'll likely want to update this code to allow half-float values to be packed tighter than 32-bit offsets. At that point, we'll probably want to revisit what drivers ultimately want, and add options. Acked-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-19 13:26:58 -08:00
Kenneth Graunke	3c2c6bd1c7	compiler: Make is_64bit(GL_*) helper more broadly available I'd like to use this in the prog_parameter.c code, so I need to move it into C, make it non-static, and so on. This probably isn't the ideal place for it, but I couldn't think of a better one. Acked-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-19 13:26:58 -08:00
Eric Engestrom	daf8ada08d	gitlab-ci: automatically run the CI on pushes to `ci/` branches Last commit limited the CI to master and MRs, but to avoid having to manually trigger CI runs, let's add a 3rd, automatic way: by pushing to a branch named `ci/` (or `ci-*` or just `ci`) (which you can delete afterwards, the pipeline results will remain). Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-02-19 16:57:32 +00:00
Eric Engestrom	861ade7042	gitlab-ci: limit the automatic CI to master and MRs Runs on random other branches (stables RCs, personal forks) can still be triggered manually via the web interface, or an app using the API. This should massively help with the current voracious state of our CI. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-02-19 16:57:28 +00:00
Eric Engestrom	f84f833981	tegra/autotools: add missing libdrm cflags Fixes: `f1374805a8` "drm-uapi: use local files, not system libdrm" Bug: https://bugs.freedesktop.org/show_bug.cgi?id=109647 Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-02-19 13:29:05 +00:00
Eric Engestrom	b787403a21	tegra/meson: add missing dep_libdrm Fixes: `f1374805a8` "drm-uapi: use local files, not system libdrm" Bug: https://bugs.freedesktop.org/show_bug.cgi?id=109645 Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-02-19 13:29:00 +00:00
Rhys Perry	238730daef	ac/nir: implement half-float nir_op_ldexp Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-19 11:04:46 +00:00
Rhys Perry	6971e8d342	ac/nir: implement half-float nir_op_frsq v2: don't use ac_get_onef() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-19 11:04:41 +00:00
Rhys Perry	2038aec22a	ac/nir: implement half-float nir_op_frcp v2: don't use ac_get_onef() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-19 11:04:35 +00:00
Rhys Perry	4261edc067	ac/nir: make ac_build_fdiv support 16-bit floats v2: don't use ac_get_onef() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-19 11:04:29 +00:00
Rhys Perry	6790b3a8db	ac/nir: make ac_build_isign work on all bit sizes v2: don't use ac_get_zero(), ac_get_one() and ac_int_of_size() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-19 11:04:20 +00:00
Rhys Perry	bbbfdef683	ac/nir: make ac_build_clamp work on all bit sizes v2: don't use ac_get_zerof() and ac_get_onef() v3: rename "intr" to "name" Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-19 11:03:58 +00:00
Rhys Perry	7e5004e30a	ac/nir: fix 64-bit nir_op_f2f16_rtz Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-19 11:03:44 +00:00
Rhys Perry	c4ea20c0a0	ac/nir: implement 8-bit nir_load_const_instr Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-19 11:03:33 +00:00
Rhys Perry	0ca550e01a	radv: ensure export arguments are always float So that the signature is correct and consistent, the inputs to a export intrinsic should always be 32-bit floats. This and the previous commit fixes a large amount crashes from dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_int_* tests Fixes: `b722b29f10` ('radv: add support for 16bit input/output') Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-19 11:03:22 +00:00
Rhys Perry	64065aa504	radv: bitcast 16-bit outputs to integers 16-bit outputs are stored as 16-bit floats in the outputs array, so they have to be bitcast. Fixes: `b722b29f10` ('radv: add support for 16bit input/output') Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-19 11:03:18 +00:00
Eric Engestrom	23b485c920	gitlab-ci: use ccache to speed up builds Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-19 10:09:51 +00:00
Eric Anholt	dbe3af67a4	v3d: Move i2b and f2b support into emit_comparison. This lets us save a resolve to NIR true/false for ifs and discard_if. No change in shader-db.	2019-02-18 18:18:37 -08:00
Eric Anholt	0bba9c8489	v3d: Emit a simpler negate for the iabs implementation. One program affected in my shader-db. instructions in affected programs: 110 -> 108 (-1.82%)	2019-02-18 18:13:09 -08:00
Eric Anholt	1a775d43c9	v3d: Delay emitting ldvpm on V3D 4.x until it's actually used. For V3D 3.x, we emitted the ldvpms all at the top so that we didn't need to do VPM setup when the load_inputs are out of order. For V3D 4.x, we can reduce register pressure by delaying our loads until they're actually needed. This also avoids a bunch of silly MOVs in the pre-opt VIR dump. total instructions in shared programs: 6421415 -> 6419933 (-0.02%) total uniforms in shared programs: 2393139 -> 2393140 (<.01%) total threads in shared programs: 153864 -> 153906 (0.03%)	2019-02-18 18:09:07 -08:00
Eric Anholt	5a84d46896	v3d: Stop tracking num_inputs for VPM loads. It's unused in the VS (since we need vattr_sizes[] anyway), so move it to FS prog data.	2019-02-18 18:09:07 -08:00
Eric Anholt	581eba072d	v3d: Add a function to describe what the c->execute.file check means. This is what pointed out that we were misusing the check for last_thrsw in the previous commit.	2019-02-18 18:09:07 -08:00
Eric Anholt	441294962c	v3d: Fix the check for "is the last thrsw inside control flow" The execute.file check used to be good enough, until I stopped setting up the execute mask for uniform ifs. No known tests fixed, noticed while doing a refactor. Fixes: `0805060573` ("v3d: Handle dynamically uniform IF statements with uniform control flow.")	2019-02-18 18:09:07 -08:00
Eric Anholt	07d5b5a972	v3d: Fix f2b32 behavior. Now that we don't have the vir_PF() magic, it's obvious that we were doing the wrong thing for f2b32 by allowing -0.0 to produce true instead of false.	2019-02-18 18:09:07 -08:00
Eric Anholt	3022b4bd82	v3d: Kill off vir_PF(), which is hard to use right. You were allowed to pass in any old temp so that you could hopefully fold the PF up into the def of the temp. If we couldn't find one, it implicitly generated a MOV(nop, reg). However, that PF could have different behavior depending on whether the def being folded into was a float or int opcode, which the caller doesn't necessarily control. Due to the fragility of the function, just switch all callers over to vir_set_pf(). This also encourages the callers to use a _dest call for the inst they're putting the PF on, eliminating a bunch of temps in the pre-optimization VIR. shader-db says the change is in the noise: total instructions in shared programs: 6226247 -> 6227184 (0.02%) instructions in affected programs: 851068 -> 852005 (0.11%)	2019-02-18 18:09:06 -08:00
Eric Anholt	6186a8d44e	v3d: Do bool-to-cond for discard_if as well. Turns this minimal conditional discard (glsl-fs-discard-01.shader_test): 0x3de0b086c5fe9000 fcmp.pushn -, r1, r5; mov r2, 0 0x3dec3086bbfc001f nop ; mov.ifa r2, -1 0x3c047186bbe80000 nop ; mov.pushz -, r2 0x3dea3186ba837000 setmsf.ifna -, 0 ; nop into: 0x3c00b186c582a000 fcmp.pushn -, r2, r5; nop 0x3de83186ba837000 setmsf.ifa -, 0 ; nop total instructions in shared programs: 6229820 -> 6226247 (-0.06%)	2019-02-18 18:09:06 -08:00
Eric Anholt	718eef62cb	v3d: Refactor bcsel and if condition handling. Both were doing the same thing to try to get a condition to predicate on. Noticed when I wanted to do this for discard_if as well. No change in shader-db.	2019-02-18 18:09:06 -08:00
Eric Anholt	4586f9f902	v3d: Add a helper function for getting a nop register. Just a little refactor to explain what's going on with QFILE_NULL.	2019-02-18 18:09:06 -08:00
Eric Anholt	339155122b	v3d: Drop our hand-lowered nir_op_ffract. The NIR lowering works fine, though it causes some slight noise due to what looks like choices about propagating constants up multiply chains changing. total instructions in shared programs: 6229671 -> 6229820 (<.01%) total uniforms in shared programs: 2312171 -> 2312324 (<.01%)	2019-02-18 18:09:06 -08:00
Eric Anholt	16f5085490	v3d: Drop a perf note about merging unpack_half_*, which has been implemented. This is handled with copy-propagation now.	2019-02-18 18:09:06 -08:00
Eric Anholt	146e432b49	v3d: Fix incorrect flagging of ldtmu as writing r4 on v3d 4.x. Fixes some stalls in 3DMMES's main vertex shader. total instructions in shared programs: 6280751 -> 6211270 (-1.11%) instructions in affected programs: 2935050 -> 2865569 (-2.37%)	2019-02-18 18:09:06 -08:00
Eric Anholt	cd5e0b2729	v3d: Use the early_fragment_tests flag for the shader's disable-EZ field. Apparently we need disable-EZ flagged, not just "does Z writes". Fixes dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo on 7278, even though it passed in simulation. Signed-off-by: Eric Anholt <eric@anholt.net> Fixes: `051a41d3d5` ("v3d: Add support for the early_fragment_tests flag.")	2019-02-18 18:09:06 -08:00
Eric Anholt	332b969c4e	v3d: Sync indirect draws on the last rendering. Fixes intermittent fails in dEQP-GLES31.functional.draw_indirect.compute_interop.separate.drawelements_compute_cmd_and_data_and_indices and others (particularly when run as part of a CTS run)	2019-02-18 18:09:06 -08:00
Eric Anholt	32f16b0b1e	v3d: Clear the GMP on initialization of the simulator. Otherwise, we might have pages accessible that shouldn't be and miss out on errors. This is unlikely for most tests since v3d_hw_get_mem() is big enough that it'll be a freshly zeroed mmap, but if screens are destroyed and recreated then we'd be reusing the old v3d_hw_get_mem() contents.	2019-02-18 18:09:06 -08:00
Emil Velikov	ba652394a3	docs: update calendar, add news item and link release notes for 18.3.4 Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2019-02-18 18:38:14 +00:00
Emil Velikov	d7108dac73	docs: add sha256 checksums for 18.3.4 Signed-off-by: Emil Velikov <emil.velikov@collabora.com> (cherry picked from commit `bfb5bdaa97`)	2019-02-18 18:36:23 +00:00
Emil Velikov	a1ccff4aaf	docs: add release notes for 18.3.4 Signed-off-by: Emil Velikov <emil.velikov@collabora.com> (cherry picked from commit `b26488dead`)	2019-02-18 18:36:21 +00:00
Ilia Mirkin	57441af8bf	i965: always enable EXT_float_blend From the table in isl_format.c, it appears that all generations support blending on 32-bit float surfaces. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-18 12:13:54 -05:00
Ilia Mirkin	9fec653093	st/mesa: enable GL_EXT_float_blend when possible If the driver supports PIPE_BIND_BLENABLE on RGBA32F, flip EXT_float_blend on (which will affect ES3 contexts). Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-02-18 12:13:54 -05:00
Ilia Mirkin	070a5e5d92	mesa: add explicit enable for EXT_float_blend, and error condition If EXT_float_blend is not supported, error out on blending of FP32 attachments in an ES2 context. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-18 12:13:54 -05:00
Samuel Pitoiset	47616810ed	radv: fix writing the alpha channel of MRT0 when alpha coverage is enabled This version is better and safer. Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-18 18:06:07 +01:00
Rob Clark	d6c43cceff	freedreno/ir3: handle quirky atomic dst for a6xx The new encoding returns a value via the 2nd src. The legalize pass needs to be aware of this to set the correct needs_sy flag, otherwise we can, in cases where the atomic dst is not used, overwrite the register that hardware will asynchronously load result into without (sy) flag, so it gets clobbered by the atomic result. This fixes a whole lot of rando ssbo+atomic fails, like dEQP-GLES31.functional.ssbo.layout.single_basic_type.packed.highp_vec4. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-18 12:01:36 -05:00
Rob Clark	28fc6733cd	freedreno/a6xx: fix helper_invocation (sampler mask/id) Since gl_HelperInvocation is lowered to: !((1 << sample_id) & sample_mask_in)) Not setting these enable bits was causing it be broken. (And probably a bunch of other stuff too.) Fixes dEQP-GLES31.functional.shaders.helper_invocation.* Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-18 10:37:54 -05:00
Samuel Pitoiset	32ab7a59bb	radv: remove unused variable in gather_push_constant_info() Trivial. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-18 13:30:16 +01:00
Lionel Landwerlin	8c87d029bc	i965: scale factor changes should trigger recompile Found by inspection. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `3da858a6b9` ("intel/compiler: add scale_factors to sampler_prog_key_data") Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-02-18 12:18:13 +00:00
Samuel Pitoiset	0d8f096293	radv: write the alpha channel of MRT0 when alpha coverage is enabled Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109597 Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-18 12:14:22 +01:00
Samuel Pitoiset	2cf5433b99	ac: use new LLVM 8 intrinsic when loading 16-bit values Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-18 12:14:20 +01:00
Samuel Pitoiset	f0223143a8	ac: add ac_build_llvm8_tbuffer_load() helper It uses the new LLVM intrinsics. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-18 12:14:17 +01:00
Tapani Pälli	9762a9f893	mesa: return NULL if we exceed MaxColorAttachments in get_fb_attachment This fixes invalid access to Attachment array which would occur if caller would exceed MaxColorAttachments. In practice this should not ever happen because DiscardFramebufferEXT specifies only GL_COLOR_ATTACHMENT0 to be valid and InvalidateFramebuffer will error out before but this should make coverity happy. v2: const, remove _EXT (Ian) CID: 1442559 Fixes: `0c42b5f3cb` "mesa: wire up InvalidateFramebuffer" Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-02-18 07:51:55 +02:00
Alyssa Rosenzweig	2c6a7fbeb7	panfrost: Fix clipping region Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-18 05:13:50 +00:00
Alyssa Rosenzweig	fa1b36ddc2	panfrost: Preserve w sign in perspective division This fixes issues where polygons that should be culled (due to negative w, for instance) may not be. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-18 05:13:34 +00:00
Alyssa Rosenzweig	49985cebea	panfrost: Cleanup mali_viewport (clipping) code Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-18 05:13:03 +00:00
Alyssa Rosenzweig	a94463732a	panfrost: Swap order of tiled texture (de)alloc Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-18 05:10:33 +00:00
Alyssa Rosenzweig	4a4ed53c01	panfrost: Free imported BOs Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-18 05:10:06 +00:00
Alyssa Rosenzweig	b5a01296f4	panfrost: Fix various leaks unmapping resources v2: Don't check for NULL before free() Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-18 05:09:41 +00:00
Kenneth Graunke	535251487b	nir: Don't reassociate add/mul chains containing only constants The idea here is to reassociate a * (b * c) into (a * c) * b, when b is a non-constant value, but a and c are constants, allowing them to be combined. But nothing was enforcing that 'b' must be non-constant, which meant that running opt_algebraic in a loop would never terminate if the IR contained non-folded constant expressions like 256 * 0.5 * 2. Normally, we call constant folding in such a loop too, but IMO it's better for nir_opt_algebraic to be robust and not rely on that. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109581 Fixes: `32e266a9a5` i965: Compile fp64 funcs only if we do not have 64-bit hardware support Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-02-16 23:36:14 -08:00
Chris Wilson	e9882b879b	i965: Assert the execobject handles match for this device Object handles are local to the device fd, so double check we are not mixing together objects from multiple screens on execbuf submission. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-16 23:35:29 -08:00
Rob Clark	99b90ecd35	freedreno/a6xx: cache flush harder Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-16 16:28:00 -05:00
Rob Clark	1af0c5d320	freedreno/a6xx: compute support Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-16 16:28:00 -05:00
Rob Clark	5118dcf8c3	freedreno/a6xx: image/ssbo state emit Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-16 16:28:00 -05:00
Rob Clark	2183d9cff7	freedreno/a6xx: border-color offset helper Soon we'll need this logic to deal w/ image/SSBO case, so split out a helper rather than duplicate the logic. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-16 16:28:00 -05:00
Rob Clark	c1a27ba9ba	freedreno/ir3: HIGH reg w/a for a6xx It seems like some instructions (noticed this w/ cat3), cannot read HIGH regs.. cat1 (mov/cov) can, and possibly some/all of cat2. The blob seems to stick w/ an extra mov into low regs. So lets do the same. This fixes WGID on a6xx, which unsurprisingly is related to a lot of deqp compute fails. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-16 16:28:00 -05:00
Rob Clark	947848524d	freedreno/ir3: add a6xx+ SSBO/image support Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-16 16:28:00 -05:00
Rob Clark	b46d5b8a84	freedreno/ir3: add a6xx instruction encoding For the handful of instructions that use a new encoding. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-16 16:27:59 -05:00
Rob Clark	2e0ea3f09c	freedreno/ir3: add image/ssbo <-> ibo/tex mapping Images and SSBOs don't map directly to the hw. They end up being part texture and part something else. Starting with a6xx, the hack used for a5xx to smash the image tex state into hw texture state starting from MAX counting down won't work, because we start using tex state also for SSBO read. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-16 16:27:59 -05:00
Rob Clark	75f3a5245e	freedreno/ir3: fix ncomp for _store_image() src Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-16 16:27:59 -05:00
Rob Clark	feee3050d3	freedreno/ir3: split out a4xx+ instructions Note that image/ssbo support is currently only implemented for a5xx. But the instruction encoding is the same for a4xx. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-16 16:27:59 -05:00
Rob Clark	42af0640f6	freedreno/ir3: split out image helpers Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-16 16:27:59 -05:00
Rob Clark	aefdb9bed2	freedreno/a6xx: clean up some open-coded bits Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-16 16:27:59 -05:00
Rob Clark	b51de44dea	freedreno/a6xx: move stream-out emit to helper Split out of the main fd6_emit() code, since it was already getting to be a pretty giant function. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-16 16:26:14 -05:00
Rob Clark	c0d6be11d6	freedreno/ir3: fix varying packing vs. tex sharp edge We probably need to rethink how we detect which instruction first defines higher register classes. But for now, this at least fixes the symptom. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-16 16:26:14 -05:00
Samuel Pitoiset	52bdb043af	radv: fix invalid element type when filling vertex input default values The elements added into a vector should have the same type as the first one, otherwise this hits an assertion in LLVM. Fixes: `4b3549c084` ("radv: reduce the number of loaded channels for vertex input fetches") reported-by: Philip Rebohle <philip.rebohle@tu-dortmund.de> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-16 15:33:18 +01:00
Eleni Maria Stea	7188e2ba15	i965: Removed the field etc_format from the struct intel_mipmap_tree After the previous changes to emulate the ETC/EAC formats using the secondary shadow miptree, the etc_format field of the intel_mipmap_tree struct became redundant and the remaining check that used it has been replaced. (Nanley Chery) Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2019-02-15 15:54:41 -08:00
Eleni Maria Stea	248f2e7888	i965: Enabled the OES_copy_image extension on Gen 7 GPUs OES_copy_image extension was disabled on Gen7 due to the lack of support for ETC2 images. Enabled it back. (Kenneth Graunke) v2: - Removed the blank lines in the comments above OES_copy_image and OES_texture_view extensions in intel_extensions.c (Nanley Chery) Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2019-02-15 15:54:41 -08:00
Eleni Maria Stea	db0c379c06	i965: Fixed the CopyImageSubData for ETC2 on Gen < 8 For CopyImageSubData to copy the data during the 1st draw call, we need to update the shadow tree right before the rendering. v2: - Added assertion that the miptree doesn't need update at the time we update the texture surface. (Nanley Chery) v3: - As we now update the tree before the rendering we don't need to copy the data during the unmap anymore. Removed the unnecessary update from the intel_miptree_unmap in intel_mipmap_tree.c (Nanley Chery) v4: - Fixed unrelated empty line removal (Nanley Chery) - As now the intel_upate_etc_shadow of intel_mipmap_tree.c is only called inside its following function, we don't need to declare it at the top of the file anymore. (Nanley Chery) Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2019-02-15 15:54:41 -08:00
Eleni Maria Stea	d8eb7287fe	i965: Faking the ETC2 compression on Gen < 8 GPUs using two miptrees. GPUs Gen < 8 cannot sample ETC2 formats. So far, they converted the compressed EAC/ETC2 images to non-compressed RGBA images. When GetCompressed* functions were called, the pixels were returned in this RGBA format and not the compressed format that was expected. Trying to fix this problem, we use a secondary shadow miptree to store the decompressed data for the rendering and the main miptree to store the compressed for the Get functions to work. Each time that the main miptree is written with compressed data, we decompress them to RGB and update the shadow. Then we use the shadow for rendering. v2: - Fixes in the commit message (Nanley Chery) - Reversed the changes in brw_get_texture_swizzle and swapped the b, g values at the time that we decompress the data in the function: intel_miptree_update_etc_shadow of intel_mipmap_tree.c (Nanley Chery) - Simplified the format checks in the miptree_create function of the intel_mipmap_tree.c and reserved the call of the intel_lower_compressed_format for the case that we are faking the ETC support (Nanley Chery) - Removed the check for the auxiliary usage for the shadow miptree at creation (miptree_create of intel_mipmap_tree.c) as we won't use auxiliary buffers with these types of trees (Nanley Chery) - Set the etc_format of the non-ETC miptrees to MESA_FORMAT_NONE and removed the unecessary checks (Nanley Chery) - Fixed an unrelated indentation change (Nanley Chery) - Modified the function intel_miptree_finish_write to set the mt->shadow_needs_update to true to catch all the cases when we need to update the miptree (Nanley Chery) - In order to update the shadow miptree during the unmap of the main and always map the main (Nanley Chery) the following change was necessary: Splitted the previous update function that was updating all the mipmap levels and use two functions instead: one that updates one level and one that updates all of them. Used the first during unmap and the second before the rendering. - Removed the BRW_MAP_ETC_BIT flag and the mechanism to decide which miptree should be mapped each time and reversed all the changes in the higher level texture functions that upload data to textures as they aren't needed anymore. - Replaced the boolean needs_fake_etc with an inline function that checks when we need to fake the ETC compression (Nanley Chery) - Removed the initialization of the strides in the update function as the values will be overwritten by the intel_miptree_map call (Nanley Chery) - Used minify instead of division in the new update function intel_miptree_update_etc_shadow_levels in intel_mipmap_tree.c (Nanley Chery) - Removed the depth from the calculation of the number of slices in the new update function (intel_miptree_update_etc_shadow_levels of intel_mipmap_tree.c) as we don't need to support 3D ETC images. (Nanley Chery) v3: - Renamed the rgba_fmt in function miptree_create (intel_mipmap_tree.c) to decomp_format as the format is not always in rgba order. (Nanley Chery) - Documented the new usage for the shadow miptree in the comment above the field in the intel_miptree struct in intel_mipmap_tree.h (Nanley Chery) - Removed the redundant flags from the mapping of the miptrees in intel_miptree_update_etc_shadow of intel_mipmap_tree.c (Nanley Chery) - Fixed the switch from surface's logical level to physical level in the intel_miptree_update_etc_shadow_levels of intel_mipmap_tree.c (Nanley Chery) - Excluded the Baytrail GPUs from the check for the ETC emulation as they support the ETC formats natively. (Nanley Chery) - Simplified the check if the format is BGRA in intel_miptree_update_etc_shadow of intel_mipmap_tree.c (Nanley Chery) v4: - Removed the functions intel_miptree_(map\|unmap)_etc and the check if we need to call them as with the new changes, they became unreachable. (Nanley Chery) - We'd rather calculate the level width and height using the shadow miptree instead of the main in intel_miptree_update_etc_shadow_levels of intel_mipmap_tree.c (Nanley Chery) - Fixed the format in the mt_surface_usage, set at the miptree creation, in miptree_create of intel_mipmap_tree.c (Nanley Chery) v5: - Fixed the levels calculations in intel_mipmap_tree.c (Nanley Chery) - Update the flag shadow_needs_update outside the function intel_miptree_update_etc_shadow (Nanley Chery) - Fixed indentation error (Nanley Chery) v6: - Fixed typo in commit message (Nanley Chery) - Simplified the assignment of the mt_fmt in the miptree_create of the intel_mipmap_tree.c (Nanley Chery) - Combined declarations and assignments where it was possible in the intel_miptree_update_etc_shadow and intel_miptree_update_etc_shadow_levels of the intel_mipmap_tree.c (Nanley Chery) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81843 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104272 Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2019-02-15 15:54:41 -08:00
Nanley Chery	c6dada70f0	i965: Rename intel_mipmap_tree::r8stencil_* -> ::shadow_* Use more generic field names. We'll reuse these fields for a workaround with ASTC miptrees. Reviewed-by: Eleni Maria Stea <estea@igalia.com>	2019-02-15 15:54:41 -08:00
Timothy Arceri	a801196ec9	nir: remove simple dead if detection from nir_opt_dead_cf() This was probably useful when it was first written, however it looks to be no longer necessary. As far as I can tell these days dce is smart enough to remove useless instructions from if branches. Once this is done nir_opt_peephole_select() will end up removing the empty if. Removing this support reduces the dolphin uber shader compilation time spent in nir_opt_dead_cf() by a little over 7x. No shader-db changes on i965 or radeonsi. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-02-16 10:45:31 +11:00
Alok Hota	f695e43354	swr/rast: Add translation support to streamout Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-02-15 14:54:29 -06:00
Alok Hota	a7fa0cc0a5	swr/rast: simdlib cleanup, clipper stack space fixes Reduce stack space used by clipper, which had lead to crashes in some versions for MSVC Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-02-15 14:54:23 -06:00
Alok Hota	f9c29a301a	swr/rast: convert DWORD->uint32_t, QWORD->uint64_t Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-02-15 14:54:19 -06:00
Alok Hota	c503b58878	swr/rast: Refactor scratch space variable names Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-02-15 14:54:14 -06:00
Alok Hota	0b4db43705	swr/rast: FP consistency between POSH/RENDER pipes - Ensure all threads have optimal floating-point control state - Disable auto-generation of fused FP ops for VERTEX shader stage - Disable "fast" FP ops for VERTEX shader stage Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-02-15 14:54:09 -06:00
Alok Hota	dc7b3c95a4	swr/rast: Move knob defaults to generated cpp file Reduces amount of compile churn when testing different default values Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-02-15 14:54:04 -06:00
Alok Hota	05e4ff33f5	swr/rast: Flip BitScanReverse index calculation The intrinsic returns the number of leading zeros, not the bit number of the first nonzero, so just flip it based on the mask size Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-02-15 14:53:58 -06:00
Alok Hota	ae400a9b11	swr/rast: Correctly align 64-byte spills/fills Fixes crashes on some compute shaders when running on AVX512 Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-02-15 14:53:54 -06:00
Alok Hota	78bab66479	swr/rast: Disable use of __forceinline by default - Was not useful to inline in release builds - FORCEINLINE can be used if absolutely necessary Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-02-15 14:52:51 -06:00
Alok Hota	20d5c88760	swr/rast: Convert system memory pointers to gfxptr_t Fulfills an unused internal interface Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-02-15 14:52:32 -06:00
Bas Nieuwenhuizen	4b03a19a0b	radv: Use correct num formats to detect whether we should be use 1.0 or 1. normalized and scaled formats also return floats. Fixes: `4b3549c084` ("radv: reduce the number of loaded channels for vertex input fetches") Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-15 20:24:16 +00:00
Ian Romanick	979b43b347	nir/algebraic: Simplify comparison with sequential integers starting with 0 All of the affected shaders are Unreal4 demos. All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15437170 -> 15437001 (<.01%) instructions in affected programs: 21536 -> 21367 (-0.78%) helped: 43 HURT: 0 helped stats (abs) min: 1 max: 4 x̄: 3.93 x̃: 4 helped stats (rel) min: 0.68% max: 1.01% x̄: 0.80% x̃: 0.80% 95% mean confidence interval for instructions value: -4.07 -3.79 95% mean confidence interval for instructions %-change: -0.83% -0.77% Instructions are helped. total cycles in shared programs: 383007896 -> 383007378 (<.01%) cycles in affected programs: 158640 -> 158122 (-0.33%) helped: 38 HURT: 4 helped stats (abs) min: 1 max: 48 x̄: 13.89 x̃: 6 helped stats (rel) min: 0.03% max: 1.01% x̄: 0.33% x̃: 0.19% HURT stats (abs) min: 2 max: 3 x̄: 2.50 x̃: 2 HURT stats (rel) min: 0.06% max: 0.09% x̄: 0.08% x̃: 0.08% 95% mean confidence interval for cycles value: -16.90 -7.77 95% mean confidence interval for cycles %-change: -0.39% -0.19% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 8213746 -> 8213745 (<.01%) instructions in affected programs: 127 -> 126 (-0.79%) helped: 1 HURT: 0 total cycles in shared programs: 187734146 -> 187734144 (<.01%) cycles in affected programs: 2132 -> 2130 (-0.09%) helped: 1 HURT: 0 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-15 11:11:02 -08:00
Ian Romanick	ad05920258	nir/algebraic: Convert some f2u to f2i Section 5.4.1 (Conversion and Scalar Constructors) of the GLSL 4.60 spec says: It is undefined to convert a negative floating-point value to an uint. Assuming that (uint)some_float behaves like (uint)(int)some_float allows some optimizations in the i965 backend to proceed. This basically undoes the small amount of damage done by "intel/compiler: Avoid propagating inequality cmods if types are different". v2: Replicate part of the commit message as a comment in the code. Suggested by Jason. shader-db results compairing before "intel/compiler: Avoid propagating inequality cmods if types are different" and after this commit: Skylake total cycles in shared programs: 383007996 -> 383007896 (<.01%) cycles in affected programs: 85208 -> 85108 (-0.12%) helped: 13 HURT: 8 helped stats (abs) min: 2 max: 26 x̄: 10.77 x̃: 6 helped stats (rel) min: 0.09% max: 0.65% x̄: 0.28% x̃: 0.14% HURT stats (abs) min: 2 max: 12 x̄: 5.00 x̃: 3 HURT stats (rel) min: 0.04% max: 0.32% x̄: 0.12% x̃: 0.07% 95% mean confidence interval for cycles value: -9.31 -0.21 95% mean confidence interval for cycles %-change: -0.24% <.01% Cycles are helped. Broadwell total cycles in shared programs: 415251194 -> 415251370 (<.01%) cycles in affected programs: 83750 -> 83926 (0.21%) helped: 7 HURT: 13 helped stats (abs) min: 10 max: 12 x̄: 11.43 x̃: 12 helped stats (rel) min: 0.30% max: 0.30% x̄: 0.30% x̃: 0.30% HURT stats (abs) min: 2 max: 36 x̄: 19.69 x̃: 22 HURT stats (rel) min: 0.05% max: 0.89% x̄: 0.44% x̃: 0.47% 95% mean confidence interval for cycles value: 0.76 16.84 95% mean confidence interval for cycles %-change: <.01% 0.37% Inconclusive result (%-change mean confidence interval includes 0). Haswell total instructions in shared programs: 13823885 -> 13823886 (<.01%) instructions in affected programs: 2249 -> 2250 (0.04%) helped: 0 HURT: 1 total cycles in shared programs: 390094243 -> 390094001 (<.01%) cycles in affected programs: 85640 -> 85398 (-0.28%) helped: 15 HURT: 6 helped stats (abs) min: 4 max: 26 x̄: 18.53 x̃: 18 helped stats (rel) min: 0.09% max: 0.66% x̄: 0.47% x̃: 0.42% HURT stats (abs) min: 2 max: 14 x̄: 6.00 x̃: 2 HURT stats (rel) min: 0.04% max: 0.37% x̄: 0.15% x̃: 0.04% 95% mean confidence interval for cycles value: -17.36 -5.69 95% mean confidence interval for cycles %-change: -0.44% -0.14% Cycles are helped. Ivy Bridge total cycles in shared programs: 180986448 -> 180986552 (<.01%) cycles in affected programs: 34835 -> 34939 (0.30%) helped: 0 HURT: 10 HURT stats (abs) min: 2 max: 18 x̄: 10.40 x̃: 10 HURT stats (rel) min: 0.06% max: 0.36% x̄: 0.28% x̃: 0.30% 95% mean confidence interval for cycles value: 4.67 16.13 95% mean confidence interval for cycles %-change: 0.20% 0.35% Cycles are HURT. Sandy Bridge total cycles in shared programs: 154603969 -> 154603970 (<.01%) cycles in affected programs: 171514 -> 171515 (<.01%) helped: 25 HURT: 14 helped stats (abs) min: 1 max: 4 x̄: 1.80 x̃: 1 helped stats (rel) min: 0.02% max: 0.10% x̄: 0.04% x̃: 0.04% HURT stats (abs) min: 1 max: 8 x̄: 3.29 x̃: 3 HURT stats (rel) min: 0.03% max: 0.28% x̄: 0.10% x̃: 0.11% 95% mean confidence interval for cycles value: -0.91 0.96 95% mean confidence interval for cycles %-change: -0.02% 0.04% Inconclusive result (value mean confidence interval includes 0). No changes on Iron Lake or GM45. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-15 11:11:02 -08:00
Matt Turner	ac21dd4aee	intel/compiler/test: Add unit test for mismatched signedness comparison v2 (idr): Move adding the test to after adding the fix. Reordering the two commits prevents possible headaches for git-bisect with scripts that always do 'ninja check'. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109404 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-02-15 11:11:02 -08:00
Matt Turner	2dff9a66b6	intel/compiler: Avoid propagating inequality cmods if types are different v2: Fix silly bug in logic. s/\|\|/&&/ All but one of the affected shaders is in an Unreal4 demo. The other is in Tomb Raider. All of the cases that Ian investigated appear to be sequences like the following if (int(uint(some_float)) < 0) /* other relations too */ ... At least in Tomb Raider, it's not obvious that this sequence came from the original shader. In some of the Unreal demos, the shader contains code like if (int(uint(textureLod(...))) > 0) ... which explicitly generates the offending sequence. All Gen6+ platforms had similar results (Skylake shown): total instructions in shared programs: 15437170 -> 15437187 (<.01%) instructions in affected programs: 4492 -> 4509 (0.38%) helped: 0 HURT: 17 HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.05% max: 0.73% x̄: 0.66% x̃: 0.73% 95% mean confidence interval for instructions value: 1.00 1.00 95% mean confidence interval for instructions %-change: 0.57% 0.75% Instructions are HURT. total cycles in shared programs: 383007996 -> 383007992 (<.01%) cycles in affected programs: 20542 -> 20538 (-0.02%) helped: 6 HURT: 7 helped stats (abs) min: 2 max: 6 x̄: 5.33 x̃: 6 helped stats (rel) min: 0.11% max: 0.36% x̄: 0.32% x̃: 0.36% HURT stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 HURT stats (rel) min: 0.27% max: 0.27% x̄: 0.27% x̃: 0.27% 95% mean confidence interval for cycles value: -3.30 2.69 95% mean confidence interval for cycles %-change: -0.19% 0.19% Inconclusive result (value mean confidence interval includes 0). No changes on Iron Lake or GM45. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109404 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: nagrigoriadis@gmail.com Tested-by: Danylo Piliaiev <danylo.piliaiev@gmail.com>	2019-02-15 11:11:02 -08:00
Matt Turner	e50db60d16	intel/compiler/test: Set devinfo->gen = 7 We emit an FBL instruction which only exists since Gen7. This prevents the test from segfaulting when run with TEST_DEBUG=1. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-02-15 11:11:02 -08:00
James Zhu	9364d66cb7	gallium/auxiliary/vl: Add video compositor compute shader render Add compute shader initilization, assign and cleanup in vl_compositor API. Set video compositor compute shader render as default when pipe support it. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2019-02-15 10:07:03 -05:00
James Zhu	f6ac0b5d71	gallium/auxiliary/vl: Add compute shader to support video compositor render Add compute shader to support video compositor render. Signed-off-by: James Zhu <James.Zhu@amd.com> Acked-by: Christian König <christian.koenig@amd.com>	2019-02-15 10:07:03 -05:00
James Zhu	299e2bc046	gallium/auxiliary/vl: Rename csc_matrix and increase its size. Rename csc_matrix to shader_params, and increase shader_params size to store more constants for compute shader, Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2019-02-15 10:07:03 -05:00
James Zhu	7b7b5f2029	gallium/auxiliary/vl: Split vl_compositor graphic shaders from vl_compositor API Split vl_compositor graphic shaders from vl_compositor API in order to share vl_compositor API with vl_compositor compute shader later. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2019-02-15 10:07:03 -05:00
James Zhu	b34d7c5daa	gallium/auxiliary/vl: Move dirty define to header file Move dirty define to header file to share with compute shader. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>	2019-02-15 10:07:03 -05:00
Juan A. Suarez Romero	1fb24080b7	nir: remove jump from two merging jump-ending blocks In opt_peel_initial_if optimization, when moving the continue list to end of the continue block, before the jump, could happen that the continue list itself also ends with a jump. This would mean that we would have two jump instructions in a row: the first one from the continue list and the second one from the contine block. As inserting an instruction after a jump is not allowed (and it does not make sense, as it will not be executed), remove the jump from the continue block and keep the one from continue list, as it will be executed first. CC: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-15 15:16:24 +01:00
Juan A. Suarez Romero	69be9934a7	nir: move ALU instruction before the jump instruction opt_split_alu_of_phi moves ALU instruction to the end of continue block. But if the continue block ends with a jump instruction (an explicit "continue" instruction) then the ALU must be inserted before the jump, as it is illegal to add instructions after the jump. CC: Ian Romanick <ian.d.romanick@intel.com> Fixes: `0881e90c09` ("nir: Split ALU instructions in loops that read phis") Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-02-15 15:14:36 +01:00
Andres Gomez	a43596df62	mesa: INVALID_VALUE for wrong type or format in ClearBufferData Instead of generating a GL_INVALID_ENUM error when the type or format is incorrect while using glClear{Named}Buffer{Sub}Data, generate GL_INVALID_VALUE. From page 72 (page 94 of the PDF) of the OpenGL 4.6 spec: " An INVALID_VALUE error is generated if type is not one of the types in table 8.2. An INVALID_VALUE error is generated if format is not one of the formats in table 8.3." Fixes the following test: KHR-GL45.direct_state_access.buffers_errors v2: correct the doxygen documentation. Cc: Pi Tabred <servuswiegehtz@yahoo.de> Cc: Brian Paul <brianp@vmware.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-02-15 14:28:06 +02:00
Gurchetan Singh	67426ccd42	virgl: use virgl_transfer_inline_write even less We've noticed the Team Fortress 2 engine seems to do many small calls to glSubData(..). Let's pick our heuristic based on the resource base width, not the size of a particular upload. This will cause transfers to be batched together in the transfer queue. Revelant glbench microbenchmark -- Before: buffer_upload_dynamic_element_array_131072 = 131.17 mbytes_sec After: buffer_upload_dynamic_element_array_131072 = 6828.24 mbytes_sec Reviewed-by: Gert Wollny <gert.wollny@collabora.com>	2019-02-15 11:19:05 +01:00
Gurchetan Singh	f0e71b1088	virgl: use transfer queue This improves Unigine Valley benchmark by 3 to 10 fps (depending on the scene). It also improves the Team Fortress 2 benchmark from 6 fps to 13 fps (host: 20 fps). Reviewed-by: Gert Wollny <gert.wollny@collabora.com>	2019-02-15 11:19:05 +01:00
Gurchetan Singh	4a7857b377	virgl: introduce transfer queue Transfers will be placed here at unmap time instead of incurring a VM exit. There's an attempt to deduplicate intersecting 1D transfers, which are surprisingly common. This can also help with mipmapped texture upload and smaller textures, where the majority of the time is spent in the guest kernel / QEMU -- not virglrenderer. This is shown by the GLbench texture upload benchmark: Before: texture_upload_rgba_teximage2d_32 = 64.23 mtexel_sec After: texture_upload_rgba_teximage2d_32 = 367.44 mtexel_sec v2: Split up list iteration functions (@gerddie) v3: Support for optimizing glBufferSubData Reviewed-by: Gert Wollny <gert.wollny@collabora.com>	2019-02-15 11:19:05 +01:00
Gurchetan Singh	9c4930946a	virgl: add encoder functions for new protocol Let's encode the new protocol with new helper functions. Reviewed-by: Gert Wollny <gert.wollny@collabora.com>	2019-02-15 11:19:05 +01:00
Gurchetan Singh	5510cc67e0	virgl: make winsys modifications for encoded transfers The idea is to have two command buffers: 1) One for transfers 2) One for commands, which can include transfers At flush time, (2) will be filled. Otherwise, (1) will be used to submit transfers if there are enough of them. v2: Pass size directly to cmd_buf_create (@gerddie) Reviewed-by: Gert Wollny <gert.wollny@collabora.com>	2019-02-15 11:19:05 +01:00
Gurchetan Singh	90e9650585	virgl: add extra checks in virgl_res_needs_flush_wait This is motivated by the following scenario: glSubBufferData(GL_ARRAY_BUFFER, ...) glFlush(..) glSubBufferData(GL_ARRAY_BUFFER, ...) glSubBufferData(GL_ARRAY_BUFFER, ...) glSubBufferData(GL_ARRAY_BUFFER, ...) This increases @davidriley's Team Fortress 2 apitrace from 1 fps to 6 fps and helps with the Chromium glbench microbenchmarks: Before: texture_update_rgba_texsubimage2d_2048 = 554.96 mtexel_sec buffer_upload_dynamic_array_12 = 0.02 mbytes_sec buffer_upload_dynamic_array_576 = 1.07 mbytes_sec After: texture_update_rgba_texsubimage2d_2048 = 612.29 mtexel_sec buffer_upload_dynamic_array_12 = 2.22 mbytes_sec buffer_upload_dynamic_array_576 = 164.89 mbytes_sec Reviewed-by: Gert Wollny <gert.wollny@collabora.com>	2019-02-15 11:19:05 +01:00
Gurchetan Singh	ab6ea6e9ce	virgl: pass virgl transfer to virgl_res_needs_flush_wait Reviewed-by: Gert Wollny <gert.wollny@collabora.com>	2019-02-15 11:19:05 +01:00
Gurchetan Singh	d98fbd9c92	virgl: keep track of number of computations It's good to keep track of these things. Reviewed-by: Gert Wollny <gert.wollny@collabora.com>	2019-02-15 11:19:05 +01:00
Gurchetan Singh	35515985a9	virgl: limit command length to 16 bits Much of our logic is based around the idea the upper 16 bits of a command dword can encode the length of the command. Now that the command buffer >= 2^16 - 1, we should check for this. v2: alignment, and only check VIRGL_ENCODE_MAX_DWORDS Reviewed-by: Gert Wollny <gert.wollny@collabora.com>	2019-02-15 11:19:05 +01:00
Gurchetan Singh	503ffe46bb	virgl: use virgl_transfer in inline write Let's define a helper function and use it. This commit also allows resources to be emitted into different command buffers. Like the ioctls, send 0 for layer_stride and stride. If we actually send the real values, there are various assumptions in virglrenderer for non-1D buffers that may need to be modified. Reviewed-by: Gert Wollny <gert.wollny@collabora.com>	2019-02-15 11:19:05 +01:00
Gurchetan Singh	0fcd48bac5	virgl: add protocol for resource transfers Mostly similar to VIRGL_CCMD_RESOURCE_INLINE_WRITE. However, this uses the resource's already attached iovecs rather than the command buffer to transfer the data. v2: Used (1 << 16) not (1 << 15) [@gerddie] Reviewed-by: Gert Wollny <gert.wollny@collabora.com>	2019-02-15 11:19:05 +01:00
Gurchetan Singh	168c3ffce3	virgl: when creating / freeing transfers, pass slab pool directly This will allow us to destroy transfers w/o having a pointer to the context. Reviewed-by: Gert Wollny <gert.wollny@collabora.com>	2019-02-15 11:19:04 +01:00
Gurchetan Singh	d5c2dacc15	virgl: unmap uploader at flush time This should save some memory when allocating and freeing transfers. Reviewed-by: Gert Wollny <gert.wollny@collabora.com>	2019-02-15 11:19:04 +01:00
Gurchetan Singh	14f265b533	virgl: make alignment smaller when uploading index user buffers Since we're just uploading to guest memory, let's just align to dword size. Fixes: e0f932 ("u_upload_mgr: pass alignment to u_upload_data manually") Reviewed-by: Gert Wollny <gert.wollny@collabora.com>	2019-02-15 11:19:04 +01:00
Gurchetan Singh	7626e6e189	virgl: track level cleanliness rather than resource cleanliness This allows a minor optimization for texture upload. Reviewed-by: Gert Wollny <gert.wollny@collabora.com>	2019-02-15 11:19:04 +01:00
Gurchetan Singh	c19aedcf1a	virgl: don't mark unclean after a flush The guest memory is still clean until host GL touches it, which we should track elsewhere. Reviewed-by: Gert Wollny <gert.wollny@collabora.com>	2019-02-15 11:19:04 +01:00
Gurchetan Singh	5b6a2ae987	virgl: use virgl_resource_dirty helper Reviewed-by: Gert Wollny <gert.wollny@collabora.com>	2019-02-15 11:19:04 +01:00
Gurchetan Singh	1d294ad264	virgl: add ability to do finer grain dirty tracking There are levels to cleanliness. Reviewed-by: Gert Wollny <gert.wollny@collabora.com>	2019-02-15 11:19:04 +01:00
Alyssa Rosenzweig	acc52fff20	panfrost: Improve logging and patch memory leaks Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-15 07:47:54 +00:00
Alyssa Rosenzweig	c70ed4ca18	panfrost: Don't align framebuffer dims Fixes regressions with EGL clients Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-15 07:46:30 +00:00
Alyssa Rosenzweig	5155bcf099	panfrost: Implement PIPE_QUERY_OCCLUSION_COUNTER Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-15 07:46:02 +00:00
Alyssa Rosenzweig	2d22b5380c	panfrost: Identify MALI_OCCLUSION_PRECISE bit Setting this is required for desktop-style occlusion queries. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-15 07:45:56 +00:00
Tapani Pälli	595af46f0f	drirc/i965: add option to disable 565 configs and visuals We have cases where we would not like to expose these. v2: call the option allow_rgb565_configs for consistency with existing allow_rgb10_configs (Eric, Jason) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-15 09:38:36 +02:00
Alyssa Rosenzweig	97aa05470a	panfrost: Backport driver to Mali T600/T700 There are a few differenes between Mali T860 (Panfrost's primary reference target) and the older Midgard generations (T600/T700): - Miscellaneous different magic numbers. It's not clear what these numbers mean on either the old or new configurations yet. - Errata fixes. T800 is the final Midgard generation and presumably the least buggy. Older Midgard has some extra hardware errata we have to workaround. - SFBD vs MFBD split. Essentially, older Midgard use a Single FrameBuffer Descriptor (SFBD), which corresponds to single render-target rendering. Newer Midgard (T760+) use a Multiple FrameBuffer Descriptor (MFBD), allowing multiple RTs. On ES 2.0, these descriptors serve the same function, but we implement both, depending on the version of the hardware. - CPU bitness. 32-bit systems generally use 32-bit GPU descriptors, and vice versa for 64-bit. Our target T760 systems are 32-bit whereas our target T860 systems are 64-bit. More work is needed in this area. This patch fixes support in these areas for supporting older Midgard hardware. It is tested on Mali T760 and Mali T860. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-15 07:22:42 +00:00
Alyssa Rosenzweig	f96e871c26	panfrost: Fix build; depend on libdrm Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-15 07:19:43 +00:00
Jason Ekstrand	08bfd710a2	nir/dead_cf: Stop relying on liveness analysis The liveness analysis pass is fairly expensive because it has to build large bit-sets and run a fix-point algorithm on them. Instead of requiring liveness for detecting if values escape a CF node, just take advantage of the structured nature of NIR and use block indices instead. This only requires the block index metadata which is the fastest we have metadata to generate. No shader-db changes on Kaby Lake Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-14 23:06:29 -06:00
Jason Ekstrand	b50465d197	nir/dead_cf: Inline cf_node_has_side_effects We want to handle live SSA values differently and it's going to involve walking the instructions. We can make it a single instruction walk if we combine it with cf_node_has_side_effects. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-14 23:05:28 -06:00
Jason Ekstrand	367b0ede4d	intel/fs: Bail in optimize_extract_to_float if we have modifiers This fixes a bug in runscape where we were optimizing x >> 16 to an extract and then negating and converting to float. The NIR to fs pass was dropping the negate on the floor breaking a geometry shader and causing it to render nothing. Fixes: `1f862e923c` "i965/fs: Optimize float conversions of byte/word..." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109601 Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-02-14 23:02:44 -06:00
Ilia Mirkin	8c859367df	swr: set PIPE_CAP_MAX_VARYINGS correctly Unfortunately swr was missed in the original commit. The number of varyings should generally match up to what's reported as the shader caps for fragment inputs. Fixes: `6010d7b8e8` (gallium: add PIPE_CAP_MAX_VARYINGS) Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Alok Hota <alok.hota@intel.com> Cc: 19.0 <mesa-stable@lists.freedesktop.org>	2019-02-14 20:29:36 -05:00
Jason Ekstrand	5064464931	intel/fs: Silence a compiler warning Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-14 16:04:47 -06:00
Jason Ekstrand	9b202239ba	anv: Silence some compiler warnings in release builds Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-14 16:04:45 -06:00
Jason Ekstrand	cd60c995a6	anv/blorp: Delete a pointless assert Just a little higher up in the function we assert that the aspect masks are actually equal so there's no reason for the weaker check. Also, the temporary variables were causing compiler warnings in release builds. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-14 16:04:42 -06:00
Jason Ekstrand	b14d7a6b60	nir: Silence a couple of warnings in release builds [28/716] Compiling C object 'src/compiler/nir/068b2c8@@nir@sta/nir_gather_xfb_info.c.o'. ../src/compiler/nir/nir_gather_xfb_info.c: In function ‘nir_gather_xfb_info’: ../src/compiler/nir/nir_gather_xfb_info.c:171:13: warning: variable ‘max_offset’ set but not used [-Wunused-but-set-variable] unsigned max_offset[NIR_MAX_XFB_BUFFERS] = {0}; ^~~~~~~~~~ [36/716] Compiling C object 'src/compiler/nir/068b2c8@@nir@sta/nir_instr_set.c.o'. ../src/compiler/nir/nir_instr_set.c:502:1: warning: ‘instr_each_src_and_dest_is_ssa’ defined but not used [-Wunused-function] instr_each_src_and_dest_is_ssa(nir_instr *instr) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-14 16:04:35 -06:00
Kenneth Graunke	6775665e5e	spirv: Eliminate dead input/output variables after translation. spirv_to_nir can generate input/output variables which are illegal for the current shader stage, which would cause nir_validate_shader to balk. After my recent commit to start decorating arrays as compact, dEQP-VK.spirv_assembly.instruction.graphics.module.same_module started hitting validation errors due to outputs in a TCS (not intended for the TCS at all) not being per-vertex arrays. Thanks to Jason Ekstrand for suggesting this approach. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109573 Fixes: `ef99f4c8d1` compiler: Mark clip/cull distance arrays as compact before lowering. Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>	2019-02-14 11:03:56 -08:00
Kenneth Graunke	39aee57523	anv: Put MOCS in the correct location My patch to switch from struct-based MOCS to numeric MOCS accidentally divided all MOCS entries by 2 in the Vulkan driver. MOCS on Gen9+ is just an array index into a table. But in the hardware packets, the index starts at bit 1. So we need to shift it. Fixes: `0b44644ca6` (genxml: Consistently use a numeric "MOCS" field) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-14 11:03:28 -08:00
Ian Romanick	9a918050e0	spirv: Add missing break Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Fixes: `c6465fec0c` ("spirv: add SpvCapabilityInt64Atomics") CID: 1442555	2019-02-14 08:35:59 -08:00
Eric Engestrom	c2b4b46fa9	util/tests: compile to something sensible in release builds assert()-based tests make no sense without asserts, so make sure asserts are compiled in, even if the rest of the code has asserts turned off. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-02-14 12:52:34 +00:00
Eric Engestrom	f7c56475d2	anv/tests: compile to something sensible in release builds assert()-based tests make no sense without asserts, so make sure asserts are compiled in, even if the rest of the code has asserts turned off. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-02-14 12:52:34 +00:00
Eric Engestrom	4c1ca5b074	etnaviv: drop duplicate #define Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-14 11:20:00 +00:00
Eric Engestrom	7f68b38439	st/dri: drop duplicate #define Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-14 11:20:00 +00:00
Eric Engestrom	2fa165e757	gbm: drop duplicate #defines Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-14 11:20:00 +00:00
Eric Engestrom	f1374805a8	drm-uapi: use local files, not system libdrm There was an issue recently caused by the system header being included by mistake, so let's just get rid of this include path and always explicitly #include "drm-uapi/FOO.h" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-14 11:20:00 +00:00
Eric Engestrom	69e4c273c4	drm-uapi/README: remove explicit list of driver names These headers are used by a lot more than just the intel drivers nowadays. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-14 11:20:00 +00:00
Samuel Pitoiset	227df98fa6	radv: fix radv_fixup_vertex_input_fetches() We should check that num_channels is 4, otherwise that breaks the world. Sorry for the short breakage. Fixes: `4b3549c084` ("radv: reduce the number of loaded channels for vertex input fetches") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-14 09:44:35 +01:00
Samuel Pitoiset	4b3549c084	radv: reduce the number of loaded channels for vertex input fetches It's unnecessary to load more channels than the vertex attribute format. The remaining channels are filled with 0 for y and z, and 1 for w. 29077 shaders in 15096 tests Totals: SGPRS: 1321605 -> 1318869 (-0.21 %) VGPRS: 935236 -> 932252 (-0.32 %) Spilled SGPRs: 24860 -> 24776 (-0.34 %) Code Size: 49832348 -> 49819464 (-0.03 %) bytes Max Waves: 242101 -> 242611 (0.21 %) Totals from affected shaders: SGPRS: 93675 -> 90939 (-2.92 %) VGPRS: 58016 -> 55032 (-5.14 %) Spilled SGPRs: 172 -> 88 (-48.84 %) Code Size: 2862740 -> 2849856 (-0.45 %) bytes Max Waves: 15474 -> 15984 (3.30 %) This mostly helps Croteam games (Talos/Sam2017). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-14 09:10:56 +01:00
Samuel Pitoiset	210aec3612	radv: store vertex attribute formats as pipeline keys The formats will be used for reducing the number of loaded channels. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-14 09:10:09 +01:00
Samuel Pitoiset	45382baef6	radv: use MAX_{VBS,VERTEX_ATTRIBS} when defining max vertex input limits Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-14 09:09:51 +01:00
Samuel Pitoiset	2154fac6f3	ac: make use of ac_build_expand_to_vec4() in visit_image_store() And make ac_build_expand() a static function. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-14 09:09:48 +01:00
Eric Anholt	338d399fd0	freedreno: Use the NIR lowering for isign. I think this will save an instruction and hopefully not increase any other costs (possibly the immediate -1 and 1?), but I haven't actually tested. Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-14 00:32:30 +00:00
Eric Anholt	8f3694e1ab	intel: Use the NIR lowering for isign. Drops one instruction from fs-sign-int.shader_test. No change in shader-db due to it having 0 instances of sign(genIType). This may hurt isign64 if algebraic runs before int64 lowering, but I wasn't sure how to mark the algebraic opt as "every bit size but 64". v2: Update commit message about shader-db. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)	2019-02-14 00:32:30 +00:00
Eric Anholt	3f22b35a43	v3d: Use the NIR lowering for isign instead of rolling our own. min/max instead of comparisons saves 2 instructions on fs-sign-int.shader_test.	2019-02-14 00:32:30 +00:00
Eric Anholt	42d2cae907	nir: Move panfrost's isign lowering to nir_opt_algebraic. I wanted to reuse this from v3d. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-02-14 00:32:30 +00:00
Timothy Arceri	68baf96824	nir: turn an ssa check in nir_search into an assert Everything should be in ssa form when we call this. This is a hotpath so replace the check with an assert. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-02-14 09:35:32 +11:00
Timothy Arceri	46a4d2c867	nir: turn ssa check into an assert Everthing should be in ssa form when this is called. Checking for it here is expensive so turn this into an assert instead. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2019-02-14 09:35:32 +11:00
Timothy Arceri	0a89c9779a	nir: prehash instruction in nir_instr_set_add_or_rewrite() There is no need to hash the instruction twice, especially as we end up adding it in the majority of cases. Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-14 09:35:32 +11:00
Dylan Baker	279060cd32	meson: Add dependency on genxml to anvil Currently the Intel "anvil" driver races with the generation of genxml files, while i965 has an explicit dependency. This patch adds the same dependency to anvil. Fixes: `d1992255bb` ("meson: Add build Intel "anv" vulkan driver") Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-13 22:01:00 +00:00
Samuel Pitoiset	334da034d8	radv: always export gl_SampleMask when the fragment shader uses it For some reasons, this breaks trees rendering in Project Cars. Fixes: `85010585cd` ("radv: only enable gl_SampleMask if MSAA is enabled too") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109401 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-13 23:01:30 +01:00
Alok Hota	736241892f	gallium/aux: add PIPE_CAP_MAX_VARYINGS to u_screen Allows drivers using `u_pipe_screen_get_param_defaults` to use a fallback value for the new pipe cap. Default value of 8 based on GL 2.1 MAX_VARYING_FLOATS Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-13 15:08:14 -06:00
Kristian H. Kristensen	e8566d7098	.mailmap: Add a few more alises for myself Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-13 12:03:41 -08:00
Samuel Pitoiset	5e18000d1b	radv/winsys: fix BO list creation when RADV_DEBUG=allbos is set Fixes: `50fd253bd6` ("radv/winsys: Add priority handling during submit.") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-13 20:51:40 +01:00
Kristian H. Kristensen	0a41ddbd4e	freedreno/a6xx: Fix point coord Use ir3_next_varying() for iterating through varyings and unset the global point coord invert bit. Fixes: dEQP-GLES3.functional.shaders.builtin_variable.pointcoord Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-13 11:14:06 -08:00
Kristian H. Kristensen	2fbd2d5f58	freedreno/a6xx: Front facing needs UNK3 bit We need to set UNK3 in GRAS_CNTL and RB_RENDER_CONTROL0 for the value to be reliably delivered. Fixes: dEQP-GLES3.functional.shaders.builtin_variable.frontfacing Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-13 11:14:06 -08:00
Kristian H. Kristensen	1831238c8e	freedreno/a6xx: Update headers This pulls in changes for compute shaders and a6xx ssbo/image support. FACENESS bit moved from position 1 to 2 and there's a global invert bit for point coord. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-13 11:14:06 -08:00
Kristian H. Kristensen	182e5c011f	freedreno/a6xx: Clean up mixed use of swap and swizzle for texture state Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-13 11:03:29 -08:00
Rob Clark	61094629cb	freedreno/a6xx: small compiler warning fix Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-02-13 13:54:05 -05:00
Dylan Baker	aff52dd2c6	get-pick-list: Add --pretty=medium to the arguments for Cc patches Because none of them have been picked up for 19.0 due to this bug being reintroduced. v2: - Fix fixes tags Fixes: `e6b3a3b201` ("bin/get-pick-list.sh: handle "typod" usecase.") Fixes: `fac10169bb` ("bin/get-pick-list.sh: prefix output with "[stable] "") Reviewed-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-02-13 08:59:30 -08:00
Eric Engestrom	68a9383c6f	gitlab-ci: limit ninja to 4 threads max I tried bumping the limit on make and scons instead, but that just thrashed the runners, so let's not do that (sorry @daniels :]). Instead, remove the automatic thread management from ninja and limit it to 4 instead, in line with make and scons. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-13 16:15:43 +00:00
Konstantin Kharlamov	fccc9d3de6	mapi: work around GCC LTO dropping assembly-defined functions Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109391 Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Acked-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-13 14:20:51 +00:00
Caio Marcelo de Oliveira Filho	017349997f	nir: fix example in opt_peel_loop_initial_if description Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-12 20:33:20 -08:00
Karol Herbst	7e08f22a72	nir/opt_if: don't mark progress if nothing changes if we have something like this: loop { ... if x { break; } else { continue; } } opt_if_loop_last_continue returns true marking progress allthough nothing changes. Fixes: `5921a19d4b` "nir: add if opt opt_if_loop_last_continue()" Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-13 00:21:35 +01:00
Oscar Blumberg	3c540e0a74	radeonsi: Fix guardband computation for large render targets Stop using 12.12 quantization for viewports that are not contained in the lower 4k corner of the render target as the hardware needs to keep both absolute and relative coordinates representable. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>	2019-02-12 17:21:46 -05:00
Chia-I Wu	2f8734e13b	egl: fix KHR_partial_update without EXT_buffer_age EGL_BUFFER_AGE_EXT can be queried without EXT_buffer_age. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-12 19:14:34 +00:00
Kenneth Graunke	5a006b026d	mesa: Advertise EXT_float_blend in ES 3.0+ contexts. This extension simply drops a draw time restriction: "Furthermore, an INVALID_OPERATION error is generated by DrawArrays and the other drawing commands defined in section 2.8.3 (10.5 in ES 3.1) if blending is enabled (see below) and any draw buffer has 32-bit floating-point format components." We never correctly enforced this restriction anyway, so we were basically already implementing it. We just need to advertise it for our behavior to be correct. The extension requires EXT_color_buffer_float, but we already enable that via dummy_true. So we can dummy_true this one as well. Found while debugging WebGL conformance tests. Does not fix any. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-02-12 10:57:25 -08:00
Alok Hota	d3dfa86a30	gallium/swr: Param defaults for unhandled PIPE_CAPs Without using this function, we fail the -Wswitch flag when compiling the default debugoptimized mode in Meson Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-02-12 18:55:14 +00:00
Juan A. Suarez Romero	1ad26f9417	anv/cmd_buffer: check for NULL framebuffer This can happen when we record a VkCmdDraw in a secondary buffer that was created inheriting from the primary buffer, but with the framebuffer set to NULL in the VkCommandBufferInheritanceInfo. Vulkan 1.1.81 spec says that "the application must ensure (using scissor if neccesary) that all rendering is contained in the render area [...] [which] must be contained within the framebuffer dimesions". While this should be done by the application, commit `465e5a86` added the clamp to the framebuffer size, in case of application does not do it. But this requires to know the framebuffer dimensions. If we do not have a framebuffer at that moment, the best compromise we can do is to just apply the scissor as it is, and let the application to ensure the rendering is contained in the render area. v2: do not clamp to framebuffer if there isn't a framebuffer v3 (Jason): - clamp earlier in the conditional - clamp to render area if command buffer is primary v4: clamp also x and y to render area (Jason) v5: rename used variables (Jason) Fixes: `465e5a86` ("anv: Clamp scissors to the framebuffer boundary") CC: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-12 19:19:13 +01:00
Marek Olšák	6c64413b6f	radeonsi: use MEM instead of MEM_GRBM in COPY_DATA.DST_SEL Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-02-12 13:08:54 -05:00
Marek Olšák	f8e4c9df47	radeonsi: add AMD_DEBUG env var as an alternative to R600_DEBUG Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-02-12 13:08:54 -05:00
Samuel Pitoiset	1b8983c25b	radv: fix using LOAD_CONTEXT_REG with old GFX ME firmwares on GFX8 This fixes a critical issue. Cc: <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109575 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-12 17:39:30 +01:00
Samuel Pitoiset	bd1186572f	radv: add support for push constants inlining when possible This removes some scalar loads from shaders, but it increases the number of SET_SH_REG packets. This is currently basic but it could be improved if needed. Inlining dynamic offsets might also help. Original idea from Dave Airlie. 29077 shaders in 15096 tests Totals: SGPRS: 1321325 -> 1357101 (2.71 %) VGPRS: 936000 -> 932576 (-0.37 %) Spilled SGPRs: 24804 -> 24791 (-0.05 %) Code Size: 49827960 -> 49642232 (-0.37 %) bytes Max Waves: 242007 -> 242700 (0.29 %) Totals from affected shaders: SGPRS: 290989 -> 326765 (12.29 %) VGPRS: 244680 -> 241256 (-1.40 %) Spilled SGPRs: 1442 -> 1429 (-0.90 %) Code Size: 8126688 -> 7940960 (-2.29 %) bytes Max Waves: 80952 -> 81645 (0.86 %) Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-12 17:25:54 +01:00
Samuel Pitoiset	8364ffe823	radv: keep track of the number of remaining user SGPRs Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-12 17:25:52 +01:00
Samuel Pitoiset	5f9379ca35	radv: gather if shaders load dynamic offsets separately Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-12 17:25:49 +01:00
Samuel Pitoiset	5806d99984	radv: gather more info about push constants This is needed in order to inline some push constants when possible. This also adds a new helper for initializing the pass. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-12 17:25:34 +01:00
Samuel Pitoiset	129a9f4937	radv: fix compiler issues with GCC 9 "The C standard says that compound literals which occur inside of the body of a function have automatic storage duration associated with the enclosing block. Older GCC releases were putting such compound literals into the scope of the whole function, so their lifetime actually ended at the end of containing function. This has been fixed in GCC 9. Code that relied on this extended lifetime needs to be fixed, move the compound literals to whatever scope they need to accessible in." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109543 Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Gustaw Smolarczyk <wielkiegie@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-12 14:48:08 +01:00
Tapani Pälli	2a2e69f975	i965: add P0x formats and propagate required scaling factors Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Signed-off-by: Lin Johnson <johnson.lin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-02-12 08:43:04 +02:00
Tapani Pälli	3da858a6b9	intel/compiler: add scale_factors to sampler_prog_key_data Patch propagates given scale_factors to lowering options. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-12 08:42:25 +02:00
Tapani Pälli	722f96bfc8	dri: add P010, P012, P016 for 10bit/12bit/16bit YUV420 formats Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Signed-off-by: Lin Johnson <johnson.lin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-02-12 08:42:02 +02:00
Tapani Pälli	19a85a704b	nir: add option to use scaling factor when sampling planes YUV lowering Patch adds nir_lower_tex_options as parameter to sample_plane so that we don't need to extend nir_tex_instr for this. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-12 08:41:20 +02:00
Kenneth Graunke	3eedc8f7b1	i965: Use info->textures_used instead of prog->SamplersUsed. prog->SamplersUsed is set by the linker when validating resource limits, while info->textures_used is gathered after NIR optimizations, which may have eliminated some unused surfaces. This may let us skip some work. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-11 21:34:50 -08:00
Kenneth Graunke	59ae985631	i965: Drop unnecessary 'and' with prog->SamplerUnits textures_used_by_txf is a subset of textures_used which is a subset of prog->SamplerUnits. This should do nothing. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-11 21:34:48 -08:00
Kenneth Graunke	f5c7df4dc9	nir: Gather texture bitmasks in gl_nir_lower_samplers_as_deref. Eric and I would like a bitmask of which samplers are used, similar to prog->SamplersUsed, but available in NIR. The linker uses SamplersUsed for resource limit checking, but later optimizations may eliminate more samplers. So instead of propagating it through, we gather a new one. While there, we also gather the existing textures_used_by_txf bitmask. Gathering these bitfields in nir_shader_gather_info is awkward at best. The main reason is that it introduces an ordering dependency between the two passes. If gathering runs before lower_samplers_as_deref, it can't look at var->data.binding. If the driver doesn't use the full lowering to texture_index/texture_array_size (like radeonsi), then the gathering can't use those fields. Gathering might be run early /and/ late, first to get varying info, and later to update it after variant lowering. At this point, should gathering work on pre-lowered or post-lowered code? Pre-lowered is also harder due to the presence of structure types. Just doing the gathering when we do the lowering alleviates these ordering problems. This fixes ordering issues in i965 and makes the txf info gathering work for radeonsi (though they don't use it). Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-11 21:34:45 -08:00
Kenneth Graunke	120f9b8362	nir: Use sampler derefs in drawpixels and bitmap lowering. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-11 21:34:44 -08:00
Kenneth Graunke	04bdc56872	program: Make prog_to_nir create texture/sampler derefs. Until now, prog_to_nir has been setting texture_index and sampler_index directly. This is different than GLSL shaders, which create variable dereferences and rely on lowering passes to reach this final form. radeonsi uses variable dereferences for samplers rather than texture_index and sampler_index, so it doesn't even make sense to set them there. By moving to derefs, we ensure that both GLSL and ARB programs produce the same final form that the driver desires. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-11 21:34:40 -08:00
Kenneth Graunke	6a4be25a90	st/nir: Use sampler derefs in built-in shaders. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-11 21:34:38 -08:00
Kenneth Graunke	ba9c1c8217	st/nir: Lower sampler derefs for builtin shaders. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-11 21:34:36 -08:00
Kenneth Graunke	8d1646e0e1	st/nir: Pull sampler lowering into a helper function. This will make it easier to reuse across GLSL / ARB / built-ins. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-11 21:34:35 -08:00
Kenneth Graunke	243c11dc16	i965: Call nir_lower_samplers for ARB programs. An upcoming patch will start building derefs in prog_to_nir, at which point we'll need to lower them to indexes. This gets both GLSL and non-GLSL shaders using the same paths. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-11 21:34:30 -08:00
Kenneth Graunke	529a0711c1	glsl: Don't look at sampler uniform storage for internal vars Passes like nir_lower_drawpixels add additional sampler variables, and set an explicit binding which never changes. These extra samplers don't have proper uniform storage associated with them, and there is no way to update bindings via the API. So, for any 'hidden' variables, just trust that there's an explicit binding set. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-11 21:34:28 -08:00
Kenneth Graunke	d34e434989	glsl: Allow gl_nir_lower_samplers*() without a gl_shader_program I would like to be able to run gl_nir_lower_samplers() to turn texture and sampler variable dereferences into indexes and offsets, even for ARB programs, and built-in shaders. This would make sampler handling more consistent across the various types of shaders. For GLSL programs, the gl_nir_lower_samplers_as_deref() pass looks up the variable bindings in the shader program's uniform storage. But ARB programs and built-in shaders don't have a gl_shader_program, and uniform storage doesn't exist. In this case, we simply skip that lookup, and trust var->data.binding to be set correctly by whoever created the shader. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-11 21:34:22 -08:00
Kenneth Graunke	f45dd6d31b	st/mesa: Limit GL_MAX_[NATIVE_]PROGRAM_PARAMETERS_ARB to 2048 Piglit's vp-max-array test creates a vertex program containing a uniform array sized to the value of GL_MAX_NATIVE_PROGRAM_PARAMETERS_ARB. Mesa will then add additional state-var parameters for things like the MVP matrix. radeonsi currently exposes a value of 4096, derived from constant buffer upload size. This means the array will have 4096 elements, and the extra MVP state-vars would get a prog_src_register::Index of over 4096. Unfortunately, prog_src_register::Index is a signed 13-bit integer, so values beyond 4096 end up turning into negative numbers. Negative source indexes are only valid for relative addressing, so this ends up generating illegal IR. In prog_to_nir, this would cause an out of bounds array access. st_mesa_to_tgsi checks for a negative value, assumes it's bogus, and remaps it to parameter 0 in order to get something in-range. This isn't right - instead of reading the MVP matrix, it would read the first element of the vertex program's large array. But the test only checks that the program compiles, so we never noticed that it was broken. This patch limits the size of the program limits, with the understanding that we may need to generate additional state-vars internally. i965 has exposed 1024 for this limit for years, so I don't expect lowering it to 2048 will cause any practical problems for radeonsi or other drivers. Fixes vp-max-array with prog_to_nir.c. Cc: "19.0" <mesa-stable@lists.freedesktop.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-11 21:09:51 -08:00
Francisco Jerez	374eb3cd6f	intel/dump_gpu: Disambiguate between BOs from different GEM handle spaces. This fixes a rather astonishing problem that came up while debugging an issue in the Vulkan CTS. Apparently the Vulkan CTS framework has the tendency to create multiple VkDevices, each one with a separate DRM device FD and therefore a disjoint GEM buffer object handle space. Because the intel_dump_gpu tool wasn't making any distinction between buffers from the different handle spaces, it was confusing the instruction state pools from both devices, which happened to have the exact same GEM handle and PPGTT virtual address, but completely different shader contents. This was causing the simulator to believe that the vertex pipeline was executing a fragment shader, which didn't end up well. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-02-11 12:27:22 -08:00
Kristian H. Kristensen	e404c6879d	freedreno/a6xx: Fall back to masked RGBA blits for depth/stencil The blitter doesn't seem to have a write mask, so for depth only and stencil only blits to Z24S8 we cast the Z24S8 buffer to an RGBA UNORM8 buffer and fall back to pipeline blits with corresponding write mask. Fixes dEQP-GLES3.functional.fbo.blit.depth_stencil.depth24_stencil8_stencil_only dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_depth dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_msaa_depth dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_depth dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_msaa_depth dEQP-GLES3.functional.fbo.msaa.2_samples.stencil_index8 dEQP-GLES3.functional.fbo.msaa.4_samples.stencil_index8 Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-11 12:26:21 -08:00
Kristian H. Kristensen	f03ba155d5	freedreno/a6xx: Add format argument to fd6_tex_swiz() We need to allow overriding the format with that of the image or sampler view, so we can't take it from the resource in fd6_tex_swiz(). Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-11 12:26:21 -08:00
Kristian H. Kristensen	bc8c813d5a	freedreno/a6xx: Support y-inverted blits The src coordinates are s24.8. For an inverted blit that ends at y=0 we need to program -1 for sy2, so we need to handle negative values correctly. Fixes dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_mag_reverse_dst_y dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_min_reverse_dst_y dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_min_reverse_src_y dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_color dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_color Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-11 12:26:21 -08:00
Kristian H. Kristensen	03a01e5d23	freedreno/a6xx: Support some depth/stencil blits on blitter We can rewrite almost all depth stencil blits to various red-only blits. The exception is depth-only or stencil-only blits into z24s8 combined depth stencil buffer. We can fall back for depth-only, but stencil-only remains broken. Fixes dEQP-GLES3.functional.fbo.blit.depth_stencil.depth24_stencil8_basic dEQP-GLES3.functional.fbo.blit.depth_stencil.depth24_stencil8_scale dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_basic dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_scale dEQP-GLES3.functional.fbo.blit.depth_stencil.depth32f_stencil8_stencil_only Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-11 12:26:21 -08:00
Kristian H. Kristensen	e9592da2b4	freedreno/a6xx: Move blit check so as to restore comment The explanation for the compressed format check is broken across two comments: /* We can blit if both or neither formats are compressed formats... / / ... but only if they're the same compression format. */ but the ok_format() checks were inserted between, breaking up the flow of the sentence. Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-11 12:26:21 -08:00
Kristian H. Kristensen	d2639f2eac	freedreno: Don't tell the blitter what it can't do Call ctx->blit() and let it reject blits it can't do instead of giving up on stencil blits and blits u_blitter can't do. Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-11 12:26:21 -08:00
Kristian H. Kristensen	8cf1303698	freedreno: Consolidate u_blitter functions in freedreno_blitter.c Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-11 12:26:21 -08:00
Kristian H. Kristensen	701d30dda8	freedreno/a6xx: Combine emit_blit and fd6_blit Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-11 12:26:21 -08:00
Kristian H. Kristensen	6d1a7bdba3	freedreno/a6xx: Use the right resource for separate stencil stride Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-11 12:26:21 -08:00
Kristian H. Kristensen	24b4172375	freedreno: Log number of draw for sysmem passes Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-11 12:26:21 -08:00
Kristian H. Kristensen	a201cb157d	freedreno/a6xx: Drop render condition check in blitter We already check earlier in the call chain in fd_blit(). glBlitFramebuffer always sets render_condition_enable and thus we would never try the blitter path for that. Now that we get all of dEQP-GLES3.functional.fbo.blit.conversion.* down this path, it turs out that the fail_if(info->mask != util_format_get_mask(info->src.format)); fail_if(info->mask != util_format_get_mask(info->dst.format)); conditions weren't accurate. util_format_get_mask() returns PIPE_MASK_RGBA for any format with any color channels, while info->mask is the exact set of channels to blit. So we reject things we could blit - for example, PIPE_FORMAT_R16G16_FLOAT where info->mask is RG while util_format_get_mask() returns RGBA - and accept things we can't. It turns out that the blitter is happy to blit different number of channels, but fails to blit formats with different numerical formats and srgb formats. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-02-11 12:26:21 -08:00
Kristian H. Kristensen	4f7a9c23ed	freedreno/a6xx: regen headers Update for a6xx.xml.h to incorporate a few new bits and changes to blit src rect coordinate types. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-02-11 12:26:21 -08:00
Leo Liu	a0a52a0367	st/va/vp9: set max reference as default of VP9 reference number If there is no information about number of render targets Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com> Cc: 19.0 <mesa-stable@lists.freedesktop.org>	2019-02-11 14:44:16 -05:00
Leo Liu	21cdb828a3	st/va: fix the incorrect max profiles report Add "PIPE_VIDEO_PROFILE_MAX" to enum, so it will make sure here will be correct when adding more profiles in the future. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109107 Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com> Cc: 19.0 <mesa-stable@lists.freedesktop.org>	2019-02-11 14:44:16 -05:00
Guttula, Suresh	2cf2a56739	st/va:Add support for indirect manner by returning VA_STATUS_ERROR_OPERATION_FAILED Based on VA Spec,DeriveImage() returns VA_STATUS_ERROR_OPERATION_FAILED if driver dont have support for internal surface formats.Currently vaDeriveImage() failed for non-contiguous planes and operation failed error string is required to support indirect manner i.e. vaCreateImage()+vaPutImage() incase vaDeriveImage() failed with VA_STATUS_ERROR_OPERATION_FAILED. This patch will notify to the client as operation failed with proper error sting,so that client will fallback to vaCreateImage()+vaPutImage(). v2: updated commit message based on VA spec. Signed-off-by: suresh guttula <suresh.guttula@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2019-02-11 14:44:16 -05:00
Marek Olšák	114a899cc8	winsys/amdgpu: cs_check_space sets the minimum IB size for future IBs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-02-11 12:35:48 -05:00
Marek Olšák	766e920cdb	winsys/amdgpu: clean up IB buffer size computation Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-02-11 12:35:48 -05:00
Marek Olšák	8c1cb393fc	winsys/amdgpu: remove occurence of INDIRECT_BUFFER_CONST Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-02-11 12:35:48 -05:00
Marek Olšák	881ef14b32	winsys/amdgpu: use a separate fence list for syncobjs Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-02-11 12:35:48 -05:00
Marek Olšák	9f00123d51	winsys/amdgpu: unify fence list code Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-02-11 12:35:48 -05:00
Marek Olšák	ddfe209a0d	winsys/amdgpu: don't drop manually added fence dependencies wow, it's hard to believe that fence and syncobjs dependencies were ignored. Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-02-11 12:35:48 -05:00
Marek Olšák	61c678d4bc	radeonsi: fix EXPLICIT_FLUSH for flush offsets > 0 Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-02-11 12:35:06 -05:00
Marek Olšák	4522f01d4e	gallium/u_threaded: fix EXPLICIT_FLUSH for flush offsets > 0 Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-02-11 12:35:04 -05:00
Jason Ekstrand	9e6a6ef0d4	nir/deref: Rematerialize parents in rematerialize_derefs_in_use_blocks When nir_rematerialize_derefs_in_use_blocks_impl was first written, I attempted to optimize things a bit by not bothering to re-materialize the sources of deref instructions figuring that the final caller would take care of that. However, in the case of more complex deref chains where the first link or two lives in block A and then another link and the load/store_deref intrinsic live in block B it doesn't work. The code in rematerialize_deref_in_block looks at the tail of the chain, sees that it's already in block B and skips it, not realizing that part of the chain also lives in block A. The easy solution here is to just rematerialize deref sources of deref instructions as well. This may potentially lead to a few more deref instructions being created by the conditions required for that to actually happen are fairly unlikely and, thanks to the caching, it's all linear time regardless. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109603 Fixes: `7d1d1208c2` "nir: Add a small pass to rematerialize derefs per-block" Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-02-11 10:57:23 -06:00
Jason Ekstrand	fd77606b5b	intel/fs: Use enumerated array assignments in fb read TXF setup It's more clear and means we don't have to update the array every time we add an optional texture instruction argument Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-11 10:57:09 -06:00
Michel Dänzer	d6c55f6c62	gitlab-ci: Re-use docker image from the main repo in forked repos Instead of generating it from scratch in each forked repo. This should save time, energy and storage. (The xserver & xf86-video-amdgpu CI scripts do basically the same) v2: * Hardcode "mesa" instead of using $CI_PROJECT_NAME, to avoid breakage if the project name is changed after forking (Eric Engestrom) Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-11 12:24:31 +01:00
Ilia Mirkin	cc79a1483f	nvc0: we have 16k-sized framebuffers, fix default scissors For some reason we don't use view volume clipping by default, and use scissors instead. These scissors were set to an 8k max fb size, while the driver advertises 16k-sized framebuffers. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: <mesa-stable@lists.freedesktop.org>	2019-02-10 23:36:23 -05:00
Alyssa Rosenzweig	85e2bb58ca	panfrost: Specify supported draw modes per-context Midgard has native support for QUADS and POLYGONS; Bifrost seemingly does not. Thus, Midgard generally skips prim_convert whereas Bifrost needs the pass; this patch allows the setting of allowed primitives to occur on a per-context basis (for runtime hardware selection). v2: Use (POLYGONS + 1) instead of LINES_ADJACENCY. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Robert Foss <robert.foss@collabora.com>	2019-02-11 03:23:00 +00:00
Dave Airlie	90c6880df7	radv: remove alloc parameter from pipeline init clang points out this isn't used. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-11 10:04:40 +10:00
Dave Airlie	a523ae0cac	radv/llvm: initialise passes member. Fixes coverity warning Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-11 08:59:02 +10:00
Dave Airlie	d2e82c2682	glsl: glsl to nir fix uninit class member. The constructor should init this to NULL Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-02-11 08:55:07 +10:00
Alyssa Rosenzweig	2458797256	panfrost: Elucidate texture op scheduling comment Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-10 00:51:57 +00:00
Alyssa Rosenzweig	658961aec3	panfrost: Remove speculative if 0'd format bit code Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-10 00:51:51 +00:00
Alyssa Rosenzweig	b1213a3947	panfrost: Remove if 0'd dead code Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-10 00:50:35 +00:00
Alyssa Rosenzweig	e91e1786c5	panfrost: Add kernel-agnostic resource management Various methods relating to resource management were previously marked as kernel-specific, forcing them to stay downstream in the vendor overlay and eventually be duplicated for DRM code. This patch adds back this code in kernel-neutral space, allowing for code sharing and minimising the diff to downstream. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-10 00:44:32 +00:00
Alyssa Rosenzweig	4ed23b193a	panfrost: Don't hardcode number of nir_ssa_defs Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-10 00:42:52 +00:00
Alyssa Rosenzweig	97dcad8d3e	panfrost: Clean-up one-argument passing quirk Most Midgard instructions take two-arguments logically; there are always two arguments at the assembly level. For the few instructions that take only a single argument, generally the second argument slot is unused, with a zero inline constant occupying the space. fmov/imov are the exception, where the first argument is filled with r24 and the logical argument is in the second slot. Previously, these constraints were handled by a delicate, buggy series of hacks. This commit removes these hacks. Instead, we look at the logical number of arguments (from NIR), switching between two argument and one-argument-one-zero style. We then introduce a quirk for the flipped style, which applies to fmov/imov. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-10 00:41:25 +00:00
Karol Herbst	49397a3c84	glsl_type: initialize offset and location to -1 for glsl_struct_field Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-09 13:52:15 +01:00
Kenneth Graunke	55e00a2ea8	nouveau: Silence unhandled cap warnings Nouveau apparently uses the u_screen helper but prints a warning in the default case, so running any GL program would start grumbling. Fixes: `8fa54bc549` gallium: Add a PIPE_CAP_NIR_COMPACT_ARRAYS capability bit. Reviewed-by: Karol Herbst <kherbst@redhat.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-02-08 16:26:00 -08:00
Caio Marcelo de Oliveira Filho	ee670d09af	intel/compiler: use 0 as sampler in emit_mcs_fetch The sampler will be ignored since the underlying 'ld_mcs' operation won't use it, so just fill the field with 0 instead of the texture to make it clearer that's the case. This will also avoid is_high_sampler() to kick in unnecessarily, in case we are using the operation for a texture with index >= 16. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-08 14:51:56 -08:00
Eric Engestrom	e8e544436c	wsi: query the ICD's max dimensions instead of hard-coding them anv and radv both happened to already return 2^14 for these, but querying the ICD is safer and will help if vdreno (or whatever it's called) doesn't have the same max. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-08 18:54:57 +00:00
Ian Romanick	b031c64349	nir: Convert a bcsel with only phi node sources to a phi node v2: Remove the original ALU instruciton after all of its readers are modified to read the new ALU instruction. v3: Fix an issue where a bcsel that may not be executed on a loop iteration due to a break statement is converted to a phi (and therefore incorrectly "executed"). Noticed by Tim. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109216 Fixes: `8fb8ebfbb0` ("intel/compiler: More peephole select") Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-08 10:37:06 -08:00
Ian Romanick	0881e90c09	nir: Split ALU instructions in loops that read phis A single shader in Unigine Superposition is affected by this change. A single iadd is moved to the end of a loop. This iadd is involved in a complex set of logic to terminate the loop, and an extra mov instruction is inserted. This shader really needs the optimization suggested by bugzilla #94747, and I expect that to make this tiny regression go away. All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15047543 -> 15047545 (<.01%) instructions in affected programs: 565 -> 567 (0.35%) helped: 0 HURT: 2 total cycles in shared programs: 369977253 -> 369978253 (<.01%) cycles in affected programs: 127910 -> 128910 (0.78%) helped: 0 HURT: 2 v2: Skip nir_op_vec{2,3,4} and nir_op_[fi]mov instructions to avoid infinite optimization loops. Remove the original ALU instruciton after all of its readers are modified to read the new ALU instruction. v3: Extend to the more general case. The if the prev-block value from the phi is not undef, this means the ALU instruction has to be duplicated in both the prev-block and the continue-block. Fixes: `8fb8ebfbb0` ("intel/compiler: More peephole select") Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-08 10:37:06 -08:00
Ian Romanick	0c0c69729b	nir: Select phi nodes using prev_block instead of continue_block This simplifies some changes coming later. Fixes: `8fb8ebfbb0` ("intel/compiler: More peephole select") Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-08 10:37:06 -08:00
Ian Romanick	8d8f80af3a	nir: Refactor code that checks phi nodes in opt_peel_loop_initial_if This will be used in a couple more places soon. The function name is... horribly long. Neither Matt nor I could think of any thing that was shorter and still more descriptive than "is_phi_foo". I'm willing to entertain suggestions. Fixes: `8fb8ebfbb0` ("intel/compiler: More peephole select") Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-08 10:37:06 -08:00
Ian Romanick	4d65d2b12e	nir: Document some fields of nir_loop_terminator Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-08 10:37:06 -08:00
Ian Romanick	28ef5bb74c	intel/compiler: Silence warning about value that may be used uninitialized For some reason, this warning only occurs for me in release builds. In file included from src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c:25:0: src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c: In function ‘brw_nir_lower_mem_access_bit_sizes’: src/compiler/nir/nir_builder.h:501:26: warning: ‘src_swiz[2]’ may be used uninitialized in this function [-Wmaybe-uninitialized] alu_src.swizzle[i] = swiz[i]; ~~~~~~~~~~~~~~~~~~~^~~~~~~~~ src/intel/compiler/brw_nir_lower_mem_access_bit_sizes.c:225:16: note: ‘src_swiz[2]’ was declared here unsigned src_swiz[4]; ^~~~~~~~ Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-08 10:37:06 -08:00
Ian Romanick	78169870e4	nir: Silence zillions of unused parameter warnings in release builds Fixes: `cd56d79b59` "nir: check NIR_SKIP to skip passes by name" Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-08 10:37:06 -08:00
Eric Engestrom	3dc5faf523	gitlab-ci: workaround docker bug for users with uppercase characters CI_REGISTRY_IMAGE == lower($CI_REGISTRY/$CI_PROJECT_PATH) Suggested-by: Daniel Stone <daniels@collabora.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-08 17:45:57 +00:00
Andrii Simiklit	2b7d5c3217	i965: consider a 'base level' when calculating width0, height0, depth0 I guess that when we calculating the width0, height0, depth0 to use for function 'intel_miptree_create' we need to consider the 'base level' like it is done in the 'intel_miptree_create_for_teximage' function. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107987 Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-07 21:40:50 -08:00
Timothy Arceri	26aa460940	nir: rewrite varying component packing There are a number of reasons for the rewrite. 1. Adding support for packing tess patch varyings in a sane way. 2. Making use of qsort allowing the code to be much easier to follow. 3. Fixes a bug where different interp types caused component packing to be skipped for all varyings in some scenarios. 4. Allows us to add a crude live range analysis for deciding which components should be packed together. This support can optionally be added in a future patch. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-08 02:54:56 +00:00
Timothy Arceri	2f53260417	nir: add is_packing_supported_for_type() helper This will be used in the following patches to determine if we support packing the components of a varying. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-08 02:54:56 +00:00
Timothy Arceri	e041123841	nir: add glsl_type_is_32bit() helper Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-08 02:54:56 +00:00
Timothy Arceri	7b01d5c354	nir: add support for marking used patches when packing varyings This adds support needed for marking the varyings as used but we don't actually support packing patches in this patch. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-08 02:54:56 +00:00
Timothy Arceri	d0af13cfb4	st/glsl_to_nir: call nir_remove_dead_variables() after lowing local indirects Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-08 02:54:56 +00:00
Timothy Arceri	d0abbaa528	util: move BITFIELD macros to util/macros.h Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-08 02:54:56 +00:00
Karol Herbst	cbd1ad6165	st/mesa: require RGBA2, RGB4, and RGBA4 to be renderable If the driver does not support rendering to these formats but does support texturing, we can end up in incompatibilities between textures and renderbuffers that are then copied to. Fixes KHR-GL45.copy_image.functional on nvc0 Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Cc: 19.0 <mesa-stable@lists.freedesktop.org>	2019-02-07 21:51:45 -05:00
Karol Herbst	6010d7b8e8	gallium: add PIPE_CAP_MAX_VARYINGS Some NVIDIA hardware can accept 128 fragment shader input components, but only have up to 124 varying-interpolated input components. We add a new cap to express this cleanly. For most drivers, this will have the same value as PIPE_SHADER_CAP_MAX_INPUTS for the fragment shader. Fixes KHR-GL45.limits.max_fragment_input_components Signed-off-by: Karol Herbst <karolherbst@gmail.com> [imirkin: rebased, improved docs/commit message] Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Eric Anholt <eric@anholt.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Cc: 19.0 <mesa-stable@lists.freedesktop.org>	2019-02-07 21:51:45 -05:00
Alyssa Rosenzweig	738346fa23	kmsro: Silence warning if missing Regardless of whether the build uses kmsro, kmsro is the default driver descriptor when the static loader is used. Thus, in an edge case where the static loader is used, no static targets are loaded, and kmsro is not compiled, a spurious warning is printed. There's no harm in executing the stub function in this case, but it's not "an error" to not have kmsro in the build; the driver missing warning should not printed kmsro. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-02-08 01:48:37 +00:00
Lionel Landwerlin	f1bcb9be46	radv: assert that colorAttachment is valid for CmdClearAttachment This partially reverts a change from `b7a93cbded` ("radv: Handle VK_ATTACHMENT_UNUSED in CmdClearAttachment") which fixed actual issues but also started to accept invalid values for the colorAttachment field. This change asserts that the field is valid for the current pass. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `b7a93cbded` ("radv: Handle VK_ATTACHMENT_UNUSED in CmdClearAttachment") Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-08 00:18:16 +00:00
Lionel Landwerlin	a934a3d124	anv: assert that color attachment are valid This reverts commit `d76e777988`. Let's make this obvious that there is an application issue if it tries to access an attachment that doesn't exist in the current pass. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `d76e777988` ("anv: Handle VK_ATTACHMENT_UNUSED in colorAttachment") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-08 00:18:16 +00:00
Dave Airlie	3c153b3982	docs: update qbo support for virgl Signed-off-by: Dave Airlie <airlied@redhat.com>	2019-02-08 09:06:36 +10:00
Eric Engestrom	6e0effbd34	travis: fix osx make build This variable was removed in commit `087af992a2` "travis: remove unused linux code path" because it looked like it was only used by the Linux build. Turns out I was wrong, so let's restore it. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-02-07 20:14:14 +00:00
Jason Ekstrand	eaf5e4a24d	README: Drop the badges from the readme They have been added as badges directly to the GitLab project. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-07 12:46:17 -06:00
Eric Engestrom	358d0cfab2	driconf: drop unused macro Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-02-07 13:40:26 +00:00
Eric Engestrom	00be88aab8	meson: add script to print the options before configuring a builddir Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-07 13:22:41 +00:00
Alyssa Rosenzweig	d43ec104b7	panfrost: Include glue for out-of-tree legacy code In addition to the DRM interface in active development, for legacy kernels Panfrost has a small, optional, out-of-tree glue repository. For various reasons, this legacy code should not be included in Mesa proper, but this commit allows it to coexist peacefully with upstream Panfrost. If the nondrm repo is cloned/symlinked to the directory `src/gallium/drivers/panfrost/nondrm`, legacy functionality will be built. Otherwise, the driver will build normally, though a runtime error message will be printed if a legacy kernel is detected. This workaround is icky, but it allows a nearly-upstream Panfrost to work on real hardware, today. Ideally, this patch will be reverted when the Panfrost kernel module is mature and we drop legacy support. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-07 01:58:32 +00:00
Alyssa Rosenzweig	7da251fc72	panfrost: Check in sources for command stream This patch includes the command stream portion of the driver, complementing the earlier compiler. It provides a base for future work, though it does not integrate with any particular winsys. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2019-02-07 01:57:50 +00:00
Alyssa Rosenzweig	8f4485ef1a	panfrost: Use u_pipe_screen_get_param_defaults Switching to the defaults function cleans up pan_screen.h markedly and futureproofs for when new PIPE_CAPs are added. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Suggested-by: Eric Anholt <eric@anholt.net>	2019-02-07 01:57:19 +00:00
Alyssa Rosenzweig	8f9f99d84d	kmsro: Move DRM entrypoints to shared block As kmsro allows an essentially mix-and-match hodgepodge of display drivers and renderonly GPUs, it doesn't make sense to couple the display driver entrypoint definition with the driver. Instead, we move all kmsro entrypoints to a shared kmsro block at the end (avoiding clutter and distraction since this list may snowball in the future). v2: Alphabetize driver list. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-07 01:50:16 +00:00
Rhys Perry	5b6f522fc2	nvc0: add compute invocation counter The strategy is to keep a CPU-side counter of the direct invocations, and a GPU-side counter of the indirect invocations, and then add them together for queries. The specific technique is a macro which multiplies a list of integers together and accumulates the product into SCRATCH registers held inside of the context. Another macro will read those values out and add them to the passed-in cpu-side counter to be stored in a query buffer the same way that all the other statistics are stored. Original implementation by Rhys Perry, redone by Ilia Mirkin to use the SCRATCH temporaries. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-02-06 19:35:57 -05:00
Karol Herbst	cce4955721	gm107/ir: add fp64 rsq Acked-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: 19.0 <mesa-stable@lists.freedesktop.org>	2019-02-06 19:35:57 -05:00
Karol Herbst	815a8e59c6	gm107/ir: add fp64 rcp Acked-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: 19.0 <mesa-stable@lists.freedesktop.org>	2019-02-06 19:35:57 -05:00
Karol Herbst	12669d2970	gk104/ir: Use the new rcp/rsq in library [imirkin: add a few more "long" prefixes to safen things up] Acked-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: 19.0 <mesa-stable@lists.freedesktop.org>	2019-02-06 19:35:57 -05:00
Boyan Ding	656ad06051	gk110/ir: Use the new rcp/rsq in library v2: (Karol Herbst <kherbst@redhat.com> * fix Value setup for the builtins Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com> [imirkin: track the fp64 flag when switching ops to calls] Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: 19.0 <mesa-stable@lists.freedesktop.org>	2019-02-06 19:35:57 -05:00
Boyan Ding	7937408052	gk110/ir: Add rsq f64 implementation Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: 19.0 <mesa-stable@lists.freedesktop.org>	2019-02-06 19:35:57 -05:00
Boyan Ding	04593d9a73	gk110/ir: Add rcp f64 implementation Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: 19.0 <mesa-stable@lists.freedesktop.org>	2019-02-06 19:35:57 -05:00
Ilia Mirkin	6adb9b38bf	nvc0: stick zero values for the compute invocation counts Not quite perfect, but at least we don't end up with random values in the query buffer. Fixes KHR-GL45.pipeline_statistics_query_tests_ARB.functional_default_qo_values Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: 19.0 <mesa-stable@lists.freedesktop.org>	2019-02-06 19:35:57 -05:00
Ilia Mirkin	e00799d3dc	nv50,nvc0: use condition for occlusion queries when already complete For the NO_WAIT variants, we would jump into the ALWAYS case for both nested and inverted occlusion queries. However if the query had previously completed, the application could reasonably expect that the render condition would follow that result. To resolve this, we remove the nesting distinction which unnecessarily created an imbalance between the regular and inverted cases (since there's no "zero" condition mode). We also use the proper comparison if we know that the query has completed (which could happen as a result of an earlier get_query_result call). Fixes KHR-GL45.conditional_render_inverted.functional Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: 19.0 <mesa-stable@lists.freedesktop.org>	2019-02-06 19:35:57 -05:00
Ilia Mirkin	162352e671	nvc0: fix 3d images on kepler Looks like SUBFM.3D and SUEAU are perfectly capable of dealing with 3d tiling, they just need the correct inputs. Supply them. We also have to deal with the case where a 2d "layer" of a 3d image is bound. In this case, we supply the z coordinate separately to the shader, which has to optionally treat every 2d case as if it could be a slice of a 3d texture. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: 19.0 <mesa-stable@lists.freedesktop.org>	2019-02-06 19:35:57 -05:00
Ilia Mirkin	5de5beedf2	nvc0/ir: fix second tex argument after levelZero optimization We used to pre-set a bunch of extra arguments to a texture instruction in order to force the RA to allocate a register at the boundary of 4. However with the levelZero optimization, which removes a LOD argument when it's uniformly equal to zero, we undid that logic by removing an extra argument. As a result, we could end up with insufficient alignment on the second wide texture argument. Instead we switch to a different method of achieving the same result. The logic runs during the constraint analysis of the RA, and adds unset sources as necessary right before being merged into a wide argument. Fixes MISALIGNED_REG errors in Hitman when run with bindless textures enabled on a GK208. Fixes: `9145873b15` ("nvc0/ir: use levelZero flag when the lod is set to 0") Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: 19.0 <mesa-stable@lists.freedesktop.org>	2019-02-06 19:35:57 -05:00
Ilia Mirkin	4443b6ddf2	nvc0/ir: always use CG mode for loads from atomic-only buffers Atomic operations don't update the local cache, which means that we would have to issue CCTL operations in order to get the updated values. When we know that a buffer is primarily used for atomic operations, it's easier to just avoid the caching at that level entirely. The same issue persists for non-atomic buffers, which will have to be fixed separately. Fixes the failing dEQP-GLES31.functional.atomic_counter.* tests. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Karol Herbst <kherbst@redhat.com> Cc: 19.0 <mesa-stable@lists.freedesktop.org>	2019-02-06 19:35:57 -05:00
Ilia Mirkin	399215eb7a	nvc0: add support for handling indirect draws with attrib conversion The hardware does not natively support FIXED and DOUBLE formats. If those are used in an indirect draw, they have to be converted. Our conversion tries to be clever about only converting the data that's needed. However for indirect, that won't work. Given that DOUBLE or FIXED are highly unlikely to ever be used with indirect draws, read the indirect buffer on the CPU and issue draws directly. Fixes the failing dEQP-GLES31.functional.draw_indirect.random.* tests. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: 19.0 <mesa-stable@lists.freedesktop.org>	2019-02-06 19:35:57 -05:00
Kristian H. Kristensen	0f7a20e91e	freedreno/a6xx: Use tiling for all resources We used to restrict this to just PIPE_BIND_SAMPLER_VIEW resources, but most resources benefit from being tiled. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-02-06 15:28:48 -08:00
Kristian H. Kristensen	357ea7da51	freedreno/a6xx: Emit blitter dst with OUT_RELOCW We're writing to the bo and the kernel needs to know for fd_bo_cpu_prep() to work. Fixes: `f93e431272` ("freedreno/a6xx: Enable blitter") Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-06 15:22:25 -08:00
Bas Nieuwenhuizen	13ab63bb62	radv: Implement VK_EXT_buffer_device_address. v2: Also update the release notes. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-06 22:37:38 +01:00
Bas Nieuwenhuizen	3259e7b036	radv: Do not use the bo list for local buffers. The kernel already does it for us. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-06 22:36:19 +01:00
Bas Nieuwenhuizen	8a15950211	amd/common: Implement global memory accesses. Needed for VK_EXT_buffer_device_address. The pointers are implmemented as i8*, since I could not figure out how to emulate setting struct offsets in LLVM based on the SPIR-V offsets (and more weird stuff like row major matrices). Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-06 22:36:11 +01:00
Bas Nieuwenhuizen	5703ecf651	amd/common: Do not use 32-bit loads for shared memory. We use a straight glsl->llvm type conversion so types should already be right. Also even though the writemasks were changed we we not actually doing 32-bit things, so this fails miserably. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-06 22:36:06 +01:00
Bas Nieuwenhuizen	8d1718590b	amd/common: handle nir_deref_cast for shared memory from integers. Can happen e.g. after a phi. Fixes: `a2b5cc3c39` "radv: enable variable pointers" Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-06 22:36:02 +01:00
Bas Nieuwenhuizen	830fd0efc1	amd/common: Handle nir_deref_type_ptr_as_array for shared memory. Fixes: `a2b5cc3c39` "radv: enable variable pointers" Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-06 22:35:58 +01:00
Bas Nieuwenhuizen	dbdb44d575	amd/common: Fix stores to derefs with unknown variable. Fixes: `a2b5cc3c39` "radv: enable variable pointers" Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-06 22:35:54 +01:00
Bas Nieuwenhuizen	3c24fc64c7	amd/common: Use correct writemask for shared memory stores. The check was for 1 bit being set, which is clearly not what we want. CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-06 22:35:49 +01:00
Bas Nieuwenhuizen	00253ab2c4	radv: Fix the shader info pass for not having the variable. For example with VK_EXT_buffer_device_address or VK_KHR_variable_pointers. Fixes: `a2b5cc3c39` "radv: enable variable pointers" Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-06 22:35:45 +01:00
Bas Nieuwenhuizen	58c8dadd32	amd/common: Implement ptr->int casts in ac_to_integer. For the implicit casts inherent in nir. This should probably have been done for shared memory for VK_KHR_variable_pointers. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-06 22:35:40 +01:00
Bas Nieuwenhuizen	e00d9a9a72	amd/common: Add gep helper for pointer increment. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-06 22:35:36 +01:00
Bas Nieuwenhuizen	39ab4e12f7	radv: Only look at pImmutableSamples if the descriptor has a sampler. Equivalent of ANV patch `c7f4a2867c` CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-06 22:35:32 +01:00
Eric Engestrom	40b53a7203	xvmc: fix string comparison Fixes: `6fca18696d` "g3dvl: Update XvMC unit tests." Cc: Younes Manton <younes.m@gmail.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 18:15:43 +00:00
Eric Engestrom	110a6e1839	xvmc: fix string comparison Fixes: `c7b65dcaff` "xvmc: Define some Xv attribs to allow users to specify color standard and procamp" Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 18:15:43 +00:00
Eric Engestrom	ba26bc4ef0	gitlab-ci: add meson glvnd build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	5459900f38	travis: remove unused scons code path Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	087af992a2	travis: remove unused linux code path Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	73275147fe	gitlab-ci: add make Gallium ST Other build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	360a7bfbe9	gitlab-ci: add make Gallium ST Clover LLVM-7 build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	39315a747b	gitlab-ci: add make Gallium ST Clover LLVM-6.0 build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	e80f88c48a	gitlab-ci: add make Gallium ST Clover LLVM-5.0 build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	cc85f50029	gitlab-ci: add make Gallium ST Clover LLVM-4.0 build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	984e295500	gitlab-ci: add make Gallium ST Clover LLVM-3.9 build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	d0dff24cbb	gitlab-ci: add make Gallium Drivers "Other" build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	055cfbc6de	gitlab-ci: add make Gallium Drivers RadeonSI build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	7b26a19f31	gitlab-ci: add make Gallium Drivers SWR build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	bbdc563c11	gitlab-ci: add make loaders/classic DRI build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	f33517bda7	gitlab-ci: add meson gallium ST "Other" build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	8dab707ab8	gitlab-ci: add meson gallium ST Clover (LLVM 7.0) build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	8744ac0904	gitlab-ci: add meson gallium ST Clover (LLVM 6.0) build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	b5a70af062	gitlab-ci: add meson gallium ST Clover (LLVM 5.0) build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	d407ead204	gitlab-ci: add meson gallium "other drivers" build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	06e8f1961b	gitlab-ci: add meson gallium RadeonSI build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	360c814bfe	gitlab-ci: add meson gallium SWR build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	d73265e20d	gitlab-ci: add meson loader/classic DRI build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	6a19ec9daa	gitlab-ci: add scons SWR build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	d4c6d4d5cb	gitlab-ci: add scons llvm 3.5 build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	06b245b438	gitlab-ci: add a scons no-llvm build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	89a7467899	gitlab-ci: add a make vulkan build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	46d23c0a46	gitlab-ci: add a meson vulkan build Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Eric Engestrom	329f5cd780	gitlab-ci: add ubuntu container Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-02-06 17:56:30 +00:00
Marek Olšák	42a1cd034d	radeonsi: use local ws variable in si_need_dma_space Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-02-06 11:17:21 -05:00
Marek Olšák	2c4911c652	radeonsi: don't leak an index buffer if draw_vbo fails Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-02-06 11:17:21 -05:00
Marek Olšák	d72c319867	radeonsi: make allocator_zeroed_memory unmappable and use bigger buffers Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-02-06 11:17:21 -05:00
Marek Olšák	5068dec5de	radeonsi: clear allocator_zeroed_memory with SDMA so that it can be used in parallel IBs. This also removes the SO_FILLED_SIZE hack. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-02-06 11:17:21 -05:00
Marek Olšák	7d4c935654	radeonsi: initialize textures using DCC to black when possible Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2019-02-06 11:17:21 -05:00
Jonathan Marek	3361305f57	freedreno: a2xx: fix fast clear Fixes: `912a9c8d` Signed-off-by: Jonathan Marek <jonathan@marek.ca> Cc: 19.0 <mesa-stable@lists.freedesktop.org>	2019-02-06 14:34:57 +00:00
Eric Engestrom	54fa5eceae	egl: use coherent variable names `EGLDisplay` variables (the opaque Khronos type) have mostly been consistently called `dpy`, as this is the name used in the Khronos specs. However, `_EGLDisplay` variables (our internal struct) have been randomly called `dpy` when there was no local variable clash with `EGLDisplay`s, and `disp` otherwise. Let's be consistent and use `dpy` for the Khronos type, and `disp` for our struct. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Eric Anholt <eric@anholt.net>	2019-02-06 11:53:24 +00:00
Alyssa Rosenzweig	a81d5587d6	meson: Remove panfrost from default driver list Until the kernel side matures and the full driver is upstreamed, to avoid end-user surprises, Panfrost should only be built for the adventurous. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-06 02:59:00 +00:00
Eric Anholt	3c08ecf147	v3d: Whitespace consistency fix.	2019-02-05 15:46:42 -08:00
Eric Anholt	940501a446	v3d: Fix copy-propagation of input unpacks. I had a single function for "does this do float input unpacking" with two major flaws: It was missing the most common thing to try to copy propagate a f32 input nunpack to (the VFPACK to an FP16 render target) along with several other ALU ops, and also would try to propagate an f32 unpack into a VFMUL which only does f16 unpacks. instructions in affected programs: 659232 -> 655895 (-0.51%) uniforms in affected programs: 132613 -> 135336 (2.05%) and a couple of programs increase their thread counts. The uniforms hit appears to be a pattern in generated code of doing (-a >= a) comparisons, which when a is abs(b) can result in the abs instruction being copy propagated once but not fully DCEed.	2019-02-05 15:46:04 -08:00
Eric Anholt	e5c6938590	v3d: Fix input packing of .l for rounding/fdx/fdy. Avoids a regression in dEQP-GLES3.functional.shaders.derivate.fwidth.texture.* once we start copy-propagating more input packs.	2019-02-05 15:45:23 -08:00
Eric Anholt	1a4170952d	v3d: Fix pack/unpack of VFPACK operand unpacks. We want to be able to copy propagate our texture unpacks into the vfpack.	2019-02-05 15:45:23 -08:00
Eric Anholt	d0fdbd4211	v3d: Fix dumping of shaders with alpha test. We were trying to print a NULL entry from the table.	2019-02-05 15:42:14 -08:00
Eric Anholt	bdef17b052	v3d: Store the actual mask of color buffers present in the key. If you only bound rt 1+, we'd still emit a write to the rt0 that isn't present (noticed while debugging an ext_framebuffer_multisample-alpha-to-coverage-no-draw-buffer-zero regression in another change).	2019-02-05 15:42:04 -08:00
Eric Anholt	17a649af05	v3d: Fix precompile of FRAG_RESULT_DATA1 and higher outputs. I was just leaving the other MRT targets than DATA0 out, by accident.	2019-02-05 15:35:49 -08:00
Kristian H. Kristensen	ba4b22011a	st/nir: Use src/ relative include path for autotools Fixes: `cdc53fa81c` Acked-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-02-05 14:19:51 -08:00
Kenneth Graunke	8fa54bc549	gallium: Add a PIPE_CAP_NIR_COMPACT_ARRAYS capability bit. Iris would like to use compact arrays for tesslevels and clip/cull distances. radeonsi will likely want to switch to these at some point, since it'll be necessary for GL_ARB_gl_spirv support, but it's not ready for them just yet. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-05 13:58:46 -08:00
Kenneth Graunke	cf731564e6	st/nir: Call nir_lower_clip_cull_distance_arrays(). Today, st always sets LowerCombinedClipCullDistance, causing the GLSL IR lowering to run, giving us vec4[2] arrays. I would like to disable this and instead run the NIR lowering so that we get compact float[] arrays instead. Calling the new pass is a noop if the GLSL IR pass has already run, so it's safe to call the pass unconditionally. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-05 13:58:46 -08:00
Kenneth Graunke	15c6902117	nir: Avoid splitting compact arrays into per-element variables. Compact arrays are used for special variables like clip and cull distances, or tessellation levels. Drivers using compact arrays assume that these values will always be actual arrays. We don't want to turn a float[1] gl_CullDistance into a single float; that would confuse drivers. Today, i965 uses compact arrays, and Gallium drivers use nir_lower_io_arrays_to_elements, so we haven't had any overlap that would demonstrate the issue. Iris will use both. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-05 13:58:46 -08:00
Kenneth Graunke	ba9dcc80fb	nir: Avoid clip/cull distance lowering multiple times. A couple places in st/nir assume that cull distances have been lowered away, so it will need to call this lowering pass for drivers which opt out of the GLSL IR lowering. The Intel backend also calls this pass, for i965 and anv. We need to only do it once. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-05 13:58:46 -08:00
Kenneth Graunke	5730364d69	nir: Bail on clip/cull distance lowering if GLSL IR already did it. We have a GLSL IR pass to convert clip/cull distance float[] arrays into vec4[2] arrays. In `ff281e6204`, we attempted to skip this pass if the GLSL IR lowering had already run. But, that code was not quite right, as we forgot to strip away the per-vertex IO array layer for geometry and tessellation shader varyings. If the GLSL IR pass has run, the variables will not be marked as "compact". So we can simply check that and bail. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-05 13:58:46 -08:00
Kenneth Graunke	ef99f4c8d1	compiler: Mark clip/cull distance arrays as compact before lowering. nir_lower_clip_cull_distance_arrays() marks the combined clip/cull distance array as compact. However, when translating in from GLSL or SPIR-V, we were not marking the original float[] arrays as compact. We should do so. That way, we can detect these corner cases properly. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-02-05 13:58:46 -08:00
Kenneth Graunke	3327c93510	nir: Record info->fs.pixel_center_integer in lower_system_values radeonsi uses a system value for gl_FragCoord rather than an input var. These get translated into load_frag_coord NIR intrinsics, which lose the pixel_center_integer and origin_upper_left decorations. To cope with this, Tim added a shader_info field for pixel_center_integer, and made glsl_to_nir set it accordingly. prog_to_nir also needs to handle these fragcoord conventions. Instead of duplicating the logic to set the info field, just move it to nir_lower_system_values so it'll happen regardless of who makes the NIR. (For what it's worth, we don't need an info flag for origin_upper_left, because radeonsi lowers origin conventions in nir_lower_wpos_ytransform before nir_lower_system_values destroys the variable and qualifiers.) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-05 13:51:52 -08:00
Kenneth Graunke	536abd453b	program: Extend prog_to_nir handle system values. Some drivers, such as radeonsi, use a system value for gl_FragCoord rather than an input variable. In this case, our Mesa IR will have a PROGRAM_SYSTEM_VALUE register, which we need to translate. This makes prog_to_nir work for Gallium drivers which expose the PIPE_CAP_TGSI_FS_POSITION_IS_SYSVAL capability bit. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-05 13:51:51 -08:00
Kenneth Graunke	fa38ca25f6	program: Use u_bit_scan64 in prog_to_nir. We can simply iterate the bits rather than using util_last_bit and checking each one up until that point. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-05 13:51:50 -08:00
Kenneth Graunke	a01ad3110a	st/mesa: Add NIR versions of the PBO upload/download shaders. Acked-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Rob Clark <robdclark@gmail.com> Tested-by: Eric Anholt <eric@anholt.net>	2019-02-05 13:43:42 -08:00
Kenneth Graunke	a02349b9e7	st/mesa: Add a NIR version of the OES_draw_texture built-in shaders. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Rob Clark <robdclark@gmail.com> Tested-by: Eric Anholt <eric@anholt.net>	2019-02-05 13:43:41 -08:00
Kenneth Graunke	be492affa8	st/mesa: Add NIR versions of the clear shaders. We implement the basic VS and FS, as well as the VS that does layered clears by writing gl_Layer from the vertex shader. Drivers which need a geometry shader for writing layer continue falling back to TGSI, as I didn't need this and so didn't bother implementing it. (We certainly could, however, if people want to add it in the future.) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Rob Clark <robdclark@gmail.com> Tested-by: Eric Anholt <eric@anholt.net>	2019-02-05 13:43:39 -08:00
Kenneth Graunke	3f28b245b5	st/mesa: Add NIR versions of the drawpixels Z/stencil fragment shaders. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Rob Clark <robdclark@gmail.com> Tested-by: Eric Anholt <eric@anholt.net>	2019-02-05 13:43:37 -08:00
Kenneth Graunke	2d45f9fa25	st/mesa: Add a NIR version of the drawpixels/bitmap VS copy shader. This provides a native NIR version of the DrawPixels/Bitmap passthrough vertex shader. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Rob Clark <robdclark@gmail.com> Tested-by: Eric Anholt <eric@anholt.net>	2019-02-05 13:43:36 -08:00
Kenneth Graunke	cdc53fa81c	st/nir: Make new helpers for constructing built-in NIR shaders. The state tracker generates several built-in shaders in order to perform scissored clears, upload/download PBOs, and so on. These are currently constructed using TGSI, using ureg and u_simple_shader. I want to have NIR versions of these shaders, for my Gallium driver that has a NIR backend but no TGSI support. To that end, we'll want a few helpers to help construct simple shaders. This patch adds two new helpers: - st_nir_finish_builtin_shader() takes a manually constructed NIR shader, applies lowering passes (like st_link_nir would do for GLSL), and constructs the pipe_shader_state. - st_nir_make_passthrough_shader() makes a simple passthrough shader, which copies inputs to outputs. This is similar to u_simple_shaders. v2: Set info->fs.untyped_color_outputs for vc4/v3d (thanks Eric!). Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Rob Clark <robdclark@gmail.com> Tested-by: Eric Anholt <eric@anholt.net>	2019-02-05 13:43:33 -08:00
Kenneth Graunke	4f799264d1	st/nir: Move varying setup code to a helper function. I want to reuse this for built-in shaders. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Rob Clark <robdclark@gmail.com> Tested-by: Eric Anholt <eric@anholt.net>	2019-02-05 13:43:02 -08:00
Jason Ekstrand	36734987a5	nir/deref: Drop zero ptr_as_array derefs They are effectively (&x)[0] or *&x which does nothing. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-02-05 15:17:19 -06:00
Eric Anholt	aaef12702f	nir: Move V3D's "the shader was TGSI, ignore FS output types" flag to NIR. Ken's rework of mesa/st builtins to NIR means that we'll have more NIR shaders with color output types that are mismatched with the render target types. Since this is behavior that GLSL doesn't require, add it as a shader_info option so the driver can know that it needs to ignore the FS output's base type in favor of the actual render target's. This prevents needing additional variants in several mesa/st paths (clear, pbo upload, pbo download), given that the driver already has to handle the variants for any TGSI being passed to it (from u_blitter, for example). Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-05 12:12:33 -08:00
Emil Velikov	8943eb8f03	anv: wire up the state_pool_padding test Cc: Jason Ekstrand <jason@jlekstrand.net> Fixes: `927ba12b53` ("anv/tests: Adding test for the state_pool padding.") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com><Paste> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-02-05 11:39:36 -08:00
Karol Herbst	a61c388d07	nvc0/ir: replace cvt instructions with add to improve shader performance gives me an performance boost of 0.2% in pixmark_piano on my gk106, gm204 and gp107. reduces the amount of generated convert instructions by roughly 30% in shader-db. v2: only for 32 bit operations move some common code out of the switch handle OP_SAT with modifiers v3: only for registers and const memory rework if clauses merge isCvt into this patch v4: merge isCvt into its use Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-02-05 20:35:38 +01:00
Bart Oldeman	a203eaa4f4	gallium-xlib: query MIT-SHM before using it. When Mesa is compiled for gallium-xlib using e.g. ./configure --enable-glx=gallium-xlib --disable-dri --disable-gbm -disable-egl and is used by an X server (usually remotely via SSH X11 forwarding) that does not support MIT-SHM such as XMing or MobaXterm, OpenGL clients report error messages such as Xlib: extension "MIT-SHM" missing on display "localhost:11.0". ad infinitum. The reason is that the code in src/gallium/winsys/sw/xlib uses MIT-SHM without checking for its existence, unlike the code in src/glx/drisw_glx.c and src/mesa/drivers/x11/xm_api.c. I copied the same check using XQueryExtension, and tested with glxgears on MobaXterm. This issue was reported before here: https://lists.freedesktop.org/archives/mesa-users/2016-July/001183.html Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Cc: <mesa-stable@lists.freedesktop.org>	2019-02-05 17:53:35 +00:00
Alok Hota	6e5eb4ead6	swr/rast: update SWR rasterizer shader stats Primarily refactoring internal stats types Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2019-02-05 11:41:25 -06:00
Michel Dänzer	c0a540f320	loader/dri3: Use strlen instead of sizeof for creating VRR property atom sizeof counts the terminating null character as well, so that also contributed to the ID computed for the X11 atom. But the convention is for only the non-null characters to contribute to the atom ID. Fixes: `2e12fe425f` "loader/dri3: Enable adaptive_sync via _VARIABLE_REFRESH property" Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-02-05 17:18:44 +00:00
Jonathan Marek	4f0a3c9f9e	nir: add missing vec opcodes in lower_bool_to_float Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-05 15:34:15 +00:00
Gert Wollny	b0b3de2be7	mesa: release references to image textures when a context is destroyed When a texture is still bound as an image and the context it was bound in is destroyed but not the texture, then the texture will still hold the resource and will not be freed when it is finally destroyed. Hence, release these references when the context is destroyed. This leak was triggered by virglrenderer: https://gitlab.freedesktop.org/virgl/virglrenderer/issues/86 Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-02-05 10:53:41 +00:00
Gert Wollny	f1f3640f6f	radeonsi: release tokens after creating the shader program ureg_get_tokens clears the reference to the tokens, and create_compute_state makes a copy, hence the tokens must be explicitely released. Fixes: Direct leak of 256 byte(s) in 1 object(s) allocated from: #0 0x7ff729cf3c60 in realloc (/usr/lib64/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so+0xdbc60) #1 0x7ff721b1240c in tokens_expand ../../samba/mesa/src/gallium/auxiliary/tgsi/tgsi_ureg.c:234 #2 0x7ff721b1c9c0 in get_tokens ../../samba/mesa/src/gallium/auxiliary/tgsi/tgsi_ureg.c:257 #3 0x7ff721b1c9c0 in copy_instructions ../../samba/mesa/src/gallium/auxiliary/tgsi/tgsi_ureg.c:2040 #4 0x7ff721b1c9c0 in ureg_finalize ../../samba/mesa/src/gallium/auxiliary/tgsi/tgsi_ureg.c:2090 #5 0x7ff721b1e919 in ureg_get_tokens ../../samba/mesa/src/gallium/auxiliary/tgsi/tgsi_ureg.c:2167 #6 0x7ff721f8b35a in si_create_dma_compute_shader ../../samba/mesa/src/gallium/drivers/radeonsi/si_shaderlib_tgsi.c:219 #7 0x7ff722043ed9 in si_compute_do_clear_or_copy ../../samba/mesa/src/gallium/drivers/radeonsi/si_compute_blit.c:156 #8 0x7ff7220448d3 in si_clear_buffer ../../samba/mesa/src/gallium/drivers/radeonsi/si_compute_blit.c:247 #9 0x7ff7220350e8 in vi_dcc_clear_level ../../samba/mesa/src/gallium/drivers/radeonsi/si_clear.c:274 Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-02-05 11:50:54 +01:00
Caio Marcelo de Oliveira Filho	8c7c543936	isl: assert that Gen8+ don't have bit6_swizzling v2: Rewrite the condition to more clearly match the comment. (Jordan) Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-04 20:44:41 -08:00
Caio Marcelo de Oliveira Filho	5299c9cbcc	anv: skip bit6 swizzle detection in Gen8+ It is always false on Gen8+. Also, move the variable definition near its use. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-04 20:44:41 -08:00
Caio Marcelo de Oliveira Filho	60740eade3	i965: skip bit6 swizzle detection in Gen8+ It is always false on Gen8+. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-04 20:44:41 -08:00
Caio Marcelo de Oliveira Filho	51547bbc5a	nir: keep the phi order when splitting blocks All things being equal is better to keep the original order. Since the new block is empty, push the phis in order to tail. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Daniel Schürmann <daniel.schuermann@campus.tu-berlin.de>	2019-02-04 20:41:13 -08:00
Ilia Mirkin	38f542783f	nv50,nvc0: add explicit settings for recent caps Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: 19.0 <mesa-stable@lists.freedesktop.org>	2019-02-04 23:36:46 -05:00
Alyssa Rosenzweig	e67e072637	panfrost: Implement Midgard shader toolchain This patch implements the free Midgard shader toolchain: the assembler, the disassembler, and the NIR-based compiler. The assembler is a standalone inaccessible Python script for reference purposes. The disassembler and the compiler are implemented in C, accessible via the standalone `midgard_compiler` binary. Later patches will use these interfaces from the driver for online compilation. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Eric Anholt <eric@anholt.net> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2019-02-05 01:26:28 +00:00
Alyssa Rosenzweig	61d3ae6e0b	panfrost: Initial stub for Panfrost driver This patch adds an initial stub for the Gallium driver, containing simple screen functions and the majority of the driver headers but no actual functionality. It further adds the winsys glue for linking in this stub driver via kmsro on Rockchip/Amlogic boards. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Eric Anholt <eric@anholt.net> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2019-02-05 01:19:30 +00:00
Marek Olšák	742d6cdb42	radeonsi: fix crashing performance counters (division by zero) Fixes: `e2b9329f17` "radeonsi: move remaining perfcounter code into si_perfcounter.c"	2019-02-04 18:46:25 -05:00
Marek Olšák	a03ecbaeec	radeonsi: handle render_condition_enable in si_compute_clear_render_target	2019-02-04 18:46:25 -05:00
Sonny Jiang	984fd73515	radeonsi: use compute for clear_render_target when possible Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-02-04 18:46:25 -05:00
Kenneth Graunke	dc46317d1a	st/mesa: Set pipe_image_view::shader_access in PBO readpixels. Commit `8b626a22b2` introduced a new pipe_image_view::shader_access field, indicating the access mode specified in the shader. st/mesa's built-in PBO download shader creates a write-only image buffer, so we should flag it as such. Nobody uses this field yet (Iris will), so we don't need to backport this fix to stable branches. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-02-04 11:17:56 -08:00
Rodrigo Vivi	56c3b4971d	intel: Add more PCI Device IDs for Coffee Lake and Ice Lake. Align with kernel commits: 5e0f5a58b167 ("drm/i915/cfl: Adding another PCI Device ID.") 03ca3cf8e9aa ("drm/i915/icl: Adding few more device IDs for Ice Lake") Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Cc: Anuj Phogat <anuj.phogat@gmail.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-02-04 10:05:25 -08:00
Danylo Piliaiev	64d3b148fe	anv: Fix VK_EXT_transform_feedback working with varyings packed in PSIZ Transform feedback did not set correct SO_DECL.ComponentMask for varyings packed in VARYING_SLOT_PSIZ: gl_Layer - VARYING_SLOT_LAYER in VARYING_SLOT_PSIZ.y gl_ViewportIndex - VARYING_SLOT_VIEWPORT in VARYING_SLOT_PSIZ.z gl_PointSize - VARYING_SLOT_PSIZ in VARYING_SLOT_PSIZ.w Fixes: `36ee2fd61c` "anv: Implement the basic form of VK_EXT_transform_feedback" Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-02-04 15:30:43 +00:00
Danylo Piliaiev	b7a93cbded	radv: Handle VK_ATTACHMENT_UNUSED in CmdClearAttachment From the Vulkan 1.0.98 spec for vkCmdClearAttachments: "If any attachment to be cleared in the current subpass is VK_ATTACHMENT_UNUSED, then the clear has no effect on that attachment." "If the aspectMask member of any element of pAttachments contains VK_IMAGE_ASPECT_COLOR_BIT, then the colorAttachment member of that element must either refer to a color attachment which is VK_ATTACHMENT_UNUSED, or must be a valid color attachment." "If the aspectMask member of any element of pAttachments contains VK_IMAGE_ASPECT_DEPTH_BIT, then the current subpass' depth/stencil attachment must either be VK_ATTACHMENT_UNUSED, or must have a depth component" "If the aspectMask member of any element of pAttachments contains VK_IMAGE_ASPECT_STENCIL_BIT, then the current subpass' depth/stencil attachment must either be VK_ATTACHMENT_UNUSED, or must have a stencil component" Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-04 14:50:43 +02:00
Danylo Piliaiev	d76e777988	anv: Handle VK_ATTACHMENT_UNUSED in colorAttachment From the Vulkan 1.0.98 spec for vkCmdClearAttachments: "If the aspectMask member of any element of pAttachments contains VK_IMAGE_ASPECT_COLOR_BIT, then the colorAttachment member of that element must either refer to a color attachment which is VK_ATTACHMENT_UNUSED, or must be a valid color attachment." Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-02-04 14:49:50 +02:00
Samuel Pitoiset	0d0affad3c	radv: don't flush src stages when dstStageMask == BOTTOM_OF_PIPE Original patch by Fredrik Höglund. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-04 13:19:14 +01:00
Samuel Pitoiset	9efa3405a7	radv: do not set preserveAttachments for internal render passes We don't use that. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-04 13:19:14 +01:00
Samuel Pitoiset	80e809d993	radv: drop useless checks when resolving subpass color attachments The Vulkan spec says: "If pResolveAttachments is not NULL, for each resolve attachment that does not have the value VK_ATTACHMENT_UNUSED, the corresponding color attachment must not have the value VK_ATTACHMENT_UNUSED." Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-04 13:19:14 +01:00
Samuel Pitoiset	76c17cfd8d	radv: execute external subpass barriers after ending subpasses Outgoing dependencies (ie. external) should happen after the subpass. This doesn't change anything for subpass resolves as we already make sure that attachments are shader readable. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-04 13:19:14 +01:00
Samuel Pitoiset	b482c030f5	radv: accumulate all ingoing external dependencies to the first subpass In case two or more subpasses declare ingoing external dependencies. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-04 13:19:14 +01:00
Samuel Pitoiset	eaab35e5e3	radv: handle subpass dependencies correctly The different masks should be accumulated. For example if two subpasses declare an outgoing dependency (ie. dst == VK_SUBPASS_EXTERNAL). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-04 13:19:14 +01:00
Samuel Pitoiset	6430616e77	radv: track if subpasses have color attachments Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-04 13:19:14 +01:00
Samuel Pitoiset	1e810f1c53	radv: add radv_render_pass_add_subpass_dep() helper To share common code that handles subpass dependencies. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-04 13:19:14 +01:00
Samuel Pitoiset	2472907563	radv: move some render pass things to radv_render_pass_compile() radv_render_pass_compile() is common to vkCreateRenderPass() and vkCreateRenderPass2(). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-04 13:19:14 +01:00
Samuel Pitoiset	b509013060	radv: handle final layouts at end of every subpass and render pass That shouldn't change anything as we check if the last subpass id is the final subpass. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-04 13:18:38 +01:00
Samuel Pitoiset	5699ac0078	radv: determine the last subpass id for every attachments Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-04 13:17:59 +01:00
Samuel Pitoiset	e1a0a268c6	radv: use the new attachments array when starting subpasses Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-04 13:17:57 +01:00
Samuel Pitoiset	a20c2e38d8	radv: store the list of attachments for every subpass This reworks how the depth stencil attachment is used for simplicity. This also introduces radv_render_pass_compile() helper that will be used for further optimizations. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-04 13:17:54 +01:00
Samuel Pitoiset	a7c7d811f1	radv: move subpass image transitions to radv_cmd_buffer_begin_subpass() Instead of doing them in radv_cmd_buffer_set_subpass(). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-04 13:17:52 +01:00
Samuel Pitoiset	291a933786	radv: add radv_cmd_buffer_begin_subpass() helper To unify some code in BeginRenderPass() and NextSubpass(). Based on Intel ANV driver. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-04 13:17:50 +01:00
Samuel Pitoiset	41199e2eeb	radv: remove useless MAYBE_UNUSED in CmdBeginRenderPass() Trivial. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-04 13:17:46 +01:00
Samuel Pitoiset	545552c9b9	radv: remove unused radv_render_pass_attachment::view_mask Trivial. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-04 13:17:42 +01:00
Samuel Pitoiset	0f932bbede	radv: bail out when no image transitions will be performed Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-04 13:17:40 +01:00
Marek Olšák	1e85cfb91a	meson: drop the xcb-xrandr version requirement autotools doesn't have any requirement. This fixes meson on Ubuntu 16.04. Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-02-03 18:39:57 -05:00
Eric Engestrom	808bf59cac	wsi/display: add comment Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Keith Packard <keithp@keithp.com>	2019-02-02 23:08:03 +00:00
Jason Ekstrand	0aa5a97b03	relnotes: Add VK_EXT_buffer_device_address	2019-02-02 08:42:14 -06:00
Jason Ekstrand	48ed2a7bb0	anv: Implement VK_EXT_buffer_device_address Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-02-01 17:09:42 -06:00
Jason Ekstrand	e644ed468f	intel/fs: Implement nir_intrinsic_global_atomic_* eviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-01 16:11:00 -06:00
Jason Ekstrand	a91f392073	intel/fs: Use SENDS for A64 writes on gen9+ eviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-01 16:11:00 -06:00
Jason Ekstrand	1c25bf4373	intel/fs: Implement load/store_global with A64 untyped messages eviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-01 16:11:00 -06:00
Jason Ekstrand	b4f0d062cd	intel/fs: Do the grf127 hack on SIMD8 instructions in SIMD16 mode Previously, we only applied the fix to shaders with a dispatch mode of SIMD8 but the code it relies on for SIMD16 mode only applies to SIMD16 instructions. If you have a SIMD8 instruction in a SIMD16 shader, neither would trigger and the restriction could still be hit. Fixes: `232ed89802` "i965/fs: Register allocator shoudn't use grf127..." Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-01 16:11:00 -06:00
Jason Ekstrand	79724a0756	intel/fs: Properly handle 64-bit types in LOAD_PAYLOAD By just assigning dst.type to src[i].type, we ensure that the offset at the end of the loop actually offsets it by the right number of registers. Otherwise, we'll get into a case where we copy with a Q type and then offset with a D type and things get out of sync. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-01 16:10:57 -06:00
Jason Ekstrand	f02914a991	intel/fs/cse: Split create_copy_instr into three cases Previously, we tried to combine all cases where the instruction being CSE'd writes to more than one MOV worth of registers into one case with a bit of special casing for LOAD_PAYLOAD. This commit splits things so that LOAD_PAYLOAD is entirely it's own case. This makes tweaking the LOAD_PAYLOAD case simpler in the next commit. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-01 16:10:40 -06:00
Jason Ekstrand	f409a08e5f	intel/nir: Add global support to lower_mem_access_bit_sizes Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-01 16:08:29 -06:00
Oscar Blumberg	fea5b8e5ad	intel/fs: Fix memory corruption when compiling a CS Missing check for shader stage in the fs_visitor would corrupt the cs_prog_data.push information and trigger crashes / corruption later when uploading the CS state. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-01 10:53:33 -08:00
Jason Ekstrand	ab940b0d97	spirv: Support LocalSizeId and LocalSizeHintId execution modes Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-01 17:34:02 +00:00
Jason Ekstrand	7223590c42	spirv: Handle OpExecutionModeId Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-01 17:34:02 +00:00
Jason Ekstrand	e68871f6a4	spirv: Handle constants and types before execution modes We already defer handling the actual execution modes until after we've created the shader. This just moves it a tiny bit further so we actually have constants and types and can handle OpExecutionModeId. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-01 17:34:02 +00:00
Jason Ekstrand	7d862ef530	spirv: Rework handling of spec constant workgroup size built-ins Instead of handling it as part of the handling of constant instructions, just stash the vtn_value when we see the decoration and handle it explicitly later. This will let us re-order handling of constant instructions without breaking the Vulkan SPIR-V requirement that decorating a specialization constant as the WorkgroupSize built-in overrides the workgroup size set as an execution mode. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-01 17:34:02 +00:00
Jason Ekstrand	9b37e93e42	spirv: Replace vtn_constant_value with vtn_constant_uint The uint version is less typing, supports different bit sizes, and is probably a bit more safe because we're actually verifying that the SPIR-V value is an integer scalar constant. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-02-01 17:34:02 +00:00
Samuel Pitoiset	5e7f800f32	radv: fix build Fixes: `9b9ccee4d6` ("radv: take LDS into account for compute shader occupancy stats") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-01 15:31:55 +01:00
Timothy Arceri	9b9ccee4d6	radv: take LDS into account for compute shader occupancy stats Ported from `d205faeb6c`. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-02-01 22:25:30 +11:00
Timothy Arceri	a53d68d318	ac/radv/radeonsi: add ac_get_num_physical_sgprs() helper Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-02-01 22:25:30 +11:00
Gurchetan Singh	574186f0e8	docs: add GL_EXT_texture_compression_s3tc_srgb to release notes Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-02-01 10:01:59 +00:00
Gurchetan Singh	dc9a15aefb	st/mesa: expose EXT_texture_compression_s3tc_srgb Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-02-01 10:01:59 +00:00
Gurchetan Singh	a2ab400719	i965: Set flag for EXT_texture_compression_s3tc_srgb Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-02-01 10:01:59 +00:00
Gurchetan Singh	db24132d80	mesa/main: Expose EXT_texture_compression_s3tc_srgb Required for the following test: bin/compressedteximage GL_COMPRESSED_SRGB_ALPHA_S3TC_DXT1_EXT pass when emulating GL on GLES. Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-02-01 10:01:59 +00:00
Timothy Arceri	0f3a8e1b64	st/glsl_to_nir: remove dead local variables Without this we do not end up with a deterministic NIR because temporary register variables are added in random order. NIR must be deterministic because we use it to produce a sha for the radeonsi backends disk cache. This fixes the shader cache for a bunch of shaders. Another positive is that this results in a large reduction in the size of the NIR that the state tracker stores to the disk cache. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-02-01 15:56:02 +11:00
Dylan Baker	4052142de7	meson: remove -std=c++11 from intel/tools for meson all C++ code is already compiled as C++11, so it's unnecessary. It's also the wrong way to do this, if we really needed this the correct way is to set: ```meson executable( ... override_options : ['cpp_std=c++11'], ) ``` Which ensures not only that the correct syntax for the current compiler is used, but also that meson doesn't create arguments like `-std=c++14 ... -std=c++11` Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-01-31 21:42:16 +00:00
Dylan Baker	8e49b32f63	meson: fix style in intel/tools The `:` in options should always have one space before and after `foo : bar`, and lists do not get spaces around the braces: `[foo]` not `[ foo ]` Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-01-31 21:42:16 +00:00
Dylan Baker	d93d53fa72	meson: remove build_by_default : true Which is and has always been the default. This is largely an artifact of how the building of these tools was controlled when the meson build was originally created. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-01-31 21:42:16 +00:00
Emil Velikov	1240c3cb10	docs: update calendar, add news item and link release notes for 18.3.3 Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2019-01-31 21:17:38 +00:00
Emil Velikov	83160c6c05	docs: add sha256 checksums for 18.3.3 Signed-off-by: Emil Velikov <emil.velikov@collabora.com> (cherry picked from commit `7475d7727f`)	2019-01-31 21:15:20 +00:00
Emil Velikov	4d0732dc39	docs: add release notes for 18.3.3 Signed-off-by: Emil Velikov <emil.velikov@collabora.com> (cherry picked from commit `190a79f462`) [Emil: drop VERSION hunk] Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Conflicts: VERSION	2019-01-31 21:14:56 +00:00
Neha Bhende	69d736b17a	st/mesa: Fix topogun-1.06-orc-84k-resize.trace crash We need to initialize all fields in rs->prim explicitly while creating new rastpos stage. Fixes: `bac8534267` ("st/mesa: allow glDrawElements to work with GL_SELECT feedback") v2: Initializing all fields in rs->prim as per Ilia. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-01-31 12:21:59 -07:00
Dylan Baker	c812c740e6	android,autotools,i965: Fix location of float64_glsl.h Android.mk and autotools disagree about where generated files should go, which wasn't a problem until we wanted to build a dist tarball. This corrects the problem by changing the output and include paths to be the same on android and autotools (meson already has the correct include path). Fixes: `7d7b30835c` ("automake: Fix path to generated source") Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-01-31 19:04:30 +00:00
Marek Olšák	d49c16a597	gallium: allow more PIPE_RESOURCE_ driver flags radeonsi has 8 and will probably have 9 soon. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-01-31 13:10:42 -05:00
Eric Anholt	ab4d5775b0	v3d: Fix image_load_store clamping of signed integer stores. This was copy-and-paste fail, that oddly showed up in the CTS's reinterprets of r32f, rgba8, and srgba8 to rgba8i, but not r32ui and r32i to rgba8i or reinterprets to other signed int formats. Fixes: `6281f26f06` ("v3d: Add support for shader_image_load_store.")	2019-01-31 08:39:40 -08:00
Eric Anholt	db2ae51121	mesa: Skip partial InvalidateFramebuffer of packed depth/stencil. One of the CTS cases tries to invalidate just stencil of packed depth/stencil, and we incorrectly lost the depth contents. Fixes dEQP-GLES3.functional.fbo.invalidate.whole.unbind_read_stencil Fixes: `0c42b5f3cb` ("mesa: wire up InvalidateFramebuffer") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-01-31 08:37:46 -08:00
Rob Clark	39cfdf9930	freedreno: more fixing release tarball Fixes: `aa0fed10d3` freedreno: move ir3 to common location Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-01-31 09:59:18 -05:00
Rob Clark	e252656d14	freedreno: fix release tarball Fixes: `b4476138d5` freedreno: move drm to common location Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-01-31 09:59:18 -05:00
Emmanuel Gil Peyrot	0d4dd59ae5	docs: make bugs.html easier to find Thanks to Yann Kervran for the report and suggestions. Signed-off-by: Emmanuel Gil Peyrot <linkmauve@linkmauve.fr> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-31 14:31:48 +00:00
Dave Airlie	9279a28f07	virgl: ARB_query_buffer_object support v1.1: fix size define. Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-01-31 11:23:38 +10:00
Dave Airlie	38658c6d4d	virgl: enable elapsed time queries GL underneath always has GL_TIME_ELAPSED so always enable these. Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-01-31 11:23:30 +10:00
Dylan Baker	da48cba61e	automake: Add --enable-autotools to distcheck flags Fixes: `e68777c87c` ("autotools: Deprecate the use of autotools") Reviewed-by: Matt Turner <mattst88@gmail.com>	2019-01-30 19:32:44 +00:00
Marek Olšák	ffbd37d8e9	radeonsi: fix a comment typo in si_fine_fence_set	2019-01-30 14:32:05 -05:00
Marek Olšák	f4eb746ef7	r600: add -Wstrict-overflow=0 to meson to silence the warning same as radeonsi	2019-01-30 12:49:45 -05:00
Marek Olšák	d50bef9831	winsys/amdgpu: remove amdgpu_drm.h definitions trivial	2019-01-30 12:38:56 -05:00
Marek Olšák	16672f16da	radeonsi: unify error paths in si_texture_create_object	2019-01-30 12:35:22 -05:00
Marek Olšák	2361558eb7	radeonsi: merge & rename texture BO metadata functions	2019-01-30 12:35:22 -05:00
Marek Olšák	1c12d56e4d	radeonsi: enable dithered alpha-to-coverage for better quality same as AMDVLK. GL_NV_alpha_to_coverage_dither_control allows controlling this behavior. The default is implementation-dependent.	2019-01-30 12:35:22 -05:00
Dylan Baker	b4986d2e0c	gallium: wrap u_screen in extern "C" for c++ Some drivers (notabily SWR) are written in C++, and as such they need access to C headers with extern "C". So lets add that.	2019-01-30 15:12:27 +00:00
Gert Wollny	45903cddc3	mesa/core: Enable EXT_texture_sRGB_R8 also for desktop GL As of Nov/30/2018 the extension is also valid for OpenGL >= 1.2, so enable it accordingly and also add the required view class entry. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-01-30 11:32:40 +00:00
Samuel Pitoiset	9c762c01c8	radv/winsys: fix hash when adding internal buffers This fixes serious stuttering in Shadow Of The Tomb Raider. Fixes: `50fd253bd6` ("radv/winsys: Add priority handling during submit.") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-30 12:29:10 +01:00
Erik Faye-Lund	3b6f95ad66	mesa: expose NV_conditional_render on GLES The extension spec has been updated to include GLES 2 support, so let's enable it there. v2: fixup ABI-check as well Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-01-30 09:43:44 +01:00
Ernestas Kulik	90458bef54	v3d: Fix leak in resource setup error path Reported by Coverity: in the case of unsupported modifier request, the code does not jump to the “fail” label to destroy the acquired resource. CID: 1435704 Signed-off-by: Ernestas Kulik <ernestas.kulik@gmail.com> Fixes: `45bb8f2957` ("broadcom: Add V3D 3.3 gallium driver called "vc5", for BCM7268.")	2019-01-29 16:14:13 -08:00
Ernestas Kulik	f6e49d5ad0	vc4: Fix leak in HW queries error path Reported by Coverity: in the case where there exist hardware and non-hardware queries, the code does not jump to err_free_query and leaks the query. CID: 1430194 Signed-off-by: Ernestas Kulik <ernestas.kulik@gmail.com> Fixes: `9ea90ffb98` ("broadcom/vc4: Add support for HW perfmon")	2019-01-29 16:14:13 -08:00
Eric Anholt	6053c7bb43	v3d: Fix a release build set-but-unused compiler warning.	2019-01-29 16:02:51 -08:00
Eric Anholt	0c05198d6b	v3d: Always enable the NEON utile load/store code. I can't imagine the new HW block being paired with a v6 CPU, so don't bother with the CPU detection that vc4 had to do. Improves 1024x1024 TexImage on my 7278 by 47.3229% +/- 0.679632%	2019-01-29 16:00:25 -08:00
Emil Velikov	385843ac3c	vc4: Declare the last cpu pointer as being modified in NEON asm. Earlier commit addressed 7 of the 8 instances available. v2: Rebase patch back to master (by anholt) Cc: Carsten Haitzler (Rasterman) <raster@rasterman.com> Cc: Eric Anholt <eric@anholt.net> Fixes: `300d3ae8b1` ("vc4: Declare the cpu pointers as being modified in NEON asm.") Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2019-01-29 16:00:25 -08:00
Dylan Baker	75ad254acf	docs: Add relnotes stub for 19.1	2019-01-29 15:32:16 -08:00
Dylan Baker	dba0989ac1	bump version for 19.0 branch	2019-01-29 15:30:25 -08:00
Dylan Baker	90a7a9c973	automake: Add include dir for nir src directory Fixes: `6281f26f06` ("v3d: Add support for shader_image_load_store.") Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-01-29 23:24:57 +00:00
Dylan Baker	82365595e9	automake: Add float64.glsl to dist tarball Fixes: `b63a1f8e40` ("glsl: Create file to contain software fp64 functions") Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-01-29 23:24:57 +00:00
Dylan Baker	7d7b30835c	automake: Fix path to generated source Fixes: `b63a1f8e40` ("glsl: Create file to contain software fp64 functions") Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-01-29 23:24:57 +00:00
Matt Turner	9de90caca8	nir: Optimize double-precision lower_round_even() Use the trick of adding and then subtracting 2**52 (52 is the number of explicit mantissa bits a double-precision floating-point value has) to implement round-to-even. Cuts the number of instructions on SKL of the piglit test fs-roundEven-double.shader_test from 109 to 21. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-01-29 15:02:23 -08:00
Marek Olšák	3e249b853e	ac: use the correct LLVM processor name on Raven2 Reviewed-by: Alex Deucher <alexander.deucher@amd.com>	2019-01-29 17:46:55 -05:00
Eric Anholt	f7769b5121	v3d: Fix the autotools build. Noticed while looking at the gitlab-CI MR.	2019-01-29 14:00:27 -08:00
Jonathan Marek	31a1348a66	freedreno: fix sysmem rendering being used when clear is used This batch->cleared value is only used to decide to use sysmem rendering or not, so it should include any buffers that are affected by a clear. This is required because the a2xx fast clear doesn't work with sysmem rendering. The a22x "normal" clear path doesn't work with sysmem either. Signed-off-by: Jonathan Marek <jonathan@marek.ca>	2019-01-29 20:22:33 +00:00
Jonathan Marek	c93d77431f	freedreno: fix depth usage logic Depth can be used even when there is no restore/resolve of depth. This happens when the depth buffer is invalidated after rendering to avoid the resolve operation. Signed-off-by: Jonathan Marek <jonathan@marek.ca>	2019-01-29 20:22:33 +00:00
Jonathan Marek	bcefa0f1cb	freedreno: fix invalidate logic Set dirty bits on invalidate to trigger invalidate logic in fd_draw_vbo. Also, resource_written for color needs to be after the invalidate logic. Signed-off-by: Jonathan Marek <jonathan@marek.ca>	2019-01-29 20:22:32 +00:00
Jonathan Marek	786f9639d6	mesa/st: wire up DiscardFramebuffer Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-29 20:22:32 +00:00
Rob Clark	0c42b5f3cb	mesa: wire up InvalidateFramebuffer And before someone actually starts implementing DiscardFramebuffer() lets rework the interface to something that is actually usable. Signed-off-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-29 20:22:32 +00:00
Jonathan Marek	e685566612	st/dri: invalidate_resource depth/stencil before flush_resource This allows freedreno to be aware of the depth invalidate when flushing batches on flush_resource. AFAIK, the only other driver which might care about this change is vc4, where I think it should help by allowing the depth invalidate to work with GALLIUM_HUD. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-29 20:22:32 +00:00
Mario Kleiner	820dfcea43	egl/wayland-drm: Only announce formats via wl_drm which the driver supports. Check if a pixel format is supported by the Wayland servers gpu driver before exposing it to the client via wl_drm, so we avoid reporting formats to the client which the server gpu can't handle. Restrict this reporting to the new color depth 30 formats for now, as the ARGB/XRGB8888 and RGB565 formats are probably supported by every gpu under the sun. Atm. this is mostly useful to allow proper PRIME renderoffload for depth 30 formats on the typical Intel iGPU + NVidia dGPU "NVidia Optimus" laptop combo. Tested on Intel, AMD, NVidia with single-gpu setup and on a Intel + NVidia Optimus setup. Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2019-01-29 20:03:20 +00:00
Mario Kleiner	a34b0d68bb	egl/wayland: Allow client->server format conversion for PRIME offload. (v2) Support PRIME render offload between a Wayland server gpu and a Wayland client gpu with different channel ordering for their color formats, e.g., between Intel drivers which currently only support ARGB2101010 and XRGB2101010 import/display and nouveau which only supports ABGR2101010 rendering and display on nv-50 and later. In the wl_visuals table, we also store for each format an alternate sibling format which stores colors at the same precision, but with different channel ordering, e.g., ARGB2101010 <-> ABGR2101010. If a given client-gpu renderable format is not supported by the server for import, but the alternate format is supported by the server, expose the client-gpu renderable format as a valid EGLConfig to the client. At eglSwapBuffers time, during the blitImage() detiling blit from the client backbuffer to the linear buffer, the client format is converted to the server supported format. As we have to do a copy for PRIME anyway, this channel swizzling conversion comes essentially for free. Note that even if a server gpu in principle does support sampling from the clients native format, this conversion will be a performance advantage if it allows to convert to the servers preferred format for direct scanout, as the Wayland compositor may then be able to directly page-flip a fullscreen client wl_buffer onto the primary plane, or onto a hardware overlay plane, avoiding an extra data copy for desktop composition. Tested so far under Weston with: nouveau single-gpu, Intel single-gpu, AMD single-gpu, "Optimus" Intel server iGPU for display + NVidia client dGPU for rendering. v2: Implement minor review comments by Eric Engestrom: Add some comment and assert, and some style fixes for clarity. No functional change. Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2019-01-29 20:03:20 +00:00
Jason Ekstrand	a920979d4f	intel/fs: Use split sends for surface writes on gen9+ Surface reads don't need them because they just have the one address payload. With surface writes, on the other hand, we can put the address and the data in the different halves and avoid building the payload all together. The decrease in register pressure and added freedom in register allocation resulting from this change reduces spilling enough to improve the performance of one customer benchmark by about 2x. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	014edff0d2	intel/fs: Add interference between SENDS sources Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	eab1c55590	intel/fs: Support SENDS in SHADER_OPCODE_SEND Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	cca199fd85	intel/disasm: Properly disassemble split sends Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	8babaa84e8	intel/eu: Add support for the SENDS[C] messages Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	d6a6e10390	intel/inst: Indent some code We're about to add some more if cases so let's have the giant re-indent in it's own patch to make review easier. Acked-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	d96969120d	intel/inst: Fix the ia16_addr_imm helpers These have clearly never seen any use.... On gen8, the bottom 4 bits are missing so we need to shift them off before we call set_bits and shift again when we get the bits. Found by inspection. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	e46fb33143	intel/disasm: Rework SEND decoding to use descriptors Instead of fetching the information out of the instruction directly, fetch the descriptor and then pluck the information out of the descriptor. The current scheme works ok for SEND but with SENDS, it all falls to pieces because the descriptor is completely shuffled around. This commit doesn't actually convert everything. One notable exception is URB messages which don't even use descriptors in emit_urb_WRITE yet. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	13a6fabc62	intel/eu: Add more message descriptor helpers We want to be able to extract data from descriptors as well as unify a bit of the descriptor construction. One of the unifications we do is to unify the read/write and dataport descriptors. On gen4-5, read/write are substantially different and the read descriptors change between gen4 and gen4.x. On gen6, they unified layouts between read, write, and dataport. Then, on gen8, they added one bit to the message type field but left it reserved MBZ for read/write messages. This commit chooses to treat that as if they expanded the field everywhere and just didn't have enough enum values for read/write to bother with the extra bit. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	c3aa436bfe	intel/eu/validate: SEND restrictions also apply to SENDC Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	fee6bd8d8e	intel/eu: Use GET_BITS in brw_inst_set_send_ex_desc It's a bit more readable Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	b284d222db	intel/fs: Use SHADER_OPCODE_SEND for varying UBO pulls on gen7+ Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	8514eba693	intel/fs: Use SHADER_OPCODE_SEND for texturing on gen7+ Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	f547cebbe0	intel/fs: Use a logical opcode for IMAGE_SIZE Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	d2d3e04501	intel/fs: Use SHADER_OPCODE_SEND for surface messages Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	7f1cf046cd	intel/fs: Add a generic SEND opcode Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	ba3c5300f9	intel/eu: Rework surface descriptor helpers This commit pulls the surface descriptor helpers out into brw_eu.h and makes them no longer depend on the codegen infrastructure. This should allow us to use them directly from the IR code instead of the generator. This change is unfortunately less mechanical than perhaps one would like but it should be fairly straightforward. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	5b17379631	intel/eu: Add has_simd4x2 bools to surface_write functions Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	2ce93b88c0	intel/fs: Take an explicit exec size in brw_surface_payload_size() Instead of magically falling back to SIMD8 for atomics and typed messages on Ivy Bridge, explicitly figure out the exec size and pass that into brw_surface_payload_size. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	cf42b0f9e2	intel/fs: Handle IMAGE_SIZE in size_read() and is_send_from_grf() Like all the other sends, it's just mlen * REG_SIZE. Fixes: `3cbc02e469` "intel: Use TXS for image_size when we have..." Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	009c0bd840	intel/defines: Explicitly cast to uint32_t in SET_FIELD and SET_BITS If you pass a bool in as the value to set, the C standard says that it gets converted to an int prior to shifting. If you try to set a bool to bit 31, this lands you in undefined behavior. It's better just to add the explicit cast and let the compiler delete it for us. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Jason Ekstrand	077b9557a4	intel/fs: Get rid of fs_inst::equals There are piles of fields that it doesn't check so using it is a lie. The only reason why it's not causing problem is because it has exactly one user which only uses it for MOV instructions (which aren't very interesting) and only on Sandy Bridge and earlier hardware. Just get rid of it and inline it in the one place that it's actually used. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-29 18:43:55 +00:00
Rob Clark	446a14bc0a	freedreno: minor cleanups Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-01-29 12:30:50 -05:00
Rob Clark	c3baa077bf	freedreno: stop frob'ing pipe_resource::nr_samples Previously we tried to normalize nr_samples to MAX2(1, nr_samples) to avoid having to deal with 0 vs 1 everywhere. But this causes problems in mesa/st, for example st_finalize_texture() will think there is a nr_samples mismatch and recreate the texture. Somehow this manifests as corrupt x11 font rendering on generations that do not support MSAA (but apparently works fine on a5xx and a6xx which do support MSAA.) Fixes: `cf0c7258ee` freedreno/a5xx: MSAA Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-01-29 12:30:50 -05:00
Rob Clark	1a6ddfe5ee	freedreno/a6xx: fix blitter nr_samples check nr_samples for non-MSAA case could be either zero or one. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-01-29 12:22:08 -05:00
Rob Clark	9106a0fe33	freedreno/a5xx: fix blitter nr_samples check nr_samples for non-MSAA case could be either zero or one. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-01-29 12:21:19 -05:00
Bas Nieuwenhuizen	69edc972fc	radv: Enable VK_EXT_memory_priority. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-01-29 15:56:56 +01:00
Bas Nieuwenhuizen	50fd253bd6	radv/winsys: Add priority handling during submit. Switched to the raw bo list api to avoid having to use 2 arrays for everything. This was introduced in libdrm 2.4.97 which we already depend upon. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-01-29 15:56:52 +01:00
Bas Nieuwenhuizen	ead54d4a42	radv/winsys: Set winsys bo priority on creation. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-01-29 15:56:41 +01:00
Samuel Pitoiset	3a8d6c0880	radv: re-enable fast depth clears for 16-bit surfaces on VI This has been disabled some months ago because it introduced rendering issues with Shadow Of Warrier II (DXVK). This game is no longer affected, I wonder if `824cfc1ee5` ("radv: rework the TC-compat HTILE hardware bug with COND_EXEC") fixed the problem. I checked The Forest on my Polaris, and it renders fine too. According to Phillip, this gives +5.5% with Rise Of The Tomb Raider and DXVK. This is because DXVK uses 16-bit depth surfaces while the native port from Feral uses 32-bit depth surfaces. Unfortunately, Shadow Of The Tomb Raider isn't affected because it clears each layer of a D16 array texture individually. So it doesn't hit the fast clear path. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-29 15:20:55 +01:00
Eric Anholt	932ed9c00b	vc4: Enable NEON asm on meson cross-builds. The core Mesa with_asm_arch and USE_ARM_ASM flags are disabled for meson cross-builds because of the need to run host binaries on the build system. vc4 doesn't need to do that, so skip with_asm_arch to enable NEON on my cross-builds. Fixes: `ebcb4c2156` ("meson: Enable VC4's NEON assembly support.")	2019-01-28 16:45:48 -08:00
Carsten Haitzler (Rasterman)	300d3ae8b1	vc4: Declare the cpu pointers as being modified in NEON asm. Otherwise, the compiler is free to reuse the register containing the input for another call and assume that the value hasn't been modified. Fixes crashes on texture upload/download with current gcc. We now have to have a temporary for the cpu2 value, since outputs must be lvalues. (commit message by anholt) Fixes: `4d30024238` ("vc4: Use NEON to speed up utile loads on Pi2.")	2019-01-28 16:45:45 -08:00
Carsten Haitzler (Rasterman)	522f688471	vc4: Use named parameters for the NEON inline asm. This makes the asm code more intelligible and clarifies the functional change in the next commit. (commit message and commit squashing by anholt)	2019-01-28 16:40:46 -08:00
Jonathan Marek	f6292c32cc	kmsro: Add freedreno renderonly support Signed-off-by: Jonathan Marek <jonathan@marek.ca>	2019-01-28 18:25:27 -05:00
Jonathan Marek	7d458c0c69	freedreno: a2xx: add perfcntrs Based on a5xx perfcntrs implementation. Signed-off-by: Jonathan Marek <jonathan@marek.ca>	2019-01-28 18:21:16 -05:00
Jonathan Marek	cccec0b457	freedreno: a2xx: minor solid_vertexbuf fixups The big thing here is the 0x60 offset for the mem2gmem copy which I missed in my last patch. Signed-off-by: Jonathan Marek <jonathan@marek.ca>	2019-01-28 18:21:16 -05:00
Jonathan Marek	912a9c8d8c	freedreno: a2xx: clear fixes and fast clear path This fixes the depth/stencil clear on a20x, and adds a fast clear path. The fast clear path is only used for a20x, needs performance tests on a22x. Signed-off-by: Jonathan Marek <jonathan@marek.ca>	2019-01-28 18:21:16 -05:00
Jonathan Marek	cb2322c7c0	freedreno: a2xx: a20x hw binning Signed-off-by: Jonathan Marek <jonathan@marek.ca>	2019-01-28 18:21:16 -05:00
Jonathan Marek	501c6e70d4	freedreno: update a2xx registers Signed-off-by: Jonathan Marek <jonathan@marek.ca>	2019-01-28 18:21:16 -05:00
Timothy Arceri	fb78a6cb72	glsl: use remap location when serialising uniform program resource data This allows us to avoid expensive string compares since we already have a map to the pointers. These compares were taking ~30 seconds for a single shader compile in Godot due to it using 64,000+ uniforms. Fixes: `c4cff5f402` ("glsl: add basic support for resource list to shader cache") Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109229	2019-01-29 09:39:54 +11:00
Vinson Lee	be5b271ea7	meson: Fix typo. meson.build:166:21: ERROR: Unknown method "verson_compare" for a string. Fixes: `c1efa240c9` ("meson: Add warnings and errors when using ICC") Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Cc: 18.3 <mesa-stable@lists.freedesktop.org>	2019-01-28 10:47:32 -08:00
Jonathan Marek	7c930d99ad	freedreno: a2xx: enable early-Z testing Enable earlyZ when alpha test is disabled. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-01-28 13:04:41 -05:00
Jonathan Marek	32b1d2d716	freedreno: a2xx: ir2 cleanup Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-01-28 13:04:41 -05:00
Rob Herring	41a0acd6a1	Switch imx to kmsro and remove the imx winsys The kmsro winsys is equivalent to the imx winsys, so we can switch to it and remove the imx one. Signed-off-by: Rob Herring <robh@kernel.org>	2019-01-28 11:50:08 -06:00
Rob Herring	827e0d6654	kmsro: Add etnaviv renderonly support Enable using etnaviv for KMS renderonly. This still needs KMS driver name mapping to kmsro to be used automatically. Acked-by: Eric Anholt <eric@anholt.net> Signed-off-by: Rob Herring <robh@kernel.org>	2019-01-28 11:45:43 -06:00
Eric Anholt	272b6cf58f	kmsro: Extend to include hx8357d. This allows vc4 to initialize on the Adafruit PiTFT 3.5" touchscreen with the hx8357d tinydrm driver v2: Whitespace fix noted by Eric Engestrom, update commit message for the driver being merged. v3: Rebase on Rob Herring's pipe-loader changes. Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1) Acked-by: Emil Velikov <emil.velikov@collabora.com> (v1)	2019-01-28 09:35:45 -08:00
Rob Herring	511e7b6f61	pipe-loader: Fallback to kmsro driver when no matching driver name found If we can't find a driver matching by name, then use the kmsro driver. This removes the need for needing a driver descriptor for every possible KMS driver. Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-28 09:35:45 -08:00
Eric Anholt	ed65aeec78	pl111: Rename the pl111 driver to "kmsro". The vc4 driver can do prime sharing to many different KMS-only devices, such as the various tinydrm drivers for SPI-attached displays. Rename the driver away from "pl111" to represent what it will actually support: various sorts of KMS displays with the renderonly layer used to attach a GPU. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2019-01-28 09:35:45 -08:00
Samuel Pitoiset	afeef3cacf	radv: set noalias/dereferenceable LLVM attributes based on param types Instead of using this useless array_params_mask variable. This should set these two attributes to streamout buffers too. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-28 16:30:38 +01:00
Samuel Pitoiset	320b058d32	radv: simplify allocating user SGPRS for descriptor sets Unnecesary to check the current stages if desc_set_used_mask is used. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-28 16:30:36 +01:00
Samuel Pitoiset	d1994ed229	radv: remove radv_userdata_info::indirect field Always false. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-28 16:30:33 +01:00
Gert Wollny	212c0c630a	mesa/main: Expose EXT_sRGB_write_control Use EXT_framebuffer_sRGB to expose EXT_sRGB_write_control on GLES. Remove the checks for desktion GL in the enable calls, since EXT_framebuffer_sRGB now also indicates support for switching the linear-sRGB color space conversion on GLES. Thanks to Ilia Mirkin for all the helpful discussions that helped to rework this series. v2: Fix alphabetical listing of extensions (Tapani Pälli) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)	2019-01-28 12:18:40 +01:00
Gert Wollny	1013dfece1	mesa/main/version: Lower the requirements for GLES 3.0 GLES 3.0 does not actually require support for EXT_framebuffer_sRGB, it only needs support for sRGB attachments to framebuffers and framebuffer objects as defined in ARB_framebuffer_objects. v2: Clarify that ARB_framebuffer_objects is needed. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-01-28 12:18:40 +01:00
Gert Wollny	76c3f6fb3f	mesa/main: Use flag for EXT_sRGB instead of EXT_framebuffer_sRGB where possible All drivers that support EXT_framebuffer_sRGB also support EXT_sRGB, but in order to keep this commit minial, and not to break any drivers both flags are checked. v2: - Use only EXT_sRGB (Ilia Mirkin) - Move adding the flag EXT_sRGB to gl_extensions to a separate patch v3: use _mesa_has_EXT_framebuffer_sRGB instead of extension flag The _mesa_has function also checks for the correct versions and should be preferred over using the flags directly (Erik) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-01-28 12:18:40 +01:00
Gert Wollny	8f9dfb7d88	mesa/st: rework support for sRGB framebuffer attachements For GLES sRGB framebuffer attachemnt support is provided in two steps: sRGB attachments like described in EXT_sRGB (and GLES 3.0) that enable linear to sRGB color space transformation automatically, and the ability to switch formats of the render target surface between sRGB and linear that introduces full support for EXT_framebuffer_sRGB. Set the according flags to reflect these two levels of sRGB support. As a difference between desktopm GL and GLES, on desktop GL for a sRGB framebuffer attachment the linear-sRGB conversion is turned off by default, and for GLES it is turned on. This needs to be taken into account when initally creating a surface, i.e. on desktop GL creation of a sRGB surface is preferred, but on GLES sRGB surfaces are only created when explicitely requested. v2: - Use the new CAPS name Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-01-28 12:18:40 +01:00
Gert Wollny	385081cd17	i965: Set flag for EXT_sRGB Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: <Gurchetan Singh gurchetansingh@chromium.org> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-01-28 12:18:40 +01:00
Gert Wollny	7577c82fed	mesa:main: Add flag for EXT_sRGB to gl_extensions EXT_sRGB is an (incomplete) GLES extension that provides support for sRGB framebuffer attachments, hence it can be used to check for this support as an alternative to EXT_framebuffer_sRGB that provies the same functionality but also sRGB write control support. However, since EXT_sRGB is incomplete and superseted by GLES 3.0 it will not be exposed as an extension. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-01-28 12:18:40 +01:00
Gert Wollny	2845939d6a	virgl: Set sRGB write control CAP based on host capabilities v2: - Use the renamed CAPS - add assetions to make sure that mesa doesn't try to switch destination surface formats when it is not supported. (Ilia Mirkin) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2019-01-28 12:18:40 +01:00
Gert Wollny	8021f1875e	Gallium: Add new CAPS to indicate whether a driver can switch SRGB write Add a new cap that indicates whether the drivers supports enabling/disabling the conversion from linear space to sRGB for a framebuffer attachment. In Driver terms that this CAP indicates whether the driver can switcht between a linear and and a sRGB surface format for draw destinations witout changing the sourface itself. v2: rename CAP to DEST_SURFACE_SRGB_CONTROL to reflect its purpouse better (pointed out by Ilia Mirkin) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-01-28 12:18:40 +01:00
Neil Roberts	75b3719c4f	spirv: Don't use special semantics when counting vertex attribute size Under Vulkan, the double vertex attributes take up the same size regardless of whether they are vertex inputs or any other stage interface. Under OpenGL (ARB_gl_spirv), from GLSL 4.60 spec, section 4.3.9 Interface Blocks: "It is a compile-time error to have an input block in a vertex shader or an output block in a fragment shader. These uses are reserved for future use." So we also don't need to check if it is an vertex input or not, and use false in any case. v2: (changes made by Alejandro Piñeiro) * Update required after "spirv: Handle location decorations on block interface members" own updates (original patch was sent several months ago) * After Neil suggesting it, confirm that this change can be also done for OpenGL (ARB_gl_spirv). Expand commit message. v3: update after changing name of main method on a previous patch Signed-off-by: Neil Roberts <nroberts@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-01-28 11:42:46 +01:00
Neil Roberts	5c797f7354	glsl_types: Rename parameter of glsl_count_attribute_slots glsl_count_attribute_slots takes a parameter to specify whether the type is being used as a vertex input because on GL double attributes only take up one slot. Vulkan doesn’t make this distinction so this patch renames the argument to is_gl_vertex_input in order to make it more clear that it should always be false on Vulkan. v2: minor variable renaming (s/member/member_type) (Tapani) Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-01-28 11:42:46 +01:00
Neil Roberts	dfc3a7cb3c	spirv/nir: handle location decorations on block interface members Previously the code was taking any location decoration on the block and using that to calculate the member locations for all of the members. I think this was assuming that there would only be one location decoration for the entire block. According to the Vulkan spec it is possible to add location decorations to individual members: “If the structure type is a Block but without a Location, then each of its members must have a Location decoration. If it is a Block with a Location decoration, then its members are assigned consecutive locations in declaration order, starting from the first member which is initially the Block. Any member with its own Location decoration is assigned that location. Each remaining member is assigned the location after the immediately preceding member in declaration order.” This patch makes it instead keep track of which members have been assigned an explicit location. It also has a space to store the location for the struct as a whole. Once all the decorations have been processed it iterates over each member to fill in the missing locations using the rules described above. So, this commit is needed to get working a case like this, on both Vulkan and OpenGL using SPIR-V (ARB_gl_spirv): out block { layout(location = 2) vec4 c; layout(location = 3) vec4 d; layout(location = 0) vec4 a; layout(location = 1) vec4 b; } name; v2: (changes made by Alejandro Piñeiro) * Update after introducing struct member splitting (See commit `b0c643d`) * Update after only exposing interface_type for blocks, not to any struct * Update after last changes done for xfb support v3: use "assign" instead of "add" on the new method added (Tapani) Signed-off-by: Neil Roberts <nroberts@igalia.com> Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-01-28 11:42:46 +01:00
Christian Gmeiner	34458c1cf6	etnaviv: add linear sampling support Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>	2019-01-28 07:36:12 +01:00
Christian Gmeiner	42ca4dda2d	etnaviv: update headers from rnndb Update to etna_viv commit 4d2f857. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>	2019-01-28 07:36:09 +01:00
Christian Gmeiner	5b4a155d2b	etnaviv: extend etna_resource with an addressing mode Defines how sampler (and pixel pipes) needs to access the data represented with a resource. The used default is mode is ETNA_ADDRESSING_MODE_TILED. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>	2019-01-28 07:36:05 +01:00
Ilia Mirkin	d1d2bb8c07	nvc0: don't put text segment into bufctx The text segment is shared among multiple contexts, while each one has its own bufctx. So when reallocating the text segment, some contexts may end up with stale values in their bufctx's. Instead limit the exposure to the bufctx to within a single draw. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-01-27 21:47:09 -05:00
Timothy Arceri	0907ae35ad	radv/ac: fix some fp16 handling Fixes: `b722b29f10` ("radv: add support for 16bit input/output") Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-28 10:41:48 +11:00
Eric Anholt	c496b60ed8	v3d: Create separate sampler states for the various blend formats. The sampler border color is encoded in the TMU's blending format (half floats, 32-bit floats, or integers) and must be clamped to the format's range unorm/snorm/int ranges by the driver. Additionally, the TMU doesn't know about how we're abusing the swizzle to support BGRA, A, and LA, so we have to pre-swizzle the border color for those. We don't really want to spend half a kb on sampler states in most cases, so skip generating the variants when the border color is unused or is 0,0,0,0.	2019-01-27 08:30:03 -08:00
Eric Anholt	5fe4250a2c	v3d: Move the sampler state to the long-lived state uploader. Samplers are small (8-24 bytes), so allocating 4k for them is a huge waste.	2019-01-27 08:30:03 -08:00
Eric Anholt	09472006ff	v3d: Use the symbolic names for wrap modes from the XML.	2019-01-27 08:30:03 -08:00
Eric Anholt	c51d125d18	v3d: Fix stencil sampling from a separate-stencil buffer. When the sampler view is in sample-stencil mode, we need to return uint stencil values. To do that, fill in the format table to return R8I, and have the sampler view point at the separate stencil buffer. Fixes dEQP-GLES31.functional.stencil_texturing.format.depth32f_stencil8_2d	2019-01-27 08:30:03 -08:00
Eric Anholt	8a0b0a8f37	v3d: Fix stencil sampling from packed depth/stencil. We need to pick the 8-bit unorm value out, not the depth component.	2019-01-27 08:30:03 -08:00
Eric Anholt	fcdbd441a2	v3d: Fix release-build warning about utile_h.	2019-01-27 08:30:03 -08:00
Eric Anholt	edb1fcd963	v3d: Flush blit jobs immediately after generating them. Fixes OOMs in the CTS's packed_pixels.varied_rectangle.* tests -- the series of texture uploads at the start before texturing occurred would end up all sitting around as cached jobs for reuse. By flushing immediately, peak active BO usage goes from 150M to 40M. We could maybe put some limits on how many jobs we keep around, but blits seem particularly unlikely to get reused for other drawing.	2019-01-27 08:30:03 -08:00
Eric Anholt	ac333ffa59	v3d: Fix BO stats accounting for imported buffers.	2019-01-27 08:30:03 -08:00
Eric Anholt	060575bea8	v3d: Drop maximum number of texture units down to 16. This is the GLES 3.2 minmax, and also what the closed source driver does. Avoids hitting OOMs in the CTS's dEQP-GLES3.functional.texture.units.all_units.only_cube.1.	2019-01-27 08:30:03 -08:00
Eric Anholt	3e743d8cd8	v3d: Avoid duplicating limits defines between gallium and v3d core. We don't want to pull the compiler into every include in the gallium driver, so just make a new little header to store the limits.	2019-01-27 08:30:03 -08:00
Eric Anholt	fe6a21c867	v3d: Fix overly-large vattr_sizes structs. We want one vector size per vector, not per component.	2019-01-27 08:30:03 -08:00
Eric Anholt	533b3f0541	v3d: Rename gallium-local limits defines from VC5 to V3D. The compiler has its limits under V3D_* (like most V3D stuff), so sync up with that.	2019-01-27 08:30:03 -08:00
Bas Nieuwenhuizen	b4870a15ae	radv: Remove unused variable. Trivial.	2019-01-27 13:51:35 +01:00
Niklas Haas	804cc44d09	radv: add device->instance extension dependencies From the vulkan spec 33.3 "Extension Dependencies": "Any device extension that has an instance extension dependency that is not enabled by vkCreateInstance is considered to be unsupported, hence it must not be returned by vkEnumerateDeviceExtensionProperties for any VkPhysicalDevice child of the instance." Therefore we need to check whether the instance-level extensions are actually enabled when deciding to support a device-level extension or not. Furthermore, we need to do this for all instance-level extensions of any (transitive) device-level extension dependency, due to the following paragraph: "If an extension is supported (as queried by vkEnumerateInstanceExtensionProperties or vkEnumerateDeviceExtensionProperties), then required extensions of that extension must also be supported for the same instance or physical device." Finally, because some of these vulkan extensions may be implicitly promoted to future vulkan core API versions, we can also satisfy the dependency if the vulkan API version is high enough. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-27 13:50:35 +01:00
Niklas Haas	d12dc39396	radv: correctly use vulkan 1.0 by default From the vulkan spec 3.2 "Instances": "Providing a NULL VkInstanceCreateInfo::pApplicationInfo or providing an apiVersion of 0 is equivalent to providing an apiVersion of VK_MAKE_VERSION(1,0,0)." Fixes: `ffa15861ef` "radv: UseEnumerateInstanceVersion for the default version." Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-27 12:49:28 +01:00
Niklas Haas	d9bd3b1cb8	glsl: fix block member alignment validation for vec3 Section 7.6.2.2 (Standard Uniform Block Layout) of the GL spec says: The base offset of the first member of a structure is taken from the aligned offset of the structure itself. The base offset of all other structure members is derived by taking the offset of the last basic machine unit consumed by the previous member and adding one. The current code does not reflect this last sentence - it effectively instead aligns up the next offset up to the alignment of the previous member. This causes an issue in exactly one case: layout(std140) uniform block { layout(offset=0) vec3 var1; layout(offset=12) float var2; }; As per section 7.6.2.1 (Uniform Buffer Object Storage) and elsewhere, a vec3 consumes 3 floats, i.e. 12 basic machine units. Therefore, `var1` in the example above consumes units 0-11, with 12 being the first available offset afterwards. However, before this commit, mesa incorrectly assumes `var2` must start at offset=16 when using explicit offsets, which results in a compile-time error. Without explicit offsets, the shaders actually work fine, indicating that mesa is already correctly aligning these fields internally. (Just not in the code that handles explicit buffer offset parsing) This patch should fix piglit tests: ssbo-explicit-offset-vec3.vert ubo-explicit-offset-vec3.vert Signed-off-by: Niklas Haas <git@haasn.xyz> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-01-27 03:00:03 -05:00
Jason Ekstrand	86e5f76d3d	spirv: Add support for SPV_EXT_physical_storage_buffer Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-26 13:41:50 -06:00
Jason Ekstrand	fb282a68bc	spirv: Implement OpConvertPtrToU and OpConvertUToPtr This only implements the actual opcodes and does not implement support for using them with specialization constants. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-26 13:41:50 -06:00
Jason Ekstrand	837ed2ba51	spirv: Handle OpTypeForwardPointer We handle forward declarations by creating the pointer type with it's storage type based on storage class and just waiting to fill out the actual deref type until we get the OpTypePointer. Because any composites using the forward declared type only care about the storage type (i.e. uint64_t, uvec2, etc.) when creating their glsl_type, this works fine and we can defer the actual deref_type as far as we need. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-01-26 13:41:50 -06:00
Jason Ekstrand	4602e705e4	spirv: Drop a bogus assert This was valid back when the only valid types of pointers were uint32 and uvec2. Now that we're allowing more variety, it could be just about anything so we'll just drop the assert. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-01-26 13:41:50 -06:00
Jason Ekstrand	9e34781aef	nir: Allow SSBOs and global to alias Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-26 13:41:50 -06:00
Jason Ekstrand	9839ce8bf9	nir/validate: Allow array derefs of vectors for nir_var_mem_global Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-01-26 13:39:18 -06:00
Jason Ekstrand	5f5503d498	nir/lower_io: Add support for nir_var_mem_global Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-01-26 13:39:18 -06:00
Jason Ekstrand	314d2c90c3	nir/lower_io: Add a 32 and 64-bit global address formats These are simple scalar addresses. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-26 13:39:18 -06:00
Jason Ekstrand	e461926ef2	nir: Add load/store/atomic global intrinsics These correspond roughly to reading/writing OpenCL global pointers. The idea is that they just take a bare address and load/store from it. Of course, exactly what this address means is driver-dependent. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-01-26 13:39:18 -06:00
Axel Davy	6380fedb60	st/nine: Enable debug info if NDEBUG is not set We want to have debug info as well if using meson's debugoptimized when ndebug is off. v2: use u_debug functions that do something even if DEBUG is not set. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2019-01-26 19:53:19 +01:00
Axel Davy	d7433c22e6	st/nine: Immediately upload user provided textures Fixes regression caused by `42d672fa6a` st/nine: Bind src not dst in nine_context_box_upload Before that patch, for user provided textures, when the texture was destroyed, the safety check for pending uploads, which according to the code "Following condition cannot happen currently", was flushing the queue and thus triggering the upload. After the patch, the texture destruction was delayed after the upload. However the user frees the texture buffer, as it thinks the texture released. Instead of reverting the faulty patch, this patch instead flushes the csmt queue right away after queuing the upload for this type of textures. This is more future-proof, as we may want to bind the surface for other reasons in the future. Signed-off-by: Axel Davy <davyaxel0@gmail.com> Cc: 18.3 <mesa-stable@lists.freedesktop.org>	2019-01-26 19:53:00 +01:00
Matt Turner	a7d629a590	i965: Always compile fp64 funcs when needed Compilation of user-specified shaders with software fp64 works by compiling on demand an "fp64-funcs" shader implementing various fp64 operations and then linking it into the "user shader". In commit `64b8c86d37` Author: Timothy Arceri <tarceri@itsqueeze.com> Date: Thu Jan 17 17:16:29 2019 +1100 glsl: be much more aggressive when skipping shader compilation we changed the behavior of the shader cache to skip compilation earlier when we get a cache hit. After the aforementioned commit, compiling a user program using fp64 would store into the cache an entry for the fp64-funcs shader. Subsequent compilations of uncached user shaders using fp64 would fail in compile_fp64_funcs() after finding a cache entry for the fp64-funcs, but being unprepared to read from the cache. It's unclear to me how to retrieve the cached NIR of the fp64-funcs (if it even is cached), so just call _mesa_glsl_compile_shader() with force_recompile=true in order to ensure we generate the fp64-funcs successfully. Tested-by: Mark Janes <mark.a.janes@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-01-26 10:33:22 -08:00
Matt Turner	18b467c066	intel/compiler: Add a file-level description of brw_eu_validate.c Acked-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-01-26 10:33:22 -08:00
Jonathan Marek	41ddf1d150	freedreno: add renderonly scanout This allows creating a fd_screen with a renderonly object which will be used to allocated scanout resources. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Eric Anholt <eric@anholt.net> [slight tweak to fix uninitialized 'prsc' in debug print] Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-01-26 10:47:21 -05:00
Rob Clark	cd79b5e0c2	freedreno/a2xx: fix unused variable warning Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-01-26 10:44:31 -05:00
Timothy Arceri	8e9ad592c3	tgsi: remove culldist semantic from docs The semantic was removed in `e6d9389366`. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-01-26 12:04:53 +11:00
Timothy Arceri	5d66f7103f	ac/nir_to_llvm: fix clamp shadow reference for more hardware Fixes the following piglit test on my VEGA and matches the behaviour in the tgsi backend. tests/spec/glsl-1.10/execution/samplers/glsl-fs-shadow2D-clamp-z.shader_test Fixes: `625dcbbc45` ("amd/common: pass address components individually to ac_build_image_intrinsic") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-01-26 12:03:24 +11:00
Eric Anholt	08f4a904b3	gallium: Make sure we return is_unorm/is_snorm for compressed formats. The util helpers were looking for a non-void channels in a non-mixed format and returning its snorm/unorm state. However, compressed formats don't have non-void channels, so they always returned false. V3D wants to use util_format_is_[su]norm for its border color clamping workarounds, so fix the functions to return the right answer for these. This now means that we ignore .is_mixed. I could retain the is_mixed check, but it doesn't seem like a useful feature -- the only code I could find that might care is freedreno's blit, which has some notes about how things are wonky in this area anyway. Reviewed-by: <Roland Scheidegger sroland@vmware.com>	2019-01-25 13:06:50 -08:00
Eric Anholt	104c7883e7	gallium: Fix comment about possible colorspaces. Two typos, and missing one of the colorspaces. Reviewed-by: <Roland Scheidegger sroland@vmware.com>	2019-01-25 13:06:47 -08:00
Eric Anholt	54abd2e084	gallium: Enable unit tests as actual meson unit tests. These tests don't need swrast, so we can always enable them when build_tests is set. Most of them run to successful completion quickly (.9s on my SKL). Reviewed-by: <Roland Scheidegger sroland@vmware.com>	2019-01-25 13:06:45 -08:00
Emil Velikov	3b6aaab7e9	mapi: print function declarations for shared glapi Earlier commit aimed to remove unneeded function declarations. Namely OpenGL entrypoints which are not applicable for OpenGLES* Although it did not consider the shared glapi which needs all, including hidden ones. Resulting in warning/errors like the following ../build/src/mapi/shared-glapi/glapi_mapi_tmp.h:26014:15: error: no previous prototype for ‘shared_dispatch_stub_1414’ [-Werror=missing-prototypes] This patch addressed that. Cc: Erik Faye-Lund <erik.faye-lund@collabora.com> Reported-by: Eric Anholt <eric@anholt.net> Fixes: `6148cce388` ("mapi: drop unneeded gl_dispatch_stub declarations") Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Acked-by: Eric Anholt <eric@anholt.net>	2019-01-25 13:04:04 -08:00
Rob Clark	4aa64940c6	freedreno: limit tiling to PIPE_BIND_SAMPLER_VIEW `1ce5d757d0` dropped this limit.. which is probably the right thing to do. But it results in an extra tiled->linear blit for glReadPixels() (ie. dEQP/piglit) which is hitting some intermittent corruption (looks like cache) on a6xx, causing a lot of spurious fails. Since we are getting close to 19.0 branchpoint, re-instate this limit for now, until the blitter problems are resolved. Fixes: `1ce5d757d0` freedreno: core buffer modifier support Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-01-25 10:20:05 -05:00
Samuel Pitoiset	378e2d2414	radv: fix computing number of user SGPRs for streamout buffers Streamout buffers are emitted like push constants. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-25 15:36:16 +01:00
Jose Fonseca	65b8d723fd	appveyor: Revert commits adding Cygwin support. This reverts commits `00ad77b9f6` and `5334dafee2`. This avoids Appveyor build breakage due to Cygwin, but more importantly, there are several problems with these patches, as highlighted to my recent mesa-dev mail. So better to revert for now, and pursue Cygwin support after these have been address.	2019-01-25 14:13:26 +00:00
Tapani Pälli	540939ecee	android: fix build issues with libmesa_anv_gen* libraries We need this include path to find nir/nir_xfb_info.h. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-01-25 15:21:06 +02:00
Andrii Simiklit	4759bb2fcf	intel/batch-decoder: fix a vb end address calculation According to the loop implementation (in 'ctx_print_buffer' function), which advances dword by dword over vertex buffer(vb), the vb size should be aligned by 4 bytes too. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109449 Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-25 15:12:30 +02:00
Andrii Simiklit	db39a44f10	intel/batch-decoder: fix vertex buffer size calculation for gen<8 It should be incremented by one according to how it is calculated by 'emit_vertex_buffer_state': "\#if GEN_GEN < 8 .BufferAccessType = step_rate ? INSTANCEDATA : VERTEXDATA, .InstanceDataStepRate = step_rate, \#if GEN_GEN >= 5 .EndAddress = ro_bo(bo, end_offset - 1), \#endif \#endif" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109449 Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-25 15:12:07 +02:00
Eric Engestrom	69e9440367	meson/vdpau: add missing soversion This mirrors what autotools does in src/gallium/state_trackers/vdpau/Makefile.am and src/gallium/targets/vdpau/Makefile.am: VDPAU_MAJOR = 1 VDPAU_MINOR = 0 libvdpau_gallium_la_LDFLAGS = -version-number $(VDPAU_MAJOR):$(VDPAU_MINOR) Reported-by: Igor Gnatenko <i.gnatenko.brain@gmail.com> Fixes: `68076b8747` "meson: build gallium vdpau state tracker" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-01-25 12:10:00 +00:00
Eric Engestrom	9af77fcf98	anv: drop always-successful VkResult Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-25 09:45:27 +00:00
Rafael Antognolli	f2ece26601	anv/allocator: Avoid race condition in anv_block_pool_map. Accessing bo->map and then pool->center_bo_offset without a lock is racy. One way of avoiding such race condition is to store the bo->map + center_bo_offset into pool->map at the time the block pool is growing, which happens within a lock. v2: Only set pool->map if not using softpin (Jason). v3: Move things around and only update center_bo_offset if not using softpin too (Jason). Cc: Jason Ekstrand <jason@jlekstrand.net> Reported-by: Ian Romanick <idr@freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109442 Fixes: `fc3f588320` Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-24 17:39:40 -08:00
Dylan Baker	c1efa240c9	meson: Add warnings and errors when using ICC ICC tries to be helpful by not erroring when it sees something that it doesn't understand, which is completely the opposite of helpful. Meson 0.49.0 does much better at handling this by really trying to make ICC error, but there are some things in mesa that still get ignored until 0.49.1 v2: - Fix id check, which is 'intel' not 'icc' Cc: 18.3 <mesa-stable@lists.freedesktop.org> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1)	2019-01-24 19:14:50 +00:00
Dylan Baker	7cb7f35bc7	meson: Fix compiler checks for SWR with ICC This is a bit fragile, as the way this "fixes" the check is to move the one that we know is correct before the one that is incorrectly reported as working. In meson 0.49.1 (which isn't out yet) this is fixed that the incorrect check is reported as a failure. Fixes: `e0b037d697` ("meson: Build SWR driver") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109129 Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-01-24 19:14:50 +00:00
Dylan Baker	3ba7ab8d2c	meson: fix swr KNL build There's a typo in one of the #defines that breaks compilation. Fixes: `e0b037d697` ("meson: Build SWR driver") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109023 Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-01-24 19:14:50 +00:00
Matt Turner	70a7ece035	gallivm: Return true from arch_rounding_available() if NEON is available LLVM uses the single instruction "FRINTI" to implement llvm.nearbyint. Fixes the rounding tests of lp_test_arit. Bug: https://bugs.gentoo.org/665570 Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2019-01-24 11:07:24 -08:00
Matt Turner	385ee7c3d0	gallium: Enable ASIMD/NEON on aarch64. NEON (now called ASIMD) is available on all aarch64 CPUs. Our code was missing an aarch64 path, leading to util_cpu_caps.has_neon always being false on aarch64. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-24 11:07:24 -08:00
Dave Airlie	1f6b92b476	gallium: use put image shm2 path (v2) This fixes the drisw paths to use the new shm2 interface, so that we don't trigger the X server overflow checks when the x offset is non-zero. This just hides the versioning in drisw, and either passes the src_x or adds the offset fixup for the fallback path. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Adam Jackson <ajax@redhat.com>	2019-01-25 04:27:45 +10:00
Dave Airlie	00af91ca46	glx: add support for putimageshm2 path (v2) v2: pass x,0 in as the offset coords at glx level not earlier Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Adam Jackson <ajax@redhat.com>	2019-01-25 04:27:45 +10:00
Dave Airlie	db83a2b40f	dri_interface: add put shm image2 (v2) This adds a new interface to the swrast interface to fix an shm put image bug. The current code adds the x,y src offsets into the offset parameters, however if the x offset is > 0, and the put image copies up to the height of the image, this can trigger an X server validation check to fail and the renderering to get BadMatch. This patch fixes it to pass the x offset coord in as a src x. We cannot pass the Y coordinate due to the horrible code mangling the image w/h vs stride in swrastXPutImage. v2: drop srcx,y from api Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Adam Jackson <ajax@redhat.com>	2019-01-25 04:27:45 +10:00
Emil Velikov	281421e1bc	mapi: remove machinery handling CSV files We haven't have one in years, so just drop the code. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-01-24 18:13:25 +00:00
Emil Velikov	8a0012692a	mapi: remove old, unused ES* generator code As of earlier commit, everyone has switched to the new script for the ES dispatch. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-01-24 18:13:25 +00:00
Emil Velikov	a41214ca3e	mapi/es2api: remove no longer present entrypoints With the previous scripts API from the following was incorrectly exported. Drop them from the list, since they're no longer around. GL_EXT_blend_func_extended GL_EXT_texture_integer Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-01-24 18:13:25 +00:00
Emil Velikov	05f8558b27	mapi/es*api: remove GL_EXT_multi_draw_arrays entrypoints Now we use the upstream XML file and a cleaner generator. Thus the symbols are no longer exported and we can drop them from this list. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-01-24 18:13:25 +00:00
Emil Velikov	5661ce6c64	mapi/es*api: remove GL_OES_EGL_image entrypoints As some point in the past we fixed the scripts so, these are no longer exported. Drop them from the list. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-01-24 18:13:25 +00:00
Emil Velikov	9f86f1da7c	Revert "mapi/new: sort by slot number" This reverts commit a1f5d9412cf7cacb3534635f6c2409fafbe6574e. We no longer needed to sort - it was meant only to ease compare against the old generated files.	2019-01-24 18:13:25 +00:00
Emil Velikov	3bf08292d2	scons: wire the new generator for es1 and es2 Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-01-24 18:13:25 +00:00
Emil Velikov	0842bc879b	meson: wire the new generator for es1 and es2 v2: use ${foo})_py naming (Dylan) v3: use symbolic name for genCommon.py Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> (v2)	2019-01-24 18:13:25 +00:00
Emil Velikov	656845301d	autotools: wire the new generator for es1 and es2 The output produced functionally identical, with the following changes: - A cosmetic: swapped ABI compatible types [ GLclampf -> GLfloat, etc ] - B cosmetic: renamed parameters [ zNear -> n, etc ] - C dropped extension entrypoints - invalid/incorrect To make things easier to validate, normalise both old/new headers run the sed patterns A, B and C to both sets. A s/\<GLclampf\>/GLfloat/g; s/\<GLclampx\>/GLfixed/g; s/\<GLvoid\>/void/g; B s/\ \* / */g; s/\<texture\>/target/g; s/\<plane\>/p/g; s/\<depth\>/d/g; s/\<modeAlpha\>/modeA/g; s/\<shader\>/program/g; s/\<obj\>/shaders/g; s/\<equation\>/eqn/g; s/\<param\>/data/g; s/\<params\>/data/g; s/\<buffers\>/buffer/g; s/\<src\>/mode/g; s/\<count\>/n/g; s/\<zNear\>/n/g; s/\<zFar\>/f/g; s/\<zfail\>/dpfail/g; s/\<zpass\>/dppass/g; s/\<buf\>/index/g; s/\<value\>/target/g; s/\<cap\>/target/g; s/\<maskNumber\>/index/g; s/\<srcRGB\>/sfactorRGB/g; s/\<dstRGB\>/dfactorRGB/g; s/\<srcAlpha\>/sfactorAlpha/g; s/\<dstAlpha\>/dfactorAlpha/g; s/\<primitiveMode\>/mode/g; s/\<primcount\>/instancecount/g; s/\<top\>/t/g; s/\<bottom\>/b/g; s/\<left\>/l/g; s/\<right\>/r/g; s/\<x\>/v0/g; s/\<y\>/v1/g; s/\<z\>/v2/g; s/\<w\>/v3/g; s/\<sfactor\>/mode/g; s/\<dfactor\>/dst/g; s/\<attribindex\>/bindingindex/g; s/\<internalFormat\>/internalformat/g; s/\<bufSize\>/bufsize/g; C glMultiDrawArraysEXT glMultiDrawElementsEXT glBindFragDataLocationEXT glGetTexParameterIivEXT glGetTexParameterIuivEXT glTexParameterIivEXT glTexParameterIuivEXT v2: - gl_dispatch_stub declarations are addressed with previous patch - the public_entries table is no longer generated Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-01-24 18:13:25 +00:00
Emil Velikov	389bc2bc6e	mapi/new: remove duplicate GLvoid/void substitution We already do it a few lines above - drop the duplicate. Note that for consistency sake, we keep the substitution since the GL API is a mixed bad - some use GLvoid while others a normal void. We might want to merge this back in GLVND. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-01-24 18:13:25 +00:00
Emil Velikov	5fa6c34949	mapi/new: fixup the GLDEBUGPROCKHR typedef to the non KHR one This way we can reuse the latter, which is already present in the headers that we use. Thus we can drop the manual typedef we generate. We might want to merge this back in GLVND. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-01-24 18:13:25 +00:00
Emil Velikov	babec55f7e	mapi/new: don't print info we don't need for ES1/ES2 There is no need for the noop functions, the public_stubs and public_entries table or table size defines. Remove those. Pretty much all of this is applicable to GLVND, although it requires preparatory work. v2: - python style fixes (Dylan) - use "gldispatch" instead of not "glesv1" "glesv2" - remove the public_entries table/array (Erik) v3: - use if == "gldispatch", instead of "in" (Kyle) Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> (v2)	2019-01-24 18:13:25 +00:00
Emil Velikov	5b1bdce156	mapi/new: split out public_entries handling The only instance that requires the public_entries table is the dispatch library - split that into another function. We have to be careful with when undefining the guard, so split it out. We might want to merge this back in GLVND. Minor GLVND cleanup will be needed first. Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2019-01-24 18:13:25 +00:00
Emil Velikov	313f977224	mapi/new: reinstate _NO_HIDDEN suffixes in the new generator Strictly speaking we can rework the rest of the code so we do not need those. That said, this will require a series on it's own so let's carry this local quirk for now. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-01-24 18:13:25 +00:00
Emil Velikov	451805f810	mapi/new: use the static_data offsets in the new generator Otherwise the incorrect ones will be used, effectively breaking the ABI. Note: some entries in static_data.py list a suffixed API, while (for ES* at least) we expect the one w/o suffix. v2: - rework path handling (Dylan) - use else if chain (Erik) Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-01-24 18:13:25 +00:00
Emil Velikov	bba375c016	mapi/new: sort by slot number Makes it easier to compare the newly generated header against the old one. Will be reverted after the transition.	2019-01-24 18:13:25 +00:00
Emil Velikov	06eb3fe371	mapi/new: import mapi scripts from glvnd Currently we have over 20 scripts that generate the libGL* dispatch and various other functionality. More importantly we're using local XML files instead of the Khronos provides one(s). Resulting in an increasing complexity of writing, maintaining and bugfixing. One fairly annoying bug is handling of statically exported symbols. Today, if we enable a GL extension for GLES1/2, we add a special tag to the xml. Thus the ES dispatch gets generated, but also since we have no separate notion of GL/ES1/ES2 static functions it also gets exported statically. This commit adds step one towards clearing and simplifying our setup. It imports the mapi generator from GLVND. 012fe39 ("Remove a couple of duplicate typedefs.") v2: use local genCommon.py Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-01-24 18:13:25 +00:00
Emil Velikov	cd0f11bac5	mapi: move genCommon.py to src/mapi/new The helper will also be used by the new Khronos gl.xml aware generator. v2: Move existing one, instead of duplicating it. v3: Correct genCommon.py references in meson [Erik] v4: Drop the file from the EGL EXTRA_DIST [Erik] Suggested-by: Kyle Brenneman <kbrenneman@nvidia.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2019-01-24 18:13:25 +00:00
Emil Velikov	a08a793180	genCommon.py: Fix typo in _LIBRARY_FEATURE_NAMES. Port glvnd commit 37fc6caa4b8 ("Fix typo in _LIBRARY_FEATURE_NAMES.") from Michal Srb. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-01-24 18:13:25 +00:00
Emil Velikov	cf317bf093	mapi: add all _glapi_table entrypoints to static_data.py Currently various parts of mesa use the glapi_table differently. Some use _glapi_get_proc_offset() to get the offset, while others directly reference the specific offset via _gloffset_Function. Add all static entries, to ensure things don't break as we flip to the upstream XML + new mapi generator. Note: the offsets are also used for the alias remap table, thus we need to ensure we honour the correct offsets range or it will break. Currently this is done via MAX_OFFSETS constant, although a better solution is in the works. v2: add FramebufferTexture2DMultisampleEXT v3: add MAX_OFFSETS guard Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> (v1) Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>	2019-01-24 18:13:25 +00:00
Emil Velikov	fe9f5c0e21	mapi: sort static entrypoints numerically A few of the entrypoints were incorrectly placed. Sort those to align with the rest of the list. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-01-24 18:13:25 +00:00
Emil Velikov	5a81e8d40e	Revert "mesa/main: remove ARB suffix from glGetnTexImage" This reverts commit `f1998e15ff`. This changes the ABI, such that glGetnTexImageARB entry-point from the GLAPI gets removed. Thus accessing many functions by offset (as we do) will result in getting the wrong one. Follow-up work will swap the by-offset handling, but for now revert this patch. Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-01-24 18:13:25 +00:00
Erik Faye-Lund	6148cce388	mapi: drop unneeded gl_dispatch_stub declarations These declarations are not used anywhere - be that generated code or otherwise. [Emil: format the hunk from Erik into a patch] Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2019-01-24 18:13:24 +00:00
Emil Velikov	ca152234e1	mesa: correctly use os.path.join in our python scripts With Windows in mind, using forward slash isn't the right thing to do. Even if it just works, we might want to fix it. As here, use __file__ instead of argv[0] and sys.path.insert over sys.path.append. With the path tweak being reportedly faster. Suggested-by: Dylan Baker <dylan@pnwbakers.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-01-24 18:13:24 +00:00
Emil Velikov	9cc8e12505	freedreno: automake: ship ir3_nir_trig.py in the tarball Fixes: `aa0fed10d3` ("freedreno: move ir3 to common location") Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2019-01-24 18:13:24 +00:00
Eric Engestrom	8ed966b506	egl/glvnd: sync egl.xml from Khronos Fixes: `98984b7cdd` "egl: add glvnd entrypoints for EGL_MESA_query_driver" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2019-01-24 16:55:21 +00:00
Eric Engestrom	d2ca270511	travis: bump libdrm to 2.4.97 Fixes: `c02f761bdf` "winsys/amdgpu: use the new BO list API" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2019-01-24 14:50:33 +00:00
Veluri Mithun	85edfc04b8	egl: Implementation of egl dri2 drivers for MESA_query_driver Signed-off-by: Veluri Mithun <velurimithun38@gmail.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-01-24 14:37:52 +00:00
Eric Engestrom	98984b7cdd	egl: add glvnd entrypoints for EGL_MESA_query_driver Fixes: fbdd7bde29863935106c "egl: Implement EGL API for MESA_query_driver" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-01-24 14:37:47 +00:00
Veluri Mithun	6afce78128	egl: Implement EGL API for MESA_query_driver Signed-off-by: Veluri Mithun <velurimithun38@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-01-24 14:37:47 +00:00
Eric Engestrom	7d9274388b	egl: update headers from Khronos Cheating a tiny bit as these headers aren't in the Khronos repo yet, but I expect them to be within a couple days. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-01-24 14:37:44 +00:00
Eric Engestrom	381d0e753a	egl: finalize EGL_MESA_query_driver Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-01-24 14:37:36 +00:00
Matt Turner	e166003cb7	intel/compiler: Reset default flag register in brw_find_live_channel() emit_uniformize() emits SHADER_OPCODE_FIND_LIVE_CHANNEL with its flag_subreg set, so that the IR knows which flag is accessed. However the flag is only used on Gen7 in Align1 mode. To avoid setting unnecessary bits in the instruction words, get the information we need and reset the default flag register. This allows round-tripping through the assembler/disassembler. Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-01-23 22:48:29 -08:00
Kenneth Graunke	74c9c906f9	gallium: Add forgotten docs for PIPE_CAP_GLSL_TESS_LEVELS_AS_INPUTS. Thanks to Ilia for catching this.	2019-01-23 17:16:22 -08:00
Mark Janes	022800a058	Revert "Implement EGL API for MESA_query_driver" This reverts commit `ff621a5055`. with default warnings configuration, this commit generates: ../src/egl/main/eglapi.c:2654:1: error: no previous prototype for ‘eglGetDisplayDriverConfig’ [-Werror=missing-prototypes]	2019-01-23 16:29:13 -08:00
Mark Janes	9e9fa13c81	Revert "Implementation of egl dri2 drivers for MESA_query_driver" This reverts commit `2720f78ef2`.	2019-01-23 16:28:47 -08:00
Veluri Mithun	2720f78ef2	Implementation of egl dri2 drivers for MESA_query_driver Signed-off-by: Veluri Mithun <velurimithun38@gmail.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-01-23 22:29:14 +00:00
Veluri Mithun	ff621a5055	Implement EGL API for MESA_query_driver Signed-off-by: Veluri Mithun <velurimithun38@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-01-23 22:29:14 +00:00
Veluri Mithun	499869908b	Add extension doc for MESA_query_driver Signed-off-by: Veluri Mithun <velurimithun38@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2019-01-23 22:29:14 +00:00
Sergii Romantsov	cfca5cd958	nir: Length of boolean vtn_value now is 1 During conversion type-length was lost due to math. v2 (Jason Ekstrand): - Use a size/offset of 4 bytes Fixes: `44227453ec` (nir: Switch to using 1-bit Booleans for almost everything) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109353 Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com> Tested-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-23 15:43:06 -06:00
Marek Olšák	42aea4f1a7	st/mesa: fix PRIMITIVES_GENERATED query after the "pipeline stat single" changes When this functionality was added, the PRIMITIVES_GENERATED query was accidentally omitted. This causes issues for drivers that support transform feedback." Fixes: `d644698b44` ("gallium: Add the ability to query a single pipeline statistics counter") Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-01-23 14:32:57 -05:00
Marek Olšák	c89e8470e5	st/mesa: purge framebuffers when unbinding a context This fixes pipe_surface "leaks". Cc: 18.3 <mesa-stable@lists.freedesktop.org> Acked-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-01-23 14:32:57 -05:00
Erik Faye-Lund	5c17c01815	docs: add note about sending merge-requests from forks Sending MRs from the main Mesa repository increase clutter in the repository, and decrease visibility of project-wide branches. So it's better if MRs are sent from forks instead. Let's add a note about this, in case its not obvious to everyone. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-01-23 18:14:06 +01:00
Rob Clark	5a4af871e3	freedreno: set modifier when exporting buffer Fixes an assert we start hitting with kms/gbm: #0 0x0000007fbf3d6e3c in raise () from /lib64/libc.so.6 #1 0x0000007fbf3c4a68 in abort () from /lib64/libc.so.6 #2 0x0000007fbf3d04e8 in __assert_fail_base () from /lib64/libc.so.6 #3 0x0000007fbf3d0550 in __assert_fail () from /lib64/libc.so.6 #4 0x0000007fbf5a73c4 in gbm_dri_bo_create (gbm=0x5820f0, width=2160, height=1440, format=875713112, usage=0, modifiers=0x695e00, count=1) at ../src/gbm/backends/dri/gbm_dri.c:1150 #5 0x0000007fbf5a49c4 in gbm_bo_create_with_modifiers (gbm=0x5820f0, width=2160, height=1440, format=875713112, modifiers=0x695e00, count=1) at ../src/gbm/main/gbm.c:491 #6 0x0000007fbbac3d64 in get_back_bo (dri2_surf=0x6f4cc0) at ../src/egl/drivers/dri2/platform_drm.c:258 #7 0x0000007fbbac4318 in dri2_drm_image_get_buffers (driDrawable=0x704490, format=4098, stamp=0x6fc730, loaderPrivate=0x6f4cc0, buffer_mask=1, buffers=0x7fffffe210) at ../src/egl/drivers/dri2/platform_drm.c:409 #8 0x0000007fbf5a5318 in image_get_buffers (driDrawable=0x704490, format=4098, stamp=0x6fc730, loaderPrivate=0x70e150, buffer_mask=1, buffers=0x7fffffe210) at ../src/gbm/backends/dri/gbm_dri.c:135 #9 0x0000007fbe4308c4 in dri_image_drawable_get_buffers (drawable=0x6fc730, images=0x7fffffe210, statts=0x6f2660, statts_count=1) at ../src/gallium/state_trackers/dri/dri2.c:339 #10 0x0000007fbe430c44 in dri2_allocate_textures (ctx=0x614b30, drawable=0x6fc730, statts=0x6f2660, statts_count=1) at ../src/gallium/state_trackers/dri/dri2.c:466 #11 0x0000007fbe435580 in dri_st_framebuffer_validate (stctx=0x714160, stfbi=0x6fc730, statts=0x6f2660, count=1, out=0x7fffffe3b8) at ../src/gallium/state_trackers/dri/dri_drawable.c:85 #12 0x0000007fbe7b2c84 in st_framebuffer_validate (stfb=0x6f2190, st=0x714160) at ../src/mesa/state_tracker/st_manager.c:222 #13 0x0000007fbe7b4884 in st_api_make_current (stapi=0x7fbf0430d8 <st_gl_api>, stctxi=0x714160, stdrawi=0x6fc730, streadi=0x6fc730) at ../src/mesa/state_tracker/st_manager.c:1074 #14 0x0000007fbe434f44 in dri_make_current (cPriv=0x703c20, driDrawPriv=0x704490, driReadPriv=0x704490) at ../src/gallium/state_trackers/dri/dri_context.c:301 #15 0x0000007fbe42c910 in driBindContext (pcp=0x703c20, pdp=0x704490, prp=0x704490) at ../src/mesa/drivers/dri/common/dri_util.c:579 #16 0x0000007fbbabab40 in dri2_make_current (drv=0x69d170, disp=0x69c6e0, dsurf=0x6f4cc0, rsurf=0x6f4cc0, ctx=0x70cb40) at ../src/egl/drivers/dri2/egl_dri2.c:1456 #17 0x0000007fbbaa8ef4 in eglMakeCurrent (dpy=0x69c6e0, draw=0x6f4cc0, read=0x6f4cc0, ctx=0x70cb40) at ../src/egl/main/eglapi.c:862 #18 0x0000007fbf5736ac in InternalMakeCurrentVendor (dpy=dpy@entry=0x614fb0, draw=draw@entry=0x6f4cc0, read=read@entry=0x6f4cc0, context=context@entry=0x70cb40, apiState=apiState@entry=0x6fc940, vendor=0x6975f0) at libegl.c:861 #19 0x0000007fbf573764 in InternalMakeCurrentDispatch (dpy=0x614fb0, draw=0x6f4cc0, read=0x6f4cc0, context=0x70cb40, vendor=0x6975f0) at libegl.c:630 #20 0x0000000000403640 in init_egl (egl=0x5805a8 <gl>, gbm=0x580528 <gbm>, samples=0) at ../common.c:263 #21 0x0000000000403c1c in init_cube_smooth (gbm=0x580528 <gbm>, samples=0) at ../cube-smooth.c:225 #22 0x0000000000408618 in main (argc=1, argv=0x7fffffe8d8) at ../kmscube.c:145 Fixes: `1ce5d757d0` freedreno: core buffer modifier support Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-01-23 10:21:00 -05:00
Samuel Pitoiset	963c044c55	radv: always pass the GFX9 fence data to si_cs_emit_cache_flush() Remove two useless checks. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-23 11:31:14 +01:00
Samuel Pitoiset	5f0b17d581	radv: compute the GFX9 fence VA at allocation time Instead of doing every time we emit cache flushes. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-23 11:31:12 +01:00
Samuel Pitoiset	e7ac792400	radv: only allocate the GFX9 fence and EOP BOs for the gfx queue It's invalid to emit a ZPASS_DONE event on the compute queue, and the fence BO is unused on the compute queue (ie. we don't flush CB or DB caches). This saves some space in the upload BO. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-23 11:31:09 +01:00
Samuel Pitoiset	bd098884f1	radv: remove old_fence parameter from si_cs_emit_write_event_eop() This parameter is actually useless as the immediate value can always be zero. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-23 11:31:07 +01:00
Samuel Pitoiset	698afa177e	radv: improve gathering of load_push_constants with dynamic bindings For example, if a pipeline has two stages VS and FS. And if only the fragment stage needs dynamic bindings, we shouldn't allocate an extra user SGPR for the vertex stage. Of course, if the vertex stage loads constants, it needs an user SGPR. This should reduce the number of SET_SH_REG packets that are emitted. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-23 09:43:53 +01:00
Caio Marcelo de Oliveira Filho	e0485a1dd7	gallium: Add PIPE_CAP_GLSL_TESS_LEVELS_AS_INPUTS In the Intel backend, it makes the most sense to treat gl_TessLevelInner and gl_TessLevelOuter as ordinary shader inputs. For Radeon, it makes more sense to treat them as system values which get special handling. We already have a compiler option for this, but the Iris driver will need a capability bit so we can set it appropriately. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-01-23 00:35:56 -08:00
Ilia Mirkin	8e26d534be	nv50,nvc0: mark textures dirty on fb update We may have to flush the cache if there are any textures presently bound that refer to the outgoing framebuffer. This is only checked at validation time. Fixes a number of dEQP-GLES3.functional.fbo.color.repeated_clear.sample.* tests, which would bind a texture, then clear it while the binding was in effect, and then render to a different texture. This seems legal under the "no feedback loops" rule. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2019-01-22 23:16:01 -05:00
Timothy Arceri	678ef2a4a5	ac/nir_to_llvm: fix interpolateAt* for structs This fixes the arb_gpu_shader5 interpolateAt* tests that contain structs. Acked-by: Marek Olšák <marek.olsak@amd.com>	2019-01-23 10:41:37 +11:00
Timothy Arceri	559e5b0408	ac/nir_to_llvm: add bindless support for uniform handles Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-01-23 10:41:37 +11:00
Timothy Arceri	f0ed59076f	radeonsi/nir: add missing piece for bindless image support This fixes some piglit tests and is was TGSI does. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-01-23 10:41:37 +11:00
Rob Clark	1ce5d757d0	freedreno: core buffer modifier support Split out of a patch from Fritz Koenig to decouple from a6xx UBWC enablement, and added fd_resource_create_with_modifiers().	2019-01-22 16:33:27 -05:00
Rob Clark	c56fe4118a	loader: fix the no-modifiers case Normally modifiers take precendence over use flags, as they are more explicit. But if the driver supports modifiers, but the xserver does not, then we should fallback to the old mechanism of allocating a buffer using 'use' flags. Fixes: `069fdd5f9f` Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2019-01-22 16:33:27 -05:00
Fritz Koenig	7c4b9510d1	freedreno: add query for dmabuf modifiers	2019-01-22 16:33:27 -05:00
Fritz Koenig	ddbe6171e6	freedreno: drm_fourcc.h header include Add Qualcomm modifier for UBWC	2019-01-22 16:33:27 -05:00
Brian Paul	956c219c8f	svga: add new gallium formats to the format conversion table Fixes a static assertion which broke the build. Fixes: `3ee240890` "gallium: add SINT formats to have exact counterparts to SNORM formats" Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Neha Bhende<bhenden@vmware.com>	2019-01-22 12:58:04 -07:00
Marek Olšák	d85917deaf	radeonsi: rename rfence -> sfence Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-22 13:34:03 -05:00
Marek Olšák	260ff57647	radeonsi: rename rbo, rbuffer to buf or buffer Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-22 13:34:01 -05:00
Marek Olšák	63b91f25bc	radeonsi: rename rsrc -> ssrc, rdst -> sdst Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-22 13:33:04 -05:00
Marek Olšák	4666f36c04	radeonsi: rename rquery -> squery Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-22 13:32:59 -05:00
Marek Olšák	501ff90a95	radeonsi: rename r600_resource -> si_resource Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-22 13:32:18 -05:00
Lionel Landwerlin	a75b12ce66	vulkan: make generated enum to strings helpers available from c++ Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-22 18:20:53 +00:00
Marek Olšák	1cfbed7587	radeonsi: remove r600 from comments Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-22 12:26:45 -05:00
Marek Olšák	e0a6399eb4	winsys/amdgpu: rename rfence, rsrc, rdst -> afence, asrc, adst Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-22 12:26:45 -05:00
Marek Olšák	2792ec2cdd	radeonsi: rename rview -> sview Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-22 12:26:45 -05:00
Marek Olšák	96610f625d	radeonsi: rename rscreen -> sscreen Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-22 12:25:57 -05:00
Marek Olšák	86e25ed5a3	radeonsi: disable render cond & pipeline stats for internal compute dispatches	2019-01-22 12:24:35 -05:00
Sonny Jiang	1b25d340b7	radeonsi: use compute for resource_copy_region when possible v2: marek: fix snorm8 blits Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-01-22 12:24:35 -05:00
Jiang, Sonny	8daf5bb209	radeonsi: add compute_last_block to configure the partial block fields	2019-01-22 12:22:46 -05:00
Marek Olšák	b443465fb9	gallium/util: add util_format_snorm8_to_sint8 (from radeonsi)	2019-01-22 12:21:43 -05:00
Marek Olšák	3ee240890c	gallium: add SINT formats to have exact counterparts to SNORM formats for radeonsi	2019-01-22 12:21:43 -05:00
Marek Olšák	4d5f8f39f3	radeonsi: move PKT3_WRITE_DATA generation into a helper function Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-22 12:14:26 -05:00
Marek Olšák	c252273f98	radeonsi: don't use WRITE_DATA.DST_SEL == MEM_GRBM on >= CIK Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-22 12:14:26 -05:00
Marek Olšák	a545415eb9	radeonsi: fix the top-of-pipe fence on SI SI doesn't have MEM. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-22 12:14:26 -05:00
Marek Olšák	e402961e1d	radeonsi: correct WRITE_DATA.DST_SEL definitions Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-22 12:14:26 -05:00
Marek Olšák	c605738113	radeonsi: compile clear and copy buffer compute shaders on demand same as all other shaders	2019-01-22 11:59:27 -05:00
Marek Olšák	f139589069	radeonsi: remove redundant call to emit_cache_flush in compute clear/copy launch_grid calls it.	2019-01-22 11:59:27 -05:00
Marek Olšák	e3d283eaca	radeonsi: use buffer_store_format_x & xy	2019-01-22 11:59:27 -05:00
Marek Olšák	4c4c8bb1f0	radeonsi: fix rendering to tiny viewports where the viewport center is > 8K This fixes an assertion failure with GL CTS when cts-runner is used. (not a specific test) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108877 Cc: 18.3 <mesa-stable@lists.freedesktop.org>	2019-01-22 11:59:27 -05:00
Marek Olšák	caa2dcd730	radeonsi: fix a u_blitter crash after a shader with FBFETCH This fixes an assertion failure with GL CTS when cts-runner is used. (not a specific test) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108877 Cc: 18.3 <mesa-stable@lists.freedesktop.org>	2019-01-22 11:59:27 -05:00
Marek Olšák	c02f761bdf	winsys/amdgpu: use the new BO list API	2019-01-22 11:59:27 -05:00
Jason Ekstrand	ac0f8a6ea0	anv: Implement transform feedback queries Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-22 10:42:57 -06:00
Jason Ekstrand	7f4d9bb7b8	genxml: Add SO_PRIM_STORAGE_NEEDED and SO_NUM_PRIMS_WRITTEN Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-22 10:42:57 -06:00
Jason Ekstrand	673f33c77d	anv: Implement CmdBegin/EndQueryIndexed Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-22 10:42:57 -06:00
Jason Ekstrand	2be89cbd82	anv: Implement vkCmdDrawIndirectByteCountEXT Annoyingly, this requires that we implement integer division on the command streamer. Fortunately, we're only ever dividing by constants so we can use the mulh+add+shift trick and it's not as bad as it sounds. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-22 10:42:56 -06:00
Jason Ekstrand	36ee2fd61c	anv: Implement the basic form of VK_EXT_transform_feedback Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-22 10:42:56 -06:00
Jason Ekstrand	39925d60ec	anv: Add pipeline cache support for xfb_info Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-22 10:42:56 -06:00
Jason Ekstrand	e3bd49eaa7	anv: Add but do not enable VK_EXT_transform_feedback Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-22 10:42:56 -06:00
Alejandro Piñeiro	6b50b0a4a8	nir/xfb: distinguish array of structs vs array of blocks Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-22 10:42:56 -06:00
Jason Ekstrand	ac704e777c	nir/xfb: Properly handle arrays of blocks Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-22 10:42:56 -06:00
Alejandro Piñeiro	5649a0a6e8	nir/xfb: don't assert when xfb_buffer/stride is present but not xfb_offset In order to allow nir_gather_xfb_info to be used on OpenGL, specifically ARB_gl_spirv. So, from OpenGL 4.6 spec, section 11.1.2.1, "Output Variables": "outputs specifying both an XfbBuffer and an Offset are captured, while outputs not specifying both of these are not captured. Values are captured each time the shader writes to such a decorated object." This implies that are captured if both are present, and not if one of those are lacking. Technically, it doesn't explicitly point that having just one or the other is a mistake. In some cases, glslang is adding some extra XfbBuffer without XfbOffset around, and mentioning that technically that is not a bug (see issue#1526) And for the case of Vulkan, as the same glslang issue mentions, it is not clear if that should be a mistake or not. But even if it is a mistake, it is not really needed to be checked on the driver, and we can let the validation layers to check that. v2: simplify explicit_xfb_buffer and explicit_offset checks (Jason). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-22 10:42:56 -06:00
Jason Ekstrand	4f99ac9144	nir/xfb: Fix offset accounting for dvec3/4 Before, we were double-counting the component slots when we had a dvec3 or dvec4. Instead, just add them in once and manually offset the recorded output offset. Fixes: `19064b8c` "nir: Add a pass for gathering transform feedback info" Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-22 10:42:56 -06:00
Jason Ekstrand	96fa23bca5	nir: Preserve offsets in lower_io_to_scalar_early Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-22 10:42:56 -06:00
Samuel Pitoiset	b2bbd978d0	nir: fix lowering arrays to elements for XFB outputs If we have a transform feedback output like: float[2] x2_out (VARYING_SLOT_VAR1.x, 0, 0) which is lowered by nir_lower_io_arrays_to_elements to, float x2_out (VARYING_SLOT_VAR1.x, 0, 0) float x2_out@5 (VARYING_SLOT_VAR2.x, 0, 0) We have to update the destination offset to avoid overwriting the same value. v2 (Jason Ekstrand): - Compute the correct offsets for arrays of vectors and/or doubles Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-22 10:42:56 -06:00
Samuel Pitoiset	9f4e0aa7c1	nir: do not remove varyings used for transform feedback When a xfb buffer is explicitely declared on a varying variable, we shouldn't remove it at link time. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-22 10:42:56 -06:00
Jason Ekstrand	9c14440e81	spirv: Only set interface_type on blocks Instead of setting interface_type to whatever the per-vertex type is, we only set it on blocks. This allows later passes to tell the difference between variables that are in blocks and those that aren't. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-22 10:42:56 -06:00
Jason Ekstrand	da29594636	spirv: Only split blocks Instead of splitting every per-vertex struct, just split the ones that are actually blocks. The reason for the split is so that we have separate variables for separate locations, qualifiers, and builtin decorations. The vulkan spec only allows these on members of blocks. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-22 10:42:56 -06:00
Jason Ekstrand	662cfb121b	spirv: Initialize struct member offsets to -1 This is the "no offset specified" value. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-22 10:42:56 -06:00
Jason Ekstrand	b4eae8444e	anv: Always emit at least one vertex element This seems to make the simulator happier. The early return wasn't really protecting anything and the code that follows will happily initialize the dummy element to STORE_0 and emit it. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-22 10:42:56 -06:00
Eric Engestrom	610f956fde	configure: EGL requirements only apply if EGL is built Issue was hit with this configuration: --disable-{egl,gbm} --with-platform=drm Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Fixes: `3208fd2e46` ("configure: move platform handling further up") Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-01-22 16:12:40 +00:00
Jonathan Marek	fc4f6b2f12	freedreno: a2xx: add partial lower_scalar pass for ir2 Some instructions can only be scalar on a2xx, lower these only Signed-off-by: Jonathan Marek <jonathan@marek.ca>	2019-01-22 14:45:03 +00:00
Jonathan Marek	9f614c74b7	freedreno: a2xx: add ir2 copy propagation Two cases: * replacing srcs which refer to MOV instructions * replacing MOVs used to write to exports Signed-off-by: Jonathan Marek <jonathan@marek.ca>	2019-01-22 14:45:03 +00:00
Jonathan Marek	c7dbf0b280	freedreno: a2xx: insert scalar MOV to allow 2 source scalar If we want to use a scalar instruction with two sources, both sources have to be in the same register. This covers a common case by inserting a scalar MOV into a previous instruction with only a vector alu instruction. A better method would be to have the sources end up in the same register in the first place, but when one source is a constant this is the only way. Signed-off-by: Jonathan Marek <jonathan@marek.ca>	2019-01-22 14:45:03 +00:00
Jonathan Marek	67610a0323	freedreno: a2xx: NIR backend This patch replaces the a2xx TGSI compiler with a NIR compiler. It also adds several new features: -gl_FrontFacing, gl_FragCoord, gl_PointCoord, gl_PointSize -control flow (including loops) -texture related features (LOD/bias, cubemaps) -filling scalar ALU slot when possible Signed-off-by: Jonathan Marek <jonathan@marek.ca>	2019-01-22 14:45:03 +00:00
Tapani Pälli	da3ca69afa	nir: cleanup glsl_get_struct_field_offset, glsl_get_explicit_stride Take away const qualifier from return type of these functions as -Wignored-qualifiers points out it is ignored for these cases. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-22 13:09:15 +02:00
Eric Engestrom	41a0c00392	travis: fix autotools build after --enable-autotools switch addition Fixes: `e68777c87c` "autotools: Deprecate the use of autotools" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-22 10:29:19 +00:00
Jason Ekstrand	27af1cc2a6	spirv: Update the JSON and headers from Khronos master This corresponds to commit 79b6681aadcb53c27d1052e on GitHub. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-21 18:55:05 -06:00
Jason Ekstrand	ca8c6c9781	nir: Mark deref UBO and SSBO access as non-scalar Fixes: `63b9aa2e25` "spirv: Add support for using derefs for..." Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-21 18:41:47 -06:00
Karol Herbst	5ee0adfb6e	nir/spirv: handle ContractionOff execution mode Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-21 20:36:41 +01:00
Rob Clark	fa737042ad	nir/vtn: add caps for some cl related capabilities vtn supports these, so don't squalk if user is happy with enabling these. v2: add new members sorted Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-21 20:36:41 +01:00
Karol Herbst	ce08e5f39c	vtn: handle SpvExecutionModelKernel Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-21 20:36:41 +01:00
Karol Herbst	8bb46de08b	mesa: add MESA_SHADER_KERNEL used for CL kernels Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-21 20:36:41 +01:00
Jason Ekstrand	2aa78e46e9	anv/pipeline: Add a pdevice helper variable Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-21 11:57:00 -06:00
Jason Ekstrand	344171b9ee	relnotes: Add newly added Vulkan extensions Both the Intel and RADV people have been really bad about adding things to the release notes. We should start actually paying attention. Acked-by: Tapani Pälli <tapani.palli@intel.com>	2019-01-21 11:46:06 -06:00
Jason Ekstrand	c7f4a2867c	anv: Only parse pImmutableSamplers if the descriptor has samplers Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-21 11:45:58 -06:00
Rhys Perry	f0ba826054	radv: prevent dirtying of dynamic state when it does not change DXVK often sets dynamic state without actually changing it. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-21 14:37:53 +00:00
Rhys Perry	e4c6423c5e	radv: avoid context rolls when binding graphics pipelines It's common in some applications to bind a new graphics pipeline without ending up changing any context registers. This has a pipline have two command buffers: one for setting context registers and one for everything else. The context register command buffer is only emitted if it differs from the previous pipeline's. v2: ensure late scissor emission is done when radv_emit_rbplus_state() is called v2: make use of cmd_buffer->state.workaround_scissor_bug v3: rename "workaround_scissor_bug" to "context_roll_without_scissor_emitted" Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-21 14:37:53 +00:00
Rhys Perry	5564a797f2	radv: add missed situations for scissor bug workaround v2: rename "workaround_scissor_bug" to "context_roll_without_scissor_emitted" Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-21 14:37:53 +00:00
Rhys Perry	5d1a29071a	radv: pass radv_draw_info to radv_emit_draw_registers() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-21 14:37:53 +00:00
Jonathan Marek	5886c5d092	freedreno: a2xx: sysmem rendering Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-01-21 09:22:34 -05:00
Jonathan Marek	bec6e4b054	freedreno: a2xx: fix non-zero texture base offsets Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-01-21 09:22:27 -05:00
Jonathan Marek	02ab85afd8	freedreno: a2xx: fix VERTEX_REUSE/DEALLOC on a20x On a20x, set VGT_VERTEX_REUSE_BLOCK_CNTL to 2 and don't change it. Small rearrangement on a220 to reduce the size of draw commands. Only set DEALLOC_CNTL on a20x because the correct a220 value is not known. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-01-21 09:22:22 -05:00
Jonathan Marek	0286a11b7e	freedreno: a2xx: fix gmem2mem viewport Fixes cases where previous viewport values might case gmem2mem to fail. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-01-21 09:22:16 -05:00
Jonathan Marek	64b12520a2	freedreno: a2xx: cleanup REG_A2XX_PA_CL_VTE_CNTL Doesn't change much, but reduces the size of fd2_emit_state gmem2mem does not need to change the value: no Z clipping on resolve mem2gmem now needs to restore the common value after rendering Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-01-21 09:22:10 -05:00
Jonathan Marek	6ef7700ac6	freedreno: a2xx: cleanup init_shader_const Only 3 vertices are used so we can drop the data for vertex 4 It doesn't make sense to have 1.1 for some coordinates, use 1.0 instead Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-01-21 09:21:51 -05:00
Karol Herbst	0a793c78a3	nir: add bit_size parameter to system values with multiple allowed bit sizes v2: add assert to verify we have at least one valid bit_size v3: fix use of load_front_face in nir_lower_two_sided_color and tgsi_to_nir Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-21 00:17:18 +01:00
Karol Herbst	4125211e9c	nir: add legal bit_sizes to intrinsics With OpenCL some system values match the address bits, but in GLSL we also have some system values being 64 bit like subgroup masks. With this it is possible to adjust the builder functions so that depending on the bit_sizes the correct bit_size is used or an additional argument is added in case of multiple possible values. v2: validate dest bit_size v3: generate hex values in python code remove useless imports rename and move bit_sizes v4: add 1 to legal bit_sizes for front_face Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-21 00:16:51 +01:00
Karol Herbst	27bd07e230	nir/validate: allow to check against a bitmask of bit_sizes Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-21 00:16:51 +01:00
Karol Herbst	b9fec2b38c	nir: replace more nir_load_system_value calls with builder functions Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-21 00:16:51 +01:00
Karol Herbst	987744be98	glsl/lower_output_reads: set invariant and precise flags on temporaries fixes a couple of deqp tests (on nvc0 and potential other drivers): dEQP-GLES3.functional.shaders.invariance.highp.common_subexpression_1 dEQP-GLES3.functional.shaders.invariance.highp.common_subexpression_2 dEQP-GLES3.functional.shaders.invariance.highp.common_subexpression_3 dEQP-GLES3.functional.shaders.invariance.mediump.common_subexpression_1 dEQP-GLES3.functional.shaders.invariance.mediump.common_subexpression_2 dEQP-GLES3.functional.shaders.invariance.mediump.common_subexpression_3 dEQP-GLES3.functional.shaders.invariance.lowp.common_subexpression_1 dEQP-GLES3.functional.shaders.invariance.lowp.common_subexpression_2 dEQP-GLES3.functional.shaders.invariance.lowp.common_subexpression_3 CC: <mesa-stable@lists.freedesktop.org> Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-01-21 00:16:50 +01:00
Rhys Kidd	8002eaab6c	nv50,nvc0: add missing CAPs for unsupported features Signed-off-by: Rhys Kidd <rhyskidd@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-01-20 13:51:01 -05:00
Karol Herbst	acdad24585	nir/spirv: handle SpvStorageClassCrossWorkgroup v2: rename nir_var_global to nir_var_mem_global Signed-off-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-19 20:01:42 +01:00
Karol Herbst	36a76b7192	nir: rename nir_var_shared to nir_var_mem_shared Signed-off-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-19 20:01:41 +01:00
Karol Herbst	6fefd69724	nir: rename nir_var_ssbo to nir_var_mem_ssbo Signed-off-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-19 20:01:41 +01:00
Karol Herbst	3afc1e068f	nir: rename nir_var_ubo to nir_var_mem_ubo Signed-off-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-19 20:01:41 +01:00
Karol Herbst	9b24028426	nir: rename nir_var_function to nir_var_function_temp Signed-off-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-19 20:01:41 +01:00
Karol Herbst	e5daef9587	nir: rename nir_var_private to nir_var_shader_temp Signed-off-by: Karol Herbst <kherbst@redhat.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-19 20:01:41 +01:00
Lionel Landwerlin	ad99c1670a	intel/genxml: add missing MI_PREDICATE compare operations Doesn't save us a great deal of lines but at least they get decoded in aubinators. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-01-19 15:47:36 +00:00
Lionel Landwerlin	79514cc5fb	anv: document cache flushes & invalidations A little bit of explanation regarding how vkCmdPipelineBarrier() works. v2: Avoid referring to data port cache when it's actually sampler caches (Jason) Complete explanation for indirect draws (Jason) v3: s/samplers/sampler/ (Jason) s/UBOs/data port/ Add documentation for VK_ACCESS_CONDITIONAL_RENDERING_READ_BIT_EXT (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)	2019-01-19 15:45:41 +00:00
Lionel Landwerlin	3c4c18341a	anv: narrow flushing of the render target to buffer writes In commit `9a7b319903` ("anv/query: flush render target before copying results") we tracked all the render target writes to apply a flushes in the vkCopyQueryResults(). But we can narrow this down to only when we write a buffer (which is the only input of vkCopyQueryResults). v2: Drop newer render target write flags introduce by `1952fd8d2c` ("anv: Implement VK_EXT_conditional_rendering for gen 7.5+") Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)	2019-01-19 15:45:41 +00:00
Timothy Arceri	6ca652faf3	glsl: be much more aggressive when skipping shader compilation Currently we only add a cache key for a shader once it is linked. However games like Team Fortress 2 compile a whole bunch of shaders which are never actually linked. These compiled shaders can take up a bunch of memory. This patch changes things so that we add the key for the shader to the cache as soon as it is compiled. This means on a warm cache we can avoid the wasted memory from these shaders. Worst case scenario is we need to compile the shaders at link time but this can happen anyway if the shader has been evicted from the cache. Reduces memory use in Team Fortress 2 from 1.3GB -> 770MB on a warm cache from start up to the game menu. V2: only add key to cache when compilation is successful. Acked-by: Marek Olšák <marek.olsak@amd.com>	2019-01-19 13:12:25 +11:00
Francisco Jerez	c84ec70b3a	intel/fs: Promote execution type to 32-bit when any half-float conversion is needed. The docs are fairly incomplete and inconsistent about it, but this seems to be the reason why half-float destinations are required to be DWORD-aligned on BDW+ projects. This way the regioning lowering pass will make sure that the destination components of W to HF and HF to W conversions are aligned like the corresponding conversion operation with 32-bit execution data type. Tested-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-18 16:09:39 -08:00
Timothy Arceri	9e669ed22b	ac/nir_to_llvm: fix interpolateAt* for arrays This builds on the recent interpolate fix by Rhys `ee8488ea3b`. This fixes the arb_gpu_shader5 interpolateAt* tests that contain arrays. Fixes: `ee8488ea3b` ("ac/nir,radv,radeonsi/nir: use correct indices for interpolation intrinsics") Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-19 10:59:38 +11:00
Timothy Arceri	860a9e4849	Revert "glsl: be much more aggressive when skipping shader compilation" This reverts commit `64b8c86d37`. Reverting for now as it was causing some segfaults.	2019-01-19 10:45:07 +11:00
Kristian H. Kristensen	5486c9d526	freedreno/a6xx: Turn on texture tiling by default The color swap isn't available for tiled formats and it's not needed either. We pick one channel order and use for all non-linear formats. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-01-18 14:27:15 -08:00
Kristian H. Kristensen	60c6778dda	freedreno: Synchronize batch and flush for staging resource Staging blit downloads would wait on the src resource instead of the staging resource and didn't make sure to submit the blit batch first. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-01-18 14:27:12 -08:00
Timothy Arceri	64b8c86d37	glsl: be much more aggressive when skipping shader compilation Currently we only add a cache key for a shader once it is linked. However games like Team Fortress 2 compile a whole bunch of shaders which are never actually linked. These compiled shaders can take up a bunch of memory. This patch changes things so that we add the key for the shader to the cache as soon as it is compiled. This means on a warm cache we can avoid the wasted memory from these shaders. Worst case scenario is we need to compile the shaders at link time but this can happen anyway if the shader has been evicted from the cache. Reduces memory use in Team Fortress 2 from 1.3GB -> 770MB on a warm cache from start up to the game menu. Acked-by: Marek Olšák <marek.olsak@amd.com>	2019-01-19 08:24:47 +11:00
Timothy Arceri	c9d7b0f184	glsl: don't skip GLSL IR opts on first-time compiles This basically reverts `c2bc0aa7b1`. By running the opts we reduce memory using in Team Fortress 2 from 1.5GB -> 1.3GB from start-up to game menu. This will likely increase Deus Ex start up times as per commit `c2bc0aa7b1`. However currently 32bit games like Team Fortress 2 can run out of memory on low memory systems, so that seems more important. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-01-19 08:24:43 +11:00
Caio Marcelo de Oliveira Filho	cd56d79b59	nir: check NIR_SKIP to skip passes by name Passes' function names, separated by comma, listed in NIR_SKIP environment variable will be skipped in debug mode. The mechanism is hooked into the _PASS macro, like NIR_PRINT. The extra macro NIR_SKIP is available as a developer convenience, to skip at pointer other than the passes entry points. v2: Fix typo in NIR_SKIP macro. (Bas) Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-18 12:31:49 -08:00
Danylo Piliaiev	1952fd8d2c	anv: Implement VK_EXT_conditional_rendering for gen 7.5+ Conditional rendering affects next functions: - vkCmdDraw, vkCmdDrawIndexed, vkCmdDrawIndirect, vkCmdDrawIndexedIndirect - vkCmdDrawIndirectCountKHR, vkCmdDrawIndexedIndirectCountKHR - vkCmdDispatch, vkCmdDispatchIndirect, vkCmdDispatchBase - vkCmdClearAttachments Value from conditional buffer is cached into designated register, MI_PREDICATE is emitted every time conditional rendering is enabled and command requires it. v2: by Jason Ekstrand - Use vk_find_struct_const instead of manually looping - Move draw count loading to prepare function - Zero the top 32-bits of MI_ALU_REG15 v3: Apply pipeline flush before accessing conditional buffer (The issue was found by Samuel Iglesias) v4: - Remove support of Haswell due to possible hardware bug - Made TMP_REG_PREDICATE and TMP_REG_DRAW_COUNT defines to define registers in one place. v5: thanks to Jason Ekstrand and Lionel Landwerlin - Workaround the fact that MI_PREDICATE_RESULT is not accessible on Haswell by manually calculating MI_PREDICATE_RESULT and re-emitting MI_PREDICATE when necessary. v6: suggested by Lionel Landwerlin - Instead of calculating the result of predicate once - re-emit MI_PREDICATE to make it easier to investigate error states. v7: suggested by Jason - Make anv_pipe_invalidate_bits_for_access_flag add CS_STALL if VK_ACCESS_CONDITIONAL_RENDERING_READ_BIT is set. v8: suggested by Lionel - Precompute conditional predicate's result to support secondary command buffers. - Make prepare_for_draw_count_predicate more readable. Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-18 18:31:44 +00:00
Danylo Piliaiev	ed6e2bf263	anv: Implement VK_KHR_draw_indirect_count for gen 7+ v2: by Jason Ekstrand - Move out of the draw loop population of registers which aren't changed in it. - Remove dependency on ALU registers. - Clarify usage of PIPE_CONTROL - Without usage of ALU registers patch works for gen7+ v3: set pending_pipe_bits \|= ANV_PIPE_RENDER_TARGET_WRITES Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-18 18:31:44 +00:00
Dylan Baker	9e989b860a	bin/meson-cmd-extract: Also handle cross and native files Native file support in command line serialization isn't present in meson 0.49, but will be for 0.49.1 and 0.50 Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-01-18 09:37:01 -08:00
Jason Ekstrand	b54df1b6df	anv: Re-sort the extensions list I like to keep things in good order so that you can find them. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-18 10:32:23 -06:00
Jason Ekstrand	eb32dad07c	intel/fs: Don't touch accumulator destination while applying regioning alignment rule In some shaders, you can end up with a stride in the source of a SHADER_OPCODE_MULH. One way this can happen is if the MULH is acting on the top bits of a 64-bit value due to 64-bit integer lowering. In this case, the compiler will produce something like this: mul(8) acc0<1>UD g5<8,4,2>UD 0x0004UW { align1 1Q }; mach(8) g6<1>UD g5<8,4,2>UD 0x00000004UD { align1 1Q AccWrEnable }; The new region fixup pass looks at the MUL and sees a strided source and unstrided destination and determines that the sequence is illegal. It then attempts to fix the illegal stride by replacing the destination of the MUL with a temporary and emitting a MOV into the accumulator: mul(8) g9<2>UD g5<8,4,2>UD 0x0004UW { align1 1Q }; mov(8) acc0<1>UD g9<8,4,2>UD { align1 1Q }; mach(8) g6<1>UD g5<8,4,2>UD 0x00000004UD { align1 1Q AccWrEnable }; Unfortunately, this new sequence isn't correct because MOV accesses the accumulator with a different precision to MUL and, instead of filling the bottom 32 bits with the source and zeroing the top 32 bits, it leaves the top 32 (or maybe 31) bits alone and full of garbage. When the MACH comes along and tries to complete the multiplication, the result is correct in the bottom 32 bits (which we throw away) and garbage in the top 32 bits which are actually returned by MACH. This commit does two things: First, it adds an assert to ensure that we don't try to rewrite accumulator destinations of MUL instructions so we can avoid this precision issue. Second, it modifies required_dst_byte_stride to require a tightly packed stride so that we fix up the sources instead and the actual code which gets emitted is this: mov(8) g9<1>UD g5<8,4,2>UD { align1 1Q }; mul(8) acc0<1>UD g9<8,8,1>UD 0x0004UW { align1 1Q }; mach(8) g6<1>UD g5<8,4,2>UD 0x00000004UD { align1 1Q AccWrEnable }; Fixes: `efa4e4bc5f` "intel/fs: Introduce regioning lowering pass" Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2019-01-18 10:18:52 -06:00
Jason Ekstrand	0a7ac6d543	intel/eu: Stop overriding exec sizes in send_indirect_message For a long time, we based exec sizes on destination register widths. We've not been doing that since `1ca3a94427` but a few remnants accidentally remained. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2019-01-18 10:18:52 -06:00
Samuel Pitoiset	f682ed11c3	radv: initialize the per-queue descriptor BO only once Totally useless to write the descriptors inside the loop. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-18 13:26:32 +01:00
Samuel Pitoiset	72d9745a40	radv: do not write unused descriptors to the per-queue BO Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-18 13:26:30 +01:00
Samuel Pitoiset	8c164ea8f5	radv: reduce size of the per-queue descriptor BO Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-18 13:26:28 +01:00
Samuel Pitoiset	83cc87ead4	radv: drop unused code related to 16 sample locations The driver only supports up to 8 sample locations. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-18 13:26:24 +01:00
Karol Herbst	80dae7022e	gm107/ir: disable TEXS for tex with derivAll set fixes deqp tests: dEQP-GLES3.functional.shaders.texture_functions.texturegrad.samplercube_fixed_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.samplercube_float_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.isamplercube_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.usamplercube_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.sampler3d_fixed_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.sampler3d_float_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.isampler3d_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.usampler3d_vertex dEQP-GLES3.functional.shaders.texture_functions.texturegrad.sampler2dshadow_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.sampler3d_fixed_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.sampler3d_float_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.isampler3d_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.usampler3d_vertex dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.sampler2dshadow_vertex Fixes: `f821e80213` "gm107/ir: use scalar tex instructions where possible" Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-01-18 03:27:51 +01:00
Karol Herbst	30b5c9eda2	nv50/ir: disable tryCollapseChainedMULs in ConstantFolding for precise instructions fixes dEQP-GLES2.functional.shaders.invariance.mediump.loop_3 CC: <mesa-stable@lists.freedesktop.org> Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-01-18 02:03:30 +01:00
Bas Nieuwenhuizen	8424cd8fbd	nir: Account for atomics in copy propagation. Otherwise writes get propagated across atomics if no barrier is used. Without barrier writes should still be visible in the same invocation, so an atomic has to be considered a write. CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Fixes: `b3c6146925` "nir: Copy propagation between blocks" Fixes: `62332d139c` "nir: Add a local variable-based copy propagation pass"	2019-01-18 00:55:35 +01:00
Rafael Antognolli	927ba12b53	anv/tests: Adding test for the state_pool padding. Add a test that checks that we can use the extra space allocated for padding while allocating larger anv_states. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-17 15:08:26 -08:00
Rafael Antognolli	731c4adcf9	anv/allocator: Add support for non-userptr. If softpin is supported, create new BOs for the required size and add the respective BO maps. The other main change of this commit is that anv_block_pool_map() now returns the map for the BO that the given offset is part of. So there's no block_pool->map access anymore (when softpin is used. v3: - set fd to -1 on softpin case (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-17 15:08:24 -08:00
Rafael Antognolli	643248b66a	anv: Remove state flush. We have all the state buffers snooped, so we don't need to clflush everything anymore. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-17 15:08:22 -08:00
Rafael Antognolli	5d61c74f3d	anv/allocator: Enable snooping on block pool and anv_bo_pool BOs. We are not going to use userptr for anv block pool BOs anymore. However, so far we have been relying on the fact that userptr BOs are snooped on non-llc platforms. Let's make sure that the block pool BOs are still snooped, and we can also remove the clflush'ing that we do on all state buffers. And since we plan to remove the flushes, set the anv_bo_pool BOs to cached (snooped on non-LLC platforms) too. For LLC platforms, they are all cached by default, so this becomes a no-op. v5: - Add snooping to anv_bo_pool BOs too (Jason). - Remove anv_gem_set_domain. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-17 15:08:20 -08:00
Rafael Antognolli	dfc9ab2ccd	anv/allocator: Add padding information. It's possible that we still have some space left in the block pool, but we try to allocate a state larger than that state. This means such state would start somewhere within the range of the old block_pool, and end after that range, within the range of the new size. That's fine when we use userptr, since the memory in the block pool is CPU mapped continuously. However, by the end of this series, we will have the block_pool split into different BOs, with different CPU mapping ranges that are not necessarily continuous. So we must avoid such case of a given state being part of two different BOs in the block pool. This commit solves the issue by detecting that we are growing the block_pool even though we are not at the end of the range. If that happens, we don't use the space left at the end of the old size, and consider it as "padding" that can't be used in the allocation. We update the size requested from the block pool to take the padding into account, and return the offset after the padding, which happens to be at the start of the new address range. Additionally, we return the amount of padding we used, so the caller knows that this happens and can return that padding back into a list of free states, that can be reused later. This way we hopefully don't waste any space, but also avoid having a state split between two different BOs. v3: - Calculate offset + padding at anv_block_pool_alloc_new (Jason). v4: - Remove extra "leftover". Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-17 15:08:19 -08:00
Rafael Antognolli	7ed0898a8d	anv/allocator: Rework chunk return to the state pool. This commit tries to rework the code that split and returns chunks back to the state pool, while still keeping the same logic. The original code would get a chunk larger than we need and split it into pool->block_size. Then it would return all but the first one, and would split that first one into alloc_size chunks. Then it would keep the first one (for the allocation), and return the others back to the pool. The new anv_state_pool_return_chunk() function will take a chunk (with the alloc_size part removed), and a small_size hint. It then splits that chunk into pool->block_size'd chunks, and if there's some space still left, split that into small_size chunks. small_size in this case is the same size as alloc_size. The idea is to keep the same logic, but make it in a way we can reuse it to return other chunks to the pool when we are growing the buffer. v2: - Include Jason's suggestions to the algorithm that returns chunks. - Update comments. v3: - Disallow returning 0 blocks (Jason). - fix min_size in the loop (Jason). - remove temporary variables (Jason) v4: - return_chunk() should never return blocks larger than pool->block_size. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-17 15:08:17 -08:00
Rafael Antognolli	6a1f4c96cc	anv: Remove some asserts. They won't be true anymore once we add support for multiple BOs with non-userptr. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-17 15:08:14 -08:00
Rafael Antognolli	f39dad7e4e	anv: Validate the list of BOs from the block pool. We now have multiple BOs in the block pool, but sometimes we still reference only the first one in some instructions, and use relative offsets in others. So we must be sure to add all the BOs from the block pool to the validation list when submitting commands. v2: - Don't add block pool BOs to the dependency list right before execbuf (Jason) - Call anv_execbuf_add_bo() to each BO in the block pools (Jason) - Use anv_execbuf_add_bo_set() to add surface state dependencies to execbuf. v3: - Add comment to the non-softpin case (Jason). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-17 15:08:10 -08:00
Rafael Antognolli	11a5d4620b	anv: Split code to add BO dependencies to execbuf. This part of the anv_execbuf_add_bo() code is totally independent of the BO being added. Let's split it out, so we can reuse it later. v3: rename to anv_execbuf_add_bo_set (Jason). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-17 15:08:08 -08:00
Rafael Antognolli	f874604f45	anv/allocator: Add support for a list of BOs in block pool. So far we use only one BO (the last one created) in the block pool. When we switch to not use the userptr API, we will need multiple BOs. So add code now to store multiple BOs in the block pool. This has several implications, the main one being that we can't use pool->map as before. For that reason we update the getter to find which BO a given offset is part of, and return the respective map. v3: - Simplify anv_block_pool_map (Jason). - Use fixed size array for anv_bo's (Jason) v4: - Respect the order (item, container) in anv_block_pool_foreach_bo (Jason). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-17 15:08:04 -08:00
Rafael Antognolli	e3dc56d731	anv: Update usage of block_pool->bo. Change block_pool->bo to be a pointer, and update its usage everywhere. This makes it simpler to switch it later to a list of BOs. v3: - Use a static "bos" field in the struct, instead of malloc'ing it. This will be later changed to a fixed length array of BOs. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-17 15:08:02 -08:00
Rafael Antognolli	fc3f588320	anv/allocator: Remove pool->map. After switching to using anv_state_table, there are very few places left still using pool->map directly. We want to avoid that because it won't be always the right map once we split it into multiple BOs. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-17 15:08:00 -08:00
Rafael Antognolli	54e21e145e	anv/allocator: Rename anv_free_list2 to anv_free_list. Now that we removed the original anv_free_list, we can now use its name. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-17 15:07:58 -08:00
Rafael Antognolli	234c9d8a40	anv/allocator: Remove anv_free_list. The next commit already renames anv_free_list2 -> anv_free_list since the old one is gone. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-17 15:07:56 -08:00
Rafael Antognolli	e2179aceaf	anv/allocator: Use anv_state_table on back_alloc too. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-17 15:07:52 -08:00
Rafael Antognolli	d18267fb48	anv/allocator: Use anv_state_table on anv_state_pool_alloc. Use anv_state_pool_return_blocks() to return blocks to the pool, instead of manually pushing them. v3: - return blocks from the end of the chunk (Jason). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-17 15:07:50 -08:00
Rafael Antognolli	6a1dcfe73d	anv/allocator: Add helper to push states back to the state table. The use of anv_state_table_add() combined with anv_state_table_push(), specially when adding a bunch of states to the table, is very verbose. So we add this helper that makes things easier to digest. We also already add the anv_state_table member in this commit, so things can compile properly, even though it's not used. v2: assert that the states are always aligned to their size (Jason) v3: Add "table" member to anv_state_pool in this commit. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-17 15:07:47 -08:00
Rafael Antognolli	e8b6e0a5ba	anv/allocator: Add getter for anv_block_pool. We will need the anv_block_pool_map to find the map relative to some BO that is not at the start of the block pool. v2: just return a pointer instead of a struct (Jason) v4: Update comment (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-17 15:07:43 -08:00
Rafael Antognolli	6a2d5ae305	anv/allocator: Add anv_state_table. Add a structure to hold anv_states. This table will initially be used to recycle anv_states, instead of relying on a linked list implemented in GPU memory. Later it could be used so that all anv_states just point to the content of this struct, instead of making copies of anv_states everywhere. One has to call anv_state_table_add(), which returns an index for the state in the table, and then get a pointer to such index, and finally fill in the rest of the struct. TODO: 1) There's a lot of common code between this table backing store memory and the anv_block_pool buffer, due to how we grow it. I think it's possible to refactory this and reuse code on both places. 2) Add unit tests. v3: - Rename state table memfd (Jason) - Return VK_ERROR_OUT_OF_HOST_MEMORY on more places (Jason) - anv_state_table_grow returns VkResult (Jason) - Rename variables to be more informative (Jason) - Return errors on state table grow. - Rename anv_state_table_push/pop to anv_free_list_push2/pop2 This will be renamed again to remove the trailing "2" later. v4: - Remove exit(-1) from anv_state_table (Jason). - Use uint32_t "next" field in anv_free_entry (Jason). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-17 15:07:34 -08:00
Rafael Antognolli	27478ce00e	anv/tests: Fix block_pool_no_free test. There were 2 problems with this test. First it was comparing highest, which was -1, with an uint32_t. So the current value would never be higher than that, and the assert would always be false. It just never reached this point because of the next problem. It was always looking for the highest value of each thread and storing it in thread_max. So a test case like this wouldn't work: [Thread]: [Blocks] [0]: [0, 32, 64, 96] [1]: [128, 160, 192, 224] [2]: [256, 288, 320, 352] Not only that would skip values and iterate only over thread number 2, instead of walking through all of them, but thread_max was also initialized to -1. And then compared to unsigned blocks[i][next[i]. We fix that by getting the smallest value of each thread, and checking if it is lower than thread_min, which is initialized to INT32_MAX. And then we end up walking through all the blocks of all threads. We also change "blocks" to be int32_t instead of uint32_t, since in some places (alloc_blocks) it was already referenced as int32_t, and that fixes the comparison to -1. v2: - keep highest initialized to -1, and change blocks to be int32_t. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-17 15:05:58 -08:00
Lionel Landwerlin	4149d41f2e	anv: fix invalid binding table index computation The ++ operator strikes again. Fixes: `f92c5bc8f3` ("anv/device: fix maximum number of images supported") Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-17 11:49:10 -08:00
Eric Engestrom	c4c5c90255	docs: explain how to see what meson options exist Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-01-17 17:05:41 +00:00
Emil Velikov	406623f5b1	docs: update calendar, add news item and link release notes for 18.3.2 Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2019-01-17 11:37:41 +00:00
Emil Velikov	9d58641bf2	docs: add sha256 checksums for 18.3.2 Signed-off-by: Emil Velikov <emil.velikov@collabora.com> (cherry picked from commit `8320a07221`)	2019-01-17 11:32:20 +00:00
Emil Velikov	2dad014496	docs: add release notes for 18.3.2 Signed-off-by: Emil Velikov <emil.velikov@collabora.com> (cherry picked from commit `95a3b709c0`)	2019-01-17 11:32:19 +00:00
Iago Toral Quiroga	f92c5bc8f3	anv/device: fix maximum number of images supported We had defined MAX_IMAGES as 8, which we used to size the array for image push constant data. The comment there stated that this was for gen8, but anv_nir_apply_pipeline_layout runs for all gens and writes that array, asserting that we don't exceed that number of images, which imposes a limit of MAX_IMAGES on all gens. Furthermore, despite this, we are exposing up to 64 images per shader stage on all gens, gen8 included. This patch lowers the number of images we expose in gen8 to 8 and keeps 64 images for gen9+ while making sure that only pre-SKL gens use push constant space to handle images. v2: - <= instead of < in the assert (Eric, Lionel) - Change the way the assertion is written (Eric) v3: - Revert the way the assertion is written to the form it had in v1, the version in v2 was not equivalent and was incorrect. (Lionel) v4: - gen9+ doesn't need push constants for images at all (Jason) Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v3)	2019-01-17 07:59:00 +01:00
Tapani Pälli	a311aa631d	anv: do not advertise AHW support if extension not enabled Fixes following failing vk-gl-cts cases on Linux desktop: dEQP-VK.api.external.memory.android_hardware_buffer.suballocated.buffer.info dEQP-VK.api.external.memory.android_hardware_buffer.suballocated.image.info dEQP-VK.api.external.memory.android_hardware_buffer.dedicated.image.info dEQP-VK.api.external.memory.android_hardware_buffer.dedicated.buffer.info Fixes: `517103abf1` "anv/android: add ahardwarebuffer external memory properties" Reported-by: Juan A. Suarez <jasuarez@igalia.com> Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>	2019-01-17 07:22:02 +02:00
Eric Anholt	99ef66c325	vc4: Don't leak the GPU fd for renderonly usage. Noticed while debugging V3D -- the ro->gpu_fd was freshly opened in ro setup, and it needs to stay open until screen close (since it may be used by renderonly) and should be the same one used by the vc4 screen. Fixes: `7029ec05e2` ("gallium: Add renderonly-based support for pl111+vc4.")	2019-01-16 16:28:41 -08:00
Eric Anholt	0605726776	v3d: Don't leak the GPU fd for renderonly usage. The CTS was running out of fds, because of the ro->gpu_fd never being closed. ro->gpu_fd should match the screen (in case the caller of v3d_drm_screen_create_renderonly() has a scanout_for_resource() that uses gpu_fd) and the screen is expected to close its fd at the end, fixing the resource leak. Fixes: `e113b21cb7` ("v3d: Add renderonly support.")	2019-01-16 16:28:41 -08:00
Eric Anholt	59527a36e9	v3d: Restructure RO allocations using resource_from_handle. I had bugs in the old path where I was laying out as tiled (so we'd render tiled) but then only allocating space in the shared object for linear rendering. The resource_from_handle makes it so the same layout choices are made in both the import and export scanout cases. Also, fixes a leak of the fd that was tripping up the CTS. Now that we're checking PIPE_BIND_SHARED to choose to use RO, the DRM_FORMAT_MOD_LINEAR check wasn't needed any more. Fixes visual corruption and MMU faults in X in renderonly mode. Fixes: `bd09bb1629` ("v3d: SHARED but not necessarily SCANOUT buffers on RO must be linear.")	2019-01-16 16:28:41 -08:00
Eric Anholt	d70eb2302b	v3d: If the modifier is not known on BO import, default to linear for RO. Part of fixing DRI3 rendering with RO on X11. Fixes: `e113b21cb7` ("v3d: Add renderonly support.")	2019-01-16 16:28:41 -08:00
Timothy Arceri	cb527d2c4c	ac/nir_to_llvm: add support for structs to get_sampler_desc() Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-01-17 10:35:36 +11:00
Timothy Arceri	b12316cc92	ac/nir_to_llvm: fix regression in bindless support This wasn't ported over when deref support was implemented. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-01-17 10:35:36 +11:00
Timothy Arceri	e106e0f2dd	radeonsi/nir: get correct type for images inside structs Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-01-17 10:35:36 +11:00
Timothy Arceri	292887ac0d	ac/nir_to_llvm: fix type handling in image code The current code only strips off arrays and cannot find the type for images that are struct members. Instead of trying to get the image type from the variable, we just get it directly from the deref instruction. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-01-17 10:35:36 +11:00
Rhys Perry	8a52e4cc4f	radv: use dithered alpha-to-coverage This matches the behaviour of AMDVLK and hides banding. It is also seems to be allowed by the Vulkan spec. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-16 20:49:23 +00:00
Alok Hota	187a6506a3	swr/rast: Store cached files in multiple subdirs This improves cache filesystem performance, especially during CI tests Also updated jitcache magic number due to codegen parameter changes Removed 2 `if constexpr` to prevent C++17 requirement	2019-01-16 13:53:30 -06:00
Alok Hota	bb98be61f4	swr/rast: New execution engine per JIT Fixes relocation errors with LLVM 7.0.0	2019-01-16 13:53:30 -06:00
Alok Hota	b135db5d58	swr/rast: Scope MEM_CLIENT enum for mem usages Avoids confusion with other defaulted integer parameters - fixed some unspecified usages - removed unnecessary includes - removed unecessary protected access specifier in buckets framework	2019-01-16 13:53:30 -06:00
Alok Hota	c722ad7379	swr/rast: Unaligned and translations in gathers - added graphics address translation in odd gathers - added support for unaligned gathers in fetch shader - changed how 2+ GB offsets are handled to make them compatible with unaligned offsets	2019-01-16 13:53:30 -06:00
Alok Hota	9459863dfa	swr/rast: partial support for Tiled Resources - updated sample from TRTT surfaces correctly - implemented mapped status return for TRTT surfaces - implemented per-sample instruction minLod clamp - updated bilinear filter weight calculation to be closer to D3D specs - implemented "ReducedTexcoordRange" operation from D3D specs to avoid loss of precision on high-value normalized coordinates	2019-01-16 13:53:30 -06:00
Alok Hota	9cacf9d877	swr/rast: Add annotator to interleave isa text To make debugging simpler	2019-01-16 13:53:30 -06:00
Alok Hota	c9fa2ee343	swr/rast: Use gfxptr_t value in JitGatherVertices Use gfxptr_t type value for stream pointer uses in gather and similar calls	2019-01-16 13:53:30 -06:00
Gert Wollny	e68777c87c	autotools: Deprecate the use of autotools Since Meson will eventually be the only build system deprecate autotools now. It can still be used by invoking configure with the flag --enable-autotools NAKed-by: Ilia Mirkin <imirkin@alum.mit.edu> Acked-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Rob Clark <robdclark@gmail.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>	2019-01-16 09:52:42 -08:00
Dylan Baker	431e9abaab	meson: allow building dri driver without window system if osmesa is classic This was already enabled for gallium based osmesa with gallium drivers in `9d10581897`, so do the same for classic driver with classic osmesa. Fixes: `cbbd5bb889` ("meson: build classic osmesa") Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-01-16 17:49:51 +00:00
Bruce Cherniak	ed7673afd2	gallium/swr: Fix multi-context sync fence deadlock. Various recreation scenarios lead to API thread getting stuck in swr_fence_finish(). This is a multi-context issue, whereby one context overwrites the fence read-value with a previous sync's lesser value. The fence sync value is supposed to be always increasing. In swr_fence_cb(), only update the "read" value if the new value is greater. (This may seem like we're not waiting on the other context to finish, but had we needed for it to finish there would have been a wait prior to submitting a new sync.) cc: mesa-stable@lists.freedesktop.org	2019-01-16 09:26:36 -06:00
Samuel Pitoiset	d5d7b5e950	ac/nir: don't trash L1 caches for store operations with writeonly memory Ported from RadeonSI. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-16 13:57:22 +01:00
Kenneth Graunke	5b51d754d0	st/mesa: Optionally override RGB/RGBX dst alpha blend factors Intel's blending hardware does not properly return 1.0 for destination alpha for RGBX formats; it requires the factors to be overridden to either zero or one. Broadcom vc4 and v3d also could use this override. While overriding these factors is safe in general, Nouveau and Radeon would prefer not to. Their blending hardware already returns correct values for RGB/RGBX formats, and would like to avoid the resulting per-buffer blending and independent blend factors (rgb != a) since it can cause additional overhead. I considered simply handling this in the driver, but it's not as nice. pipe_blend_state doesn't have any format information, so we'd need the hardware blend state to depend on both pipe_blend_state and pipe_framebuffer_state. Furthermore, Intel GPUs don't have a native RGBX_SNORM format, so I avoid exposing one, which makes Gallium fall back to RGBA_SNORM. The pipe_surfaces we get in the driver have an RGBA format, making it impossible to tell that there shouldn't be an alpha channel. One could argue that st not handling it in that case is a bug. To work around this, we'd have to expose RGBX pipe formats, mapped to RGBA hardware formats, and add format swizzling special cases. All doable, but it ends up being more code than I'd like. st_atom_blend already has access to the right information and it's trivial to accomplish there, so we just add a cap bit and do that. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-15 20:53:44 -08:00
Marek Olšák	11735d6c9c	winsys/amdgpu: fix whitespace	2019-01-15 19:10:16 -05:00
Pierre Moreau	0b736f7fd4	meson: Fix with_gallium_icd to with_opencl_icd `with_gallium_icd` is never used throughout the different Meson build files, whereas `with_opencl_icd` tracks whether or not `gallium-opencl` was set to "icd". Fixes: `42ea0631f1` ("meson: build clover") Signed-off-by: Pierre Moreau <pierre.morrow@free.fr> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-01-15 13:06:50 -08:00
Kenneth Graunke	d644698b44	gallium: Add the ability to query a single pipeline statistics counter Gallium historically has treated pipeline statistics queries as a single query, PIPE_QUERY_PIPELINE_STATISTICS, which returns a block of 11 values. This was originally patterned after the D3D1x API. Much later, Brian introduced an OpenGL extension that exposed these counters - but it exposes 11 separate queries, each of which returns a single value. Today, st/mesa simply queries all 11 values, and returns a single value. While pipeline statistics counters aren't typically performance critical, this is still not a great fit. A D3D1x->GL translator might request all 11 counters by creating 11 separate GL queries...which Gallium would map to reads of all 11 values each time, resulting in a total 121 counter reads. That's not ideal. This patch adds a new cap, PIPE_CAP_QUERY_PIPELINE_STATISTICS_SINGLE, and corresponding query type PIPE_QUERY_PIPELINE_STATISTICS_SINGLE. When calling create_query(), q->index should be set to one of the PIPE_STAT_QUERY_* enums to select a counter. Unlike the block query, this returns the value in pipe_query_result::u64 (as it's a single value) instead of the pipe_query_data_pipeline_statistics group. We update st/mesa to expose ARB_pipeline_statistics_query if either capability is set, preferring the new SINGLE variant when available. Thanks to Roland, Ilia, and Marek for helping me sort this out. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-01-15 11:43:04 -08:00
Kenneth Graunke	f967273fb4	st/mesa: Rearrange PIPE_QUERY_PIPELINE_STATISTICS result fetching. This just changes the order of the switch statements, so we only look at target if the query type is PIPE_QUERY_PIPELINE_STATISTICS. The next commit will introduce a new SINGLE query type which can be used for the same GL query types, and it won't want this processing. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-01-15 11:43:04 -08:00
Kenneth Graunke	e760be08b4	st/mesa: Make an enum for pipeline statistics query result indices. Gallium handles pipeline statistics queries as a single query (PIPE_QUERY_PIPELINE_STATISTICS) which returns a struct with 11 values. Sometimes it's useful to refer to each of those values individually, rather than as a group. To avoid hardcoding numbers, we define a new enum for each value. Here, the name and enum value correspond to the index in the struct pipe_query_data_pipeline_statistics result. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-01-15 11:43:04 -08:00
Dylan Baker	4a131a1330	meson: Add a script to extract the cmd line used for meson Upstream I'm persuing a more comprehensive solution, but this should prove a suitable stop-gap measure in the meantime. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109325 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Acked-by: Eric Engestrom <eric@engestrom.ch> Acked-by: Tapani Pälli <tapani.palli@intel.com>	2019-01-15 17:38:47 +00:00
Samuel Pitoiset	7bef192018	radv: add support for VK_EXT_memory_budget A simple Vulkan extension that allows apps to query size and usage of all exposed memory heaps. The different usage values are not really accurate because they are per drm-fd, but they should be close enough. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-15 11:18:37 +01:00
Samuel Pitoiset	9784400a6b	radv: add two small helpers for getting VRAM and visible VRAM sizes Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-15 11:18:35 +01:00
Samuel Pitoiset	a6e5ce5130	radv: remove unnecessary returns in GetPhysicalDevice*Properties() These functions return nothing. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-15 11:18:17 +01:00
Bas Nieuwenhuizen	568e7a2998	radv: Set partial_vs_wave for pipelines with just GS, not tess. Looking at -pro we need to enable it for pipelines with just a GS too. This seems to reduce the hangs from https://bugs.freedesktop.org/show_bug.cgi?id=109242 on a RX 550 to the point where I can't reproduce, after the false start with the wd_switch_on_eop patch due to flakiness. (but people are reporting it does not fix the issue completely for them on polaris 11) CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-01-15 10:22:30 +01:00
Marek Olšák	5183e794af	radeonsi: also apply the GS hang workaround to draws without tessellation ported from AMDVLK. Cc: 18.3 <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-14 18:55:58 -05:00
Eric Anholt	bd09bb1629	v3d: SHARED but not necessarily SCANOUT buffers on RO must be linear. We don't have a way to talk to RO about modifiers it can do yet, so assume the minimum.	2019-01-14 15:40:55 -08:00
Eric Anholt	f72820c851	v3d: Add support for CS barrier() intrinsics.	2019-01-14 15:40:55 -08:00
Eric Anholt	9b45b06d7c	v3d: Add support for CS shared variable load/store/atomics. CS shared variables are handled effectively as SSBO access to a temporary buffer that will be allocated at CS dispatch time.	2019-01-14 15:40:55 -08:00
Eric Anholt	01d913cf90	v3d: Add support for CS workgroup/invocation id intrinsics. We get a payload for the ivec3 workgroup and an int local invocation index, and we use the core lowering to turn into the global invocation id and the local invocation id ivec3s.	2019-01-14 15:40:55 -08:00
Eric Anholt	6281f26f06	v3d: Add support for shader_image_load_store. This is only exposed on V3D 4.1+, because we didn't have the TMU write operations for images on 3.3 (To do GLES 3.1 there, you have to lower it to SSBO load/stores, which is a problem to solve later).	2019-01-14 15:40:55 -08:00
Eric Anholt	5932c2f0b9	v3d: Add SSBO/atomic counters support. So far I assume that all the buffers get written. If they weren't, you'd probably be using UBOs instead.	2019-01-14 15:40:55 -08:00
Eric Anholt	6c8edcb89c	v3d: Drop the GLSL version level. This was an arbitrary "we support lots of stuff" value when I started the driver. However, at 400 we expose OES_gpu_shader5, which claims support for dynamically indexing samplers, which the driver doesn't do yet.	2019-01-14 13:18:02 -08:00
Eric Anholt	1a63227ea0	v3d: Add support for matrix inputs to the FS. We've been relying on linking splitting up our varying matrices into separate vectors, but with SSO that doesn't happen. Supporting matrix inputs isn't too hard, though.	2019-01-14 13:18:02 -08:00
Eric Anholt	49b7e26fac	v3d: Add an isr to the simulator to catch GMP violations. Otherwise, the simulator raises the GMP interrupt and waits for it to be handled, and v3d ends up spinning in v3d_hw_tick(). Aborting right when violation happens gives us a chance to look at the backtrace of whatever thread triggered the violation.	2019-01-14 13:18:02 -08:00
Eric Anholt	3790ee07e6	v3d: Fix txf_ms 2D_ARRAY array index. We need to pass the array index through our coordinate transform unchanged. Fixes dEQP-GLES31.functional.texture.multisample.samples_1.*_2d_array	2019-01-14 13:18:02 -08:00
Eric Anholt	619a28b845	v3d: Add support for GL_ARB_framebuffer_no_attachments. Fixes dEQP-GLES31.functional.state_query.integer.max_framebuffer_height_getboolean when GLES3 is enabled.	2019-01-14 13:18:02 -08:00
Eric Anholt	051a41d3d5	v3d: Add support for the early_fragment_tests flag. If this flag hasn't been set by the shader and it has some visible side effects, then we need to disable EZ.	2019-01-14 13:18:02 -08:00
Eric Anholt	b417a9f7b2	v3d: Add support for flushing dirty TMU data at job end. This will be needed for SSBOs and image_load_store.	2019-01-14 13:18:02 -08:00
Samuel Pitoiset	ad6ceb2872	ac: add missing 16-bit types to glsl_base_to_llvm_type() Fix crashes with dEQP-VK.spirv_assembly.instruction.compute.workgroup_memory.*16 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-14 21:18:23 +01:00
Bas Nieuwenhuizen	76b12fa564	radv: Only use 32 KiB per threadgroup on Stoney. Causes hangs on some machines. What works for dEQP-VK.tessellation.shader_input_output.barrier: - running num_patches = 6 (which limits LDS to 32 KiB) - running num_patches = 8, and artificially cutting LDS size at 32 KiB. CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-01-14 19:58:27 +00:00
Marek Olšák	76df5e8f52	st/dri: fix dri2_format_table for argb1555 and rgb565 The bug caused that rgb565 framebuffers used argb1555. Fixes: `433ca3127a` Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-01-14 14:54:19 -05:00
Jason Ekstrand	2d2737dcfe	nir: Add a bool to float32 lowering pass From @jekstrand's nir-1-bit-bool branch, with improved ior/inot lowering. ior: fmax instead of fadd allows removing the fsat. inot: seq(x, 0) can be better than fsub(1, x). On a2xx, it works better with the scalar instruction set. Reviewed-by: Jonathan Marek <jonathan@marek.ca>	2019-01-14 19:27:06 +00:00
Caio Marcelo de Oliveira Filho	09c3ff01df	src/intel: use new hash table and set creation helpers Replace calls to create hash tables and sets that use _mesa_hash_pointer/_mesa_key_pointer_equal with the helpers _mesa_pointer_hash_table_create() and _mesa_pointer_set_create(). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Eric Engestrom <eric@engestrom.ch>	2019-01-14 10:49:33 -08:00
Caio Marcelo de Oliveira Filho	9fdded0cc3	src/compiler: use new hash table and set creation helpers Replace calls to create hash tables and sets that use _mesa_hash_pointer/_mesa_key_pointer_equal with the helpers _mesa_pointer_hash_table_create() and _mesa_pointer_set_create(). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Eric Engestrom <eric@engestrom.ch>	2019-01-14 10:49:28 -08:00
Caio Marcelo de Oliveira Filho	ee23e8b17c	util: Helper to create sets and hashes with pointer keys These combinations are common enough and deserve a shortcut. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Eric Engestrom <eric@engestrom.ch>	2019-01-14 10:49:21 -08:00
Samuel Pitoiset	929df7afaf	ac/nir: set cache policy when loading/storing buffer images This was missing. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-14 17:59:51 +01:00
Samuel Pitoiset	af2a85df74	ac/nir: add get_cache_policy() helper and use it Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-14 17:59:49 +01:00
Jason Ekstrand	5e4f9ea363	anv: Implement VK_KHR_depth_stencil_resolve	2019-01-14 10:16:52 -06:00
Jason Ekstrand	9f44088468	anv: Move resolve_subpass to genX_cmd_buffer.c We may have to do transitions around certain kinds of resolves so it helps to have it genX code.	2019-01-14 10:16:52 -06:00
Jason Ekstrand	930b17161f	anv/blorp: Refactor MSAA resolves into an exportable helper function This function is modeled after the aux_op functions except that it has a lot more parameters because it deals with two images as well as source and destination regions.	2019-01-14 10:16:52 -06:00
Jason Ekstrand	c92c449361	anv: Rename has_resolve to has_color_resolve	2019-01-14 10:16:52 -06:00
Jason Ekstrand	4bd976e3b8	intel/blorp: Add two more filter modes	2019-01-14 10:16:52 -06:00
Andres Gomez	3ec9ab80b8	bin/get-pick-list.sh: fix redirection in sh "&>" is bash specific. Fixes: `e0dbfc9953` ("bin/get-pick-list.sh: warn when commit lists invalid sha") Cc: Juan A. Suarez <jasuarez@igalia.com> Cc: Eric Engestrom <eric.engestrom@intel.com> Cc: Dylan Baker <dylan@pnwbakers.com> Cc: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>	2019-01-14 17:40:15 +02:00
Andres Gomez	716ed41a36	bin/get-pick-list.sh: fix the oneline printing "--summary" will also print extended header information such as creations, renames and mode changes. Let's just use "--no-patch", which suppresses the diff output. v2: Use "--no-patch" instead of the "-s" abbreviation (Eric). Fixes: `559c32d241` ("bin/get-pick-list.sh: simplify git oneline printing") Cc: Juan A. Suarez <jasuarez@igalia.com> Cc: Eric Engestrom <eric.engestrom@intel.com> Cc: Dylan Baker <dylan@pnwbakers.com> Cc: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>	2019-01-14 17:36:56 +02:00
Michel Dänzer	1a20b56798	amd/common: Restore v4i32 suffix for llvm.SI.load.const intrinsic It was accidentally dropped in commit `e4803ab7d2` "amd/common: use llvm.amdgcn.s.buffer.load for LLVM 8.0", breaking the universe with LLVM 7. Trivial.	2019-01-14 12:52:52 +01:00
Nicolai Hähnle	7fbd48fdc0	amd/common/vi+: enable SMEM loads with GLC=1 Only on LLVM 8.0+, which supports the new intrinsic. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-14 08:30:15 +01:00
Nicolai Hähnle	e4803ab7d2	amd/common: use llvm.amdgcn.s.buffer.load for LLVM 8.0 llvm.SI.load.const is deprecated. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-14 08:30:12 +01:00
Iago Toral Quiroga	1c1ae6376c	anv/pipeline_cache: free NIR shader cache Fixes: `f6aa9f7185` 'anv/pipeline_cache: Add support for caching NIR' Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-14 07:59:27 +01:00
Danylo Piliaiev	0862929bf6	glsl: Fix copying function's out to temp if dereferenced by array Function's out variable could be an array dereferenced by an array: func(v[w[i]]); or something more complicated. Copy index in any case. Fixes: `76c27e47b9` ("glsl: Copy function out to temp if we don't directly ref a variable") Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-01-14 12:04:07 +11:00
Kenneth Graunke	04c2f12ab2	i965: Drop mark_surface_used mechanism. The original idea was that the backend compiler could eliminate surfaces, so we would have it mark which ones are actually used, then shrink the binding table accordingly. Unfortunately, it's a pretty blunt mechanism - it can only prune things from the end, not the middle - since we decide the layout before we even start the backend compiler, and only limit the size. It also basically gives up if it sees indirect array access. Besides, we do the vast majority of our surface elimination in NIR anyway, not the backend - and I don't see that trend changing any time soon. Vulkan abandoned this plan a long time ago, and I don't use it in Iris, but it's still been kicking around in i965. I hacked shader-db to print the binding table size in bytes, and observed no changes with this patch. So, this code appears to do nothing useful. Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-13 09:35:32 -08:00
Eric Engestrom	bdf6a5c1d2	egl: fix python lib deprecation warning DeprecationWarning: the imp module is deprecated in favour of importlib Instead of complicated logic, just import the file directly. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2019-01-13 13:59:08 +00:00
Jason Ekstrand	b938d5fbef	spirv: Emit switch conditions on-the-fly Instead of emitting all of the conditions for the cases of a switch statement up-front, emit them on-the-fly as we emit the code for each case. The original justification for this was that we were going to have to build a default case anyway which would need them all. However, we can just trust CSE to clean up the mess in that case. Emitting each condition right before the if statement that uses it reduces register pressure and, in one customer benchmark, reduces spilling and improves performance by about 2x. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-12 17:55:49 -06:00
Jason Ekstrand	821b6861ec	nir/gcm: Support deref instructions Even though no one's been brave enough to ever use this pass, I like to keep it functionally working. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-12 17:55:49 -06:00
Jason Ekstrand	24c8108ea6	intel/nir: Call nir_opt_deref in brw_nir_optimize It's an optimization so we should probably be calling it in the optimization loop. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-12 17:55:49 -06:00
Jason Ekstrand	e57e26121a	spirv: Contain the GLSLang issue #179 workaround to old GLSLang Instead of applying the workaround universally, detect semi-old GLSLang via the generator ID and only enable the workaround on old GLSLang. This isn't nearly as precise as one would like it to be because the first GLSLang generator id version bump was on October 7, 2017 which is about 1.5 years after the bug was fixed. However, it at least lets us disable it for non-GLSLang and for more modern versions. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-12 17:55:49 -06:00
Jason Ekstrand	b57c1ec421	spirv: Whack sampler/image pointers to uniform A long time in a galaxy far far away, there was a GLSLang bug with how it handled samplers passed in as function parameters. (The bug can be found here: https://github.com/KhronosGroup/glslang/issues/179.) Unfortunately, that version was shipped in several apps and has been causing heartburn for our SPIR-V parser ever since. Recent changes to NIR uncovered a moderately old bug in how we work around this issue. In particular, we ended up with a deref_cast from uniform to local which is not a no-op cast so nir_opt_deref wasn't getting rid of the cast. The only reason why it worked before was because someone just happened to call nir_fixup_deref_modes which "fixed" the cast (that shouldn't be happening) and then a later round of copy-prop would get rid of it. The fact that the deref_cast survived that long without causing trouble for other parts of NIR is a bit surprising. Just whacking the mode of the pointer seems to fix it fairly unobtrusively. Currently, only apps with this bug will have a local variable containing an image or sampler. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109304 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-12 17:55:49 -06:00
Kenneth Graunke	2b876bc922	st/nir: Lower TES gl_PatchVerticesIn to a constant if linked with a TCS. If the TCS and TES are linked together, we can simply replace the TES's gl_PatchVerticesIn system value with a constant, possibly allowing extra optimization or letting the driver avoid uploading a special value. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-01-11 13:07:54 -08:00
Jonathan Marek	3d182601bb	glsl/nir: keep bool types when native_integers=false With the new handling of bool types, the conversion to float in glsl_to_nir should not apply to bool types anymore. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-11 19:16:11 +00:00
Jonathan Marek	b27ad17115	glsl/nir: ftrunc for native_integers=false float to int cast out_type in the default cast case is always GLSL_TYPE_FLOAT, so we get a mov otherwise. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-11 19:16:11 +00:00
Jonathan Marek	d3b47e073e	glsl/nir: int constants as float for native_integers=false All alu instructions emitted with native_integers=false expect float (or bool in some cases) constants, so this change is necessary. This will cause changes with some intrinsics which had integer sources, such as nir_intrinsic_load_uniform. Apparently it might cause issues with some opt passes, but perhaps those don't apply in OpenGL ES 2.0 cases? Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-11 19:16:11 +00:00
Jason Ekstrand	1ede463b6e	intel/peephole_ffma: Fix swizzle propagation The num_components value passed into get_mul_for_src is used to only compose the parts of the swizzle that we know will be used so we don't compose invalid swizzle components. However, we had a bug where we passed the number of components of the add all the way through. For the given source, we need the number of components read from that source. In the case where we have a narrow add, say 2 components, that is sourced from a chain of wider instructions, we may not compose all the swizzles. All we really need to do is pass through the right number of components at each level. Fixes: `2231cf0ba3` "nir: Fix output swizzle in get_mul_for_src" Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-01-11 10:44:08 -06:00
Kenneth Graunke	ae683ed3bc	nir: Allow a non-existent sampler deref in nir_lower_samplers_as_deref GL_ARB_gl_spirv does not provide a sampler deref for e.g. texelFetch(), so we can't assume that both are present and identical. Simply lower each if it is present. Fixes regressions in GL_ARB_gl_spirv tests since I switched everyone to using this pass. Thanks to Alejandro Piñeiro for catching these. Fixes: `f003859f97` nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_deref Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-11 07:54:32 -08:00
Eric Engestrom	e12b0b5c6d	travis: avoid using unset llvm-config Fixes the following errors: usage: which [-as] program ... /Users/travis/.travis/job_stages: line 110: --version: command not found ... caused by the use of an undefined $LLVM_CONFIG Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2019-01-11 14:38:35 +00:00
Eric Engestrom	c8ae891035	egl: remove unused include Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2019-01-11 14:37:47 +00:00
Eric Engestrom	d75fbff667	egl: add missing includes Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2019-01-11 14:37:47 +00:00
Iago Toral Quiroga	4b1e436bc9	anv/pipeline_cache: fix incorrect guards for NIR cache Fixes: `f6aa9f7185` 'anv/pipeline_cache: Add support for caching NIR' Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-11 12:45:18 +01:00
Kenneth Graunke	ad9832d17b	blorp: Pass the batch to lookup/upload_shader instead of context This will allow drivers to pin shader buffers if necessary. i965 and anv do not need to do this today, but iris will. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-10 20:52:04 -08:00
Kenneth Graunke	084a1cdbb7	blorp: Add blorp_get_surface_address to the driver interface. Currently, BLORP expects drivers to provide two functions for dealing with buffers: blorp_emit_reloc and blorp_surface_reloc. Both record a relocation and combine the BO address and offset into a full 64-bit address. Traditionally, blorp_surface_reloc has written that combined address to an implicitly-known buffer where surface states are stored. (In contrast, blorp_emit_reloc returns the value.) The upcoming Iris driver stores surface states in multiple buffers, which makes it impossible for blorp_surface_reloc to write the combined address - it only takes an offset, not the actual buffer to write to. This commit adds a third function, blorp_get_surface_address, which combines and returns an address, which is then passed to ISL's surface state fill functions. Softpin-only drivers can return a real address here and skip writing it in blorp_surface_reloc. Relocation-based drivers are have options. They can simply return 0 from the new function, and continue writing the address from blorp_surface_reloc. Or, they can return a presumed address from blorp_get_surface_address, and have other relocation processing write the real value later. For now, i965 and anv simply return 0. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-10 20:51:53 -08:00
Ilia Mirkin	2165636e9c	docs: fix gallium screen cap docs Make sure that the next line starts with spaces so that bullets are maintained throughout, add `` around a few more special tokens, and fix SAMPLE_COUNT_TEXTURE -> SAMPLE_COUNT. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Dave Airlie <airlied@redhat.com>	2019-01-10 21:44:09 -05:00
Danylo Piliaiev	a2db6b4254	glsl: Make invariant outputs in ES fragment shader not to cause error In all GLSL ES versions output variables in fragment shader are allowed to be invariant. From Section 4.6.1 ("The Invariant Qualifier") GLSL ES 1.00 spec: "Only the following variables may be declared as invariant: ... - Built-in special variables output from the fragment shader." From Section 4.6.1 ("The Invariant Qualifier") GLSL ES 3.00 spec: "Only variables output from a shader can be candidates for invariance." Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107842	2019-01-11 13:01:11 +11:00
Jason Ekstrand	eb4b1477dc	anv/pipeline: Cache the pre-lowered NIR This adds a second level of caching for the pre-lowered NIR that's only based off of the shader module, entrypoint and specialization constants. This is enough for spirv_to_nir as well as our first round of lowering and optimization. Caching at this level should allow for faster shader recompiles due to state changes. The NIR caching does not get serialized to disk via either the VkPipelineCache serialization mechanism or the transparent on-disk cache. We could but it's usually not that expensive to fall back to SPIR-V for the odd cache miss especially if it only happens once for several misses and it simplifies the cache. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-10 19:15:27 -06:00
Jason Ekstrand	f6aa9f7185	anv/pipeline_cache: Add support for caching NIR Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-10 19:15:27 -06:00
Jason Ekstrand	8dfda5ebbe	anv/pipeline: Hash shader modules and spec constants separately The stuff hashed by anv_pipeline_hash_shader is exactly the inputs to anv_shader_compile_to_nir so it can be used for NIR caching. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-10 19:15:27 -06:00
Jason Ekstrand	b90e55a5d5	compiler/types: Serialize/deserialize subpass input types correctly They have glsl_sampler_dim enum values of 8 and 9 which don't work when you & them with 0x7. Fortunately, we have plenty of bits. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-10 19:15:27 -06:00
Jason Ekstrand	73ddfbeb85	anv/pipeline: Move wpos and input attachment lowering to lower_nir This lets us make anv_pipeline_compile_to_nir take a device instead of a pipeline. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-10 19:15:27 -06:00
Matt Turner	32e266a9a5	i965: Compile fp64 funcs only if we do not have 64-bit hardware support Brown bag fix...	2019-01-10 15:22:17 -08:00
Jason Ekstrand	8ea8727a87	anv/pipeline: Constant fold after apply_pipeline_layout Thanks to the new NIR load_descriptor intrinsic added by the UBO/SSBO lowering series, we weren't getting UBO pushing because the UBO range detection pass couldn't see the constants it needed. This fixes that problem with a quick round of constant folding. Because we're folding we no longer need to go out of our way to generate constants when we lower the vulkan_resource_index intrinsic and we can make it a bit simpler. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2019-01-10 20:34:00 +00:00
Rob Clark	031e94dc72	freedreno/a6xx: fix 3d+tiled layout The last round of fixing 3d layer+level layout skipped the tiled case, since tiled texture support was not in place yet. This finishes the job. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-01-10 14:21:39 -05:00
Rob Clark	c92c18c70c	freedreno/a6xx: move tile_mode to sampler-view CSO This is known when the CSO is created, so no need to patch it in later. Also, it seems like smaller textures where the first level is small enough to be linear, it seems like we should set linear tile mode. See: dEQP-GLES3.functional.texture.format.unsized.rgb_unsigned_byte_3d_pot Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-01-10 14:21:39 -05:00
Rob Clark	eb625d30b7	freedreno/a6xx: separate stencil restore/resolve fixes Previously we'd use format/etc from the primary (z32) buffer for the stencil (s8), due to confusion about rsc vs psurf. Rework this to drop extra arg and push down handling of separate stencil case (and make sure we take the fmt from the right place). This doesn't completely fix separate-stencil, but at least it avoids the GPU scribbling over random other cmdstream buffers and causing a bunch of bogus fails in dEQP. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-01-10 14:21:39 -05:00
Rob Clark	04aff7e42b	freedreno: make cmdstream bo's read-only to GPU If nothing else, this will make problems with cmdstream getting blit over with pixels easier to track down (ie. faults when it first happens rather than strange failures later from corrupted cmdstream when a stateobj is later reused). (NOTE this somewhat depends on the kernel supporting the flag, and the iommu implementation. But the worst case is just that the cmdstream ends up writeable as before.) Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-01-10 14:21:39 -05:00
Guido Günther	286de96af8	etnaviv: fix typo in cflush_all description Signed-off-by: Guido Günther <guido.gunther@puri.sm> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-01-10 18:46:10 +01:00
Eric Engestrom	53fbde4df3	radv: remove a few more unnecessary KHR suffixes Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v1)	2019-01-10 16:53:44 +00:00
Rhys Perry	0210243923	nir: fix copy-paste error in nir_lower_constant_initializers Fixes: `393b59e077` ('nir: Rework nir_lower_constant_initializers() to handle functions') Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-10 10:51:52 -06:00
Andres Gomez	6c3164cd08	docs: complete the calendar and release schedule documentation As suggested by Emil Velikov. Cc: Dylan Baker <dylan.c.baker@intel.com> Cc: Juan A. Suarez <jasuarez@igalia.com> Cc: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Acked-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Eric Engestrom <eric@engestrom.ch>	2019-01-10 15:53:02 +02:00
Andres Gomez	428164d87f	glsl/linker: specify proper direction in location aliasing error The check for location aliasing was always asuming output variables but this validation is also called for input variables. Fixes: `e2abb75b0e` ("glsl/linker: validate explicit locations for SSO programs") Cc: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-10 15:51:57 +02:00
Andres Gomez	e2e03f84f9	editorconfig: Add max_line_length property The property is supported by the most of the editors, but not all: https://github.com/editorconfig/editorconfig/wiki/EditorConfig-Properties#max_line_length Cc: Eric Engestrom <eric@engestrom.ch> Cc: Eric Anholt <eric@anholt.net> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2019-01-10 15:50:34 +02:00
Tapani Pälli	864cc419eb	intel/isl: move tiled_memcpy static libs from i965 to isl Patch moves intel_tiled_memcpy[_sse41] libraries to isl, renames some functions and types and makes the required build system changes for meson, automake and Android. No functional changes are introduced. v2: code cleanups, move isl_get_memcpy_type to i965 (Jason) v3: move isl_mem_copy_fn to priv header, cleanups (Jason, Dylan) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2019-01-10 08:02:30 +02:00
Matt Turner	406f603b34	i965: Enable 64-bit GLSL extensions Now that we have software implementations of ARB_gpu_shader_int64 and ARB_gpu_shader_fp64 we can unconditionally enable these extensions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-01-09 16:42:41 -08:00
Matt Turner	613ac3aaa2	i965: Compile fp64 software routines and lower double-ops Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-01-09 16:42:41 -08:00
Matt Turner	18b4e87370	intel/compiler: Heap-allocate temporary storage Shaders containing software implementations of double-precision operations can be very large such that we cannot stack-allocate an array of grf_count*16. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-01-09 16:42:41 -08:00
Matt Turner	622d429128	intel/compiler: Expand size of the 'nr' field Shaders containing software implementations of double-precision operations can be very large such that we have more the 2^16 virtual registers during optimization. Move the 'nr' field to the union containing the immediate storage and expand it to 32-bits. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-01-09 16:42:41 -08:00
Matt Turner	7e4e9da90d	intel/compiler: Prevent warnings in the following patch The next patch replaces an unsigned bitfield with a plain unsigned, which triggers gcc to begin warning on signed/unsigned comparisons. Keeping this patch separate from the actual move allows bisectablity and generates no additional warnings temporarily. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-01-09 16:42:41 -08:00
Matt Turner	2b801b6668	intel/compiler: Rearrange code to avoid future problems A follow on commit will move nr to the same union as the immediate data, so we should assert these invariants before we overwrite the nr field. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-01-09 16:42:41 -08:00
Matt Turner	3b967e1724	intel/compiler: Avoid false positive assertions A follow on patch will move the 'nr' field to the union containing the immediate field, so prepare by checking that we're only testing these assertions if the .file is correct. The assertions with != ARF were kind of silly to begin with because the <128 check is specifically only for things in the GRF. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-01-09 16:42:41 -08:00
Matt Turner	8534742404	intel/compiler: Split 64-bit MOV-indirects if needed Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-01-09 16:42:40 -08:00
Matt Turner	e76772af6c	intel/compiler: Lower 64-bit MOV/SEL operations	2019-01-09 16:42:40 -08:00
Matt Turner	2623653126	nir: Unset metadata debug bit if no progress made NIR metadata validation verifies that the debug bit was unset (by a call to nir_metadata_preserve) if a NIR optimization pass made progress on the shader. With the expectation that the NIR shader consists of only a single main function, it has been safe to call nir_metadata_preserve() iff progress was made. However, most optimization passes calculate progress per-function and then return the union of those calculations. In the case that an optimization pass makes progress only on a subset of the functions in the shader metadata validation will detect the debug bit is still set on any unchanged functions resulting in a failed assertion. This patch offers a quick solution (short of a larger scale refactoring which I do not wish to undertake as part of this series) that simply unsets the debug bit on unchanged functions. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 16:42:40 -08:00
Matt Turner	e633fae5cb	nir: Add lowering support for 64-bit operations to software Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 16:42:40 -08:00
Matt Turner	fe2cbcf3ee	nir: Create nir_builder in nir_lower_doubles_impl() We're going to use it more in a future patch, and this avoids a lot of gross code. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 16:42:40 -08:00
Matt Turner	ecb115eb3f	nir: Add and set info::uses_64bit Will be used to communicate that a shader uses 64-bit operations to the concerned lowering passes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 16:42:40 -08:00
Matt Turner	41f3e9e5f5	nir: Implement lowering of 64-bit shift operations Reviewed-by: Elie Tournier <tournier.elie@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 16:42:40 -08:00
Matt Turner	62d55f1281	nir: Wire up int64 lowering functions Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 16:42:40 -08:00
Jason Ekstrand	adab27e741	nir: Add some more int64 lowering helpers [mattst88]: Found in an old branch of Jason's. Jason implemented: inot, iand, ior, iadd, isub, ineg, iabs, compare, imin, imax, umin, umax Matt implemented: ixor, bcsel, b2i, i2b, i2i8, i2i16, i2i32, i2i64, u2u8, u2u16, u2u32, u2u64, and fixed ilt Reviewed-by: Elie Tournier <tournier.elie@gmail.com>	2019-01-09 16:42:40 -08:00
Matt Turner	dde73e646f	nir: Tag entrypoint for easy recognition by nir_shader_get_entrypoint() We're going to have multiple functions, so nir_shader_get_entrypoint() needs to do something a little smarter. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 16:42:40 -08:00
Matt Turner	393b59e077	nir: Rework nir_lower_constant_initializers() to handle functions Previously it assumed that only a single function (the entrypoint) existed and attempted to lower constant initializers of shader outputs for each function, for instance.	2019-01-09 16:42:40 -08:00
Sagar Ghuge	f998ce4111	glsl: Add "built-in" functions to do fp32_to_int64(fp32) Reviewed-by: Elie Tournier <tournier.elie@gmail.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>	2019-01-09 16:42:40 -08:00
Sagar Ghuge	2632c12477	glsl: Add "built-in" functions to do fp32_to_uint64(fp32) Reviewed-by: Elie Tournier <tournier.elie@gmail.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>	2019-01-09 16:42:40 -08:00
Sagar Ghuge	876a4b85fe	glsl: Add "built-in" functions to do fp64_to_int64(fp64) Reviewed-by: Elie Tournier <tournier.elie@gmail.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>	2019-01-09 16:42:40 -08:00
Sagar Ghuge	21e9bb2b3f	glsl: Add utility function to round and pack int64_t value Reviewed-by: Elie Tournier <tournier.elie@gmail.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>	2019-01-09 16:42:40 -08:00
Sagar Ghuge	5a674fd789	glsl: Add "built-in" functions to do fp64_to_uint64(fp64) Reviewed-by: Elie Tournier <tournier.elie@gmail.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>	2019-01-09 16:42:40 -08:00
Sagar Ghuge	5a87441807	glsl: Add utility function to round and pack uint64_t value Reviewed-by: Elie Tournier <tournier.elie@gmail.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>	2019-01-09 16:42:40 -08:00
Sagar Ghuge	c9d333a6b7	glsl: Add "built-in" functions to do int64_to_fp32(int64_t) Reviewed-by: Elie Tournier <tournier.elie@gmail.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>	2019-01-09 16:42:40 -08:00
Sagar Ghuge	d5cf6e92b4	glsl: Add "built-in" functions to do uint64_to_fp32(uint64_t) Reviewed-by: Elie Tournier <tournier.elie@gmail.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>	2019-01-09 16:42:40 -08:00
Sagar Ghuge	b830efb191	glsl: Add "built-in" functions to do int64_to_fp64(int64_t) Reviewed-by: Elie Tournier <tournier.elie@gmail.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>	2019-01-09 16:42:40 -08:00
Sagar Ghuge	7c5b982b89	glsl: Add "built-in" functions to do uint64_to_fp64(uint64_t) Reviewed-by: Elie Tournier <tournier.elie@gmail.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>	2019-01-09 16:42:40 -08:00
Matt Turner	15757bc80b	glsl: Add "built-in" functions to convert bool to double And vice versa. Reviewed-by: Elie Tournier <tournier.elie@gmail.com>	2019-01-09 16:42:40 -08:00
Matt Turner	e213f3871f	glsl: Add "built-in" functions to do ffract(fp64) Reviewed-by: Elie Tournier <tournier.elie@gmail.com>	2019-01-09 16:42:40 -08:00
Matt Turner	5c9a659f50	glsl: Add "built-in" function to do ffloor(fp64) Reviewed-by: Elie Tournier <tournier.elie@gmail.com>	2019-01-09 16:42:40 -08:00
Matt Turner	83762afa66	glsl: Add "built-in" functions to do fmin/fmax(fp64) Reviewed-by: Elie Tournier <tournier.elie@gmail.com>	2019-01-09 16:42:40 -08:00
Matt Turner	92ac2169fb	glsl: Add "built-in" functions to do ffma(fp64) Definitely not actually a fused-multiply add. Reviewed-by: Elie Tournier <tournier.elie@gmail.com>	2019-01-09 16:42:40 -08:00
Elie Tournier	3db81b5d9f	glsl: Add "built-in" functions to do round(fp64) Signed-off-by: Elie Tournier <elie.tournier@collabora.com>	2019-01-09 16:42:40 -08:00
Elie Tournier	48891ab441	glsl: Add "built-in" functions to do trunc(fp64) v2: use mix. Signed-off-by: Elie Tournier <elie.tournier@collabora.com>	2019-01-09 16:42:40 -08:00
Elie Tournier	2119094b1d	glsl: Add "built-in" functions to do sqrt(fp64) Signed-off-by: Elie Tournier <elie.tournier@collabora.com>	2019-01-09 16:42:40 -08:00
Elie Tournier	cad58fc5e7	glsl: Add "built-in" functions to do fp32_to_fp64(fp32) Signed-off-by: Elie Tournier <elie.tournier@collabora.com>	2019-01-09 16:42:40 -08:00
Elie Tournier	407bd1bbf9	glsl: Add "built-in" functions to do fp64_to_fp32(fp64) Signed-off-by: Elie Tournier <elie.tournier@collabora.com>	2019-01-09 16:42:40 -08:00
Elie Tournier	f499942b31	glsl: Add "built-in" functions to do int_to_fp64(int) v2: use mix Signed-off-by: Elie Tournier <elie.tournier@collabora.com>	2019-01-09 16:42:40 -08:00
Elie Tournier	773190f281	glsl: Add "built-in" functions to do fp64_to_int(fp64) v2: use mix Signed-off-by: Elie Tournier <elie.tournier@collabora.com>	2019-01-09 16:42:40 -08:00
Elie Tournier	cbf090b809	glsl: Add "built-in" functions to do uint_to_fp64(uint) Signed-off-by: Elie Tournier <elie.tournier@collabora.com>	2019-01-09 16:42:40 -08:00
Elie Tournier	a3551ee61f	glsl: Add "built-in" functions to do fp64_to_uint(fp64) Signed-off-by: Elie Tournier <elie.tournier@collabora.com>	2019-01-09 16:42:40 -08:00
Elie Tournier	4a93401546	glsl: Add "built-in" functions to do mul(fp64, fp64) v2: use mix Signed-off-by: Elie Tournier <elie.tournier@collabora.com>	2019-01-09 16:42:40 -08:00
Elie Tournier	f111d72596	glsl: Add "built-in" functions to do add(fp64, fp64) v2: use mix and findMSB to optimise. v3: [Sagar] Fix zFrac0 == 0u case in __normalizeRoundAndPackFloat64 Signed-off-by: Elie Tournier <elie.tournier@collabora.com>	2019-01-09 16:42:40 -08:00
Elie Tournier	c036fc97a2	glsl: Add "built-in" functions to do lt(fp64, fp64) Signed-off-by: Elie Tournier <elie.tournier@collabora.com>	2019-01-09 16:42:40 -08:00
Elie Tournier	3e4d5ea7b8	glsl: Add utility function to extract 64-bit sign Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-01-09 16:42:40 -08:00
Elie Tournier	ec6e823a99	glsl: Add "built-in" functions to do eq/ne(fp64, fp64)	2019-01-09 16:42:40 -08:00
Elie Tournier	c802cdde9d	glsl: Add "built-in" function to do sign(fp64) v2: use mix. Signed-off-by: Elie Tournier <elie.tournier@collabora.com>	2019-01-09 16:42:40 -08:00
Elie Tournier	eac66f0248	glsl: Add "built-in" functions to do neg(fp64) v2: use mix. Signed-off-by: Elie Tournier <elie.tournier@collabora.com>	2019-01-09 16:42:40 -08:00
Elie Tournier	0428951b9d	glsl: Add "built-in" function to do abs(fp64) Signed-off-by: Elie Tournier <elie.tournier@collabora.com>	2019-01-09 16:42:40 -08:00
Matt Turner	b63a1f8e40	glsl: Create file to contain software fp64 functions The following patches will add implementations of various double-precision operations to this file. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-01-09 16:42:40 -08:00
Ian Romanick	412472da5c	glsl: Add utility to convert text files to C strings Will be used to convert the .glsl source file containing software fp64 routines to a .h file that can be included while building the compiler. This commit contains two squashed together: the first from Ian adding the utility (with the existing title), and the second from Dylan making the code both python2 and python3 compatible. This is somewhat modeled after the xxd utility that comes with Vim. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> xxd.py: Make python2 and 3 compatible This makes use of unicode_literals, so that undecorated strings are considered text (python2 unicode, python3 str) and not bytes in python2 and text in python3. It makes use of io.open, which provides python2 with python3's open behavior (it's an alias in python3), in particular support for the 't' and 'b' option. Finally, it decorates all of the string literals with the 'b' prefix, so that python interprets them as bytes. I've removed the stdin and stdout options, as python2 always requires these to be bytes, but python3 always treats them as text (there is a way to get at the underlying bytes buffer, but that's even more complexity), and makes the input files required arguments. In the meson we use the '@INPUT@' shorthand instead of listing each input, as meson will expand that to [prog_python, '@INPUT0@', @INPUT1@, ..., @OUTPUT@, ...]	2019-01-09 16:42:40 -08:00
Timothy Arceri	76c27e47b9	glsl: Copy function out to temp if we don't directly ref a variable Otherwise we can end up with IR that looks like this: ( (declare (temporary ) vec4 f@8) (assign (xyzw) (var_ref f@8) (var_ref f) ) (call f16 ((swiz y (var_ref f@8) ))) (assign (xyzw) (var_ref f) (var_ref f@8) ) )) When we really need: (declare (temporary ) float inout_tmp) (assign (x) (var_ref inout_tmp) (swiz y (var_ref f) )) (call f16 ((var_ref inout_tmp) )) (assign (y) (var_ref f) (swiz y (swiz xxxx (var_ref inout_tmp) ))) (declare (temporary ) void void_var) The GLSL IR function inlining code seemed to produce correct code even without this but we need the correct IR for GLSL IR -> NIR to be able to understand whats going on. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-01-09 16:42:40 -08:00
Matt Turner	63f6d7afd6	glsl: Add function support to glsl_to_nir Based on a patch from Tim Arceri, but I had to substantially rewrite it as a result of the NIR derefs rework. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-01-09 16:42:40 -08:00
Francisco Jerez	230a8a541d	intel/fs: Remove FS_OPCODE_UNPACK_HALF_2x16_SPLIT opcodes. These are broken on a future platform, but it turns out we don't need to fix them, since they're just type-converting moves with strided source. Kill them. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-09 12:03:09 -08:00
Francisco Jerez	cbea91eb57	intel/fs: Remove nasty open-coded CHV/BXT 64-bit workarounds. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-09 12:03:09 -08:00
Francisco Jerez	2c99c7a56c	intel/fs: Remove existing lower_conversions pass. It's redundant with the functionality provided by lower_regioning now. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-09 12:03:09 -08:00
Francisco Jerez	efa4e4bc5f	intel/fs: Introduce regioning lowering pass. This legalization pass is meant to handle situations where the source or destination regioning controls of an instruction are unsupported by the hardware and need to be lowered away into separate instructions. This should be more reliable and future-proof than the current approach of handling CHV/BXT restrictions manually all over the visitor. The same mechanism is leveraged to lower unsupported type conversions easily, which obsoletes the lower_conversions pass. v2: Give conditional modifiers the same treatment as predicates for SEL instructions in lower_dst_modifiers() (Iago). Special-case a couple of other instructions with inconsistent conditional mod semantics in lower_dst_modifiers() (Curro). Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-09 12:03:09 -08:00
Francisco Jerez	b94519971a	intel/fs: Constify fs_inst::can_do_source_mods(). Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-09 12:03:09 -08:00
Francisco Jerez	c301f447ea	intel/fs: Respect CHV/BXT regioning restrictions in copy propagation pass. Currently the visitor attempts to enforce the regioning restrictions that apply to double-precision instructions on CHV/BXT at NIR-to-i965 translation time. It is possible though for the copy propagation pass to violate this restriction if a strided move is propagated into one of the affected instructions. I've only reproduced this issue on a future platform but it could affect CHV/BXT too under the right conditions. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-09 12:03:08 -08:00
Francisco Jerez	464e79144f	intel/eu/gen7: Fix brw_MOV() with DF destination and strided source. I triggered this bug while prototyping code for a future platform on IVB. Could be a problem today though if a strided move is copy-propagated into a type-converting move with DF destination. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-09 12:03:08 -08:00
Francisco Jerez	bc781a0323	intel/fs: Fix bug in lower_simd_width while splitting an instruction which was already split. This seems to be a problem in combination with the lower_regioning pass introduced by a future commit, which can modify a SIMD-split instruction causing its execution size to become illegal again. A subsequent call to lower_simd_width() would hit this bug on a future platform. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-09 12:03:08 -08:00
Francisco Jerez	812ede088f	intel/fs: Implement quad swizzles on ICL+. Align16 is no longer a thing, so a new implementation is provided using Align1 instead. Not all possible swizzles can be represented as a single Align1 region, but some fast paths are provided for frequently used swizzles that can be represented efficiently in Align1 mode. Fixes ~90 subgroup quad swap Vulkan CTS tests. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-09 12:03:08 -08:00
Francisco Jerez	c5f9c0009d	intel/fs: Handle source modifiers in lower_integer_multiplication(). lower_integer_multiplication() implements 32x32-bit multiplication on some platforms by bit-casting one of the 32-bit sources into two 16-bit unsigned integer portions. This can give incorrect results if the original instruction specified a source modifier. Fix it by emitting an additional MOV instruction implementing the source modifiers where necessary. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2019-01-09 12:03:08 -08:00
Andrii Simiklit	0206ffc28d	anv/pipeline: remove unnecessary null-pointer check Looks like it is impossible that 'last' variable is a null because at least the get_vs_prog_data shouldn't return a null pointer. So this check is unnecessary starts from commit: `99d497c5b6` "anv/pipeline: Replace get_fs_input_map with ..." This small issue is found by cppcheck. Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-09 12:29:12 -06:00
Indrajit Das	d2c170eb35	st/va: Return correct status from vlVaQuerySurfaceStatus This ensures that during encoding, applications can get the correct status of the surface before submitting more operations on the same. Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Indrajit Das <indrajit-kumar.das@amd.com>	2019-01-09 11:34:22 -05:00
Roland Scheidegger	0c226d40ef	Revert "llvmpipe: Always return some fence in flush (v2)" This reverts commit `f6a6da8131`. With this commit we see massive amounts of asserts triggering in lp_fence_wait(), assert(f->issued), for instance with libgl_xlib state tracker and piglit. Not entirely sure if the assert could just be removed.	2019-01-09 17:28:53 +01:00
Marek Olšák	e986c1ca1d	st/mesa: don't leak pipe_surface if pipe_context is not current We have found some pipe_surface leaks internally. This is the same code as surface_destroy in radeonsi. Ideally, surface_destroy would be in pipe_screen. Cc: 18.3 <mesa-stable@lists.freedesktop.org> Reviewed-by: Brian Paul <brianp@vmware.com>	2019-01-09 11:08:44 -05:00
Marek Olšák	fd82a1d1d6	st/mesa: don't reference pipe_surface locally in PBO code Reviewed-by: Brian Paul <brianp@vmware.com>	2019-01-09 11:08:44 -05:00
Marek Olšák	5da442338b	st/mesa: unify window-system renderbuffer initialization Reviewed-by: Brian Paul <brianp@vmware.com>	2019-01-09 11:08:44 -05:00
Mario Kleiner	5e30e54e05	radeonsi: Fix use of 1- or 2- component GL_DOUBLE vbo's. With Mesa 18.1, commit `be973ed21f`, si_llvm_load_input_vs() changed the number of source 32-bit wide dword components used for fetching vertex attributes into the vertex shader from a constant 4 to a variable num_channels number, depending on input data format, with some special case handling for input data formats like 64-Bit doubles. In the case of a GL_DOUBLE input data format with one or two components though, e.g, submitted via ... a) glTexCoordPointer(1, GL_DOUBLE, 0, buffer); b) glTexCoordPointer(2, GL_DOUBLE, 0, buffer); ... the input format would be SI_FIX_FETCH_RG_64_FLOAT, but no special case handling was implemented for that case, so in the default path the number of 32-bit dwords would be set to the number of float input components derived from info->input_usage_mask. This ends with corrupted input to the vertex shader, because fetching a 64-bit double from the vbo requires fetching two 32-bit dwords instead of 1, and fetching a two double input requires 4 dword fetches instead of 2, so in these cases the vertex shader receives incomplete/truncated input data: a) float v = gl_MultiTexCoord0.x; -> v.x is corrupted. b) vec2 v = gl_MultiTexCoord0.xy; -> v.x is assigned correctly, but v.y is corrupted. This happens with the standard TGSI IR compiled shaders. Under NIR with R600_DEBUG=nir, we got correct behavior because the current radeonsi nir code always assigns info->input_usage_mask = TGSI_WRITEMASK_XYZW, thereby always fetches 4 dwords regardless of what the shader actually needs. Fix this by properly assigning 2 or 4 dword fetches for one or two component GL_DOUBLE input. Fixes: `be973ed21f` ("radeonsi: load the right number of components for VS inputs and TBOs") Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Cc: mesa-stable@lists.freedesktop.org Cc: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2019-01-09 11:08:44 -05:00
Rhys Perry	ee8488ea3b	ac/nir,radv,radeonsi/nir: use correct indices for interpolation intrinsics Fixes artifacts in World of Warcraft when Multi-sample Alpha-Test is enabled with DXVK. It also fixes artifacts with Fallout 4's god rays with DXVK. Various piglit interpolateAt*() tests under NIR are also fixed. v2: formatting fix update commit message to include Fallout 4 and the Fixes tag Fixes: `f4e499ec79` ('radv: add initial non-conformant radv vulkan driver') Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106595 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>	2019-01-09 14:57:07 +00:00
Samuel Pitoiset	b8c4f523b4	radv: skip draws with instance_count == 0 Loosely based on RadeonSI. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-09 14:22:38 +01:00
Samuel Pitoiset	a2b5cc3c39	radv: enable variable pointers The Vulkan spec 1.1.97 says: "variablePointers specifies whether the implementation supports the SPIR-V VariablePointers capability. When this feature is not enabled, shader modules must not declare the VariablePointers capability." As the SPIR-V feature is enabled, we should turn on the extension feature as well. All dEQP-VK.spirv_assembly.instruction.compute.variable_pointers.* pass with the khronos internal repo. Note that a bunch of them fails with the public repo, but it's expected as they violate the specification. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-09 12:32:18 +01:00
Samuel Pitoiset	d58b11e709	radv: get rid of bunch of KHR suffixes Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Eric Engestrom <eric.engestrom@intel.com> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-01-09 12:26:48 +01:00
Maya Rashish	a2ddb710fd	radeon: fix printf format specifier. From glibc printf(3): Z A nonstandard synonym for z that predates the appearance of z. Do not use in new code. Z may not exist on non-glibc systems. Prefer the standard symbol. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-01-09 14:15:06 +11:00
Tomasz Figa	f6a6da8131	llvmpipe: Always return some fence in flush (v2) If there is no last fence, due to no rendering happening yet, just create a new signaled fence and return it, to match the expectations of the EGL sync fence API. Fixes random "Could not create sync fence 0x3003" assertion failures from Skia on Android, coming from the following code: https://android.googlesource.com/platform/frameworks/base/+/master/libs/hwui/pipeline/skia/SkiaOpenGLPipeline.cpp#427 Reproducible especially with thread count >= 4. One could make the driver always keep the reference to the last fence, but: - the driver seems to explicitly destroy the fence whenever a rendering pass completes and changing that would require a significant functional change to the code. (Specifically, in lp_scene_end_rasterization().) - it still wouldn't solve the problem of an EGL sync fence being created and waited on without any rendering happening at all, which is also likely to happen with Android code pointed to in the commit. Therefore, the simple approach of always creating a fence is taken, similarly to other drivers, such as radeonsi. Tested with piglit llvmpipe suite with no regressions and following tests fixed: egl_khr_fence_sync conformance eglclientwaitsynckhr_flag_sync_flush eglclientwaitsynckhr_nonzero_timeout eglclientwaitsynckhr_zero_timeout eglcreatesynckhr_default_attributes eglgetsyncattribkhr_invalid_attrib eglgetsyncattribkhr_sync_status v2: - remove the useless lp_fence_reference() dance (Nicolai), - explain why creating the dummy fence is the right approach. Signed-off-by: Tomasz Figa <tfiga@chromium.org>	2019-01-09 02:06:13 +01:00
Eric Anholt	700aeaf9c8	glsl: Fix buffer overflow with an atomic buffer binding out of range. The binding is checked against the limits later in the function, so we need to make sure we don't overflow before the check here. Fixes this valgrind warning (and sometimes segfault): ==1460== Invalid write of size 4 ==1460== at 0x74C98DD: ast_declarator_list::hir(exec_list, _mesa_glsl_parse_state) (ast_to_hir.cpp:4943) ==1460== by 0x74C054F: _mesa_ast_to_hir(exec_list, _mesa_glsl_parse_state) (ast_to_hir.cpp:159) ==1460== by 0x7435C12: _mesa_glsl_compile_shader (glsl_parser_extras.cpp:2130) in dEQP-GLES31.functional.debug.negative_coverage.get_error.compute. exceed_atomic_counters_limit Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2019-01-08 15:44:58 -08:00
Eric Anholt	211b826790	nir: Make nir_deref_instr_build/get_const_offset actually use size_align. I think this was copy-and-paste mistake -- nir_opt_large_constants was passing in glsl_get_natural_size_align_bytes() given brw_nir.c's arguments to the opt pass. I wanted to reuse this function for handling constant offsets of arrays of images in V3D. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Rob Clark <robdclark@gmail.com>	2019-01-08 15:40:53 -08:00
Danylo Piliaiev	9f29d90327	glsl/linker: Fix unmatched TCS outputs being reduced to local variable Always match TCS outputs since they are shared by all invocations within the patch and should not be converted to local variables. This is one of the issues found in Downward. Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104297	2019-01-09 10:31:13 +11:00
Eric Anholt	db3b6b6bca	v3d: Enable GL_ARB_texture_gather on V3D 4.x. This is part of GLES 3.1, and with the NIR lowering we're now passing the GLES31 testcases.	2019-01-08 13:03:44 -08:00
Eric Anholt	6051c11d17	nir: Add nir_lower_tex support for Broadcom's swizzled TG4 results. V3D returns the texels in a different order in the resulting vec4 from what GLSL wants, so we need to put in a swizzle. Fixes dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.base_level.level_1 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-08 13:03:41 -08:00
Bas Nieuwenhuizen	3fcec4a550	freedreno: Move register constant files to src/freedreno. This way they can be shared. Build tested with meson, but not too sure on the autotools stuff though. Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Acked-by: Rob Clark <robdclark@gmail.com>	2019-01-08 21:46:14 +01:00
Caio Marcelo de Oliveira Filho	baabfb1959	nir: fix warning in nir_lower_io.c Initialize the variable with NULL. Fixes the following In file included from ../src/compiler/nir/nir_lower_io.c:34: ../src/compiler/nir/nir_lower_io.c: In function ‘nir_lower_explicit_io’: ../src/compiler/nir/nir.h:668:11: warning: ‘addr’ may be used uninitialized in this function [-Wmaybe-uninitialized] return src; ^~~ ../src/compiler/nir/nir_lower_io.c:735:17: note: ‘addr’ was declared here nir_ssa_def *addr; ^~~~ v2: Avoid using a 'default' case so we get help from the compiler when new deref types are added. (Lionel) Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-08 12:29:56 -08:00
Chia-I Wu	3cb65cf8aa	freedreno/drm: sync uapi again "pad" was missing in Mesa's msm_drm.h. sizeof(drm_msm_gem_info) remains the same, but now the compiler initializes the field to zero. Buffer allocation results in EINVAL without this for me. Cc: Rob Clark <robdclark@gmail.com> Cc: Kristian Høgsberg <hoegsberg@gmail.com> Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@gmail.com>	2019-01-08 19:55:28 +00:00
Chia-I Wu	6eeb1fe491	meson: fix EGL/X11 build without GLX dep_xcb and others were not set under this configuration. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-01-08 10:58:48 -08:00
Eric Engestrom	b38a48a569	wsi: drop unneeded KHR suffix Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-08 18:48:03 +00:00
Eric Engestrom	4f5a526789	anv: drop unneeded KHR suffix Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-08 18:47:56 +00:00
Karol Herbst	d0c6ef2793	nir: rename global/local to private/function memory the naming is a bit confusing no matter how you look at it. Within SPIR-V "global" memory is memory accessible from all threads. glsl "global" memory normally refers to shader thread private memory declared at global scope. As we already use "shared" for memory shared across all thrads of a work group the solution where everybody could be happy with is to rename "global" to "private" and use "global" later for memory usually stored within system accessible memory (be it VRAM or system RAM if keeping SVM in mind). glsl "local" memory is memory only accessible within a function, while SPIR-V "local" memory is memory accessible within the same workgroup. v2: rename local to function as well v3: rename vtn_variable_mode_local as well Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-08 18:51:46 +01:00
Dylan Baker	401dca1c73	autotools: Remove tegra vdpau driver This has never functioned and probably wont ever function, due to the way gallium media state trackers are architected and the tegra video decoder is architected. Cc: Thierry Reding <thierry.reding@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Fixes: `1755f608f5` ("tegra: Initial support")	2019-01-08 09:42:56 -08:00
Pierre Moreau	ba55cb2bcd	clover/meson: Ignore 'svn' suffix when computing CLANG_RESOURCE_DIR The version exported by LLVM in its CMake configuration files can include the “svn” suffix when building a development version (for example “8.0.0svn”). However the exported clang headers are still found under “lib/clang/8.0.0/”, without the “svn” suffix. Meson takes care of removing the “svn” suffix from the version when using the dependency’s `version()` method. This processing is already performed in “configure.ac” when using autotools. Signed-off-by: Pierre Moreau <pierre.morrow@free.fr> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-01-08 08:53:38 -08:00
Lionel Landwerlin	add5a2ec92	anv: flush fast clear colors into compressed surfaces In the following scenario : 1. Create image format R8G8B8A8_UNORM 2. Create image view format R8G8B8A8_SRGB 3. Clear the view through a sub pass to a particular color 4. Barrier on the image to from color attachment to source transfer 5. Copy the image into a linear buffer to check the content The step 4 resolving the clear color is unaware of the SRGB format of the view, because the blorp resolve operations operate on images the color associated with the resolve will not operate on SRGB format but UNORM. Leading to the wrong color being written into surfaces. This change forces a clear color resolve at the end of the render pass so following resolves won't have to deal with the clear color with a format that doesn't match the image's format. On gfxbench vulkan_5_normal 1280x720, this appear to cost us ~0.5fps, from 49.316 down to 48.949. v2: Only fast clear resolve when image & view have different formats (Lionel) v3: Update warning (Jason) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108911 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: mesa-stable@lists.freedesktop.org	2019-01-08 16:37:00 +00:00
Lionel Landwerlin	366eb656ac	anv: explictly specify format for blorp ccs/mcs op Resolve operations can happen when dealing with view (begin/end subpasses) in which case the view's format needs to apply, not the image's format. v2: Relayout arguments of a ccs_op() call (Jason) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108911 Cc: mesa-stable@lists.freedesktop.org	2019-01-08 16:36:56 +00:00
Tapani Pälli	c292414765	dri3: initialize adaptive_sync as false before configQueryb Fixes following errors from valgrind output: ==23388== Conditional jump or move depends on uninitialised value(s) ==23388== at 0x48B4924: loader_dri3_drawable_init (loader_dri3_helper.c:381) ==23388== by 0x48A97D2: dri3_create_drawable (dri3_glx.c:386) ==23388== by 0x489E190: driFetchDrawable (dri_common.c:369) ==23388== by 0x48A9187: dri3_bind_context (dri3_glx.c:195) ==23388== by 0x488B75C: MakeContextCurrent (glxcurrent.c:220) ==23388== by 0x488B8DB: glXMakeCurrent (glxcurrent.c:267) ==23388== by 0x10A987: ??? (in /usr/bin/glxgears) ==23388== by 0x4BEB412: (below main) (in /usr/lib64/libc-2.28.so) ==23388== ==23388== Conditional jump or move depends on uninitialised value(s) ==23388== at 0x48B5A40: loader_dri3_swap_buffers_msc (loader_dri3_helper.c:923) ==23388== by 0x48A9B7E: dri3_swap_buffers (dri3_glx.c:587) ==23388== by 0x4887A81: glXSwapBuffers (glxcmds.c:857) ==23388== by 0x10ADED: ??? (in /usr/bin/glxgears) ==23388== by 0x4BEB412: (below main) (in /usr/lib64/libc-2.28.so) Fixes: `2e12fe425f` "loader/dri3: Enable adaptive_sync via _VARIABLE_REFRESH property" Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>	2019-01-08 08:15:07 +02:00
Dave Airlie	4298a85ae8	virgl: use primconvert provoking vertex properly This stores the raster state and calls the correct primconvert interface using the currently bound raster state. Reviewed-By: Gert Wollny <gert.wollny@collabora.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2019-01-08 12:06:41 +10:00
Jason Ekstrand	754eff07d2	anv: Sort properties and features switch statements Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-07 18:41:15 -06:00
Jason Ekstrand	05d72d6d48	spirv: Sort supported capabilities Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-07 18:41:15 -06:00
Jason Ekstrand	34af63fa22	anv: Enable the new deref-based UBO/SSBO path Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	63b9aa2e25	spirv: Add support for using derefs for UBO/SSBO access For now, it's hidden behind a cap. Hopefully, we can eventually drop that along with all the manual offset code in spirv_to_nir. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	3a7c5667c8	spirv: Make better use of vtn_pointer_uses_ssa_offset The choice of whether or not we should use block_load/store isn't a choice between external and not so much as a choice between deref instructions and manually calculated offsets. In vtn_pointer_from_ssa, we guard the index+offset case behind vtn_pointer_uses_ssa_offset and then branch out from there. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	adc155a815	spirv: Add explicit pointer types Instead of baking in uvec2 for UBO and SSBO pointers and uint for push constant and shared memory pointers, make it configurable. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	be039cb467	spirv: Choose atomic deref type with pointer_uses_ssa_offset Previously, we hard-coded the rule about workgroup variables and the builder lower_workgroup_access_to_offsets flag. Instead base it on the handy helper we have for exactly this sort of thing. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	5c3cb9c3ce	spirv: Add error checking for Block and BufferBlock decorations Variable pointers being well-defined across the block boundary requires a couple of very specific SPIR-V validation rules. Normally, we'd trust the validator to catch these but since CTS tests have been found in the wild which violate them, we'll carry our own checks. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	e90b738f20	nir/vulkan: Add a descriptor type to vulkan resource intrinsics Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	f393b10b3f	nir/lower_io: Add "explicit" IO lowering This new pass is for lowering explicitly laid out memory coming in from SPIR-V or a similar source. It's quite a bit more complicated than the normal lower_io because we have to be able to handle matrices. The way the stride information is stored for matrices is awkward and dealing with row-major matrices is especially painful. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	52dd43c7ef	nir/validate: Allow array derefs on vectors in more modes Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	013ee5732b	nir/intrinsics: Add access flags to load/store_deref Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	7755171e4c	nir/intrinsics: Allow deref sources to consume anything This commit adds a new num_components value for intrinsic sources of -1 which means that it consumes everything and the number of components effectively isn't validated. This is useful for deref sources which just take the result of the deref and we leave it up to the driver to decide what that size should be. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	d0fe52a456	nir/validate: Allow derefs in phi nodes We added this assert when first moving derefs over to instructions to ensure that deref chains could go all the way back to the variables. Now that we're going to start using derefs for things that we can do variable pointers on such as UBOs and SSBOs, we need to be able to run derefs through phi nodes, selects, and basically anything else. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	7e85480a67	nir/remove_dead_variables: Properly handle deref casts We already detect any incomplete deref chains (where the deref is used for something other than another deref or a load/store) and flag the variable as used thanks to deref_used_for_not_store. All that's left to do is to properly skip casts when cleaning up. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	78d80f7db2	nir/deref: Skip over casts in fixup_deref_modes This pass is used when, for instance, we lazily change the mode of variables rather than replacing the variable with a new one. Since we only do this in cases where we know we have full deref chains, it's ok to just skip them in fixup_deref_modes. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	d8e3edb784	nir/deref: Support casts and ptr_as_array in comparisons The code which constructs deref paths already gives you the path starting at the nearest deref_cast or deref_var. All we need to do for casts is handle the case where the start of the path isn't a deref_var. For ptr_as_array derefs, we just bail if we have any after the divergence point between the two derefs. We may be able to do better in the future but this works for now. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	a1c688517d	nir/opt_deref: Properly optimize ptr_as_array derefs When handling casts, we can't blindly propagate the parent of a cast into a ptr_as_array deref because doing so might loose the stride information from the cast. Instead, before we can propagate into ptr_as_array derefs, we need to check that the cast is a cast of an array deref and that the stride matches. For other types of derefs, we can continue to propagate casts as normal because they don't need the stride. We also add an optimization which can combine a ptr_as_array deref with it parent if it is also an array deref of some form. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	427558a717	nir/validate: Don't allow derefs in if conditions Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	e94a027af8	nir: Add a ptr_as_array deref type These correspond directly to SPIR-V's OpPtrAccessChain. As such, they treat whatever their parent gives them as if it's the first element in some array and dereferences that array. If the parent is, itself, an array deref, then the two indices can just be added together to get the final array deref. However, it can also be used in cases where what you have is a dereference to some random vec2 value somewhere. In this case, we require a cast before the ptr_as_array and use the ptr_stride field in the cast to provide a stride for the ptr_as_array derefs. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	fc9c4f89b8	nir: Move propagation of cast derefs to a new nir_opt_deref pass We're going to want to do more deref optimizations going forward and this gives us a central place to do them. Also, cast propagation will get a bit more complicated with the addition of ptr_as_array derefs. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	bf1a1eed88	spirv: Propagate layout decorations to created glsl_types Instead of just storing the decorations in the vtn_type, propagate them all the way through to the glsl_type. For array strides, this means we need to handle them earlier so we break array stride handling into it's own function and explicitly call it for both pointer and array types. Due to type deduplication in the SPIR-V, we may have explicit layout decorations on all sorts of types that don't actually want them. In order to prevent these leaking into unfortunate places in NIR, we explicitly strip them off before creating NIR variables and when casting pointers to non-external memory. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-08 00:38:30 +00:00
Jason Ekstrand	6cebeb4f71	glsl_type: Add support for explicitly laid out matrices and arrays SPIR-V allows for matrix and array types to be decorated with explicit byte stride decorations and matrix types to be decorated row- or column-major. This commit adds support to glsl_type to encode this information. Because this doesn't work nicely with std430 and std140 alignments, we add asserts to ensure that we don't use any of the std430 or std140 layout functions with explicitly laid out types. In SPIR-V, the layout information for matrices is applied to the parent struct member instead of to the matrix type itself. However, this is gets rather clumsy when you're walking derefs trying to compute offsets because, the moment you hit a matrix, you have to crawl back the deref chain and find the struct. Instead, we take the same path here as we've taken in spirv_to_nir and put the decorations on the matrix type itself. This also subtly adds support for strided vector types. These don't come up in SPIR-V directly but you can get one as the result of taking a column from a row-major matrix or a row from a column-major matrix. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	7f70b3e555	glsl_type: Simplify glsl_channel_type This is C++ so we can just poke at the fields of glsl_type if we wish and calling get_instance is way easier and more reliable than handling each instance separately. While we're at it, we re-arrange the base type labels to match the enum order and add 8-bit type support. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	d8a11bfc08	glsl_type: Add a C wrapper to get struct field offsets Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	d34f19feba	glsl_type: Drop the glsl_get_array_instance C helper It was added in `bce6f99875` even though it's completely redundant with glsl_array_type(). Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	a700a82bda	nir: Distinguish between normal uniforms and UBOs Previously, NIR had a single nir_var_uniform mode used for atomic counters, UBOs, samplers, images, and normal uniforms. This commit splits this into nir_var_uniform and nir_var_ubo where nir_var_uniform is still a bit of a catch-all but the nir_var_ubo is specific to UBOs. While we're at it, we also rename shader_storage to ssbo to follow the convention. We need this so that we can distinguish between normal uniforms and UBO access at the deref level without going all the way back variable and seeing if it has an interface type. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	c9a4135e14	nir: Allow storing to shader_storage I have no idea how shader_storage made it into the list of banned variable modes for stores but it clearly should be allowed. This only doesn't cause us a problem today because we never actually use derefs on shader_storage variables. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	cd93b0a670	nir/validate: Require array indices to match the deref bit size This doesn't currently change anything because array indices are required to be 32 bits and all derefs are also 32 bits. However, we will one day have 64-bit derefs for OpenCL. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	abfe674c54	spirv: Handle arbitrary bit sizes for deref array indices We already had code in link_as_ssa to handle bit sizes; we just need to use it. While we're at it we clean up link_as_ssa a bit and add an explicit bit_size parameter in preparation for a day when we have derefs that aren't 32 bit. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	bfe31c5e46	nir/builder: Add nir_i2i and nir_u2u helpers which take a bit size Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com	2019-01-08 00:38:29 +00:00
Jason Ekstrand	639c236e74	spirv: Emit NIR deref instructions on-the-fly This simplifies our deref handling by emitting the actual NIR deref instructions on-the-fly instead of of building up a deref chain and then emitting them at the last moment. In order for this to work with the parts of the compiler that assume they can chase deref chains, we have to run nir_rematerialize_derefs_in_use_blocks_impl to put the derefs back in the right places. Otherwise, in cases such as loop continues where the SPIR-V blocks are not in the same order as the NIR blocks, we may end up with a deref chain with a parent that does not dominate it's child and nir_repair_ssa_impl will insert phis in the deref chain. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	c59f07684c	spirv: Sign-extend array indices The SPIR-V spec was recently updated to clarify that array indices are treated as signed integers. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	f8992eb5ba	anv/apply_pipeline_layout: Set the cursor in lower_res_reindex_intrinsic The loop through instructions doesn't set the cursor for us so unless we set it somewhere, we may end up emitting instructions in the wrong place. The only reason why we haven't been bitten by this in the past is that it only happens in a few variable pointers cases and the CTS tests for those don't use much control flow so things were getting emitted in the correct order by accident. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	42b2f3e91f	spirv: Handle any bit size in vector_insert/extract This crops up both in the actual SPIR-V VectorInsert/Extract opcodes as well as various places where we deal with vector derefs. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:29 +00:00
Jason Ekstrand	a392ddb781	glsl_type: Support serializing 8 and 16-bit types Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-01-08 00:38:29 +00:00
Bas Nieuwenhuizen	70ed049cc6	spirv: Fix matrix parameters in function calls. They can be handled exactly the same as arrays, we just need to handle the base type correctly in the switches. Fixes: `a45b6fb452` "spirv: Pass SSA values through functions" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109204 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-01-08 01:30:03 +01:00
Bas Nieuwenhuizen	3cc940277a	radv: Fix rasterization precision bits. Note that these limits are exact, not a "precision is at least x", as texel coords also get snapped to a multiple of this step size before filtering. This fixes CTS tests dEQP-VK.texture.explicit_lod.2d.sizes.31x55_nearest_linear_mipmap_nearest_repeat dEQP-VK.texture.explicit_lod.2d.sizes.57x35_nearest_linear_mipmap_nearest_repeat Fixes: `f4e499ec79` "radv: add initial non-conformant radv vulkan driver" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109151 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-01-07 23:27:30 +01:00
Kenneth Graunke	f003859f97	nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_deref These days, we have two sampler lowering passes. The newer one, gl_nir_lower_samplers_as_deref, is used by radeonsi. It rewrites variables to drop structures out of sampler deref chains, to make life simpler. It then sets var->data.binding for non-bindless sampler and image variables based on the GL uniform storage's opaque index values. The older one converts sampler deref chains (nir_tex_src_texture_deref) to a numerical offset (nir_tex_src_texture_offset). It also stores the constant-valued portion of that number in tex->texture_index, making life really simple for drivers that don't support indirects. It too pokes at GL uniform storage's opaque index values. Logically, we can do the first pass (simplify derefs, set bindings) then the second (turn derefs to offsets, set texture_index). This patch does exactly that, eliminating some redundancy (only one pass has to poke at GL uniform storage), and gaining proper var->data.binding values for drivers using the full lowering. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-07 14:25:04 -08:00
Kenneth Graunke	c69f9297cf	nir: Fix gl_nir_lower_samplers_as_deref's structure type handling. We recurse to remove structures, and at each step, re-modify the resulting type for our link in the deref chain. For arrays, the result of recursion is the new underlying type - so we wrap it with the array dimensionality again. For structs, we want to simply use the new underlying type, skipping the struct altogether. The correct way to do this is to do nothing at all. Previously, we had reset type to next->type, which is the /old/ field type, not the new field type we obtained by recursing. This undid our recursive work. Fixes about 338 tests with nested structs, such as: dEQP-GLES2.functional.uniform_api.value.initial.get_uniform.nested_structs_arrays.sampler2D_samplerCube_fragment Note that currently only radeonsi uses this pass, and NIR support is disabled there by default, so the breakage was likely not seen by most people. The next commit uses this pass for more drivers, so this fix prevents regressions from that change. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-07 14:25:04 -08:00
Bas Nieuwenhuizen	be6cee51c0	amd/common: Add some parentheses to silence warning. [1/59] Compiling C object 'src/amd/common/src@amd@common@@amd_common@sta/ac_nir_to_llvm.c.o'. ../mesa/src/amd/common/ac_nir_to_llvm.c: In function ‘get_inst_tessfactor_writemask’: ../mesa/src/amd/common/ac_nir_to_llvm.c:4089:32: warning: suggest parentheses around ‘+’ inside ‘<<’ [-Wparentheses] writemask = ((1 << num_comps + 1) - 1) << first_component; ~~~~~~~~~~^~~ ../mesa/src/amd/common/ac_nir_to_llvm.c:4091:33: warning: suggest parentheses around ‘+’ inside ‘<<’ [-Wparentheses] writemask = (((1 << num_comps + 1) - 1) << first_component) << 4; Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-01-07 23:15:37 +01:00
Bas Nieuwenhuizen	64c83efaee	radv: Remove unused variable. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-01-07 23:15:33 +01:00
Bas Nieuwenhuizen	656c1c488c	radv: Remove device path. unused and gcc complains about strncpy. (from what I can see because strncpy does not leave a 0 byte on truncate. That said we don't use it so this does not fix a real bug). Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-01-07 23:15:14 +01:00
Marek Olšák	492ad9a402	ac: remove unused variable from ac_build_ddxy trivial	2019-01-07 14:51:25 -05:00
Andres Gomez	0cc01f45e7	glsl: correct typo in GLSL compilation error message v2: Add the "fix" tag (Erik). Fixes: `037f68d81e` ("glsl: apply align layout qualifier rules to block offsets") Cc: Timothy Arceri <tarceri@itsqueeze.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-01-07 19:07:33 +02:00
Jason Ekstrand	027835b1da	vulkan: Update the XML and headers to 1.1.97 Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-01-07 10:00:01 -06:00
Andres Gomez	6decc6b1d9	docs: update 18.3 and add 19.x cycles for the release calendar v2: replace incorrect "<td/>" with "<td>" (Eric). Cc: Dylan Baker <dylan.c.baker@intel.com> Cc: Juan A. Suarez <jasuarez@igalia.com> Cc: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Acked-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Juan A. Suarez <jasuarez@igalia.com>	2019-01-07 17:19:47 +02:00
Bas Nieuwenhuizen	110564fdec	anv/android: Do not reject storage images. We do the ImageFormatProperties check already, and rejecting an usage flag when both ImageFormatProperties and the WSI (which is Android) support it is not allowed. Intel does support storage for some of the support WSI formats, such as R8G8B8A8_UNORM, and looking at the ISL_SURF_USAGE_DISABLE_AUX_BIT, the imported images do not have any form of compression that would prevent this fix. v2: Also consider STORAGE bit for Gralloc usage bits. (From Kevin Strasser <kevin.strasser@intel.com>) Fixes: `053d4c328f` "anv: Implement VK_ANDROID_native_buffer (v9)" Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-01-07 15:20:55 +01:00
Bas Nieuwenhuizen	9a45a190ad	radv: Implement buffer stores with less than 4 components. We started using it in the btoi paths for r32g32b32, and the LLVM IR checker will complain about it because we end up with intrinsics with the wrong type extension in the name. Fixes: `593996bc02` ("radv: implement buffer to image operations for R32G32B32") Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2019-01-07 14:54:14 +01:00
Jon Turney	00ad77b9f6	appveyor: Add a Cygwin build script	2019-01-07 13:40:58 +00:00
Jon Turney	5334dafee2	appveyor: put build steps in a script, rather than inline in appveyor.yml	2019-01-07 13:40:57 +00:00
Lucas Stach	d015888efb	etnaviv: annotate variables only used in debug build Some of the status variables in the compiler are only used in asserts and thus may be unused in release builds. Annotate them accordingly to avoid 'unused but set' warnings from the compiler. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-01-07 11:51:02 +01:00
Lucas Stach	b56d903b5a	etnaviv: enable full overwrite in a few more cases Take into account the render target format when checking if the color mask affects all channels of the RT. This allows to enable full overwrite in a few cases where a non-alpha format is used. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2019-01-07 11:50:23 +01:00
Timothy Arceri	6dade5d534	nir: avoid uninitialized variable warning Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109231	2019-01-07 10:57:00 +11:00
Timothy Arceri	17fac39398	st/glsl: refactor st_link_nir() The functional change here is moving the nir_lower_io_to_scalar_early() calls inside st_nir_link_shaders() and moving the st_nir_opts() call after the call to nir_lower_io_arrays_to_elements(). This fixes a bug with the following piglit test due to the current code not cleaning up dead code after we lower arrays. This was causing an assert in the new duplicate varyings link time opt introduced in `70be9afccb`. tests/spec/glsl-1.10/execution/vsfs-unused-array-member.shader_test Moving the nir_lower_io_to_scalar_early() calls also allows us to tidy up the code a little and merge some loops. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-07 10:54:20 +11:00
Eric Anholt	8847370424	v3d: Use the core tex lowering. Even without any clever optimization on the unpack operations, this gives us a useful value for the channels read field, which we can use to avoid ldtmu instructions to the no-op register. instructions in affected programs: 890712 -> 881974 (-0.98%)	2019-01-04 15:59:59 -08:00
Eric Anholt	f217a94542	nir: Add nir_lower_tex options to lower sampler return formats. I've been doing this in the nir-to-vir and nir-to-qir backends of v3d and vc4, but nir could potentially do some useful stuff for us (like avoiding unpack/repacks) if we give it the information. v2: Skip lowering for txs/query_levels v3: Fix a crash on old-style shadow v4: Rename to tex_packing, use nir_format_unpack_sint/uint helpers, pack the enum. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-04 15:59:57 -08:00
Eric Anholt	a74f2aeb4f	nir: Allow nir_format_unpack_int/sint to unpack larger values. For V3D, I want to unpack 4-16-bit packed integers for 8 and 16-bit integer samplers. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-04 15:59:30 -08:00
Jason Ekstrand	19c608fe43	intel/blorp: Be more conservative about copying clear colors In `92eb5bbc68` we attempted to avoid copying clear colors whenever we weren't doing a resolve. However, this broke MSAA resolves because we need the clear color in the source. This patch makes blorp much more conservative such that it only avoids the clear color copy if either aux_usage == NONE or it's explicitly doing a fast-clear. Fixes: `92eb5bbc68` "intel/blorp: Only copy clear color when doing..." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107728 Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2019-01-04 17:57:43 -06:00
Eric Anholt	81b9361b68	v3d: Stop scalarizing our uniform loads. We can pull a whole vector in a single indirect load. This saves a bunch of round-trips to the TMU, instructions for setting up multiple loads, references to the UBO base in the uniforms, and apparently manages to reduce register pressure as well. instructions in affected programs: 3086665 -> 2454967 (-20.47%) uniforms in affected programs: 919581 -> 721039 (-21.59%) threads in affected programs: 1710 -> 3420 (100.00%) spills in affected programs: 596 -> 522 (-12.42%) fills in affected programs: 680 -> 562 (-17.35%) Improves 3dmmes performance by 2.29312% +/- 0.139825% (n=5)	2019-01-04 15:41:23 -08:00
Eric Anholt	f8a8de8b9a	v3d: Do UBO loads a vector at a time. In the process of adding support for SSBOs and CS shared vars, I ended up needing a helper function for doing TMU general ops. This helper can be that starting point, and saves us a bunch of round-trips to the TMU by loading a vector at a time.	2019-01-04 15:41:23 -08:00
Eric Anholt	b0e0086257	v3d: Remove dead switch cases and comments from v3d_nir_lower_io. Moving things to NIR left this mess around. All we lower now is uniforms.	2019-01-04 15:41:23 -08:00
Eric Anholt	f8e6b364b0	v3d: Fix up VS output setup during precompiles. I noticed that a VS I was debugging was missing all of its output stores -- outputs_written was for POS, VAR0, VAR3, while the shader's variables were POS, VAR9, and VAR12. I'm not sure what outputs_written is supposed to be doing here, but we can just walk the declared variables and avoid both this bug and the emission of extra stvpms for less-than-vec4 varyings.	2019-01-04 15:41:23 -08:00
Eric Anholt	e1385e879d	v3d: Reinstate the new shader-db output after v3d_compile() refactor. I misplaced it in the rebase conflicts.	2019-01-04 15:26:19 -08:00
Caio Marcelo de Oliveira Filho	bbf9ee9b18	nir: remove dead code from copy_prop_vars When copy_prop_vars also took care of dead write handling, intrin was used as part of store_to_entry. Now it isn't, so this assignment isn't used really used. Add a comment clarifying what happens to intrin. Fixes: `4dfa7adc10` "nir: Remove handling of dead writes from copy_prop_vars" Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-04 15:18:41 -08:00
Lionel Landwerlin	31e4c9ce40	i965: add CS stall on VF invalidation workaround Even with the previous commit, hangs are still happening. The problem there is that the VF cache invalidate do happen immediately without waiting for previous rendering to complete. What happens is that we invalidate the cache the moment the PIPE_CONTROL is parsed but we still have old rendering in the pipe which continues to pull data into the cache with the old high address bits. The later rendering with the new high address bits then doesn't have the clean cache that it expects/needs. v2: Update commit message/explanation with Jason's Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Fixes: `a363bb2cd0` ("i965: Allocate VMA in userspace for full-PPGTT systems.") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109072	2019-01-04 11:18:54 +00:00
Lionel Landwerlin	92b7407090	i965: include draw_params/derived_draw_params for VF cache workaround These buffers are using VB slots and should be included in the workaround decision. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Fixes: `a363bb2cd0` ("i965: Allocate VMA in userspace for full-PPGTT systems.") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109072	2019-01-04 11:18:54 +00:00
Lionel Landwerlin	da634a4acb	intel/blorp: emit VF caching workaround before 3DSTATE_VERTEX_BUFFERS Probably no difference but it's nice to have i965 & blorp emit things in the same order. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-01-04 11:18:51 +00:00
Lionel Landwerlin	e5ed217545	i965: limit VF caching workaround to gen8/9/10 Documentation of the 3DSTATE_VERTEX_BUFFERS packet says this is only needed before ICL. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2019-01-04 11:18:48 +00:00
Andres Gomez	f0312cfa93	glsl/linker: complete documentation for assign_attribute_or_color_locations Commit `27f1298b9d` ("glsl/linker: validate attribute aliasing before optimizations") forgot to complete the documentation. Cc: Tapani Pälli <tapani.palli@intel.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2019-01-04 09:04:31 +02:00
Gurchetan Singh	6b7aea9d85	virgl: remove empty file Fixes: 174f53 ("virgl: consolidate transfer code") Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2019-01-03 20:59:29 +01:00
Gurchetan Singh	ca66457b05	virgl: don't flush an empty range Otherwise, the gl-1.0-long-dlist Piglit test crashes. Fixes: db7757 ("virgl: modify how we handle GL_MAP_FLUSH_EXPLICIT_BIT") Reported by airlied@ v2: Exit on any invalid range (Erik) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109190 Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Tested-by: Jakob Bornecrantz <jakob@collabora.com>	2019-01-03 20:59:29 +01:00
Eric Engestrom	393a756e6a	docs: advertise distro-provided meson cross-files Hopefully we can kick start the revolution and other distros will start providing them as well :) Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-01-03 18:53:21 +00:00
Eric Engestrom	8b363bc42e	docs: fix the meson aarch64 cross-file `gcc-ar` is preferred over the generic `ar`, and the `arm` family is for 32-bit ARM [1]. [1] https://mesonbuild.com/Reference-tables.html#cpu-families Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2019-01-03 18:53:21 +00:00
Jakob Bornecrantz	6a9be6fc0c	virgl/vtest: Use default socket name from protocol header No functional change as the socket name is the same, just removing the double definition of the path. Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org> Signed-off-by: Jakob Bornecrantz <jakob@collabora.com>	2019-01-03 15:50:38 +00:00
Rob Clark	e869481ef3	freedreno: fix staging resource size for arrays A 2d-array texture (for example), should get the # of array elements from box->depth, rather than depth0 which is minified. Fixes dEQP-GLES3.functional.shaders.texture_functions.texture.sampler2darray_bias_float_fragment with tiled textures. Reported-by: Kristian H. Kristensen <hoegsberg@chromium.org> Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-01-03 08:11:40 -05:00
Rob Clark	67a7f6f244	freedreno: remove blit_via_copy_region() If we hit the memcpy() path for copy_region(), that will try to do a transfer_map(), which goes badly for blits to/from staging triggered by transfer_map() or transfer_unmap(). We could possibly add fd_blit2() which has allow_transfer_map param, and call that for staging blits. But I'm not really sure if trying the blit via copy_region() is very useful. At least for newer gens that implement fd_context::blit(), it probably isn't. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-01-03 08:10:32 -05:00
Rob Clark	2fc17e16a3	freedreno/a6xx: rework blitter API Switch over to using fd_context::blit(), in the same way that a5xx does. The previous patch wires fd_resource_copy_region() up to the blitter so a6xx no longer needs to bypass the core layer to accelerate this. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-01-03 08:10:23 -05:00
Rob Clark	53b8eb78d5	freedreno: try blitter for fd_resource_copy_region() Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-01-03 08:10:16 -05:00
Rob Clark	228eddd7ee	freedreno: rework blit API First step to unify the way fd5 and fd6 blitter works. Currently a6xx bypasses the blit API in order to also accelerate resource_copy_region() But this approach can lead to infinite recursion: #0 fd_alloc_staging (ctx=0x5555936480, rsc=0x7fac485f90, level=0, box=0x7fbab29220) at ../src/gallium/drivers/freedreno/freedreno_resource.c:291 #1 0x0000007fbdebed04 in fd_resource_transfer_map (pctx=0x5555936480, prsc=0x7fac485f90, level=0, usage=258, box=0x7fbab29220, pptrans=0x7fbab29240) at ../src/gallium/drivers/freedreno/freedreno_resource.c:479 #2 0x0000007fbe5c5068 in u_transfer_helper_transfer_map (pctx=0x5555936480, prsc=0x7fac485f90, level=0, usage=258, box=0x7fbab29220, pptrans=0x7fbab29240) at ../src/gallium/auxiliary/util/u_transfer_helper.c:243 #3 0x0000007fbde2dcb8 in util_resource_copy_region (pipe=0x5555936480, dst=0x7fac485f90, dst_level=0, dst_x=0, dst_y=0, dst_z=0, src=0x7fac47c780, src_level=0, src_box_in=0x7fbab2945c) at ../src/gallium/auxiliary/util/u_surface.c:350 #4 0x0000007fbdf2282c in fd_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47c780, src_level=0, src_box=0x7fbab2945c) at ../src/gallium/drivers/freedreno/freedreno_blitter.c:173 #5 0x0000007fbdf085d4 in fd6_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47c780, src_level=0, src_box=0x7fbab2945c) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:587 #6 0x0000007fbde2f3d0 in util_try_blit_via_copy_region (ctx=0x5555936480, blit=0x7fbab29430) at ../src/gallium/auxiliary/util/u_surface.c:864 #7 0x0000007fbdec02c4 in fd_blit (pctx=0x5555936480, blit_info=0x7fbab29588) at ../src/gallium/drivers/freedreno/freedreno_resource.c:993 #8 0x0000007fbdf08408 in fd6_blit (pctx=0x5555936480, info=0x7fbab29588) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:546 #9 0x0000007fbdebdc74 in do_blit (ctx=0x5555936480, blit=0x7fbab29588, fallback=false) at ../src/gallium/drivers/freedreno/freedreno_resource.c:129 #10 0x0000007fbdebe58c in fd_blit_from_staging (ctx=0x5555936480, trans=0x7fac47b7e8) at ../src/gallium/drivers/freedreno/freedreno_resource.c:326 #11 0x0000007fbdebea38 in fd_resource_transfer_unmap (pctx=0x5555936480, ptrans=0x7fac47b7e8) at ../src/gallium/drivers/freedreno/freedreno_resource.c:416 #12 0x0000007fbe5c5c68 in u_transfer_helper_transfer_unmap (pctx=0x5555936480, ptrans=0x7fac47b7e8) at ../src/gallium/auxiliary/util/u_transfer_helper.c:516 #13 0x0000007fbde2de24 in util_resource_copy_region (pipe=0x5555936480, dst=0x7fac485f90, dst_level=0, dst_x=0, dst_y=0, dst_z=0, src=0x7fac47b8e0, src_level=0, src_box_in=0x7fbab2997c) at ../src/gallium/auxiliary/util/u_surface.c:376 #14 0x0000007fbdf2282c in fd_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47b8e0, src_level=0, src_box=0x7fbab2997c) at ../src/gallium/drivers/freedreno/freedreno_blitter.c:173 #15 0x0000007fbdf085d4 in fd6_resource_copy_region (pctx=0x5555936480, dst=0x7fac485f90, dst_level=0, dstx=0, dsty=0, dstz=0, src=0x7fac47b8e0, src_level=0, src_box=0x7fbab2997c) at ../src/gallium/drivers/freedreno/a6xx/fd6_blitter.c:587 ... Instead rework the API to push the fallback back to core code, so that we can rework resource_copy_region() to have it's own fallback path, and then finally convert fd6 over to work in the same way. This also makes ctx->blit() optional, and cleans up some unnecessary callers. Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-01-03 08:09:52 -05:00
Rob Clark	f1c88336e6	freedreno: skip depth resolve if not written For multi-pass rendering, it is common to keep the same depth buffer from previous pass, to discard geometry that would be hidden by later draws. In the later passes with depth-test enabled, but depth-write disabled, there is no reason to do gmem2mem resolve. TODO probably do something similar for stencil.. although stencil buffer isn't used as commonly these days Signed-off-by: Rob Clark <robdclark@gmail.com>	2019-01-03 08:09:24 -05:00
Timothy Arceri	4d3f6cb973	nir: merge some basic consecutive ifs After trying multiple times to merge if-statements with phis between them I've come to the conclusion that it cannot be done without regressions. The problem is for some shaders we end up with a whole bunch of phis for the merged ifs resulting in increased register pressure. So this patch just merges ifs that have no phis between them. This seems to be consistent with what LLVM does so for radeonsi we only see a change (although its a large change) in a single shader. Shader-db results i965 (SKL): total instructions in shared programs: 13098176 -> 13098152 (<.01%) instructions in affected programs: 1326 -> 1302 (-1.81%) helped: 4 HURT: 0 total cycles in shared programs: 332032989 -> 332037583 (<.01%) cycles in affected programs: 60665 -> 65259 (7.57%) helped: 0 HURT: 4 The cycles estimates reported by shader-db for i965 seem inaccurate as the only difference in the final code is the removal of the redundent condition evaluations and jumps. Also the biggest code reduction (~7%) for radeonsi was in a tomb raider tressfx shader but for some reason this does not get merged for i965. Shader-db results radeonsi (VEGA): Totals from affected shaders: SGPRS: 232 -> 232 (0.00 %) VGPRS: 164 -> 164 (0.00 %) Spilled SGPRs: 59 -> 59 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 14584 -> 13520 (-7.30 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 13 -> 13 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-01-03 15:17:16 +11:00
Timothy Arceri	19cafe8084	nir: add rewrite_phi_predecessor_blocks() helper This will also be used by the if merge pass in the following commit. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2019-01-03 15:17:16 +11:00
Timothy Arceri	5122fbc4ba	nir: simplify does_varying_match() Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-03 11:47:56 +11:00
Timothy Arceri	8d05ee2005	nir: make use of does_varying_match() helper Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-03 11:47:56 +11:00
Timothy Arceri	0016166d19	nir: make nir_opt_remove_phis_impl() static Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-01-03 11:47:56 +11:00
Eric Anholt	d2b899c0ec	v3d: Refactor compiler entrypoints. Before, I had per-stage entryoints with some helpers shared between them. As I extended for compute shaders and shader-db, it turned out that the other common code in the middle wanted to be shared too.	2019-01-02 14:12:29 -08:00
Eric Anholt	0805060573	v3d: Handle dynamically uniform IF statements with uniform control flow. Loops will be trickier, since we need some analysis to figure out if the breaks/continues inside are uniform. Until we get that in NIR, this gets us some quick wins. total instructions in shared programs: 6192844 -> 6174162 (-0.30%) instructions in affected programs: 487781 -> 469099 (-3.83%)	2019-01-02 14:12:29 -08:00
Eric Anholt	5e9ee6e841	v3d: Fold comparisons for IF conditions into the flags for the IF. total instructions in shared programs: 6193810 -> 6192844 (-0.02%) instructions in affected programs: 800373 -> 799407 (-0.12%)	2019-01-02 14:12:29 -08:00
Eric Anholt	078dc176bc	v3d: Don't try to fold non-SSA-src comparisons into bcsels. There could have been a write of a src in between the comparison and the bcsel that would invalidate the comparison.	2019-01-02 14:12:29 -08:00
Eric Anholt	2e0433b687	v3d: Move the "Find the ALU instruction generating our bool" out of bcsel. This will be reused for if statements.	2019-01-02 14:12:29 -08:00
Eric Anholt	c3ae0aa264	v3d: Simplify the emission of comparisons for the bcsel optimization. I wanted to reuse the comparison stuff for nir_ifs, but for that I just want the flags and no destination value. Splitting the conditions from the destinations ended up cleaning the existing code up, anyway.	2019-01-02 14:12:29 -08:00
Eric Anholt	49d8e2aff1	v3d: Don't forget to include RT writes in precompiles. Looking at some assembly dumps for an optimization, we were clearly missing important parts of the shader!	2019-01-02 14:12:29 -08:00
Eric Anholt	3a81c753a3	v3d: Fix segfault when failing to compile a program. We'll still fail at draw time, but this avoids a regression in shader-db execution once I enable TLB writes in precompiles. Fixes: `b38e4d313f` ("v3d: Create a state uploader for packing our shaders together.")	2019-01-02 14:12:29 -08:00
Marek Olšák	3ae57957be	radeonsi: always unmap texture CPU mappings on 32-bit CPU architectures Team Fortress 2 32-bit version runs out of the CPU address space. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2019-01-02 15:01:59 -05:00
Marek Olšák	edfca1f8dc	radeonsi: remove unused variables in si_insert_input_ptr Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2019-01-02 15:01:58 -05:00
Marek Olšák	cba475b3e7	radeonsi: use u_decomposed_prims_for_vertices instead of u_prims_for_vertices It seems to be the same, but this doesn't use integer division with a variable divisor. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2019-01-02 15:01:56 -05:00
Marek Olšák	54bc87469a	radeonsi: make si_cp_wait_mem more configurable Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2019-01-02 15:01:54 -05:00
Marek Olšák	9d2c3a1fe0	radeonsi: call si_fix_resource_usage for the GS copy shader as well Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2019-01-02 15:01:53 -05:00
Marek Olšák	d28e208213	radeonsi: don't emit redundant PKT3_NUM_INSTANCES packets Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2019-01-02 15:01:50 -05:00
Caio Marcelo de Oliveira Filho	7d6babf995	nir: add a way to print the deref chain Makes debugging easier when we care about the deref chain and not the deref instruction itself. To make it take a const pointer, constify some of the static functions in nir_print.c. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-02 10:09:04 -08:00
Dylan Baker	a2596450ac	meson: Error out if building nouveau and using LLVM without rtti Nouveau requires rtti. Often LLVM is configured without rtti, and code with and without cannot be linked safely. Lets just error out if nouveau is requested and llvm is built without rtti. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109202 Fixes: `c5a97d658e` ("meson: fix builds against LLVM built without rtti") Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2019-01-02 09:30:12 -08:00
Alexander von Gluck IV	1b97a72328	egl/haiku: Fix reference to disp vs dpy Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Fixes: `00992700c9` "egl: set the EGLDevice when creating a display"	2019-01-02 13:45:09 +00:00
Iago Toral Quiroga	ec79069856	compiler/spirv: use 32-bit polynomial approximation for 16-bit asin() The 16-bit polynomial execution doesn't meet Khronos precision requirements. Also, the half-float denorm range starts at 2^(-14) and with asin taking input values in the range [0, 1], polynomial approximations can lead to flushing relatively easy. An alternative is to use the atan2 formula to compute asin, which is the reference taken by Khronos to determine precision requirements, but that ends up generating too many additional instructions when compared to the polynomial approximation. Specifically, for the Intel case, doing this adds +41 instructions to the program for each asin/acos call, which looks like an undesirable trade off. So for now we take the easy way out and fallback to using the 32-bit polynomial approximation, which is better (faster) than the 16-bit atan2 implementation and gives us better precision that matches Khronos requirements. v2: - Fallback to 32-bit using recursion (Jason). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-02 07:54:39 +01:00
Iago Toral Quiroga	fda3f6d424	compiler/spirv: implement 16-bit frexp Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-02 07:54:35 +01:00
Iago Toral Quiroga	7d3c34197a	compiler/spirv: implement 16-bit hyperbolic trigonometric functions v2: - use nir_fadd_imm and nir_fmul_imm helpers (Jason) v3: - since we need to define one for fsub use it for fdiv too (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-02 07:54:05 +01:00
Iago Toral Quiroga	88663ba67c	compiler/spirv: implement 16-bit exp and log v2 - use nir_fmul_imm helper (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-02 07:54:05 +01:00
Iago Toral Quiroga	f18554e2ce	compiler/spirv: implement 16-bit atan2 v2: - fix huge_val for 16-bit, it was mean't to be 2^14 not 10^14. v3: - rebase on top of new bool sized opcodes - use nir_b2f helper - use nir_fmul_imm helper Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-02 07:54:05 +01:00
Iago Toral Quiroga	1c8de08ec9	compiler/spirv: implement 16-bit atan v2: - use nir_fadd_imm and nir_fmul_imm helpers (Jason) - rebased on top of new sized boolean opcodes - use nir_b2f helper Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-02 07:54:05 +01:00
Iago Toral Quiroga	df118535ca	compiler/spirv: implement 16-bit acos Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-02 07:54:05 +01:00
Iago Toral Quiroga	dbbbe24d76	compiler/spirv: implement 16-bit asin v2: - use nir_fmul_imm and nir_fadd_imm helpers (Jason) v3: - missed one case where we need to replace nir_imm_float with nir_imm_floatN_t (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-02 07:54:05 +01:00
Iago Toral Quiroga	95b7c29c2c	compiler/spirv: handle 16-bit float in radians() and degrees() v2: - use nir_imm_fmul helper (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-02 07:54:05 +01:00
Iago Toral Quiroga	aeee683780	compiler/nir: add nir_fadd_imm() and nir_fmul_imm() helpers Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-02 07:54:05 +01:00
Iago Toral Quiroga	5fc9ad1cb0	compiler/nir: add a nir_b2f() helper Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-01-02 07:54:05 +01:00
Timothy Arceri	70be9afccb	nir: link time opt duplicate varyings If we are outputting the same value to more than one output component rewrite the inputs to read from a single component. This will allow the duplicate varying components to be optimised away by the existing opts. shader-db results i965 (SKL): total instructions in shared programs: 12869230 -> 12860886 (-0.06%) instructions in affected programs: 322601 -> 314257 (-2.59%) helped: 3080 HURT: 8 total cycles in shared programs: 317792574 -> 317730593 (-0.02%) cycles in affected programs: 2584925 -> 2522944 (-2.40%) helped: 2975 HURT: 477 shader-db results radeonsi (VEGA): SGPRS: 31576 -> 31664 (0.28 %) VGPRS: 17484 -> 17064 (-2.40 %) Spilled SGPRs: 184 -> 167 (-9.24 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 583340 -> 569368 (-2.40 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 6162 -> 6270 (1.75 %) Wait states: 0 -> 0 (0.00 %) vkpipeline-db results RADV (VEGA): Totals from affected shaders: SGPRS: 14880 -> 15080 (1.34 %) VGPRS: 10872 -> 10888 (0.15 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 674016 -> 668396 (-0.83 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 2708 -> 2704 (-0.15 %) Wait states: 0 -> 0 (0.00 % V2: bunch of tidy ups suggested by Jason Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-02 12:19:17 +11:00
Timothy Arceri	d828694b80	nir: rework nir_link_opt_varyings() This just cleans things up a little and make things more safe for derefs. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-02 12:19:17 +11:00
Timothy Arceri	c0aba8b0dc	nir: add can_replace_varying() helper This will be reused by the following patch. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-02 12:19:17 +11:00
Timothy Arceri	50de3f80a8	nir: rename nir_link_constant_varyings() nir_link_opt_varyings() The following patches will add support for an additional optimisation so this function will no longer just optimise varying constants. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-02 12:19:17 +11:00
Timothy Arceri	0a4378ce56	st/glsl_to_nir: call nir_lower_load_const_to_scalar() in the st This will help the new opt introduced in the following patches allowing us to remove extra duplicate varyings. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-01-02 12:19:17 +11:00
Timothy Arceri	2ef0f944f5	radeonsi: make use of ac_are_tessfactors_def_in_all_invocs() Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-01-02 10:01:31 +11:00
Timothy Arceri	2832bc972b	ac/nir_to_llvm: add ac_are_tessfactors_def_in_all_invocs() The following patch will use this with the radeonsi NIR backend but I've added it to ac so we can use it with RADV in future. This is a NIR implementation of the tgsi function tgsi_scan_tess_ctrl(). Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-01-02 10:01:24 +11:00
Timothy Arceri	2817a4ec0b	radeonsi: remove unrequired param in si_nir_scan_tess_ctrl() Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-01-02 10:01:15 +11:00
Timothy Arceri	4dda445750	tgsi/scan: correctly walk instructions in tgsi_scan_tess_ctrl() The previous code used a do while loop and continues after walking a nested loop/if-statement. This means we end up evaluating the last instruction from the nested block against the while condition and potentially exit early if it matches the exit condition of the outer block. Fixes: `386d165d8d` ("tgsi/scan: add a new pass that analyzes tess factor writes") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-01-02 09:53:01 +11:00
Timothy Arceri	dd061eb044	tgsi/scan: fix loop exit point in tgsi_scan_tess_ctrl() This just happened not to crash/assert because all loops have at least 1 if-statement and due to a second bug we end up matching the same ENDIF to exit both the iteration over the if-statment and the loop. The second bug is fixed in the following patch. Fixes: `386d165d8d` ("tgsi/scan: add a new pass that analyzes tess factor writes") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2019-01-02 09:53:01 +11:00
Ilia Mirkin	8f98ff362c	nv30: disable rendering to 3D textures There's no way to tell the 3D engine about swizzling on such textures. While rendering to NPOT ones may be possible, there's no great way to expose that in gallium, nor would there be any practical benefit. Fixes the non-compressed-format "copyteximage 3D" failures. Something odd going on with the compressed formats. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2019-01-01 15:11:14 -05:00
Bas Nieuwenhuizen	8c93ef5de9	radv: Do a cache flush if needed before reading predicates. This caused random failures for two conditional rendering tests: dEQP-VK.conditional_rendering.draw_clear.draw.update_with_rendering_discard dEQP-VK.conditional_rendering.draw_clear.draw.update_with_rendering_no_discard These wrote the predicate with the vertex shader, did a barrier and then started the conditional rendering. However the cache flushes for the barrier only happen on first draw, so after the predicate has been read. Fixes: `e45ba51ea4` "radv: add support for VK_EXT_conditional_rendering" Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-12-31 20:52:08 +01:00
Erik Faye-Lund	86089a7316	anv/autotools: make sure tests link with -msse2 Without this, I get the following error when building the tests with autotools on i686: ---8<--- src/intel/common/gen_clflush.h: In function ‘gen_clflush_range’: src/intel/common/gen_clflush.h:37:7: warning: implicit declaration of function ‘__builtin_ia32_clflush’; did you mean ‘__builtin_ia32_pause’? [-Wimplicit-function-declaration] __builtin_ia32_clflush(p); ^~~~~~~~~~~~~~~~~~~~~~ __builtin_ia32_pause src/intel/common/gen_clflush.h: In function ‘gen_flush_range’: src/intel/common/gen_clflush.h:45:4: warning: implicit declaration of function ‘__builtin_ia32_mfence’; did you mean ‘__builtin_ia32_fnclex’? [-Wimplicit-function-declaration] __builtin_ia32_mfence(); ^~~~~~~~~~~~~~~~~~~~~ __builtin_ia32_fnclex ---8<--- The erros are generated for each of these files: - mesa/src/intel/vulkan/tests/state_pool_no_free.c - mesa/src/intel/vulkan/tests/state_pool.c - mesa/src/intel/vulkan/tests/block_pool_no_free.c - mesa/src/intel/vulkan/tests/state_pool_free_list_only.c This is obviously because gen_clflush.h contains code that uses intrinsics that are only available with SSE3. Since the driver already uses SSE3, it seems reasonable to add this to the tests as well. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Acked-by: Eric Engeström <eric@engestrom.ch>	2018-12-31 17:28:21 +01:00
Erik Faye-Lund	89679e18a9	anv/meson: make sure tests link with -msse2 Without this, I get the following error when building the tests using meson on i686: ---8<--- In file included from ../../../mesa/src/intel/vulkan/anv_private.h:46, from ../../../mesa/src/intel/vulkan/tests/state_pool_no_free.c:26: ../../../mesa/src/intel/common/gen_clflush.h: In function ‘gen_clflush_range’: ../../../mesa/src/intel/common/gen_clflush.h:37:7: error: implicit declaration of function ‘__builtin_ia32_clflush’; did you mean ‘__builtin_ia32_pause’? [-Werror=implicit-function-declaration] __builtin_ia32_clflush(p); ^~~~~~~~~~~~~~~~~~~~~~ __builtin_ia32_pause ../../../mesa/src/intel/common/gen_clflush.h: In function ‘gen_flush_range’: ../../../mesa/src/intel/common/gen_clflush.h:45:4: error: implicit declaration of function ‘__builtin_ia32_mfence’; did you mean ‘__builtin_ia32_fnclex’? [-Werror=implicit-function-declaration] __builtin_ia32_mfence(); ^~~~~~~~~~~~~~~~~~~~~ __builtin_ia32_fnclex ---8<--- The errors are generated for each of these files: - mesa/src/intel/vulkan/tests/state_pool_no_free.c - mesa/src/intel/vulkan/tests/state_pool.c - mesa/src/intel/vulkan/tests/block_pool_no_free.c - mesa/src/intel/vulkan/tests/state_pool_free_list_only.c This is obviously because gen_clflush.h contains code that uses intrinsics that are only available with SSE3. Since the driver already uses SSE3, it seems reasonable to add this to the tests as well. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Eric Engeström <eric@engestrom.ch>	2018-12-31 17:27:33 +01:00
Ilia Mirkin	207fb558e4	nv30: fix some s3tc layout issues s3tc layouts are a bit finicky - they're packed, but not swizzled. Adjust logic to allow for that case: - Don't set a uniform pitch for POT-sized compressed textures - Adjust define_rect API to be less confused about block sizes - Only mark a texture as linear if it has a uniform pitch set This has been tested to fix xonotic (as well as the s3tc-* piglits) on nv3x and keeps it working on nv4x. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-12-30 23:32:21 -05:00
Ilia Mirkin	ad251330e8	nv30: use correct helper to get blocks in y direction This doesn't matter since all compressed formats supported by this hardware use square blocks, but best to use the correct helper. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-12-30 23:32:21 -05:00
Ilia Mirkin	b04c1907c8	nv30: add support for multi-layer transfers This logic mirrors what we do on nv50. The relatively new texture_subdata callback can cause this to happen with 3D textures, which is triggered at least by xonotic, and probably many piglits. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-12-30 23:32:21 -05:00
Ilia Mirkin	b34cfd4749	nv30: fix rare issue with fp unbinding not finding the bufctx If the last-active context gets deleted, the pushbuf doesn't have a bufctx to reference. Then there could be a sequence of binds which would trigger a reset on that bin before validation was done. Instead we just pass in the bufctx in question directly. All other instances of PUSH_RESET happen strictly after a validation is run. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102349 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-12-30 19:44:43 -05:00
Ilia Mirkin	ef3eac9545	nv30: avoid setting user_priv without setting cur_ctx The whole user_priv thing is a mess, but as long as it's there, it basically has to map 1:1 to the cur_ctx. Unfortunately we were setting user_priv to some context, then that context could get deleted without any draws/validations in it, leading user_priv to become NULL, with cur_ctx still pointing at some old context. Then we wouldn't run the switch logic, which in turn led to a NULL bufctx being dereferenced. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102349 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-12-30 19:44:43 -05:00
Eric Anholt	ad1e59cf8d	v3d: Add support for gl_HelperInvocation. We can just look at the MSF flags -- if they're unset, then we're definitely in a helper invocation. Fixes dEQP-GLES31.functional.shaders.helper_invocation.* with GLES3.1 enabled.	2018-12-30 08:05:11 -08:00
Eric Anholt	20021e3473	v3d: Add support for textureSize() on MSAA textures. Fixes failures in dEQP-GLES31.functional.shaders.builtin_functions.texture_size.samples_1_texture_2d in the GLES3.1 suite.	2018-12-30 08:05:11 -08:00
Eric Anholt	f695d62fe5	v3d: Add support for requesting the sample offsets.	2018-12-30 08:05:11 -08:00
Eric Anholt	906fca1b4b	v3d: Add support for non-constant texture offsets. Fixes dEQP-GLES31.functional.texture.gather.offset_dynamic.min_required_offset.2d.rgba8.size_pot.clamp_to_edge_repeat and others.	2018-12-30 08:05:11 -08:00
Eric Anholt	47caefc7b4	v3d: Force sampling from base level for tg4. This is what the GLSL ES 310 spec tells us to do, but apparently the "gather mode" flag doesn't imply it in the HW. Fixes dEQP-GLES31.functional.texture.gather.basic.2d.rgba8.filter_mode.min_nearest_mipmap_linear_mag_linear	2018-12-30 08:05:11 -08:00
Eric Anholt	f9bdce9966	v3d: Add a note for a potential performance win on multop/umul24. Noticed while debugging a testcase.	2018-12-30 08:05:11 -08:00
Eric Anholt	b36757448d	v3d: Dead-code eliminate unused flags updates. The greedy comparison folding in bcsel means that we may have left the original bool-generating NIR ALU instruction dead, but DCE wasn't eliminating the VIR code for it because of the flags updates. total instructions in shared programs: 5186024 -> 5100894 (-1.64%) instructions in affected programs: 1448695 -> 1363565 (-5.88%)	2018-12-30 08:05:11 -08:00
Eric Anholt	20e3526298	v3d: Don't generate temps for comparisons. This was just generated work for vir_opt_dead_code and cluttered up the dumps.	2018-12-30 08:04:54 -08:00
Eric Anholt	ebde5afb93	v3d: Move "does this instruction have flags" from sched to generic helpers. I wanted to reuse it for DCE of flags updates.	2018-12-30 08:03:51 -08:00
Eric Anholt	39b1112189	v3d: Drop incorrect dependency for flpop. It is just shifting probably-means-flags bits out of a value, it doesn't actually update the flags on its own.	2018-12-30 08:03:51 -08:00
Eric Anholt	a7c9fd7573	v3d: Drop unused count_nir_instrs() helper. This was for shader-db, but I haven't cared about NIR instruction counts in a long time.	2018-12-30 08:03:51 -08:00
Eric Anholt	696f63f1b4	v3d: Hook up some shader-db output to GL_ARB_debug_output. This allows the original shader-db project's run.c runner to parse things easily, and is probably a good thing to have for GL_ARB_debug_output in general. I formatted it more like Intel's so I can mostly reuse their report script.	2018-12-30 08:03:51 -08:00
Eric Anholt	87b251a940	v3d: Add a "precompile" debug flag for shader-db. I've been using my apitrace-based shader-db so far, but it's slow (apitrace decompression), intrusive (apitrace windows spamming the screen), and doesn't have much coverage. The original shader-db provides a lot more coverage and compiles faster, at the expense of not having the actual runtime variant key. As v3d has a lot less runtime variation than vc4 did, this tradeoff makes more sense.	2018-12-29 13:52:09 -08:00
Eric Anholt	9ec6a3d621	v3d: Fix uniform pretty printing assertion failure with branches. Fixes: `248a7fb392` ("v3d: Do uniform pretty-printing in the QPU dump.")	2018-12-29 13:52:09 -08:00
Dylan Baker	133a5b8383	meson: Override C++ standard to gnu++11 when building with altivec on ppc64 Otherwise there will be symbol collisions for the vector name. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108943 Distro Bug: https://bugs.gentoo.org/673622 Fixes: `42ea0631f1` ("meson: build clover") Acked-by: Matt Turner <mattst88@gmail.com>	2018-12-28 11:04:57 -08:00
Lionel Landwerlin	f7bccf6ab4	intel/aub_viewer: highlight true booleans Useful to spot PIPE_CONTROL flags. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-12-28 16:48:46 +00:00
Lionel Landwerlin	6ba61ea391	intel/aub_viewer: fold binding/sampler table items Makes things easier to read rather than a long block of text. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-12-28 16:48:43 +00:00
Lionel Landwerlin	7ab8c80625	intel/aub_viewer: fix shader view Not decoding the shader at the right offset. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-12-28 16:48:40 +00:00
Lionel Landwerlin	f3ed4a058d	intel/aub_viewer: print address of missing shader Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-12-28 16:48:21 +00:00
Lionel Landwerlin	0382e11989	intel/aub_viewer: fixup 0x address prefix Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-12-28 16:48:18 +00:00
Lionel Landwerlin	8e2fda411a	intel/aub_viewer: fix shader get_bo Instruction addresses are always in ppgtt space. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-12-28 16:48:08 +00:00
Nicholas Kazlauskas	e260493f2a	radeonsi: Enable adaptive_sync by default for radeon It's better to let most applications make use of adaptive sync by default. Problematic applications can be placed on the blacklist or the user can manually disable the feature. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>	2018-12-28 17:08:14 +01:00
Nicholas Kazlauskas	2e12fe425f	loader/dri3: Enable adaptive_sync via _VARIABLE_REFRESH property The DDX driver can be notified of adaptive sync suitability by flagging the application's window with the _VARIABLE_REFRESH property. This property is set on the first swap the application performs when adaptive_sync is set to true in the drirc. It's performed here instead of when the loader is initialized for two reasons: (1) The window's drawable can be missing during loader init. This can be observed during the Unigine Superposition benchmark. (2) Adaptive sync will only be enabled closer to when the application actually begins rendering. If adaptive_sync is false then the _VARIABLE_REFRESH property is deleted on loader init. The property is only managed on the glx DRI3 backend for now. This should cover most common applications and games on modern hardware. Vulkan support can be implemented in a similar manner but would likely require splitting the function out into a common helper function. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>	2018-12-28 16:44:47 +01:00
Nicholas Kazlauskas	a9c36dbf9c	drirc: Initial blacklist for adaptive sync Applications that don't present at a predictable rate (ie. not games) shouldn't have adapative sync enabled. This list covers some of the common desktop compositors, web browsers and video players. [ Michel Dänzer: Added entry for firefox-esr ] Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>	2018-12-28 16:44:27 +01:00
Nicholas Kazlauskas	7407670036	util: Add adaptive_sync driconf option This option lets the user decide whether mesa should notify the window manager / DDX driver that the current application is adaptive sync capable. It's off by default. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>	2018-12-28 16:38:06 +01:00
Nicholas Kazlauskas	759b940389	util: Get program name based on path when possible Some programs start with the path and command line arguments in argv[0] (program_invocation_name). Chromium is an example of an application using mesa that does this. This tries to query the real path for the symbolic link /proc/self/exe to find the program name instead. It only uses the realpath if it was a prefix of the invocation to avoid breaking wine programs. Cc: Timothy Arceri <tarceri@itsqueeze.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-12-28 15:41:01 +01:00
Tomeu Vizoso	bf1dfcc3e8	etnaviv: Consolidate buffer references from framebuffers We were leaking surfaces because the references taken in etna_set_framebuffer_state weren't being released on context destroy. Instead of just directly releasing those references in etna_context_destroy, use the util_copy_framebuffer_state helper. Take the chance to remove the duplicated buffer references in compiled_framebuffer_state to avoid confusion. The leak can be reproduced with a client that continuously creates and destroys contexts. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reported-by: Sjoerd Simons <sjoerd.simons@collabora.co.uk> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2018-12-28 10:22:01 +01:00
Dave Airlie	d1ce7eba8b	virgl/vtest: fix front buffer flush with protocol version 0. Older versions of virglrenderer before 33da7361aec486290df0aec4ad8dfa8ff6adde2c in vtest mode, misrender gears. Fixes: `9d81cd8e7c` (virgl: Pass resource size and transfer offsets) Reviewed-By: Gert Wollny <gert.wollny@collabora.com>	2018-12-28 16:50:38 +10:00
Dylan Baker	6adbd9ac74	docs/autoconf: Mark autoconf as being replaced I know it's not what anyone wants, but how about we start with a message in the documentation that encourages people to try meson. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Eric Engeström <eric@engestrom.ch>	2018-12-27 09:03:20 -08:00
Dylan Baker	4c32964f49	docs/install: Update python dependency section Note that meson requires python 3, scons requires python 2, and autotools works with either. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Eric Engeström <eric@engestrom.ch>	2018-12-27 09:03:20 -08:00
Dylan Baker	a57dbe6971	docs/meson: Update LLVM section with information about native files Reviewed-by: Eric Engeström <eric@engestrom.ch>	2018-12-27 09:03:17 -08:00
Dylan Baker	40ec5fec0a	docs/install: Add meson to the main install page Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Eric Engeström <eric@engestrom.ch>	2018-12-27 09:03:07 -08:00
Juan A. Suarez Romero	fe7919acad	docs: update calendar, add news item and link release notes for 18.2.8 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2018-12-27 17:37:33 +01:00
Juan A. Suarez Romero	0d53451890	docs: add sha256 checksums for 18.2.8 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `24c31bc0e2`)	2018-12-27 17:35:04 +01:00
Juan A. Suarez Romero	008478e340	docs: add release notes for 18.2.8 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `785e09e3b3`)	2018-12-27 17:35:02 +01:00
Ilia Mirkin	2269ab8588	nv50,nvc0: add missing CAPs for unsupported features Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-12-26 20:28:07 -05:00
Ilia Mirkin	1d10bb2025	nvc0: enable GL_NV_shader_atomic_float on pre-Maxwell Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-12-26 20:04:57 -05:00
Ilia Mirkin	0dd55db10f	nv50/ir: add support for converting ATOMFADD to proper ir Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-12-26 20:04:57 -05:00
Ilia Mirkin	9867f2a1f7	st/mesa: expose GL_NV_shader_atomic_float when ATOMFADD is supported Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-26 20:04:57 -05:00
Ilia Mirkin	4d5a6a1649	st/mesa: select ATOMFADD when source type is float Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-26 20:04:57 -05:00
Ilia Mirkin	d139231b32	gallium: add PIPE_CAP_TGSI_ATOMFADD to indicate support ATOMFADD is a little special -- make drivers have to specify it explicitly. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-26 20:04:57 -05:00
Ilia Mirkin	5574414edc	tgsi: add ATOMFADD operation This is supported by at least NVIDIA hardware, and exposeable via GL extensions. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-26 20:04:57 -05:00
Ilia Mirkin	bac8534267	st/mesa: allow glDrawElements to work with GL_SELECT feedback Not sure if this ever worked, but the current logic for setting the min/max index is definitely wrong for indexed draws. While we're at it, bring in all the usual logic from the non-indirect drawing path. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109086 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-12-26 19:30:33 -05:00
Eric Anholt	7d7ecfbcbc	gallium/ttn: Fix setup of outputs_written. We need a 64-bit value, otherwise we only handle the low 32, and happen to sign-extend to claim to write all varying slots if VARYING_SLOT_VAR2 was used. Fixes: `4d0b2c7aaa` ("ttn: Update shader->info as we generate code.") Reviewed-by: Rob Clark <robdclark@gmail.com>	2018-12-26 11:42:09 -08:00
Lionel Landwerlin	e2ae5f2f0a	anv: don't do partial resolve on layer > 0 We've made the choice not to use fast clears on layer > 0 with multilayer images. This is partly because we would need to store multiple clear colors for each layer, making the existing memory layout, already including aux surfaces, fast clear color, image state, etc... even more complex. Partial resolves are the operations transfering the clear colors into the auxiliary buffers. This operation is currently implemented in Blorp by loading the clear color from the image's BO, into a shader that then samples from the auxiliary buffer and writes the color only if it isn't there already. The problem here is that because we store only one clear color for all layers and it is used for partial resolves. If you trigger a partial clear on a layer > 0, then you're likely to deal with a color that is not what you actually want. In the particular issues below, we have multiple layers, each cleared with a different color but the partial resolve just writes the wrong color into the auxiliary buffers for layers > 0. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108910 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108911 Cc: mesa-stable@lists.freedesktop.org	2018-12-24 09:42:46 +00:00
Axel Davy	c6b37e5412	st/nine: Increase the limit of cached ff shaders 100 is too small for some games, which triggers recompilations every frame. Increase to 1024. Signed-off-by: Axel Davy <davyaxel0@gmail.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-12-23 08:14:50 +01:00
Axel Davy	104681c5d5	st/nine: Add src reference to nine_context_range_upload Just like nine_context_box_upload, nine_context_range_upload should reference the src, which holds the ram source buffer. Fixes: https://github.com/iXit/Mesa-3D/issues/327 Signed-off-by: Axel Davy <davyaxel0@gmail.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Cc: mesa-stable@lists.freedesktop.org	2018-12-23 08:14:50 +01:00
Axel Davy	42d672fa6a	st/nine: Bind src not dst in nine_context_box_upload nine_context_box_upload uploads a ram buffer (from src) to a pipe_resource (dst). We already have a refcount on the pipe_resource, what needs to be protected from release is the ram buffer, thus a reference to src. Signed-off-by: Axel Davy <davyaxel0@gmail.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Cc: mesa-stable@lists.freedesktop.org	2018-12-23 08:14:50 +01:00
Axel Davy	f91f748fab	st/nine: Fix volumetexture dtor on ctor failure The dtor is called on allocation failure, thus we must check the volumes are allocated before trying to release them. Signed-off-by: Axel Davy <davyaxel0@gmail.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Cc: mesa-stable@lists.freedesktop.org	2018-12-23 08:14:50 +01:00
Axel Davy	1cc8192ad0	st/nine: Switch to presentation buffer if resize is detected This enables to match the window size on resize on all cases, as it only works currently with presentation buffers. Signed-off-by: Axel Davy <davyaxel0@gmail.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-12-23 08:14:50 +01:00
Axel Davy	c442dd7890	st/nine: Use helper to release swapchain buffers later This patch introduces a structure to release the present_handles only when they are fully released by the server, thus making "DestroyD3DWindowBuffer" actually release the buffer right away when called. Signed-off-by: Axel Davy <davyaxel0@gmail.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-12-23 08:14:50 +01:00
Rob Clark	51a44c3aac	freedreno/a6xx: fix 3d texture layout Maybe not 100% perfect, but seems to be a pretty good approximation of that. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-22 15:29:15 -05:00
Rob Clark	8f60f1381d	freedreno: update generated headers Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-22 15:28:50 -05:00
Rob Clark	be9ec158d7	freedreno/a6xx: improve setup_slices() debug msgs Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-22 15:28:24 -05:00
Rob Clark	2b497fc507	freedreno/a6xx: simplify special case for 3d layout This logic can be re-written as the two cases for 3d (ie. before/after the miplevel sizes start reducing) vs everything else. I think it is easier to read this way. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-22 15:27:57 -05:00
Rob Clark	d71a50f831	freedreno: combine fd_resource_layer_offset()/fd_resource_offset() We really only need this logic in one place. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-22 15:27:37 -05:00
Rob Clark	6667dde098	freedreno/ir3: don't treat all inputs/outputs as vec4 This was a hold-over from the early TGSI days, and mostly not needed with NIR. This avoids burning an entire 4 consecutive scalar regs for vec3 outputs, for example. Which fixes a few places that we were doing worse that we should on register usage. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-22 15:27:21 -05:00
Rob Clark	3453814622	freedreno/ir3: fix fallout of extra assert Fixes the following crash that happened after `d6110d4d` The problem happens if we first compile a "vanilla" shader with nothing lowered in NIR, which perform the final lowering passes on so->shader-> nir (including nir_lower_locals_to_regs()), and then later we have compile a shader with some lowering. The second time through we would have already done nir_lower_locals_to_regs(). Arguably this was already a bug, just one we hadn't noticed yet. Fixes: `d6110d4d54` intel/compiler: move nir_lower_bool_to_int32 before nir_lower_locals_to_regs Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-21 19:04:22 -05:00
Kenneth Graunke	626f2477ab	st/nir: Drop unused gl_program parameter in VS input handling helper. Nobody uses this, so let's drop it. This makes the helper callable from places without a gl_program. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-21 15:29:32 -08:00
Kenneth Graunke	3a78b46e59	st/nir: Gather info after applying lowering FS variant features DrawPixels lowering, for example, adds new varyings that need to be accounted for in inputs_read. The earlier info gathering at link time cannot account for this. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-21 15:29:30 -08:00
Kenneth Graunke	bcb6f19947	st/mesa: Combine the DrawPixels and Bitmap passthrough VS programs. They're now identical, so we can just compile it once. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-21 15:29:29 -08:00
Kenneth Graunke	80dd9dfe33	st/mesa: Don't open code the drawpixels vertex shader. Now that we always copy color, we can just use the util function. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-21 15:29:28 -08:00
Kenneth Graunke	ed1a356c5e	st/mesa: Drop !passColor optimization in drawpixels shaders. The glDrawPixels passthrough vertex shader copies position and texcoord vertex attributes to varying outputs. It also optionally copies a third gl_Color attribute, which sometimes is unnecessary. Until now, we've compiled separate variants of the shader, one of which does this extra copy, and the other of which doesn't. We have done this since 2007. But, the vertex shader runs for a whopping four vertices, and so the cost of a copying a single input to output is likely inconsequential. In theory, we could bind one fewer vertex element - but we always bind all three regardless. So, we don't even get that savings. This patch unifies the two, so we always copy the optional color, and save having to compile the variant. It also makes the VS input interface match up with the vertex element state without any dead (unused) input attributes. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-21 15:29:25 -08:00
Kenneth Graunke	42d31e0516	st/mesa: Drop dead 'passthrough_fs' field. Dead since 2015 (commit `5142564734`). Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-21 15:29:20 -08:00
Bas Nieuwenhuizen	bba5749484	radv: Fix wrongly positioned paren. Trivial. Fixes: `9f0bfbed11` "radv: Work around non-renderable 128bpp compressed 3d textures on GFX9."	2018-12-21 21:06:55 +01:00
Dylan Baker	1e872d1486	docs: add note about using backticks for rbs in gitlab So that gitlab will render the < and > correctly allowing the tag to be copy-n-pasted without additional formatting. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2018-12-21 17:43:56 +00:00
Alex Deucher	516160d717	pci_ids: add new VegaM pci id Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: mesa-stable@lists.freedesktop.org	2018-12-21 11:51:34 -05:00
Roland Scheidegger	171983dc89	gallivm: abort when trying to use non-existing intrinsic Whenever llvm removes an intrinsic (we're using), we're hitting segfaults due to llvm doing calls to address 0 in the jitted code instead. However, Jose figured out we can actually detect this with LLVMGetIntrinsicID(), so use this to abort, so we don't have to wonder what got broken. (Of course, someone still needs to fix the code to no longer use this intrinsic.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2018-12-21 17:37:00 +01:00
Roland Scheidegger	f3b1acff48	gallivm: don't use pavg.b intrinsic on llvm >= 6.0 This intrinsic disppeared with llvm 6.0, using it ends up in segfaults (due to llvm issuing call to NULL address in the jited shaders). Add code doing the same thing as the autoupgrade code in llvm so it can be matched and replaced back with a pavgb. While here, also improve lp_test_format, so it tests both with and without cache (as it was, it tested the cache versions only, whereas cache is actually disabled in llvmpipe, and in any case even with it enabled vertex and geometry shaders wouldn't use it). (Although at least for the unorm8 uncached fetch, the code is still quite different to what llvmpipe is using, since that would use unorm8x16 type, whereas the test code is using unorm8x4 type, hence disabling some intrinsic paths.) Fixes: `6f4083143b` ("gallivm: use llvm jit code for decoding s3tc") Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Tested-by: Michel Dänzer <michel.daenzer@amd.com>	2018-12-21 17:35:05 +01:00
Emil Velikov	a8d020c3dc	travis: meson: port gallium build combinations over This commit adds a number of build combinations: - Gallium Drivers {SWR, RadeonSI, Others) Each one has different LLVM requirements. Building SWR alone is twice as slow as all other drivers combined. - Gallium ST Clover LLVM {5,6,7} Because C++ API changes all the time. Analogous to above building Clover takes as much time as building all other ST combined. - Gallium ST Others Nouveau is used, instead of i915g since meson has explicit target tracking. Meaning that a configure error is thrown if we use i915g with say va, vdpau or others. Note: LLVM prior to 5.0 is intentionally dropped. If needed we can add that later. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-13 01:34:59 +00:00
Emil Velikov	39634f2f35	travis: meson: add explicit handling to gallium ST Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-12 13:52:20 +00:00
Emil Velikov	51318c32fe	travis: meson: explicitly control the DRI loaders Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-12 13:42:36 +00:00
Emil Velikov	e890aaabed	travis: meson: add unwind handling Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-12-12 13:33:14 +00:00
Emil Velikov	266ae2225e	travis: meson: use FOO_DRIVERS directly It makes for a shorter MESON_OPTIONS and cleaner handling. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-12 13:18:54 +00:00
Dylan Baker	31c162ad22	travis: meson: enable unit tests v2: [Emil] pass the argument directly to meson Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1) Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-12-11 10:34:51 -08:00
Dylan Baker	116f0fb216	travis: Don't try to read libdrm out of configure.ac Since we're going to delete it shortly Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-12-11 11:09:21 -08:00
Dylan Baker	ecf96413bb	travis: meson: use native files to override llvm-config This is the supported way to do this, and should be more robust and reliable. v2: [Emil] - enable backslash escapes - don't hardcode the path - pass the argument directly to meson Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1) Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-11 10:40:25 -08:00
Emil Velikov	81173fd69f	travis: printout llvm-config --version Provides quick and easy feedback. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-13 10:38:20 +00:00
Emil Velikov	de72c1fe6c	travis: meson: print the configured state Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-12 17:43:07 +00:00
Emil Velikov	7c38d7b7c8	travis: flip to distro xenial, drop sudo false The latter is the default these days and Travis will be removing sudo soonish. Flipping to xenial, allows us to remove a bunch of hacks we have. Plus it prevents us from adding new ones, to workaround what seems like a gcc/binutils bug. For example (from the upcoming meson build): FAILED: ccache c++ -o src/gallium/targets/pipe-loader/pipe_r600.so ... ... src/util/libmesa_util.a ... /usr/lib/x86_64-linux-gnu/libz.so ... src/util/libmesa_util.a(disk_cache.c.o): In function `deflate_and_write_to_disk': _build/../src/util/disk_cache.c:746: undefined reference to `deflateInit_' _build/../src/util/disk_cache.c:765: undefined reference to `deflate' ... As we can see, even though libz.so is explicitly passed after the object that requires it - the linker still fails to see the symbols. Avoid all those situations - flip the switch. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-13 11:20:41 +00:00
Emil Velikov	12187550f9	configure: add CXX11_CXXFLAGS to LLVM_CXXFLAGS Seemingly with LLVM7 and GCC 5.0, the former won't properly advertise -std=c++11 and the latter will choke. dd this temporary workaround, otherwise we'll get errors like: In file included from /usr/include/c++/5/type_traits:35:0, from /usr/lib/llvm-7/include/llvm/Support/type_traits.h:18, from /usr/lib/llvm-7/include/llvm/ADT/Optional.h:22, from /usr/lib/llvm-7/include/llvm/ADT/STLExtras.h:20, from /usr/lib/llvm-7/include/llvm/ADT/StringRef.h:13, from /usr/lib/llvm-7/include/llvm/Target/TargetMachine.h:17, from ../../../src/amd/common/ac_llvm_helper.cpp:36: /usr/include/c++/5/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support must be enabled with the -std=c++11 or -std=gnu++11 compiler options. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-13 11:56:40 +00:00
Emil Velikov	f331419f26	glx/test: meson: assorted include fixes Swap '..' with the symbolic inc_glx and add glproto as dependency. That will pull the correct include, effectively fixing the tests on macOS. Fixes: `a47c525f32` ("meson: build glx") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-12 19:24:14 +00:00
Emil Velikov	e139d7a8a3	glx: meson: wire up the dispatch-index-check test Accidentally dropped with earlier commit.! Fixes: `4ccb981673` ("meson: Use consistent style for tests") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-12 19:07:52 +00:00
Emil Velikov	b44875e2dc	glx: meson: drop includes from a link-only library When producing the final libGL.so/libGLX_mesa.so we only link the local static helper lib (libglx). Thus there's no reason for the includes. Fixes: `a47c525f32` ("meson: build glx") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-12 17:55:08 +00:00
Emil Velikov	9527f9ea26	TODO: glx: meson: build dri based glx tests, only with -Dglx=dri The library itself (libGL) is only built when -Dglx=dri, yet it's accompanying tests are build even with -Dglx=xlib. Adjust the guards, so we don't build the tests when they are not applicable v2: - Reword commit message (Dylan) - Drop build_by_default hunk (Dylan) Fixes: `a47c525f32` ("meson: build glx") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-12 17:47:36 +00:00
Emil Velikov	2eedb79e1a	pipe-loader: meson: reference correct library The library is called libgalliumvl_stub - note singular. Fixes: `42ea0631f1` ("meson: build clover") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-13 04:10:50 +00:00
Emil Velikov	9d10581897	meson: don't require glx/egl/gbm with gallium drivers The gallium drivers do not require a DRI loader. Drop the artificial and unnecessary restriction. Fixes: `af9d276134` ("meson: build libmesa_gallium") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-13 03:54:03 +00:00
Emil Velikov	e0dbfc9953	bin/get-pick-list.sh: warn when commit lists invalid sha We had cases where people would list old/invalid sha in the commit. Add a trivial checker to catch those and throw a warning. CC: Juan A. Suarez <jasuarez@igalia.com> CC: Dylan Baker <dylan@pnwbakers.com> CC: mesa-stable@lists.freedesktop.org Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Andres Gomez <agomez@igalia.com>	2018-12-21 14:39:52 +00:00
Emil Velikov	6b296f64af	bin/get-pick-list.sh: rework handing of sha nominations Currently our is_sha_nomination does: - folds any whitespace, attempting to extract sha-like information - checks that at least one of the shas has landed Split it in two and do sha-like validation first. This way, commits with mesa-stable and sha nominations will feature the fixes/revert/etc instead of stable (a) or will be omitted if not applicable for the respective branch (b). Misc examples from 18.3 (a) -[ stable ] `5bc509363b` glx: make xf86vidmode mandatory for direct rendering +[ fixes ] `5bc509363b` glx: make xf86vidmode mandatory for direct rendering (b) -[ stable ] `9a7b319903` anv/query: flush render target before copying results CC: Juan A. Suarez <jasuarez@igalia.com> CC: Dylan Baker <dylan@pnwbakers.com> CC: mesa-stable@lists.freedesktop.org Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Andres Gomez <agomez@igalia.com>	2018-12-21 14:39:34 +00:00
Eric Anholt	17218a0406	vc4: Hook up perf_debug() output to GL_ARB_debug_output as well. This is the right channel to report these things, so that end-users don't need to know each driver's custom debug options.	2018-12-20 11:31:25 -08:00
Rhys Kidd	acc481ad79	vc4: Wire up core pipe_debug_callback This lets the driver use pipe_debug_message() for GL_ARB_debug_output. Signed-off-by: Rhys Kidd <rhyskidd@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-12-20 11:31:19 -08:00
Eric Anholt	ba36312fbd	v3d: Hook up perf_debug() output to GL_ARB_debug output as well. This is the right channel to report these things, so that end-users don't need to know each driver's custom debug options.	2018-12-20 11:31:19 -08:00
Rhys Kidd	d3991d2472	v3d: Wire up core pipe_debug_callback This lets the driver use pipe_debug_message() for GL_ARB_debug_output. Signed-off-by: Rhys Kidd <rhyskidd@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-12-20 11:31:16 -08:00
Eric Anholt	d80761b8f3	v3d: Drop shadow comparison state from shader variant key. The shadow state is now in the sampler.	2018-12-20 11:29:30 -08:00
Eric Anholt	0e2758daad	v3d: Fix simulator mode on i915 render nodes. i915 render nodes refuse the dumb ioctls, so the simulator would crash on the original non-apitrace shader-db. Replace them with direct i915 calls if we detect that we're on one of their gem fds.	2018-12-20 11:29:30 -08:00
Dylan Baker	0ff7eed289	docs/meson: Recommend not using CFLAGS and friends Because of the many caveats involved, using -Dc_args instead of CFLAGS is recommended both by meson upstream and by us. v2: - Fix typo Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1) Reviewed-by: Eric Anholt <eric@anholt.net>	2018-12-20 11:16:40 -08:00
Samuel Pitoiset	9606310081	radv: enable shaderStorageImageMultisample feature on GFX8+ Untested on older chips. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-20 18:01:19 +01:00
Samuel Pitoiset	6b976024a8	radv: add support for FMASK expand Original patch by Dave Airlie. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-20 18:01:17 +01:00
Samuel Pitoiset	fa16da53d8	radv: initialize FMASK for images in fully expanded mode The value depends on the number of samples. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-20 18:01:15 +01:00
Samuel Pitoiset	65d82c84d2	ac/nir: restrict fmask lookup to image load intrinsics We don't ever want to do the fmask lookup on a atomic or store, the fmask should have been decompressed if the surface has been moved to IMAGE_LAYOUT. Original patch by Dave Airlie. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-20 18:01:11 +01:00
Samuel Pitoiset	f45e43e156	spirv: add support for SpvCapabilityStorageImageMultisample Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-20 18:01:09 +01:00
Samuel Pitoiset	5b1ec10e4c	radv: compute optimal VM alignment for imported buffers This fixes GPU hangs on GFX9 with dEQP-VK.memory.external_memory_host.bind_image_memory_and_render.with_zero_offset.* Copied from RadeonSI. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-20 17:34:04 +01:00
Bas Nieuwenhuizen	9f0bfbed11	radv: Work around non-renderable 128bpp compressed 3d textures on GFX9. Exactly what title says, the new addrlib does not allow the above with certain dimensions that the CTS seems to hit. Work around it by not allowing the app to render to it via compat with other 128bpp formats and do not render to it ourselves during copies. Fixes: `776b911365` "amd/addrlib: update Mesa's copy of addrlib" Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-20 15:07:20 +01:00
Samuel Pitoiset	5c7935f8fc	radv: fix subpass image transitions with multiviews The driver needs to decompress all image layers if a fast depth/color clear has been performed. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-20 13:36:37 +01:00
Samuel Pitoiset	0a7e767e58	radv: drop the amdgpu-skip-threshold=1 workaround for LLVM 8 This workaround has been introduced by `135e4d434f` for fixing DXVK GPU hangs with many games. It is no longer needed since LLVM r345718. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-20 12:09:57 +01:00
Samuel Pitoiset	576040f2e5	ac/nir: remove the bitfield_extract workaround for LLVM 8 This workaround has been introduced by `3d41757788` and it is no longer needed since LLVM r346422. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-20 09:40:16 +01:00
Iago Toral Quiroga	d6110d4d54	intel/compiler: move nir_lower_bool_to_int32 before nir_lower_locals_to_regs The former expects to see SSA-only things, but the latter injects registers. The assertions in the lowering where not seeing this because they asserted on the bit_size values only, not on the is_ssa field, so add that assertion too. Fixes: `11dc130779` "nir: Add a bool to int32 lowering pass" CC: mesa-stable@lists.freedesktop.org Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-20 08:02:44 +01:00
Ilia Mirkin	1250383e36	st/mesa: remove sampler associated with buffer texture in pbo logic A long time ago, when this was first implemented, not having a sampler bound would cause problems on Fermi. I didn't work out the reasons, but the solution was simple -- just put the samplers back in. Since then, regular texturing paths appear to have lost their associated samplers which required a fuller investigation and fix in nouveau. Now that this is done, this code should no longer need a sampler state for fetching texels from a buffer texture. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-20 00:27:16 -05:00
Roland Scheidegger	6f4083143b	gallivm: use llvm jit code for decoding s3tc This is (much) faster than using the util fallback. (Note that there's two methods here, one would use a cache, similar to the existing code (although the cache was disabled), except the block decode is done with jit code, the other directly decodes the required pixels. For now don't use the cache (being direct-mapped is suboptimal, but it's difficult to come up with something better which doesn't have too much overhead.) Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2018-12-20 06:03:20 +01:00
Jason Ekstrand	ec1d5841fa	radv/query: Use 1-bit booleans in query shaders Fixes: `44227453ec` "nir: Switch to using 1-bit Booleans for almost..." Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Tested-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-19 16:36:40 -06:00
Jason Ekstrand	6896c91c10	radv/query: Add a nir_test_flag helper This is little more than an iadd_imm right now but it will help in the next commit where we refactor things further. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Tested-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-19 16:36:26 -06:00
Eduardo Lima Mitev	c2ebc38052	freedreno/ir3: Handle GL_NONE in get_num_components_for_glformat() An earlier patch that introduced the function failed to handle the case where an image format layout qualifier is not specified, which is allowed on desktop GL profiles. In these cases, nir_variable's image format is GL_NONE, and we don't need to print a debug message for those. Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Rob Clark <robdclark@gmail.com>	2018-12-19 22:49:05 +01:00
Eric Anholt	90818558f0	docs: Add an encouraging note about providing reviews and acks. Across several projects I've seen new contributors say "I wasn't sure if I should provide a review tag since I'm not really an expert in this area." Everyone I know already applies some implicit weighting to reviews from different people, so encourage participation. Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-12-19 12:49:17 -08:00
Eric Anholt	463df0ffe2	docs: Add a note that MRs should still include any r-b or a-b tags. v2: Mention "Tested-by" too Reviewed-by: Dylan Baker <dylan@pnwbakers.com> (v1) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-12-19 12:48:13 -08:00
Eric Anholt	fcfb7f573c	v3d: Load and store aligned utiles all at once. This calls the expensive uif offset function once per utile, but it still gets us a 212.218% +/- 2.41216% (n=10) win on 1024x1024 glTexImage over calling it on each pixel.	2018-12-19 10:27:26 -08:00
Eric Anholt	7c56b7a6ea	v3d: Add a fallthrough path for utile load/store of 32 byte lines. Now that V3D has 8 byte per pixel formats exposed, we've got stride==32 utiles to load and store. Just handle them through the non-NEON paths for now.	2018-12-19 10:27:26 -08:00
Eric Anholt	f6a0f4f41e	vc4: Move the utile load/store functions to a header for reuse by v3d. These implementations of whole-utile load/stores would be the same for v3d, though the layouts of blocks of utiles has changed.	2018-12-19 10:27:26 -08:00
Eric Anholt	8ee752194c	v3d: Implement texture_subdata to reduce teximage upload copies. This lets us store the non-PBO glTexImage data directly into the tiled image without making an extra untiled memcpy for the gallium transfer. Improves 1024x1024 TexImage perf by ~19%, mostly from not thrashing around in the kernel mapping and unmapping the transfer's temporary area.	2018-12-19 10:27:26 -08:00
Eric Anholt	e09d8aecb4	v3d: Remove dead prototypes for load/store utile functions.	2018-12-19 10:27:26 -08:00
Eric Anholt	fcf881adda	v3d: Don't try to create shadow tiled temporaries for 1D textures. They're raster order anyway, so we'd assertion fail along with wasting bandwidth. Fixes: `6ad9e8690d` ("v3d: Add support for texturing from linear.")	2018-12-19 10:27:21 -08:00
Eric Anholt	b5adc744ba	v3d: Fix check for TFU job completion in the simulator. We're waiting for the jobs-completed count to increment (with wrapping), not to reach its starting state. This mostly ended up working out because the next v3d_hw_tick() for a submit CL would end up doing the TFU operation first, but it did fail when a blit was used for glReadPixels() at the end of a test. Fixes: `ee0549ff9a` ("v3d: Add the V3D TFU submit interface to the simulator.")	2018-12-19 10:26:04 -08:00
Eric Anholt	365728dc5d	v3d: Put the dst bo first in the list of BOs for TFU calls. In the UAPI, the first BO is the destination, and the one the kernel should do an exclusive reservation on. Currently we only do exclusive reservations, anyway. However, in the simulator path I was only copying back the "destination" BO (actually src in this case), and this caused regressions once I fixed the simulator to actually complete TFU before returning (since otherwise, the TFU op would happen at the start of the next CL submit and the draw would get the right contents). Fixes: `976ea90bdc` ("v3d: Add support for using the TFU to do some blits.")	2018-12-19 10:26:04 -08:00
Caio Marcelo de Oliveira Filho	947f7b452a	nir: properly find the entry to keep in copy_prop_vars When copy propagation handles a store/copy, it iterates the current copy entries to remove aliases, but keeps the "equal" entry (if exists) to be updated. The removal step may swap the entries around (to ensure there are no holes), invalidating previous iteration pointers. The bug was saving such pointer to use later. Change the code to first perform the removals and then find the remaining right entry. This was causing updates to be lost since they were being made to an entry that was not part of the current copies. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108624 Fixes: `b3c6146925` "nir: Copy propagation between blocks" Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-19 09:33:36 -08:00
Michel Dänzer	9d8395bf0e	winsys/amdgpu: Pull in LLVM CFLAGS Fixes build failure if the LLVM headers aren't in a standard include directory. Fixes: `ec22dd34c8` "radeonsi: move SI_FORCE_FAMILY functionality to winsys" Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-12-19 17:54:18 +01:00
Caio Marcelo de Oliveira Filho	0ddc911f4d	nir: properly clear the entry sources in copy_prop_vars When updating a copy entry source value from a "non-SSA" (the data come from a copy instruction) to a "SSA" (the data or parts of it come from SSA values), it was possible to hold invalid data in ssa[0] depending on the writemask. Because the union, ssa[0] could contain a pointer to a nir_deref_instr left-over from previous non-SSA usage. Change code to clean up the array before use to avoid invalid data around. Fixes: `62332d139c` "nir: Add a local variable-based copy propagation pass" Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-19 08:35:48 -08:00
Eric Engestrom	0e4c7c3d5b	docs: format code blocks a bit nicely Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2018-12-19 16:32:30 +00:00
Eric Engestrom	b0319d0768	docs: add meson cross compilation instructions Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-12-19 16:31:51 +00:00
Gurchetan Singh	b45aa6290b	virgl: move resource creation / import / destruction to common code We can remove some duplicated code. Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-12-19 13:29:16 +01:00
Gurchetan Singh	1d3d311133	virgl: move resource metadata into base resource A resource is just a buffer with some metadata. Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-12-19 13:29:16 +01:00
Gurchetan Singh	db77573d7b	virgl: modify how we handle GL_MAP_FLUSH_EXPLICIT_BIT Previously, we ignored the the glUnmap(..) operation and flushed before we flush the cbuf. Now, let's just flush the data when we unmap. Neither method is optimal, for example: glMapBufferRange(.., 0, 100, GL_MAP_FLUSH_EXPLICIT_BIT) glFlushMappedBufferRange(.., 25, 30) glFlushMappedBufferRange(.., 65, 70) We'll end up flushing 25 --> 70. Maybe we can fix this later. v2: Add fixme comment in the code (Elie) Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-12-19 13:29:16 +01:00
Gurchetan Singh	11939f6fa2	virgl: make virgl_buffers use resource helpers We can reuse the helpers we created. Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-12-19 13:29:16 +01:00
Gurchetan Singh	4e2c77cd51	virgl: make transfer code with PIPE_BUFFER targets util_format_get_blocksize returns 1 for R8 formats (all PIPE_BUFFERs are R8). Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-12-19 13:29:16 +01:00
Gurchetan Singh	174f530008	virgl: consolidate transfer code We could allocate and destroy transfers in one place. v2: Keep l_stride around. Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-12-19 13:29:16 +01:00
Gurchetan Singh	13626b46f1	virgl: store layer_stride in metadata Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-12-19 13:29:16 +01:00
Gurchetan Singh	2a44acc83b	virgl: move vrend_get_tex_image_offset to common code Will be reused. Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-12-19 13:29:16 +01:00
Gurchetan Singh	f749229a8e	virgl: move virgl_resource_layout to common code Will be reused. Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-12-19 13:29:16 +01:00
Gurchetan Singh	a63da9c062	virgl: move texture metadata to common code Will be reused. Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-12-19 13:29:16 +01:00
Gurchetan Singh	6e7d396ad3	virgl: remove unnessecary code With commit 89b479, we moved to tracking buffer cleanliness when binding. TEST=dEQP-GLES31.functional.image_load_store.buffer.load_store.r32ui Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-12-19 13:29:16 +01:00
Gurchetan Singh	6d13d1aadb	virgl: texture_transfer_pool --> transfer_pool It's used for all types of resources. Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-12-19 13:29:16 +01:00
Nicolai Hähnle	d73a25f2c0	radeonsi: const-ify the si_query_ops Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:02:07 +01:00
Nicolai Hähnle	c85b0dea0a	radeonsi: split perfcounter queries from si_query_hw Remove a level of indirection to make the code more explicit -- should make it easier to follow what's going on. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:02:04 +01:00
Nicolai Hähnle	e0f0d3675d	radeonsi: factor si_query_buffer logic out of si_query_hw This is a move towards using composition instead of inheritance for different query types. This change weakens out-of-memory error reporting somewhat, though this should be acceptable since we didn't consistently report such errors in the first place. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:02:01 +01:00
Nicolai Hähnle	0fc6e573dd	radeonsi: move query suspend logic into the top-level si_query struct Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:59 +01:00
Nicolai Hähnle	e2b9329f17	radeonsi: move remaining perfcounter code into si_perfcounter.c Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:57 +01:00
Nicolai Hähnle	7dd289d9e4	radeonsi: track constant buffer bind history in si_pipe_set_constant_buffer Other callers of si_set_constant_buffer don't need it. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:54 +01:00
Nicolai Hähnle	829d417914	radeonsi: use si_set_rw_shader_buffer for setting streamout buffers Reduce the number of places that encode buffer descriptors. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:52 +01:00
Nicolai Hähnle	ce785f5ffd	radeonsi: add an si_set_rw_shader_buffer convenience function Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:50 +01:00
Nicolai Hähnle	556c4c42b7	radeonsi: avoid using hard-coded SI_NUM_RW_BUFFERS Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:48 +01:00
Nicolai Hähnle	1e49d72317	radeonsi: show the fixed function TCS in debug dumps This is rather important for merged VS/TCS as LSHS shaders... Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:45 +01:00
Nicolai Hähnle	6e67e79de4	radeonsi: const-ify si_set_tesseval_regs Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:42 +01:00
Nicolai Hähnle	5c841a1b1e	radeonsi: rename SI_RESOURCE_FLAG_FORCE_TILING to clarify its purpose Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:39 +01:00
Nicolai Hähnle	0d58dcc3cf	radeonsi: don't set RAW_WAIT for CP DMA clears There is never a read-after-write hazard because the command doesn't read. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:34 +01:00
Nicolai Hähnle	23af72af25	radeonsi/gfx9: use SET_UCONFIG_REG_INDEX packets when available Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:32 +01:00
Nicolai Hähnle	f18b2ac0db	radeonsi: add si_init_draw_functions and make some functions static Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:30 +01:00
Nicolai Hähnle	555cb668cc	radeonsi: extract declare_vs_blit_inputs Prepare for some later refactoring. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:27 +01:00
Nicolai Hähnle	ec22dd34c8	radeonsi: move SI_FORCE_FAMILY functionality to winsys This helps some debugging cases by initializing addrlib with slightly more appropriate settings. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:25 +01:00
Nicolai Hähnle	0ef263d62f	ac/surface: 3D and cube surfaces are never displayable Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:22 +01:00
Nicolai Hähnle	8efaffa893	amd/common: add i1 special case to ac_build_{inclusive,exclusive}_scan Allow for a unified but efficient treatment of adding a bitmask over a wave or an entire threadgroup. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:19 +01:00
Nicolai Hähnle	300876a9a7	amd/common: scan/reduce across waves of a workgroup Order-aware scan/reduce can trade-off LDS traffic for external atomics memory traffic in producer/consumer compute shaders. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:17 +01:00
Nicolai Hähnle	3963402fd3	amd/common: add ac_build_ifcc Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:15 +01:00
Nicolai Hähnle	3c77f26ccc	amd/common: whitespace fixes Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:12 +01:00
Nicolai Hähnle	76c5ad1995	amd/sid_tables: add additional python3 compatibility imports This happened to bite me while doing some experiments. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:01:09 +01:00
Nicolai Hähnle	6f0322b16a	r600: remove redundant semicolon Reviewed-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 12:00:49 +01:00
Nicolai Hähnle	7230cb8f2b	ddebug: always flush when requested, even when hang detection is disabled Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 11:59:18 +01:00
Nicolai Hähnle	539fdc49f1	ddebug: simplify watchdog loop and fix crash in the no-timeout case The following race condition could occur in the no-timeout case: API thread Gallium thread Watchdog ---------- -------------- -------- dd_before_draw u_threaded_context draw dd_after_draw add to dctx->records signal watchdog dump & destroy record execute draw dd_after_draw_async use-after-free! Alternatively, the same scenario would assert in a debug build when destroying the record because record->driver_finished has not signaled. Fix this and simplify the logic at the same time by - handing the record pointers off to the watchdog thread before each draw call and - waiting on the driver_finished fence in the watchdog thread Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-19 11:59:10 +01:00
Tapani Pälli	3627c9efff	anv/android: turn on VK_ANDROID_external_memory_android_hardware_buffer Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:42 +02:00
Tapani Pälli	3dc424a4f4	anv: ignore VkSamplerYcbcrConversion on non-yuv formats This fulfills a requirement for clients that want to utilize same code path for images with external formats (VK_FORMAT_UNDEFINED) and "regular" RGBA images where format is known. This is similar to how OES_EGL_image_external works. To support this, we allow color conversion samplers for non-YUV formats but skip setting up conversion when format does not have can_ycbcr flag set. v2: add comment and bundle can_ycbcr to the existing break condition (Lionel) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	a7b7772cfb	anv: support VkSamplerYcbcrConversionInfo in vkCreateImageView If a conversion struct was passed, then initialize view using format from the conversion structure. v2: use vk_format directly from the anv_format struct v3: added some assertions (Lionel) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	bb0721aea4	anv: add VkFormat field as part of anv_format Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	c070b0e25f	anv: support VkExternalFormatANDROID in vkCreateSamplerYcbcrConversion If external format is used, we store the external format identifier in conversion to be used later when creating VkImageView. v2: rebase to `b43f955037` changes v3: added assert, ignore components when creating external format conversion (Lionel) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	f1654fa7e3	anv/android: support creating images from external format Since we don't know the exact format at creation time, some initialization is done only when bound with memory in vkBindImageMemory. v2: demand dedicated allocation in vkGetImageMemoryRequirements2 if image has external format v3: refactor prepare_ahw_image, support vkBindImageMemory2, calculate stride correctly for rgb(x) surfaces, rename as 'resolve_ahw_image' v4: rebase to `b43f955037` changes v5: add some assertions to verify input correctness (Lionel) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	517103abf1	anv/android: add ahardwarebuffer external memory properties v2: have separate memory properties for android, set usage flags for buffers correctly v3: code cleanup (Jason) + limit maxArrayLayers to 1 for AHardwareBuffer based images v4: rebase to `b43f955037` changes Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	c79a528d2b	anv/android: support import/export of AHardwareBuffer objects v2: add support for non-image buffers (AHARDWAREBUFFER_FORMAT_BLOB) v3: properly handle usage bits when creating from image v4: refactor, code cleanup (Jason) v5: rebase to `b43f955037` changes, initialize bo flags as ANV_BO_EXTERNAL (Lionel) v6: add assert that anv_bo_cache_import succeeds, add comment about multi-bo support to clarify current implementation (Lionel) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	5c65c60d6c	anv: refactor, remove else block in AllocateMemory This makes it cleaner to introduce more cases where we import memory from different types of external memory buffers. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	884fc90fde	anv: add anv_ahw_usage_from_vk_usage helper function v2: rebase to `b43f955037` changes Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	1e6a44400a	anv/android: add GetAndroidHardwareBufferPropertiesANDROID Use the anv_format address in formats table as implementation-defined external format identifier for now. When adding YUV format support this might need to change. v2: code cleanup (Jason) v3: set anv_format address as identifier v4: setup suggestedYcbcrModel and suggested[X\|Y]ChromaOffset as expected for HAL_PIXEL_FORMAT_NV12_Y_TILED_INTEL v5: set linear tiling for GPU_DATA_BUFFER usage, add comment about multi-bo support to clarify current implementation (Lionel) Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	aa94e01bfe	anv: add from/to helpers with android and vulkan formats v2: handle R8G8B8X8 as R8G8B8_UNORM (Jason) v3: add HAL_PIXEL_FORMAT_NV12_Y_TILED_INTEL, we make it define for now to avoid direct dependency to minigbm headers Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	c1f15a0a1a	anv: make anv_get_image_format_features public This will be utilized later by GetAndroidHardwareBufferPropertiesANDROID. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	8a469fd335	anv: refactor make_surface to use data from anv_image Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Tapani Pälli	2a98e5bbb9	anv: add create_flags as part of anv_image This will make it possible for next patch to rip anv_image_create_info out from make_surface function. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-19 09:38:41 +02:00
Ian Romanick	96c4b135e3	nir/algebraic: Don't put quotes around floating point literals The quotation marks around 1.0 cause it to be treated as a string instead of a floating point value. The generator then treats it as an arbitrary variable replacement, so any iand involving a ('ineg', ('b2i', a)) matches. v2: Remove misleading comment about sized literals (suggested by Timothy). Add assertion that the name of a varible is entierly alphabetic (suggested by Jason). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Timothy Arceri <tarceri@itsqueeze.com> [v1] Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> [v1] Fixes: `6bcd2af086` ("nir/algebraic: Add some optimizations for D3D-style Booleans") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109075	2018-12-18 23:28:31 -08:00
Vinson Lee	0f7ba5758b	meson: Fix libsensors detection. Fixes: `5e71efef44` ("meson: Add lmsensors support") Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-18 19:24:01 -08:00
Vinson Lee	84f39e5971	meson: Fix typo. Fixes: `6b4c7047d5` ("meson: build gallium nine state_tracker") Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-12-18 19:14:11 -08:00
Sagar Ghuge	933c44bcc4	nir: Add a new lowering option to lower 3D surfaces from txd to txl. Tested on gen9. v2: Rename lower_txd_3d_surafaces flag to lower_txd_3d (Jason Ekstrand) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-18 13:44:09 -08:00
Christian Gmeiner	7ea8e54dd6	meson: add etnaviv to the tools option Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-18 21:50:58 +01:00
Adam Jackson	e36d136102	specs: Bump GLX_MESA_query_renderer to version 9 Note that we have an official GL extension number, pick the appropriate section of the GLX spec to modify, and add changelog. Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Adam Jackson <ajax@redhat.com>	2018-12-18 15:46:10 -05:00
Adam Jackson	9e8332ebc2	specs: Remove GLX_RENDERER_ID_MESA from GLX_MESA_query_renderer This has not even had an attempt at implementation. If you asked for renderer 0 - which, the spec implies, should always work - then dri2_convert_glx_attribs would fail, we'd silently fall back to creating an indirect context, and xserver would also not recognize the attribute and would throw BadValue at you. The API would be difficult to use in any case, since there's no way to enumerate how many renderers the screen has. I'd be tempted to add that by defining: glXQueryRendererIntegerMESA(dpy, screen, /* renderer = */ -1, 0, &value); to return the number of renderers, but a new entrypoint might be cleaner. Still, better to not specify it at all than to lie about it. Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Adam Jackson <ajax@redhat.com>	2018-12-18 15:46:10 -05:00
Adam Jackson	c63c391756	specs: Remove GLES profile interaction text from GLX_MESA_query_renderer In one place we say, if GLES isn't supported then the profile version will be 0.0. Then later we say, if the GLES profile extension isn't supported then GLX_RENDERER_OPENGL_ES_PROFILE_VERSION_MESA is not mentioned in the spec. A strict reading of the latter would mean that GLX_RENDERER_OPENGL_ES_PROFILE_VERSION_MESA is not a recognized token, and the query should instead return False. The implementation does not check for the GLES profile extensions, and the additional complexity doesn't seem worth it. Removing the interaction text makes the spec match the implementation. Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Adam Jackson <ajax@redhat.com>	2018-12-18 15:46:10 -05:00
Eduardo Lima Mitev	5820e63418	freedreno/ir3: Make imageStore use num components from image format emit_intrinsic_store_image() is always using 4 components when collecting registers for the value. When image has less than 4 components (e.g, r32f, rg32i, etc) this results in extra mov instructions. This patch uses the actual number of components from the image format. For example, in a shader like: layout (r32f, binding=0) writeonly uniform imageBuffer u_image; ... void main(void) { ... imageStore (u_image, some_offset, vec4(1.0)); ... } instruction count is reduced in at least 3 instructions (note image format is r32f, 1 component only). This obviously reduces register pressure as well. v2: - Added support for image formats from NV_image_format extension (Ilia Mirkin). - Return 4 components by default instead of asserting. (Rob Clark). v3: Added more missing formats (Ilia Mirkin). v4: Added a debug message for unknown image formats (Rob Clark). Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Rob Clark <robdclark@gmail.com>	2018-12-18 21:15:20 +01:00
Jason Ekstrand	5dad1abfdc	nir/dead_write_vars: Get modes directly from derefs Instead of going all the way back to the variable, just look at the deref. The modes are guaranteed to be the same by nir_validate whenever the variable can be found. This fixes clear_unused_for_modes for derefs that don't have an accessible variable. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	fa40a58fd9	nir/copy_prop_vars: Get modes directly from derefs Instead of going all the way back to the variable, just look at the deref. The modes are guaranteed to be the same by nir_validate whenever the variable can be found. This fixes apply_barrier_for_modes for derefs that don't have an accessible variable. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	cf7fb39805	nir/lower_wpos_center: Look at derefs for modes This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	867fe35a16	nir/lower_io_to_scalar: Look at derefs for modes This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	3fe0363dda	nir/lower_io_arrays_to_elements: Look at derefs for modes This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	8cc0f92492	nir/linking_helpers: Look at derefs for modes This is instead of looking all the way back to the variable which may not exist for all derefs. This makes this code properly ignore casts with modes other than the mode[s] we care about (where casts aren't allowed). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Jason Ekstrand	8410cf66d7	nir/propagate_invariant: Skip unknown vars If we can't find the variable from the deref, just assume it isn't invariant and continue on. This can happen if, for instance, we're writing to a deref that points into an SSBO. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-18 13:13:28 -06:00
Ian Romanick	29e4b949b4	Revert "nir/lower_indirect: Bail early if modes == 0" "There's no point in walking the program if we're never going to actually lower anything." Except we might lower compacted local arrays. In that case, modes will be 0, but there is still lowering to be done. This reverts commit `7f75cf2a94`. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109081 Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Clayton Craft <clayton.a.craft@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org>	2018-12-18 10:47:54 -08:00
Lucas Stach	433ca3127a	st/dri: replace format conversion functions with single mapping table Each time I have to touch the buffer import/export functions in the dri state tracker I get lost in the maze of functions converting between DRI_IMAGE_FOURCC, DRI_IMAGE_FORMAT, DRI_IMAGE_COMPONENTS and pipe format. Rip it out and replace by a single table, which defines the correspondence between the different representations. Also this now stores all the known representations in the __DRIimageRec, to avoid the loss of information we currently have when importing a buffer with a fourcc, which doesn't have a corresponding dri format. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-18 19:19:45 +01:00
Lucas Stach	67174d40f1	st/dri: allow both render and sampler compatible dma-buf formats Currently all the EGL APIs are missing a way to specify how an imported dma-buf is intended to be used. Demanding the format to be both usable for sampling and rendering artificially restricts the list of formats a driver is able to import. Looking at how the Intel driver implements those DRI2 image APIs it doesn't distinguish between render or sampler compatible formats. So this patch aligns behavior between Intel and Gallium based drivers. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-18 19:19:40 +01:00
Lucas Stach	a3e592e839	etnaviv: use surface format directly There is no need to do the detour over the resource behind the surface to get the format. Use the surface format directly. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>	2018-12-18 19:07:10 +01:00
Dylan Baker	7a90886921	meson: Add toggle for glx-direct GNU Hurd needs to turn off glx-direct, rather than special case it, we'll just add a toggle. CC: 18.3 <mesa-stable@lists.freedesktop.org> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-12-18 09:20:53 -08:00
Dylan Baker	8c77f4c76d	meson: Add support for gnu hurd CC: 18.3 <mesa-stable@lists.freedesktop.org> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-12-18 09:20:49 -08:00
Dylan Baker	6cf5f25bc5	meson: remove duplicate definition Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-12-18 09:18:12 -08:00
Dylan Baker	e430a034b9	meson: Fix ppc64 little endian detection Old versions of meson returned ppc64le as the cpu_family for little endian power8 cpus, versions >=0.48 don't do this, so the check wouldn't work in that case. This generalizes the check to work for both old and new versions of meson. Fixes: `34bbb24ce7` ("meson: Add support for ppc assembly/optimizations") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-12-18 09:17:54 -08:00
Jason Ekstrand	3feda3cf35	anv: Bump the patch version to 96 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-18 09:40:46 -06:00
Kenneth Graunke	3c71ba3baa	i965: Don't override subslice count to 4 on Gen11. Gen9-10 have fewer than 4 subslices per slice, so they need this to be rounded up. Gen11 isn't documented as needing this hack, and it can also have more than 4 subslices, so the hack actually can break things. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2018-12-17 14:03:45 -08:00
Ian Romanick	af07141b33	intel/compiler: More peephole_select for pre-Gen6 No shader-db changes on any Gen6+ platform. All of the shaders with cycles hurt by more than ~2% are from Master of Orion. All of the shaders have instructions helped. It looks like the pass enables some control flow to be converted to bcsels, then the scheduler does dumb things. These are new shaders (just added before doing this shader-db run), so there's probably some low-hanging fruit. Iron Lake total instructions in shared programs: 8214327 -> 8213684 (<.01%) instructions in affected programs: 84469 -> 83826 (-0.76%) helped: 114 HURT: 26 helped stats (abs) min: 2 max: 18 x̄: 7.75 x̃: 9 helped stats (rel) min: 0.17% max: 13.73% x̄: 2.52% x̃: 1.05% HURT stats (abs) min: 2 max: 20 x̄: 9.23 x̃: 8 HURT stats (rel) min: 0.70% max: 2.48% x̄: 1.66% x̃: 1.61% 95% mean confidence interval for instructions value: -5.87 -3.32 95% mean confidence interval for instructions %-change: -2.32% -1.17% Instructions are helped. total cycles in shared programs: 187736850 -> 187749314 (<.01%) cycles in affected programs: 506750 -> 519214 (2.46%) helped: 104 HURT: 36 helped stats (abs) min: 2 max: 72 x̄: 21.96 x̃: 16 helped stats (rel) min: 0.02% max: 6.16% x̄: 0.97% x̃: 0.63% HURT stats (abs) min: 4 max: 1402 x̄: 409.67 x̃: 40 HURT stats (rel) min: 0.33% max: 23.12% x̄: 5.79% x̃: 1.39% 95% mean confidence interval for cycles value: 28.32 149.74 95% mean confidence interval for cycles %-change: -0.07% 1.61% Inconclusive result (%-change mean confidence interval includes 0). GM45 total instructions in shared programs: 5044014 -> 5043652 (<.01%) instructions in affected programs: 46751 -> 46389 (-0.77%) helped: 63 HURT: 13 helped stats (abs) min: 2 max: 29 x̄: 7.65 x̃: 9 helped stats (rel) min: 0.17% max: 13.73% x̄: 2.93% x̃: 1.04% HURT stats (abs) min: 2 max: 20 x̄: 9.23 x̃: 8 HURT stats (rel) min: 0.66% max: 2.35% x̄: 1.58% x̃: 1.52% 95% mean confidence interval for instructions value: -6.54 -2.99 95% mean confidence interval for instructions %-change: -3.04% -1.28% Instructions are helped. total cycles in shared programs: 128143042 -> 128150188 (<.01%) cycles in affected programs: 324564 -> 331710 (2.20%) helped: 57 HURT: 19 helped stats (abs) min: 6 max: 74 x̄: 30.70 x̃: 32 helped stats (rel) min: 0.08% max: 4.74% x̄: 1.22% x̃: 0.81% HURT stats (abs) min: 10 max: 1400 x̄: 468.21 x̃: 60 HURT stats (rel) min: 0.56% max: 19.94% x̄: 5.80% x̃: 1.70% 95% mean confidence interval for cycles value: 6.90 181.15 95% mean confidence interval for cycles %-change: -0.52% 1.59% Inconclusive result (%-change mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	378f996771	nir/opt_peephole_select: Don't peephole_select expensive math instructions On some GPUs, especially older Intel GPUs, some math instructions are very expensive. On those architectures, don't reduce flow control to a csel if one of the branches contains one of these expensive math instructions. This prevents a bunch of cycle count regressions on pre-Gen6 platforms with a later patch (intel/compiler: More peephole select for pre-Gen6). v2: Remove stray #if block. Noticed by Thomas. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	8fb8ebfbb0	intel/compiler: More peephole select Shader-db results: The one shader hurt for instructions is a compute shader that had both spills and fills hurt. v2: Fix typo in comment noticed by Caio. v3: Fix inverted condition in brw_nir.c. Noticed by Lionel. Skylake, Broadwell, and Haswell had similar results. (Skylake shown) total instructions in shared programs: 15072761 -> 15047884 (-0.17%) instructions in affected programs: 895539 -> 870662 (-2.78%) helped: 3623 HURT: 1 helped stats (abs) min: 1 max: 181 x̄: 6.89 x̃: 4 helped stats (rel) min: 0.10% max: 25.00% x̄: 3.93% x̃: 3.20% HURT stats (abs) min: 92 max: 92 x̄: 92.00 x̃: 92 HURT stats (rel) min: 1.92% max: 1.92% x̄: 1.92% x̃: 1.92% 95% mean confidence interval for instructions value: -7.10 -6.63 95% mean confidence interval for instructions %-change: -4.03% -3.82% Instructions are helped. total cycles in shared programs: 369738930 -> 369535732 (-0.05%) cycles in affected programs: 68027851 -> 67824653 (-0.30%) helped: 2609 HURT: 1035 helped stats (abs) min: 1 max: 4508 x̄: 181.44 x̃: 77 helped stats (rel) min: <.01% max: 71.31% x̄: 9.14% x̃: 5.47% HURT stats (abs) min: 1 max: 33336 x̄: 261.04 x̃: 20 HURT stats (rel) min: <.01% max: 47.61% x̄: 2.93% x̃: 1.47% 95% mean confidence interval for cycles value: -96.43 -15.09 95% mean confidence interval for cycles %-change: -6.07% -5.36% Cycles are helped. total spills in shared programs: 10158 -> 10159 (<.01%) spills in affected programs: 166 -> 167 (0.60%) helped: 1 HURT: 1 total fills in shared programs: 22105 -> 22116 (0.05%) fills in affected programs: 837 -> 848 (1.31%) helped: 4 HURT: 1 Ivy Bridge total instructions in shared programs: 12021190 -> 11990256 (-0.26%) instructions in affected programs: 910561 -> 879627 (-3.40%) helped: 3344 HURT: 18 helped stats (abs) min: 1 max: 99 x̄: 9.29 x̃: 6 helped stats (rel) min: 0.11% max: 31.18% x̄: 5.19% x̃: 3.31% HURT stats (abs) min: 2 max: 20 x̄: 7.89 x̃: 6 HURT stats (rel) min: 0.70% max: 2.59% x̄: 1.63% x̃: 1.70% 95% mean confidence interval for instructions value: -9.49 -8.91 95% mean confidence interval for instructions %-change: -5.32% -4.98% Instructions are helped. total cycles in shared programs: 179077826 -> 178570196 (-0.28%) cycles in affected programs: 63205667 -> 62698037 (-0.80%) helped: 2767 HURT: 620 helped stats (abs) min: 1 max: 7531 x̄: 217.58 x̃: 88 helped stats (rel) min: <.01% max: 75.86% x̄: 9.59% x̃: 6.09% HURT stats (abs) min: 1 max: 31255 x̄: 152.27 x̃: 11 HURT stats (rel) min: <.01% max: 36.36% x̄: 2.77% x̃: 0.58% 95% mean confidence interval for cycles value: -173.94 -125.81 95% mean confidence interval for cycles %-change: -7.68% -6.97% Cycles are helped. Sandy Bridge total instructions in shared programs: 10852569 -> 10843758 (-0.08%) instructions in affected programs: 235803 -> 226992 (-3.74%) helped: 800 HURT: 0 helped stats (abs) min: 1 max: 88 x̄: 11.01 x̃: 8 helped stats (rel) min: 0.11% max: 23.08% x̄: 4.69% x̃: 3.36% 95% mean confidence interval for instructions value: -11.93 -10.10 95% mean confidence interval for instructions %-change: -4.99% -4.39% Instructions are helped. total cycles in shared programs: 154732047 -> 154608941 (-0.08%) cycles in affected programs: 4063110 -> 3940004 (-3.03%) helped: 606 HURT: 253 helped stats (abs) min: 1 max: 2524 x̄: 227.93 x̃: 62 helped stats (rel) min: 0.02% max: 39.24% x̄: 4.36% x̃: 1.81% HURT stats (abs) min: 1 max: 1966 x̄: 59.36 x̃: 11 HURT stats (rel) min: 0.02% max: 67.10% x̄: 3.22% x̃: 0.67% 95% mean confidence interval for cycles value: -170.49 -116.13 95% mean confidence interval for cycles %-change: -2.61% -1.65% Cycles are helped. No change on Iron Lake or GM45. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	09b7e1d8e4	nir/opt_peephole_select: Don't try to remove flow control around indirect loads That flow control may be trying to avoid invalid loads. On at least some platforms, those loads can also be expensive. No shader-db changes on any Intel platform (even with the later patch "intel/compiler: More peephole select"). v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested by Rob. See also the big comment in src/intel/compiler/brw_nir.c. v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from nir_lower_io_arrays_to_elements.c). v4: Fix inverted condition in brw_nir.c. Noticed by Lionel. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	4cd1a0be76	i965/vec4: Propagate conditional modifiers from more compares to other compares If there is a CMP.NZ that compares a single component (via a .zzzz swizzle, for example) with 0, it can propagate its conditional modifier back to a previous CMP that writes only that component. The specific case that I saw was: cmp.l.f0(8) g42<1>.xF g61<4>.xF (abs)g18<4>.zF ... cmp.nz.f0(8) null<1>D g42<4>.xD 0D In this case we can just delete the second CMP. No changes on Broadwell or Skylake because they do not use the vec4 backend. Also no changes on GM45 or Iron Lake. Sandy Bridge, Ivy Bridge, and Haswell had similar results. (Sandy Bridge shown) total instructions in shared programs: 10856676 -> 10852569 (-0.04%) instructions in affected programs: 228322 -> 224215 (-1.80%) helped: 1331 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 3.09 x̃: 4 helped stats (rel) min: 0.11% max: 6.67% x̄: 1.88% x̃: 1.83% 95% mean confidence interval for instructions value: -3.19 -2.99 95% mean confidence interval for instructions %-change: -1.93% -1.83% Instructions are helped. total cycles in shared programs: 154788865 -> 154732047 (-0.04%) cycles in affected programs: 2485892 -> 2429074 (-2.29%) helped: 1097 HURT: 59 helped stats (abs) min: 2 max: 168 x̄: 51.96 x̃: 64 helped stats (rel) min: 0.12% max: 12.70% x̄: 3.44% x̃: 2.22% HURT stats (abs) min: 2 max: 16 x̄: 3.02 x̃: 2 HURT stats (rel) min: 0.18% max: 0.83% x̄: 0.64% x̃: 0.71% 95% mean confidence interval for cycles value: -51.04 -47.26 95% mean confidence interval for cycles %-change: -3.40% -3.07% Cycles are helped. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	9a83c3d3b3	i965/fs: Eliminate unary op on operand of compare-with-zero The (-abs(x) >= 0) => (x == 0) optimization is removed from the vec4 and scalar parts. In the VS part, adding the new pattern was not helpful. The pattern that is removed is really old, and it has been handled by NIR for ages. All Gen7+ platforms had similar results. (Broadwell shown) total instructions in shared programs: 14715715 -> 14715709 (<.01%) instructions in affected programs: 474 -> 468 (-1.27%) helped: 6 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 1.12% max: 1.35% x̄: 1.28% x̃: 1.35% 95% mean confidence interval for instructions value: -1.00 -1.00 95% mean confidence interval for instructions %-change: -1.40% -1.15% Instructions are helped. total cycles in shared programs: 559569911 -> 559569809 (<.01%) cycles in affected programs: 5963 -> 5861 (-1.71%) helped: 6 HURT: 0 helped stats (abs) min: 16 max: 18 x̄: 17.00 x̃: 17 helped stats (rel) min: 1.45% max: 1.88% x̄: 1.73% x̃: 1.85% 95% mean confidence interval for cycles value: -18.15 -15.85 95% mean confidence interval for cycles %-change: -1.95% -1.51% Cycles are helped. Iron Lake and Sandy Bridge had similar results. (Iron Lake shown) total instructions in shared programs: 7780915 -> 7780913 (<.01%) instructions in affected programs: 246 -> 244 (-0.81%) helped: 2 HURT: 0 total cycles in shared programs: 177876108 -> 177876106 (<.01%) cycles in affected programs: 3636 -> 3634 (-0.06%) helped: 1 HURT: 0 GM45 total instructions in shared programs: 4799152 -> 4799151 (<.01%) instructions in affected programs: 126 -> 125 (-0.79%) helped: 1 HURT: 0 total cycles in shared programs: 122052654 -> 122052652 (<.01%) cycles in affected programs: 3640 -> 3638 (-0.05%) helped: 1 HURT: 0 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Ian Romanick	440c051340	i965/vec4/dce: Don't narrow the write mask if the flags are used In an instruction sequence like cmp(8).ge.f0.0 vgrf17:D, vgrf2.xxxx:D, vgrf9.xxxx:D (+f0.0) sel(8) vgrf1:UD, vgrf8.xyzw:UD, vgrf1.xyzw:UD The other fields of vgrf17 may be unused, but the CMP still needs to generate the other flag bits. To my surprise, nothing in shader-db or any test suite appears to hit this. However, I have a change to brw_vec4_cmod_propagation that creates cases where this can happen. This fix prevents a couple dozen regressions in that patch. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `5df88c20` ("i965/vec4: Rewrite dead code elimination to use live in/out.")	2018-12-17 13:47:06 -08:00
Ian Romanick	111bcc8d02	i965/vec4: Silence unused parameter warnings in vec4 compiler tests src/intel/compiler/test_vec4_copy_propagation.cpp: In member function ‘virtual brw::dst_reg* copy_propagation_vec4_visitor::make_reg_for_system_value(int)’: src/intel/compiler/test_vec4_copy_propagation.cpp:57:51: warning: unused parameter ‘location’ [-Wunused-parameter] virtual dst_reg make_reg_for_system_value(int location) ^~~~~~~~ src/intel/compiler/test_vec4_copy_propagation.cpp: In member function ‘virtual void copy_propagation_vec4_visitor::emit_urb_write_header(int)’: src/intel/compiler/test_vec4_copy_propagation.cpp:77:43: warning: unused parameter ‘mrf’ [-Wunused-parameter] virtual void emit_urb_write_header(int mrf) ^~~ src/intel/compiler/test_vec4_copy_propagation.cpp: In member function ‘virtual brw::vec4_instruction copy_propagation_vec4_visitor::emit_urb_write_opcode(bool)’: src/intel/compiler/test_vec4_copy_propagation.cpp:82:57: warning: unused parameter ‘complete’ [-Wunused-parameter] virtual vec4_instruction emit_urb_write_opcode(bool complete) ^~~~~~~~ src/intel/compiler/test_vec4_register_coalesce.cpp: In member function ‘virtual brw::dst_reg register_coalesce_vec4_visitor::make_reg_for_system_value(int)’: src/intel/compiler/test_vec4_register_coalesce.cpp:60:51: warning: unused parameter ‘location’ [-Wunused-parameter] virtual dst_reg make_reg_for_system_value(int location) ^~~~~~~~ src/intel/compiler/test_vec4_register_coalesce.cpp: In member function ‘virtual void register_coalesce_vec4_visitor::emit_urb_write_header(int)’: src/intel/compiler/test_vec4_register_coalesce.cpp:80:43: warning: unused parameter ‘mrf’ [-Wunused-parameter] virtual void emit_urb_write_header(int mrf) ^~~ src/intel/compiler/test_vec4_register_coalesce.cpp: In member function ‘virtual brw::vec4_instruction register_coalesce_vec4_visitor::emit_urb_write_opcode(bool)’: src/intel/compiler/test_vec4_register_coalesce.cpp:85:57: warning: unused parameter ‘complete’ [-Wunused-parameter] virtual vec4_instruction emit_urb_write_opcode(bool complete) ^~~~~~~~ src/intel/compiler/test_vec4_cmod_propagation.cpp: In member function ‘virtual brw::dst_reg cmod_propagation_vec4_visitor::make_reg_for_system_value(int)’: src/intel/compiler/test_vec4_cmod_propagation.cpp:60:51: warning: unused parameter ‘location’ [-Wunused-parameter] virtual dst_reg make_reg_for_system_value(int location) ^~~~~~~~ src/intel/compiler/test_vec4_cmod_propagation.cpp: In member function ‘virtual void cmod_propagation_vec4_visitor::emit_urb_write_header(int)’: src/intel/compiler/test_vec4_cmod_propagation.cpp:85:43: warning: unused parameter ‘mrf’ [-Wunused-parameter] virtual void emit_urb_write_header(int mrf) ^~~ src/intel/compiler/test_vec4_cmod_propagation.cpp: In member function ‘virtual brw::vec4_instruction cmod_propagation_vec4_visitor::emit_urb_write_opcode(bool)’: src/intel/compiler/test_vec4_cmod_propagation.cpp:90:57: warning: unused parameter ‘complete’ [-Wunused-parameter] virtual vec4_instruction *emit_urb_write_opcode(bool complete) ^~~~~~~~ Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 13:47:06 -08:00
Bas Nieuwenhuizen	f67dea5e19	radv: Fix multiview depth clears We were not using the view mask for depth clears, causing only the first view to be cleared. Fixes: `2e86f6b259` "radv: Add multiview clears." Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-17 20:16:26 +00:00
Bas Nieuwenhuizen	9add63a3a5	radv: Remove redundant format check. The switch directly after the check has a default case that returns NULL too, so the effective return value is not changed. Also this check is wrong once we start dealing with formats introduced by an extension (e.g. YUV formats). Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-17 20:09:38 +00:00
Eric Anholt	708d8f4d0a	nir: Fix clamping of uints for image store lowering. I botched some copy-and-paste and clamped to signed int max instead of uint max. Fixes KHR-GL46.shader_image_load_store.multiple-uniforms on skl. Fixes: `d3e046e76c` ("nir: Pull some of intel's image load/store format conversion to nir_format.h") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-17 20:02:22 +00:00
Eric Anholt	00e2cbc049	v3d: Fix the argument type for vir_BRANCH(). Apparently this has been spewing warnings for Jason's clang, but not my gcc.	2018-12-17 09:52:23 -08:00
Eric Anholt	376054fff3	vc4: Reuse nir_format_convert.h in our blend lowering. These helpers came along after and have effectively the same implementation.	2018-12-17 09:52:23 -08:00
Samuel Pitoiset	445867c80d	radv: report Vulkan version 1.1.90 for real I thought the value was correctly propagated, but actually not. Fixes: `2ac6d55f38` ("radv: bump reported version to 1.1.90") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-17 17:51:48 +01:00
Jason Ekstrand	cae373117c	anv,radv: Re-enable VK_EXT_pci_bus_info Now at version 2 with the fixed header. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 10:42:35 -06:00
Jason Ekstrand	e5b59fe6f5	vulkan: Update the XML and headers to 1.1.96 Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-17 10:41:56 -06:00
Rhys Perry	ef198e8c6a	radv: switch from nir_bcsel to nir_b32csel Fixes: `191a1dce92` ('nir: Add 1-bit Boolean opcodes') Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-17 14:52:39 +00:00
Rhys Perry	bba94a3d85	radv: don't set surf_index for stencil-only images Fixes: `f8d5b377c8` ('radv: set cb base tile swizzles for MRT speedups (v4)') Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108116 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-17 14:52:10 +00:00
Ian Romanick	9dc135efa1	nir: Release per-block metadata in nir_sweep nir_sweep already marks all metadata invalid, so it is safe to release the memory here too. mean soft fp64 using uint64: 1,342,759,331 => 1,010,670,475 gfxbench5 aztec ruins high 11: 63,555,571 => 61,889,811 deus ex mankind divided 148: 62,845,304 => 62,829,640 deus ex mankind divided 2890: 71,922,686 => 71,922,686 dirt showdown 676: 69,238,607 => 69,238,607 dolphin ubershaders 210: 77,822,072 => 77,822,072 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-16 14:39:56 -08:00
Ian Romanick	7adafd6e1c	nir: Fix holes in nir_instr Found using pahole. Changes in peak memory usage according to Valgrind massif: mean soft fp64 using uint64: 1,343,991,403 => 1,342,759,331 gfxbench5 aztec ruins high 11: 63,619,971 => 63,555,571 deus ex mankind divided 148: 62,887,728 => 62,845,304 deus ex mankind divided 2890: 72,399,750 => 71,922,686 dirt showdown 676: 69,464,023 => 69,238,607 dolphin ubershaders 210: 78,359,728 => 77,822,072 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-16 14:39:56 -08:00
Ian Romanick	8161a87b24	nir/phi_builder: Use per-value hash table to store [block] -> def mapping Replace the old array in each value with a hash table in each value. Changes in peak memory usage according to Valgrind massif: mean soft fp64 using uint64: 5,499,875,082 => 1,343,991,403 gfxbench5 aztec ruins high 11: 63,619,971 => 63,619,971 deus ex mankind divided 148: 62,887,728 => 62,887,728 deus ex mankind divided 2890: 72,402,222 => 72,399,750 dirt showdown 676: 74,466,431 => 69,464,023 dolphin ubershaders 210: 109,630,376 => 78,359,728 Run-time change for a full run on shader-db on my Haswell desktop (with -march=native) is 1.22245% +/- 0.463879% (n=11). This is about +2.9 seconds on a 237 second run. The first time I sent this version of this patch out, the run-time data was quite different. I had misconfigured the script that ran the test, and none of the tests from higher GLSL versions were run. These are generally more complex shaders, and they are more affected by this change. The previous version of this patch used a single hash table for the whole phi builder. The mapping was from [value, block] -> def, so a separate allocation was needed for each [value, block] tuple. There was quite a bit of per-allocation overhead (due to ralloc), so the patch was followed by a patch that added the use of the slab allocator. The results of those two patches was not quite as good: mean soft fp64 using uint64: 5,499,875,082 => 1,343,991,403 gfxbench5 aztec ruins high 11: 63,619,971 => 63,619,971 deus ex mankind divided 148: 62,887,728 => 62,887,728 deus ex mankind divided 2890: 72,402,222 => 72,402,222 * dirt showdown 676: 74,466,431 => 72,443,591 * dolphin ubershaders 210: 109,630,376 => 81,034,320 * The * denote tests that are better now. In the tests that are the same in both patches, the "after" peak memory usage was at a different location. I did not check the local peaks. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-16 14:39:56 -08:00
Ian Romanick	e3043e1276	util/hash_table: Add _mesa_hash_table_init function Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-16 14:39:56 -08:00
Jason Ekstrand	db197fdb6c	st/nir: Use nir_src_as_uint for tokens Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-16 15:07:28 -06:00
Jason Ekstrand	47e1e0692c	radv: Fix a stupid if in gather_intrinsic_info Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 15:06:07 -06:00
Jason Ekstrand	6bcd2af086	nir/algebraic: Add some optimizations for D3D-style Booleans D3D Booleans use a 32-bit 0/-1 representation. Because this previously matched NIR exactly, we didn't have to really optimize for it. Now that we have 1-bit Booleans, we need some specific optimizations to chew through the D3D12-style Booleans. Shader-db results on Kaby Lake: total instructions in shared programs: 15136811 -> 14967944 (-1.12%) instructions in affected programs: 2457021 -> 2288154 (-6.87%) helped: 8318 HURT: 10 total cycles in shared programs: 373544524 -> 359701825 (-3.71%) cycles in affected programs: 151029683 -> 137186984 (-9.17%) helped: 7749 HURT: 682 total loops in shared programs: 4431 -> 4399 (-0.72%) loops in affected programs: 32 -> 0 helped: 21 HURT: 0 total spills in shared programs: 10290 -> 10051 (-2.32%) spills in affected programs: 2532 -> 2293 (-9.44%) helped: 18 HURT: 18 total fills in shared programs: 22203 -> 21732 (-2.12%) fills in affected programs: 3319 -> 2848 (-14.19%) helped: 18 HURT: 18 Note that a large chunk of the improvement fixing regressions caused by switching to 1-bit Booleans. Previously, our ability to optimize D3D booleans was improved by using the D3D representation directly in NIR. Now that NIR does 1-bit bools, we need a few more optimizations. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Eric Anholt <eric@anholt.net> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	3b30814791	nir/algebraic: Optimize 1-bit Booleans Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	44227453ec	nir: Switch to using 1-bit Booleans for almost everything This is a squash of a few distinct changes: glsl,spirv: Generate 1-bit Booleans Revert "Use 32-bit opcodes in the NIR producers and optimizations" Revert "nir/builder: Generate 32-bit bool opcodes transparently" nir/builder: Generate 1-bit Booleans in nir_build_imm_bool Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	11dc130779	nir: Add a bool to int32 lowering pass We also enable it in all of the NIR drivers. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	191a1dce92	nir: Add 1-bit Boolean opcodes We also have to add support for 1-bit integers while we're here so we get 1-bit variants of iand, ior, and inot. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	615cc26b97	nir/algebraic: Generalize an optimization This just makes it nicely scale across bit sizes. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	487514ae61	nir/large_constants: Properly handle 1-bit bools Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	3191a82372	nir: Add support for 1-bit data types This commit adds support for 1-bit Booleans and integers. Booleans obviously take a value of true or false. Because we have to define the semantics of 1-bit signed and unsigned integers, we define uint1_t to take values of 0 and 1 and int1_t to take values of 0 and -1. 1-bit arithmetic is then well-defined in the usual way, just with fewer bits. The definition of int1_t and uint1_t doesn't usually matter but we do need something for purposes of constant folding. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	2fe8708ffd	nir/constant_expressions: Rework Boolean handling This commit contains three related changes. First, we define boolN_t for N = 8, 16, and 64 and move the definition of boolN_vec to the loop with the other vec definitions. Second, there's no reason why we need the != 0 on the source because that happens implicitly when it's converted to bool. Third, for destinations, we use a signed integer type and just do -(int)bool_val which will give us the 0/-1 behavior we want and neatly scales to all bit widths. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	80e8dfe9de	nir: Rename Boolean-related opcodes to include 32 in the name This is a squash of a bunch of individual changes: nir/builder: Generate 32-bit bool opcodes transparently nir/algebraic: Remap Boolean opcodes to the 32-bit variant Use 32-bit opcodes in the NIR producers and optimizations Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' */.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' */.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' */.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' */.c sed -i 's/nir_op_$[fiu]lt$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fiu]ge$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fiu]ne$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fiu]eq$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fi]$ne32g/nir_op_\1neg/g' */.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' */.c Use 32-bit opcodes in the NIR back-ends Generated with a little hand-editing and the following sed commands: sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' */.c sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' */.c sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' */.c sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' */.c sed -i 's/nir_op_$[fiu]lt$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fiu]ge$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fiu]ne$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fiu]eq$/nir_op_\132/g' */.c sed -i 's/nir_op_$[fi]$ne32g/nir_op_\1neg/g' */.c sed -i 's/nir_op_bcsel/nir_op_b32csel/g' */.c Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	b569093566	nir/algebraic: Make an optimization more specific Later in this series, bool is not going to imply 32-bit. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	517099809a	nir: Drop support for lower_b2f This was originally added for the out-of-tree Mali driver but I think we've all agreed it's easy enough for them to just do in their back-end. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	4bb1a34727	nir/algebraic: Optimize x2b(xneg(a)) -> a Shader-db results on Kaby Lake: total instructions in shared programs: 15072525 -> 15072525 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 This helps prevent regressions in later commits. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	3595a0abf4	nir/constant_folding: Fix source bit size logic Instead of looking at input_sizes[i] which contains the number of components for each source, we look at the bit size of input_types[i]. This fixes a regression in the 1-bit boolean series though I have no idea how we haven't seen it before now. Fixes: `35baee5dce` "nir/constant_folding: fix incorrect bit-size check" Fixes: `9076c4e289` "nir: update opcode definitions for different bit sizes" Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	9f7bd843af	nir/tgsi: Use nir_bany in ttn_kill_if Reviewed-by: Eric Anholt <eric@anholt.net>	2018-12-16 21:03:02 +00:00
Jason Ekstrand	e17426058c	nir/lower_idiv: Use ilt instead of bit twiddling The previous code was creating a boolean by doing an arithmetic right- shift by 31 which produces a boolean which is true if the argument is negative. This is the same as the expression r < 0 which is much simpler and doesn't depend on NIR's representation of booleans. Reviewed-by: Eric Anholt <eric@anholt.net>	2018-12-16 21:03:02 +00:00
Eric Anholt	2977c77758	v3d: Use the original bit size when scalarizing uniform loads. Prevents a regression in jekstrand's 1-bit series. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-16 21:03:01 +00:00
Eric Anholt	91a0251dbc	vc4: Use the original bit size when scalarizing uniform loads. Prevents a regression in jekstrand's 1-bit series. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-16 21:03:01 +00:00
Rhys Perry	bde9f482de	ac: split 16-bit ssbo loads that may not be dword aligned Fixes: `7e7ee82698` ('ac: add support for 16bit buffer loads') Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108114 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-16 14:56:10 +00:00
Rhys Perry	12dc7cb202	ac: refactor visit_load_buffer This is so that we can split different types of loads more easily. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-16 14:56:10 +00:00
Rhys Perry	ed4020fabe	nir: fix constness in nir_intrinsic_align() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-16 14:56:10 +00:00
Jan Vesely	e4f9a37ace	clover: Fix build after clang r348827 CodeGenOptions were moved to Basic. Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> Tested-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Kai Wasserbäch <kai@dev.carbon-project.org> CC: mesa-stable@lists.freedesktop.org	2018-12-16 06:38:10 -05:00
Jon Turney	d512b35b62	glx: Fix compilation with GLX_USE_WINDOWSGL Sadly, the GLX_USE_APPLEGL and GLX_USE_WINDOWSGL cases are not identical (because GLX_USE_WINDOWSGL uses vtables rather than a maze of ifdefs) Include <sys/time.h> again, as functions prototyped by it are used in the GLX_USE_WINDOWSGL path. Make the include guard around the __glxGetMscRate() definition match the one at it's declaration again, as it's referenced from dri_common.c which is built for GLX_USE_WINDOWSGL. Fixes: `a95ec138` ("glx: mandate xf86vidmode only for "drm" dri platforms") Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-12-15 13:49:24 +00:00
Eric Anholt	29927e7524	v3d: Drop in a bunch of notes about performance improvement opportunities. These have all been floating in my head, and while I've thought about encoding them in issues on gitlab once they're enabled, they also make sense to just have in the area of the code you'll need to work in.	2018-12-14 17:48:01 -08:00
Eric Anholt	248a7fb392	v3d: Do uniform pretty-printing in the QPU dump. If you're trying to trace what's going on in a QPU dump, this will definitely help you find your way.	2018-12-14 17:48:01 -08:00
Eric Anholt	a370ed76ab	v3d: Use the uniform pretty-printer in v3d_write_uniforms()'s debug code. This will be a lot easier than my usual "38400.000000? that looks like a viewport scale" decoding strategy.	2018-12-14 17:48:01 -08:00
Eric Anholt	532b6c5671	v3d: Move uniform pretty-printing to its own helper function. I want to reuse it in the QPU dump.	2018-12-14 17:48:01 -08:00
Eric Anholt	78ef05bde4	v3d: Move uinfo->data[] dereference to the top of v3d_write_uniforms(). Follows `3954331aff` ("vc4: Pull uinfo->data[i] dereference out to the top of the loop.") which showed a large performance win for vc4, but also cleans up the code a decent bit.	2018-12-14 17:48:01 -08:00
Eric Anholt	a7e15a5086	v3d: Avoid assertion failures when removing end-of-shader instructions. After generating VIR, we leave c->cursor pointing at the end of the shader. If the shader had dead code at the end (for example from preamble instructions in a shader with no side effects), we would assertion fail that we were leaving the cursor pointing at freed memory. Since anything following DCE should be setting up a new cursor anyway, just clear the cursor at the start.	2018-12-14 17:48:01 -08:00
Eric Anholt	5b2cc03852	v3d: Add support for draw indirect for GLES3.1. In trying to enable compute shaders, I found that a bunch of deqp-gles31's compute stuff wanted to interact with indirect dispatch. This was easy to do on its own.	2018-12-14 17:48:01 -08:00
Eric Anholt	ff80e58b38	v3d: Add missing flagging of SYNCB as a TSY op. Fixes: `f2e41daac5` ("broadcom/vc5: Update QPU instruction pack/unpack for v4.2.")	2018-12-14 17:48:01 -08:00
Eric Anholt	3f9bcf9136	v3d: Make sure that a thrsw doesn't split a multop from its umul24. The thrsw will invalidate rtop, just like accumulators and flags. Caught by simulator assertions in CS imulextended/umulextended tests. Fixes: `90269ba353` ("broadcom/vc5: Use THRSW to enable multi-threaded shaders.")	2018-12-14 17:48:01 -08:00
Eric Anholt	332a5cf6a5	v3d: Add safety checks for resource_create(). This should ease my debugging next time I screw it up.	2018-12-14 17:48:01 -08:00
Eric Anholt	6ad9e8690d	v3d: Add support for texturing from linear. Just like vc4, we have to support linear shared BOs for X11 on arbitrary displays. When we're faced with a request to texture from one of those, make a shadow image that we copy using the TFU at the start of the draw call.	2018-12-14 17:48:01 -08:00
Eric Anholt	976ea90bdc	v3d: Add support for using the TFU to do some blits. This will be useful in particular for blits from raster to UIF for X11.	2018-12-14 17:48:01 -08:00
Eric Anholt	e5b4d1f55f	v3d: Don't forget to bump the number of writes when doing TFU ops. generatemipmap is just filling out the rest of the mipmap that's already been written (by a mapping or a draw call), so it didn't matter. As I reuse the TFU code for linear-to-UIF conversions, it'll start mattering.	2018-12-14 17:48:01 -08:00
Eric Anholt	485df2574e	v3d: Set up the right stride for raster TFU. I didn't have any raster images in the generatemipmap path, so the pixels-vs-bytes mixup didn't matter here.	2018-12-14 17:48:01 -08:00
Eric Anholt	e731d53716	v3d: Don't forget to wait for our TFU job before rendering from it. Otherwise we may race to read old contents. This didn't show up in the CTS and piglit for me, but it did once I started using the TFU to do linear->UIF blits for X11. Fixes: `2ebca177dc` ("v3d: Use the TFU to do generatemipmap.")	2018-12-14 17:48:01 -08:00
Ilia Mirkin	153d3fc5f9	nvc0: always keep TSC slot 0 bound to fix TXF Same as on nv50, the TXF op always uses the TSC bound to slot 0, returning blank values if nothing is bound. An earlier change arranges for the TSC entries list to always have valid data at entry 0, so here we just make use of it. Fixes arb_texture_buffer_object-subdata-sync among others. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-12-14 20:01:31 -05:00
Ilia Mirkin	4aeaf89aa7	nvc0: replace use of explicit default_tsc with entry 0 This was used for implementing FBFETCH. However that uses TXF, which doesn't do much with a TSC. The only important bit is that sRGB-decoding works as expected, which we can achieve since all samplers we ever generate enable sRGB-decoding. Always point to entry 0 in the TSC table, and ensure that even before it ever gets initialized, the sRGB-decoding enable bit is set. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-12-14 20:01:31 -05:00
Rob Clark	5f9085638a	freedreno/a6xx: fix corrupted uniforms For older gen's fd_wfi() is used to conditionally insert a WFI if there hasn't already been one since last draw. But this doesn't work out well with stateobj since the order the stateobj is evaluated might not be what you expect. (Ie. stateobj might not be evaluated until a later draw if there is no geometry from the current draw in a given tile.) Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-14 15:01:30 -05:00
Alex Deucher	4db4b3447d	pci_ids: add new vega20 pci id Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: mesa-stable@lists.freedesktop.org	2018-12-14 14:48:39 -05:00
Alex Deucher	56cf25a114	pci_ids: add new vega10 pci ids Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: mesa-stable@lists.freedesktop.org	2018-12-14 14:48:18 -05:00
Rafael Antognolli	5c454661c6	i965/gen9: Add workarounds for object preemption. Gen9 hardware requires some workarounds to disable preemption depending on the type of primitive being emitted. We implement this by adding a function that checks the primitive type and number of instances right before the 3DPRIMITIVE. For now, we just ignore blorp. The only primitive it emits is 3DPRIM_RECTLIST, and since it's not listed in the workarounds, we can safely leave preemption enabled when it happens. Or it will be disabled by a previous 3DPRIMITIVE, which should be fine too. v3: - Apply missing workarounds for instanced rendering and line loop (Ken) - Move workaround code to brw_draw_single_prim() Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-12-14 09:40:27 -08:00
Rafael Antognolli	d8b50e152a	i965/gen10+: Enable object level preemption. Set bit when initializing context. v3: - Always toggle preemption bool to false before enabling it for the first time, so the state gets emitted (Chris Wilson). - Emit end of pipe sync with PIPE_CONTROL_RENDER_TARGET_FLUSH (Ken) Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-12-14 09:40:27 -08:00
Rafael Antognolli	019a92ffa4	intel/genxml: Add register for object preemption. Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-12-14 09:40:27 -08:00
Ian Romanick	a6b7d1151c	util/slab: Rename slab_mempool typed parameters to mempool Now everything with type 'struct slab_child_pool ' is name pool, and everything with type 'struct slab_mempool ' is named mempool. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>	2018-12-14 07:36:05 -08:00
Ian Romanick	ba5402ec9a	nir/phi_builder: Internal users should use nir_phi_builder_value_set_block_def too Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-14 07:36:05 -08:00
Christian Gmeiner	489ffaf0c1	etnaviv: drop redundant ctx function parameter There is no need to have an extra ctx paramter as all the other parameters carry all the needed information. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>	2018-12-14 11:23:00 +01:00
Kenneth Graunke	0b44644ca6	genxml: Consistently use a numeric "MOCS" field When we first started using genxml, we decided to represent MOCS as an actual structure, and pack values. However, in many places, it was more convenient to use a numeric value rather than treating it as a struct, so we added secondary setters in a bunch of places as well. We were not entirely consistent, either. Some places only had one. Gen6 had both kinds of setters for STATE_BASE_ADDRESS, but newer gens only had the struct-based setters. The names were sometimes "Constant Buffer Object Control State" instead of "Memory", making it harder to find. Many had prefixes like "Vertex Buffer MOCS"...in a vertex buffer packet...which is a bit redundant. On modern hardware, MOCS is simply an index into a table, but we were still carrying around the structure with an "Index to MOCS Table" field, in addition to the direct numeric setters. This is clunky - we really just want a number on new hardware. This patch eliminates the struct-based setters, and makes the numeric setters be consistently called "MOCS". We leave the struct definition around on Gen7-8 for reference purposes, but it is unused. v2: Drop bonus "Depth Buffer MOCS" fields on Gen7.5 and Gen9 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>	2018-12-14 00:44:54 -08:00
Timothy Arceri	a2ec78883f	nir: fix opt_if_loop_last_continue() The pass did not correctly handle loops ending in: if ssa_7 { block block_8: /* preds: block_7 / continue / succs: block_1 / } else { block block_9: / preds: block_7 / break / succs: block_11 */ } The break will get eliminated by another opt but if this pass gets called first (as it does on RADV) we ended up inserting instructions after the break. Fixes: `5921a19d4b` ("nir: add if opt opt_if_loop_last_continue()") Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-12-14 17:21:35 +11:00
Rob Clark	0ac5acaeaa	freedreno/a6xx: fix resource_copy_region() pctx->resource_copy_region() needs to fall back to sw copy for non-renderable formats. But previously for things that we could not use the blitter for, would fall back to 3d. Which won't work if 3d can't render to the dst format either. Instead rework things to fallback to fd_resource_copy_region(), which will try 3d core and then fall back to memcpy(). Fixes (for example) dEQP-GLES3.functional.texture.format.sized.2d.rgb9_e5_pot Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-13 15:51:01 -05:00
Rob Clark	4ec2f6129b	freedreno: move fd_resource_copy_region() Code-motion prep for next patch. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-13 15:51:01 -05:00
Rob Clark	57b76ee2a8	freedreno/a6xx: more blitter fixes Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-13 15:51:01 -05:00
Rob Clark	d15fc787bc	freedreno: update generated headers Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-13 15:51:01 -05:00
Rob Clark	532f8c0043	gallium/aux: add is_unorm() helper We already had one for is_snorm() but not unorm. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-13 15:51:01 -05:00
Rob Clark	85cd4df47f	freedreno/a6xx: fix blitter crash Fixes a crash with unsupported formats in dEQP-GLES3.functional.texture.format.sized.2d.rgb9_e5_pot Also fixes gpu hangs with some formats that are supported, but which we don't know what internal-format to use for the blitter, for ex dEQP-GLES3.functional.texture.format.sized.2d_array.rgb10_a2_pot Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-13 15:51:01 -05:00
Rob Clark	cca1e9606c	freedreno/ir3: don't remove unused input components Fixes: `0d240c2214` freedreno/ir3: don't fetch unused tex components Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-13 15:51:01 -05:00
Rob Clark	c19c4bf488	freedreno/ir3: fix crash Fixes a crash in dEQP-GLES3.functional.shaders.fragdepth.compare.fragcoord_z Fixes: `0d240c2214` freedreno/ir3: don't fetch unused tex components Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-13 15:51:01 -05:00
Rob Clark	3e8e033f4c	freedreno: also set DUMP flag on shaders If we emit shader as a pointer to a GEM object, also set the RELOC_DUMP flag as a hint to kernel that this is a useful buffer to snapshot for debug dumps. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-13 15:51:01 -05:00
Rob Clark	4cd016b5d6	freedreno: debug GEM obj names With a recent enough kernel, set debug names for GEM BOs, which will show up in $debugfs/gem Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-13 15:51:01 -05:00
Rob Clark	7ef722861b	freedreno/drm: sync uapi and enable softpin Pull in updated UAPI and use kernel API version to enable softpin. Since MSM_SUBMIT_BO_DUMP flag was added at same time, use that to signal to kernel that cmdstream buffers are useful to dump for debugging/cmdstream-traces. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-13 15:51:01 -05:00
Eric Anholt	4407e688cd	nir: Move intel's half-float image store lowering to to nir_format.h. I needed the same function for v3d. This was originally in `d3e046e76c` ("nir: Pull some of intel's image load/store format conversion to nir_format.h") before we made am istake about simplifying the function. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-13 12:24:26 -08:00
Eric Anholt	3a417a044e	Revert "intel: Simplify the half-float packing in image load/store lowering." This reverts commit `06fbcd2cd5`. nir_pack_half_2x16_split isn't vectorizable, it's 1-component only, thus why we had this split-scalar code in the first place. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-13 12:24:24 -08:00
Eric Anholt	c2c44dba7a	nir: Print the format of image variables. This helps a lot when debugging image load/store lowering on large testcases. Unfortunately the Mesa enum name stuff is under src/mesa and we can't get at it from the compiler. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-13 12:24:12 -08:00
Eric Anholt	19ffcba161	mesa/st: Expose compute shaders when NIR support is advertised. We have a NIR path, and V3D doesn't have TGSI input for compute (only what TTN can handle for the various gallium-internal shaders). Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2018-12-13 11:44:47 -08:00
Dave Airlie	b3f2b03ece	radv/xfb: fix counter buffer bounds checks. If we gave this function 0 counter buffers, we'd still try and access pCounterBuffers[0] as this check was incorrect. Fixes crash with ext_transform_feedback-pipeline-basic-primgen on zink on radv. Fixes: `677b496b6` (radv: fix begin/end transform feedback with 0 counter buffers.) Signed-off-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-13 19:27:05 +00:00
Jason Ekstrand	9ebc00f32e	i965: Enable nir_opt_idiv_const for 32 and 64-bit integers The pass should work for all bit sizes but it's less clear that the extra instructions are worth it on small integers. Also, the hardware doesn't do mul_high on anything other than 32-bit integers and, absent any decent mechanism for testing the pass on 8 and 16-bit types, it's probably best to just leave it disabled for now. Shader-db results on Sky Lake: total instructions in shared programs: 15105795 -> 15111403 (0.04%) instructions in affected programs: 72774 -> 78382 (7.71%) helped: 0 HURT: 265 Note that hurt here actually means helped because we're getting rid of integer quotient operations (which are a send on some platforms!) and replacing them with fairly cheap ALU ops. Reviewed-by: Ian Romanick ian.d.romanick@intel.com	2018-12-13 17:49:48 +00:00
Jason Ekstrand	455ec7327d	i965/vec4: Implement nir_op_uadd_sat Reviewed-by: Ian Romanick ian.d.romanick@intel.com	2018-12-13 17:49:48 +00:00
Ian Romanick	e639d39faf	i965/fs: Implement nir_op_uadd_sat Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-13 17:49:48 +00:00
Jason Ekstrand	74492ebad9	nir: Add a pass for lowering integer division by constants It's a reasonably well-known fact in the world of compilers that integer divisions by constants can be replaced by a multiply, an add, and some shifts. This commit adds such an optimization to NIR for easiest case of udiv. Other division operations will be added in following commits. In order to provide some additional driver control, the pass takes a minimum bit size to optimize. Reviewed-by: Ian Romanick ian.d.romanick@intel.com	2018-12-13 17:49:48 +00:00
Ian Romanick	090e282407	nir: Add a saturated unsigned integer add opcode Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-13 17:49:48 +00:00
Jason Ekstrand	39198a1238	nir/lower_int64: Add support for [iu]mul_high Reviewed-by: Ian Romanick ian.d.romanick@intel.com	2018-12-13 17:49:48 +00:00
Jason Ekstrand	9525971e2b	nir: Allow [iu]mul_high on non-32-bit types Reviewed-by: Ian Romanick ian.d.romanick@intel.com	2018-12-13 17:49:48 +00:00
Emil Velikov	a95ec13879	glx: mandate xf86vidmode only for "drm" dri platforms Currently we have the three dri "platforms" - drm, apple and windows. Since xf86vidmode is a thing only for the drm one, adjust the preprocessor guards and correctly check for the dependency. v2: terminate the GLX_USE_WINDOWSGL hunk Cc: Jon TURNEY <jon.turney@dronecode.org.uk> Fixes: `5bc509363b` ("glx: make xf86vidmode mandatory for direct rendering") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2018-12-13 17:38:19 +00:00
Alejandro Piñeiro	c7bdcd67aa	nir: remove unused variable To avoid the following warning: ./src/compiler/nir/nir_loop_analyze.c:807:16: warning: unused variable ‘ns’ [-Wunused-variable] nir_shader *ns = impl->function->shader; Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-13 16:35:21 +01:00
Erik Faye-Lund	e888f28d1f	virgl: work around bad assumptions in virglrenderer Virglrenderer does the wrong thing when given an instance divisor; it tries to use the element-index rather than the binding-index as the argument to glVertexBindingDivisor(). This worked fine as long as there was a 1:1 relationship between elements and bindings, which was the case util `19a91841c3` "st/mesa: Use Array._DrawVAO in st_atom_array.c.". So let's detect instance divisors, and restore a 1:1 relationship in that case. This will make old versions of virglrenderer behave correctly. For newer versions, we can consider making a better interface, where the instance divisor isn't specified per element, but rather per binding. But let's save that for another day. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Fixes: `19a91841c3` "st/mesa: Use Array._DrawVAO in st_atom_array.c." Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Tested-By: Gert Wollny <gert.wollny@collabora.com>	2018-12-13 16:12:10 +01:00
Erik Faye-Lund	8447b64238	virgl: wrap vertex element state in a struct This just has one member for now; the handle. But this is about to change. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Tested-By: Gert Wollny <gert.wollny@collabora.com>	2018-12-13 16:12:10 +01:00
Erik Faye-Lund	b702ff5378	virgl: simplify virgl_hw_set_index_buffer Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Tested-By: Gert Wollny <gert.wollny@collabora.com>	2018-12-13 16:12:10 +01:00
Erik Faye-Lund	00143a6241	virgl: simplify virgl_hw_set_vertex_buffers Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de> Tested-By: Gert Wollny <gert.wollny@collabora.com>	2018-12-13 16:12:10 +01:00
Juan A. Suarez Romero	0991085f66	docs: update calendar, add news item and link release notes for 18.2.7 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2018-12-13 15:45:20 +01:00
Juan A. Suarez Romero	e0b0995dcf	docs: add sha256 checksums for 18.2.7 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `e90429cc6d`)	2018-12-13 15:42:49 +01:00
Juan A. Suarez Romero	c8a17b45ea	docs: add release notes for 18.2.7 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `419ee20097`)	2018-12-13 15:42:46 +01:00
Samuel Pitoiset	5088ba2aeb	radv: don't check if format is depth in radv_image_can_enable_hile() This is always TRUE if htile_size is not 0. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-13 09:21:21 +01:00
Samuel Pitoiset	eb0034fe28	radv: check if addrlib enabled HTILE in radv_image_can_enable_htile() When hile_size is 0, we can't enable HTILE. This doesn't change anything, except not calling radv_image_alloc_htile(). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-13 09:21:19 +01:00
Samuel Pitoiset	d8325f1f07	radv: switch on EOP when primitive restart is enabled with triangle strips Otherwise, Yakuza hangs the GPU with DXVK. We don't know if linetrip and pointlist are affected, so my point is to do that only for triangle strips. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-13 09:21:16 +01:00
Samuel Pitoiset	74cf3b627c	radv: allow to skip DCC decompressions with the new predicate Feral games aren't affected because they don't decompress DCC. F1 2018 has one DCC decompression per frame, but I don't see any performance improvements. This new predicate will be probably more useful for DCC/MSAA. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-13 09:21:14 +01:00
Samuel Pitoiset	3a5adc2879	radv: add a predicate for reflecting DCC decompression state It's somehow similar to the FCE predicate. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-13 09:21:10 +01:00
Jordan Justen	c506eae53d	i965/compute: Emit GPGPU_WALKER in genX_state_upload Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-12 22:28:06 -08:00
Jordan Justen	1b85c605a6	i965/genX_state: Add register access functions Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-12 22:28:02 -08:00
Eric Anholt	06fbcd2cd5	intel: Simplify the half-float packing in image load/store lowering. This was noted by Jason in review when I tried to make a helper for the old path. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-12 16:09:48 -08:00
Eric Anholt	d3e046e76c	nir: Pull some of intel's image load/store format conversion to nir_format.h I needed the same functions for v3d. Note that the color value in the Intel lowering has already been cut down to image.chans num_components. v2: Drop the half float one, since it was a 1-liner after cleanup. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-12 16:09:43 -08:00
Eric Anholt	19c7cba2ab	nir: Add some more consts to the nir_format_convert.h helpers. Most of the bits were constant, but a few were missed. Avoids warnings from v3d's upcoming static const bits declarations. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-12 16:09:37 -08:00
Timothy Arceri	9e6b39e1d5	nir: detect more induction variables This allows loop analysis to detect inductions variables that are incremented in both branches of an if rather than in a main loop block. For example: loop { block block_1: /* preds: block_0 block_7 / vec1 32 ssa_8 = phi block_0: ssa_4, block_7: ssa_20 vec1 32 ssa_9 = phi block_0: ssa_0, block_7: ssa_4 vec1 32 ssa_10 = phi block_0: ssa_1, block_7: ssa_4 vec1 32 ssa_11 = phi block_0: ssa_2, block_7: ssa_21 vec1 32 ssa_12 = phi block_0: ssa_3, block_7: ssa_22 vec4 32 ssa_13 = vec4 ssa_12, ssa_11, ssa_10, ssa_9 vec1 32 ssa_14 = ige ssa_8, ssa_5 / succs: block_2 block_3 / if ssa_14 { block block_2: / preds: block_1 / break / succs: block_8 / } else { block block_3: / preds: block_1 / / succs: block_4 / } block block_4: / preds: block_3 / vec1 32 ssa_15 = ilt ssa_6, ssa_8 / succs: block_5 block_6 / if ssa_15 { block block_5: / preds: block_4 / vec1 32 ssa_16 = iadd ssa_8, ssa_7 vec1 32 ssa_17 = load_const (0x3f800000 / 1.000000/) / succs: block_7 / } else { block block_6: / preds: block_4 / vec1 32 ssa_18 = iadd ssa_8, ssa_7 vec1 32 ssa_19 = load_const (0x3f800000 / 1.000000/) / succs: block_7 / } block block_7: / preds: block_5 block_6 / vec1 32 ssa_20 = phi block_5: ssa_16, block_6: ssa_18 vec1 32 ssa_21 = phi block_5: ssa_17, block_6: ssa_4 vec1 32 ssa_22 = phi block_5: ssa_4, block_6: ssa_19 / succs: block_1 */ } Unfortunatly GCM could move the addition out of the if for us (making this patch unrequired) but we still cannot enable the GCM pass without regressions. This unrolls a loop in Rise of The Tomb Raider. vkpipeline-db results (VEGA): Totals from affected shaders: SGPRS: 88 -> 96 (9.09 %) VGPRS: 56 -> 52 (-7.14 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 2168 -> 4560 (110.33 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 4 -> 4 (0.00 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32211	2018-12-13 10:58:35 +11:00
Timothy Arceri	c03d6e61cc	nir: reword code comment Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-12-13 10:58:35 +11:00
Timothy Arceri	48b40380e3	nir: in loop analysis track actual control flow type This will allow us to improve analysis to find more induction variables. Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-12-13 10:58:35 +11:00
Danylo Piliaiev	5921a19d4b	nir: add if opt opt_if_loop_last_continue() Removing the last continue can allow more loops to unroll. Also inserting code into the if branch can allow the various if opts to progress further. The insertion of some loops into the if branch also reduces VGPR use in some shaders. vkpipeline-db results (VEGA): Totals from affected shaders: SGPRS: 6552 -> 6576 (0.37 %) VGPRS: 6544 -> 6532 (-0.18 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 481952 -> 478032 (-0.81 %) bytes LDS: 13 -> 13 (0.00 %) blocks Max Waves: 241 -> 242 (0.41 %) Wait states: 0 -> 0 (0.00 %) Shader-db results radeonsi (VEGA): Totals from affected shaders: SGPRS: 168 -> 168 (0.00 %) VGPRS: 144 -> 140 (-2.78 %) Spilled SGPRs: 157 -> 157 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 8524 -> 8488 (-0.42 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 7 -> 7 (0.00 %) Wait states: 0 -> 0 (0.00 %) v2: (Timothy Arceri): - allow for continues in either branch - move any trailing loops inside the if as well as blocks. - leave nir_opt_trivial_continues() to actually remove the continue. Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=32211	2018-12-13 10:58:35 +11:00
Timothy Arceri	721566bddb	nir: rework force_unroll_array_access() Here we rework force_unroll_array_access() so that we can reuse the induction variable detection in a following patch. Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-12-13 10:39:51 +11:00
Timothy Arceri	48135f175c	nir: factor out some of the complex loop unroll code to a helper Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-12-13 10:34:48 +11:00
Jordan Justen	7fe4e0ad5d	docs: Document GitLab merge request process (email alternative) This documents a process for using GitLab Merge Requests as an second way to submit code changes for Mesa. Only one of the two methods is allowed for each patch series. We will not require all patches to be emailed. Some code changes may be reviewed and merged without any discussion on the mesa-dev email list. v2: * No longer require email. Allow submitter to choose email or a GitLab merge request. * Various feedback from Brian, Daniel, Dylan, Eric, Erik, Jason, Matt, Michel and Rob. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Eric Anholt <eric@anholt.net> Acked-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Acked-by: Rob Clark <robdclark@gmail.com>	2018-12-12 10:05:29 -08:00
Rhys Kidd	ff6f1dd0d3	meson: libfreedreno depends upon libdrm (for fence support) Error message building freedreno Gallium driver with meson: ../src/gallium/drivers/freedreno/freedreno_fence.c:27:21: fatal error: libsync.h: No such file or directory \#include <libsync.h> Fixes: `4aa69cc425` ("meson: build freedreno") Signed-off-by: Rhys Kidd <rhyskidd@gmail.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-12 09:01:06 -08:00
Jason Ekstrand	ca98902d09	nir: Document the function inlining process This has thrown a few people off recently and it's good to have the process and all the rational for it documented somewhere. A comment at the top of nir_inline_functions seems as good a place as any. Acked-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-12-12 08:32:32 -06:00
Jason Ekstrand	5749c0ebc4	intel/blorp: Assert that we don't re-layout a compressed surface Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-12 08:32:32 -06:00
Jason Ekstrand	e4fdc650f1	anv/pipeline: Set the correct binding count for compute shaders Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-12-12 08:32:25 -06:00
Samuel Pitoiset	2ac6d55f38	radv: bump reported version to 1.1.90 After going through the spec changelog, it looks like RADV is up to date. Note that ANV also reports 1.1.90. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-12 13:51:16 +01:00
Erik Faye-Lund	f856f50194	virgl: force linear texturing support When I made sure that half-float texture-filtering was required for ES3, I didn't realize that virgl doesn't report support for this correctly. This regressed the GLES version available on top of several drivers, including i965 from 3.2 to 2.0. This is going to need protocol changes to fix properly, so let's just restore the previous behavior by enabling floating-point filtering unconditionally for now. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Fixes: `fcf9fcee3c` "mesa/main: do not require float-texture filtering for es3" Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2018-12-12 11:44:47 +01:00
Iago Toral Quiroga	3918943211	intel/compiler: do not copy-propagate strided regions to ddx/ddy arguments The implementation of these opcodes in the generator assumes that their arguments are packed, and it generates register regions based on that assumption. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-12 08:09:45 +01:00
Jason Ekstrand	a10a450db2	anv: Advertise support for MinLod on Skylake+ These are usually used for dealing with sparse resources but there's no reason why we can't hook them up before we have sparse. We have the hardware; let's light it up. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-12-11 21:26:23 -06:00
Jason Ekstrand	cb98e0755f	intel/fs: Support min_lod parameters on texture instructions We have to lower some shadow instructions because they don't exist in hardware and we have to lower txb+offset+clamp because the message gets too big and we run into the sampler message length limit of 11 regs. Acked-by: Ian Romanick <ian.d.romanick@intel.com>	2018-12-11 21:26:23 -06:00
Jason Ekstrand	4ef8f46fd1	nir/lower_tex: Add lowering for some min_lod cases Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-12-11 21:26:23 -06:00
Jason Ekstrand	4a691cfa7e	nir/lower_tex: Modify txd instructions instead of replacing them I don't know if one is better than the other or not but this approach has the advantage that we never forget to copy information over and we're not hard-coding quite as many assumptions. It's also a lot simpler and much less code. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-12-11 21:26:23 -06:00
Jason Ekstrand	5a968ae473	nir/lower_tex: Simplify lower_gradient logic Instead of having to call two different lower_gradient functions based on whether or not it's a cube, just make lower_gradient handle cubes. This significantly simplifies some of the logic. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-12-11 21:26:23 -06:00
Jason Ekstrand	caeffe7549	spirv: Add support for MinLod Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-12-11 21:26:23 -06:00
Jason Ekstrand	e1ef6c3c29	intel/ir: Don't allow allocating zero registers This simple check helps catch bugs early that can end up propagating into later stages of the compile and triggering strange asserts. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-12-11 21:26:23 -06:00
Roland Scheidegger	86c45fe960	gallivm: remove unused float coord wrapping for aos sampling AoS sampling tries to use integers for coord wrapping when possible, as it should be faster. However, for AVX, this was suboptimal, because only floats can use 8x32bit vectors, whereas integers have to be split into 4x32bit vectors. (I believe part of why it was slower was also that at least earlier llvm versions had trouble optimizing it properly, since you can still do simple bit ops with 8x32bit vectors, so a sequence of int add / and / int add / and with such vectors would actually end up doing 128bit inserts/extracts between the operations instead of just doing the cheap 128bit ands.) Hence, a special float coord wrapping path was added to AoS sampling. But this path was actually disabled for a long time already, since we found that just splitting everything before entering the AoS path was still sligthly faster usually, so none of this float coord wrapping code was used anymore (AoS sampling code, when avx2 isn't supported, never sees vectors with length > 4). I thought it might be useful some day again, but I'm not interested anymore in optimizing for very weird instruction sets which have support for 256bit vectors for floats but not for ints, so just drop it. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2018-12-12 03:50:03 +01:00
Emil Velikov	721c296bdc	docs: update calendar, add news item and link release notes for 18.3.1 Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-12-11 21:25:18 +00:00
Emil Velikov	5391b65ed1	docs: add sha256 checksums for 18.3.1 Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-12-11 21:21:42 +00:00
Emil Velikov	512bd8d3dd	docs: add release notes for 18.3.1 Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-12-11 21:21:41 +00:00
Neil Roberts	8600aa35bd	freedreno: Add .dir-locals to the common directory The commit `aa0fed10d3` moved a bunch of Freedreno code to a common directory. The previous directory had a .dir-locals file for Emacs. This patch copies it to the new directory as well. Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2018-12-11 13:14:08 -08:00
Rob Clark	cfe8220904	mesa/st/nir: fix missing nir_compact_varyings LinkedTransformFeedback is normally populated, which had nerf'd varying packing since the check was introduced. Fixes: `dbd52585fa` st/nir: Disable varying packing when doing transform feedback. Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-11 15:51:34 -05:00
Rob Clark	9e3fc0c1e0	nir: fix spelling typo Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-12-11 15:51:34 -05:00
Jason Ekstrand	8f401b0ce6	anv,radv: Disable VK_EXT_pci_bus_info The Vulkan working group recently discovered that we made a mistake in assuming that PCI domains are 16-bit even though they can potentially be 32-bit values. To fix this, the next spec update will change the types in the VK_EXT_pci_bus_info struct to be 32 bits which will be a backwards-incompatible change. Normally, Khronos tries very hard to never make backwards incompatible changes to specs. Hopefully, the extension is new enough (2 months) that there are no shipping apps which use the extension so this should be safe. This commit disables the extension for both anv and radv in mesa and should be back-ported to 18.3 ASAP so we avoid any potential issues with new apps running on old drivers. I'll send out a commit (which we can also back-port to 18.3 if we really care) to re-enable the extension in both drivers once this week's spec update ships. The one known use of this extension is internal to mesa and will continue working with the extension disabled and will naturally update when we get a new header. Cc: "18.3" <mesa-stable@lists.freedesktop.org> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-11 11:30:05 -06:00
Juan A. Suarez Romero	fb88dcf5ca	docs: extends 18.2 lifecycle As 18.3 was published with some delay, let's extend 18.2 life for another extra release. CC: Andres Gomez <agomez@igalia.com> CC: Dylan Baker <dylan@pnwbakers.com> CC: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Andres Gomez <agomez@igalia.com> Acked-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-11 15:20:10 +01:00
Kristian H. Kristensen	c0de7c21a3	glapi: fixup EXT_multisampled_render_to_texture dispatch There's a few missing and convoluted bits: - FramebufferTexture2DMultisampleEXT Missing sanity check, should be desktop="false" - RenderbufferStorageMultisampleEXT Missing sanity check, is aliased to RenderbufferStorageMultisample. Thus it's set only when desktop GL or GLES2 v3.0+, while the extension is GLES2 2.0+. If we flip the aliasing we'll break indirect GLX, so loosen the version to 2.0. Not perfect, yet this is the most sane thing I could think of. v2: [Emil] Fixup RenderbufferStorageMultisampleEXT, commmit message Cc: Kristian H. Kristensen <hoegsberg@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108974 Fixes: `1b331ae505` ("mesa: Add core support for EXT_multisampled_render_to_texture{,2}") Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-12-10 15:09:07 -08:00
Kristian H. Kristensen	9578dde1c8	freedreno: Fix the Makefile.am fix Commit `b028ce29f0` fixed a typo in src/freedreno/Makefile.am, but ended up breaking the build for freedreno. The typo inadvertently made things work, as we were not supposed to link with libnir or libmesautil to begin with. Those come in through libmesagallium and the typo prevented the duplicated linkage. Fixes: `b028ce29f` ("freedreno: add the missing _la in libfreedreno_ir3_la") Cc: Emil Velikov <emil.velikov@collabora.com>	2018-12-10 14:28:09 -08:00
Matt Turner	f447a13032	i965/fs: Handle V/UV immediates in dump_instructions()	2018-12-10 10:46:56 -08:00
Sagar Ghuge	694eb342a2	intel/compiler: Always print flag subregister number While disassembling the predicate always print flag subregister number to keep grammar same across the generation for assembler tool. v2: Combine consecutive format calls (Matt Turner) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-12-10 10:07:11 -08:00
Sagar Ghuge	e7598c5a62	intel/compiler: Set swizzle to BRW_SWIZZLE_XXXX for scalar region When RepCtrl is set, the swizzle field is ignored by the hardware. In order to ensure a 1-to-1 correspondence between the human-readable disassembly and the binary instruction encoding always set the swizzle to XXXX (all zeros) when it is unused due to RepCtrl Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-12-10 10:06:55 -08:00
Dylan Baker	6d3cbbbe15	meson: Add nir_algebraic_parser_test to suites Just to make it easier to run a nir tests together. Fixes: `a0ae12ca91` ("nir/algebraic: Add unit tests for bitsize validation") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-12-10 09:14:44 -08:00
Emil Velikov	27c4fdfdf8	amd/addrlib: drop si_ci_vi_merged_enum.h from the list Fixes: `776b911365` ("amd/addrlib: update Mesa's copy of addrlib") Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-12-10 16:35:01 +00:00
Emil Velikov	b028ce29f0	freedreno: add the missing _la in libfreedreno_ir3_la Fixes: `aa0fed10d3` ("freedreno: move ir3 to common location") Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-12-10 16:35:01 +00:00
Emil Velikov	b30e37ec64	freedreno: drop duplicate MKDIR_GEN declaration Fixes: `aa0fed10d3` ("freedreno: move ir3 to common location") Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-12-10 16:35:01 +00:00
Rhys Kidd	05c7e726f7	travis: radeonsi and radv require LLVM 7.0 Fixes: `3fbdcd942f` ("amd: remove support for LLVM 6.0") Cc: Marek Olšák <marek.olsak@amd.com> Cc: Jan Vesely <jan.vesely@rutgers.edu> Cc: Andres Gomez <agomez@igalia.com> Cc: Dylan Baker <dylan@pnwbakers.com> Signed-off-by: Rhys Kidd <rhyskidd@gmail.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-12-10 16:20:12 +00:00
Kirill Burtsev	a539316485	loader: free error state, when checking the drawable type Currently we distinguish if the drawable is a window or pixmap by checking xcb_present_select_input throws an error or not. Yet, we don't always free the error state returned by xcb. Cc: Kirill Burtsev <kirill.burtsev@qt.io> Cc: Boyan Ding <boyan.j.ding@gmail.com> Fixes: `6bd9ba7d07` ("loader: Add dri3 helper") Reviewed-by: Emil Velikov <emil.velikov@collabora.com> [Emil: add commit message, fixes tag] Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-12-10 16:19:55 +00:00
Timothy Arceri	032f247921	nir: make use of new nir_cf_list_clone_and_reinsert() helper Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-10 13:59:50 +11:00
Timothy Arceri	6b961eb534	nir: add a new nir_cf_list_clone_and_reinsert() helper Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-10 13:59:50 +11:00
Timothy Arceri	03d7c65ad8	nir: clarify some nit_loop_info member names Following commits will introduce additional fields such as guessed_trip_count. Renaming these will help avoid confusion as our unrolling feature set grows. Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-10 13:59:50 +11:00
Timothy Arceri	de0aee7638	nir: small tidy ups for nir_loop_analyze() Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-10 13:59:50 +11:00
Kenneth Graunke	41a4a6ba6f	i965: Flip arguments to load_register_reg helpers. load_register_imm and load_register_mem take the destination as the first argument, so I'd like load_register_reg to do the same the sake of consistency. Otherwise, reading sequences of mixed LRI/LRM/LRR is needlessly confusing. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-09 18:39:16 -08:00
Kenneth Graunke	34c9dc2537	i965: Delete dead brw_meta_resolve_color prototype. Dead since commit `09e041d61d` (May 2016).	2018-12-09 18:39:16 -08:00
Karol Herbst	77944fb2b7	nv50/ir: fix use-after-free in ConstantFolding::visit opnd() might delete the passed in instruction, but it's used through i->srcExists() later in visit v2: use continue instead return v3: use brackets for the outer if/else chain Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-12-09 18:19:59 +01:00
Karol Herbst	d63a133082	nouveau: use atomic operations for driver statistics multiple threads can write to those at the same time Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-12-09 04:43:20 +01:00
Karol Herbst	a28ff22295	nv50/ir: initialize relDegree staticly this race condition is pretty harmless, but also pretty trivial to fix Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-12-09 04:43:17 +01:00
Eric Anholt	cc6a5e937b	shader-packing	2018-12-07 16:51:12 -08:00
Eric Anholt	09ad0d870c	tfu	2018-12-07 16:49:41 -08:00
Eric Anholt	f1d98204c3	v3d: Fix a leak of the disassembled instruction string during debug dumps. Fixes: `ade416d023` ("broadcom: Add VC5 NIR compiler.")	2018-12-07 16:48:23 -08:00
Eric Anholt	7f8d8b7d27	vc4: Fix a leak of the transfer helper on screen destroy. Fixes: `d009463a65` ("vc4: Switch to using u_transfer_helper for MSAA maps.")	2018-12-07 16:48:23 -08:00
Eric Anholt	3bd73d31a8	v3d: Fix a leak of the transfer helper on screen destroy. Fixes: `7a30517cce` ("broadcom/vc5: Start adding support for rendering to Z32F_S8X24_UINT.")	2018-12-07 16:48:23 -08:00
Eric Anholt	bad95bb13c	v3d: Add VIR dumping of TMU config p0/p1. I had a bit of it for V3D 3.x, but didn't update it for 4.x.	2018-12-07 16:48:23 -08:00
Eric Anholt	1fc78ff3f1	v3d: Simplify VIR uniform dumping using a temporary.	2018-12-07 16:48:23 -08:00
Eric Anholt	5932575299	v3d: Garbage collect unused uniforms code.	2018-12-07 16:48:23 -08:00
Eric Anholt	62a3192112	v3d: Split most of TEXTURE_SHADER_STATE setup out of sampler views. For shader image load/store, we want most of this logic to be shared.	2018-12-07 16:48:23 -08:00
Eric Anholt	8cb1f3bab7	v3d: Avoid confusing auto-indenting in TEXTURE_SHADER_STATE packing Having "v3dx_pack() {" under each #if branch would confuse emacs's indenter.	2018-12-07 16:48:23 -08:00
Eric Anholt	ee9b758053	v3d: Fix handling of texture first_layer offsets for 3D textures. I think this bug predated adding v3d_layer_offset(). Noticed during an unrelated refactor.	2018-12-07 16:48:23 -08:00
Eric Anholt	acecee4c2d	v3d: Return the right gl_SampleMaskIn[] value. It's supposed to be the dispatched sample mask for this pixel, not the GL state's sample mask.	2018-12-07 16:48:23 -08:00
Eric Anholt	6870111051	v3d: Fix a comment typo	2018-12-07 16:48:23 -08:00
Eric Anholt	ca0e4ae4bc	v3d: Convert to using nir_src_as_uint() from const_value derefs. Follows `16870de8a0` ("nir: Use nir_src_is_const and nir_src_as_* in core code") to clean up v3d.	2018-12-07 16:48:23 -08:00
Eric Anholt	503b55c622	v3d: Don't forget to flush writes to UBOs. If someone did TF into a UBO, we might have left the TF job un-flushed at the point of reading.	2018-12-07 16:48:23 -08:00
Eric Anholt	504d06e4c1	v3d: Make an array for frag/vert texture state in the context. This simplifies a bunch of our texture handling, while introducing the slots necessary for adding new shader stages.	2018-12-07 16:48:23 -08:00
Eric Anholt	d1965344ac	v3d: Re-use the wrap mode uniform on V3D 3.3.	2018-12-07 16:48:23 -08:00
Eric Anholt	e94d034a38	v3d: Put default vertex attribute values into the state uploader as well. The default attributes are long-lived (the state struct is cached), and only 256 bytes each.	2018-12-07 16:48:23 -08:00
Eric Anholt	b38e4d313f	v3d: Create a state uploader for packing our shaders together. Shaders are usually quite short, and are private to the context. We can save memory and reduce the work the kernel needs to do at exec time by packing them together in a stream uploader for long-lived state.	2018-12-07 16:48:23 -08:00
Eric Anholt	1911888760	v3d: Update simulator cache flushing code to match the kernel better. We were missing the invalidate between bin and render (possibly relevant for SSBOs), and still trying to flush the nonexistent L2C on 3.3+.	2018-12-07 16:48:23 -08:00
Eric Anholt	2ebca177dc	v3d: Use the TFU to do generatemipmap. This is a separate, dedicated hardware unit for texture layout conversions and mipmap generation.	2018-12-07 16:48:23 -08:00
Eric Anholt	ee0549ff9a	v3d: Add the V3D TFU submit interface to the simulator. The TFU lets us format raster and SAND images into formats that can be read by the texture engine, and do mipmap generation. The UAPI comes from drm-next e69aa5f9b97f ("Merge tag 'drm-misc-next-2018-12-06' of git://anongit.freedesktop.org/drm/drm-misc into drm-next")	2018-12-07 16:48:23 -08:00
Eric Anholt	42652ea51e	v3d: Use combined input/output segments. The HW apparently has some issues (or at least a much more complicated VCM calculation) with non-combined segments, and the closed source driver also uses combined I/O. Until I get the last CTS failure resolved (which does look plausibly like some VPM stomping), let's use combined I/O too.	2018-12-07 16:48:23 -08:00
Eric Anholt	fb9bcf5602	v3d: Add missing OES_half_float_linear support. We were exposing ARB_texture_float, but apparently not the OES subset flag. Fixes regression from GLES3 support to GLES2. Fixes: `fcf9fcee3c` ("mesa/main: do not require float-texture filtering for es3")	2018-12-07 16:48:23 -08:00
Eric Anholt	90e98295a4	v3d: Add support for RGBA_SRGB along with BGRA_SRGB. This is the actual native format for the hardware, without swizzling. Noticed while debugging why GLES3 disappeared.	2018-12-07 16:48:23 -08:00
Kenneth Graunke	f0d51e81c9	intel/blorp: Expand blorp_address::offset to be 64 bits. In the softpin world, surface state base address may be a fixed 64-bit address (with no associated BO). It makes sense to store this in the offset field. But it needs to be the full size. We also update the clear color address to be consistently uint64_t everywhere so we can continue passing intel_miptree_get_clear_color a pointer to the blorp_address's offset field without type mismatches. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2018-12-07 16:35:51 -08:00
Rob Clark	d014af98b7	freedreno/drm: fix memory leak Fix an emberrasing memory leak with the non-softpin submit/rb implementation. Fixes: `f3cc0d2747` freedreno: import libdrm_freedreno + redesign submit Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-07 14:12:12 -05:00
Rob Clark	5c2c1f0a2d	freedreno/ir3: track max flow control depth for a5xx/a6xx Rather than just hard-coding BRANCHSTACK size. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-07 13:49:21 -05:00
Rob Clark	9517037bdc	freedreno/ir3: code-motion Split up ir3_compiler_nir.c a bit before starting to add new stuff for a6xx SSBO/image instructions. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-07 13:49:21 -05:00
Rob Clark	e37351fa57	freedreno/ir3: sync instr/disasm Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-07 13:49:21 -05:00
Rob Clark	0d240c2214	freedreno/ir3: don't fetch unused tex components Detect when a component of an (for example) texture fetch is unused and propagate the updated wrmask back to the parent instruction. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-07 13:49:21 -05:00
Rob Clark	b971afd19e	freedreno/a6xx: blitter fixes Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-07 13:49:21 -05:00
Rob Clark	237ae7daf2	freedreno: update generated headers Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-07 13:49:21 -05:00
Rob Clark	e779725f0b	freedreno/drm: fix relocs in nested stateobjs If we have an reloc from stateobjA to stateobjB, we would previously leave stateobjB's bos out of the submit's bos table. Handle this case by copying into stateobjA's reloc_bos table. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-07 13:49:21 -05:00
Rob Clark	9f7c6c78bc	freedreno/a5xx+a6xx: remove unused fs/vs pvt mem copy/pasta from older gens Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-07 13:49:21 -05:00
Rob Clark	c500e7b747	gallium: fix typo Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-07 13:49:21 -05:00
Rob Clark	f6ad286c80	freedreno: remove unused fd_surface fields Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-07 13:49:21 -05:00
Nicolai Hähnle	4275cae95c	meson: link LLVM 'native' component when LLVM is available Linking against LLVM built with BUILD_SHARED_LIBS fails otherwise, as the component is required for the draw module. Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-07 16:26:14 +01:00
Connor Abbott	2845c49218	nir: Fixup algebraic test for variable-sized conversions b2i can now take any size boolean in preparation for 1-bit booleans, so the error message printed is slightly different. Fixes: `dca6cd9ce6` ("nir: Make boolean conversions sized just like the others") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108961 Cc: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-07 16:07:51 +01:00
Samuel Pitoiset	e8a383ce67	gallium: add missing PIPE_CAP_SURFACE_SAMPLE_COUNT default value Fixes: `2710c40e3c` ("gallium: Add new PIPE_CAP_SURFACE_SAMPLE_COUNT") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Michel Dänzer <michel.daenzer@amd.com>	2018-12-07 15:06:29 +01:00
Emil Velikov	96d4ecbb11	docs: update calendar, add news item and link release notes for 18.3.0 Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-12-07 11:50:12 +00:00
Emil Velikov	0144bbdb98	docs: add sha256 checksums for 18.3.0 Signed-off-by: Emil Velikov <emil.velikov@collabora.com> (cherry picked from commit `d81beab96a`)	2018-12-07 11:44:33 +00:00
Emil Velikov	b1e0336497	docs: update 18.3.0 release notes Signed-off-by: Emil Velikov <emil.velikov@collabora.com> (cherry picked from commit `d603cd9d84`)	2018-12-07 11:44:31 +00:00
Kristian H. Kristensen	3e55df4f83	freedreno: Add support for EXT_multisampled_render_to_texture There is not much to do in freedreno - tile layout and multisample state for gmem renderings is programmed based on the pfb sample count, while resolve blits take the destination sample count from the resource. Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2018-12-06 16:56:37 -08:00
Rob Clark	913eb7fa58	freedreno/a6xx: MSAA Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-12-06 16:55:59 -08:00
Kristian H. Kristensen	14ea811c67	st/mesa: Add support for EXT_multisampled_render_to_texture In gallium, we model the attachment sample count as a new nr_samples field in pipe_surface. A driver can indicate support for the extension using the new pipe cap, PIPE_CAP_MULTISAMPLED_RENDER_TO_TEXTURE. Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2018-12-06 16:55:46 -08:00
Kristian H. Kristensen	2710c40e3c	gallium: Add new PIPE_CAP_SURFACE_SAMPLE_COUNT This new pipe cap and the new nr_samples field in pipe_surface lets a state tracker bind a render target with a different sample count than the resource. This allows for implementing EXT_multisampled_render_to_texture and EXT_multisampled_render_to_texture2. Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2018-12-06 16:55:43 -08:00
Kristian H. Kristensen	1b331ae505	mesa: Add core support for EXT_multisampled_render_to_texture{,2} This also turns on EXT_multisampled_render_to_texture which is a subset of EXT_multisampled_render_to_texture2, allowing only COLOR_ATTACHMENT0. Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2018-12-06 16:55:30 -08:00
Vinson Lee	b4fd59075b	nir/algebraic: Make algebraic_parser_test.sh executable. Fixes make check permission error. ../../bin/test-driver: line 107: ./nir/tests/algebraic_parser_test.sh: Permission denied FAIL nir/tests/algebraic_parser_test.sh (exit status: 126) Fixes: `a0ae12ca91` ("nir/algebraic: Add unit tests for bitsize validation") Signed-off-by: Vinson Lee <vlee@freedesktop.org>	2018-12-06 11:48:20 -08:00
Samuel Pitoiset	3fbdcd942f	amd: remove support for LLVM 6.0 User are encouraged to switch to LLVM 7.0 released in September 2018. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-06 14:02:56 +01:00
Kristian H. Kristensen	3b2ad8b290	gallium: Android build fixes A couple of simple fixes for building on Android with autotools. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-12-05 13:56:07 -08:00
Jason Ekstrand	dca6cd9ce6	nir: Make boolean conversions sized just like the others Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is one if 8, 16, 32, or 64. This leads to having a few more opcodes but now everything is consistent and booleans aren't a weird special case anymore. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:03:07 -06:00
Jason Ekstrand	be98b1db38	nir/opt_algebraic: Add 32-bit specifiers to a bunch of booleans Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:03:03 -06:00
Jason Ekstrand	2715080d65	nir/opt_algebraic: Drop bit-size suffixes from conversions Suffixes are dropped from a bunch of conversion opcodes when it makes sense to do so. Others are kept if we really do want the bit-size restriction. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:03:01 -06:00
Jason Ekstrand	ff8e3d3b7b	nir/opt_algebraic: Simplify an optimization using the new search ops Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:02:58 -06:00
Jason Ekstrand	05af952a11	nir/algebraic: Add support for unsized conversion opcodes All conversion opcodes require a destination size but this makes constructing certain algebraic expressions rather cumbersome. This commit adds support to nir_search and nir_algebraic for writing conversion opcodes without a size. These meta-opcodes match any conversion of that type regardless of destination size and the size gets inferred from the sizes of the things being matched or from other opcodes in the expression. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:02:56 -06:00
Jason Ekstrand	4925290ab1	nir/algebraic: Refactor codegen a bit Instead of using an OrderedDict, just have a (necessarily sorted) array of transforms and a set of opcodes. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:02:54 -06:00
Jason Ekstrand	d6aac618fb	nir/algebraic: Clean up some __str__ cruft Both of these things are already handled in the Value base class so we don't need to handle them explicitly in Constant. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:02:52 -06:00
Jason Ekstrand	85f0ea9d8f	nir/opcodes: Rename tbool to tbool32 Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:02:49 -06:00
Jason Ekstrand	03571a7a6c	nir/opcodes: Pull in the type helpers from constant_expressions While we're at it, we rework them a bit to all use regular expressions and assert more. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>	2018-12-05 15:02:06 -06:00
Connor Abbott	a0ae12ca91	nir/algebraic: Add unit tests for bitsize validation The non-failure path can be tested by just compiling mesa and then testing it, but the failure paths won't be hit unless you make a mistake, so it's best to test them with some unit tests. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-05 17:57:40 +01:00
Connor Abbott	29a1450e28	nir/algebraic: Rewrite bit-size inference Before this commit, there were two copies of the algorithm: one in C, that we would use to figure out what bit-size to give the replacement expression, and one in Python, that emulated the C one and tried to prove that the C algorithm would never fail to correctly assign bit-sizes. That seemed pretty fragile, and likely to fall over if we make any changes. Furthermore, the C code was really just recomputing more-or-less the same thing as the Python code every time. Instead, we can just store the results of the Python algorithm in the C datastructure, and consult it to compute the bitsize of each value, moving the "brains" entirely into Python. Since the Python algorithm no longer has to match C, it's also a lot easier to change it to something more closely approximating an actual type-inference algorithm. The algorithm used is based on Hindley-Milner, although deliberately weakened a little. It's a few more lines than the old one, judging by the diffstat, but I think it's easier to verify that it's correct while being as general as possible. We could split this up into two changes, first making the C code use the results of the Python code and then rewriting the Python algorithm, but since the old algorithm never tracked which variable each equivalence class, it would mean we'd have to add some non-trivial code which would then get thrown away. I think it's better to see the final state all at once, although I could also try splitting it up. v2: - Replace instances of "== None" and "!= None" with "is None" and "is not None". - Rename first_src to first_unsized_src - Only merge the destination with the first unsized source, since the sources have already been merged. - Add a comment explaining what nir_search_value::bit_size now means. v3: - Fix one last instance to use "is not" instead of != - Don't try to be so clever when choosing which error message to print based on whether we're in the search or replace expression. - Fix trailing whitespace. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-05 17:57:40 +01:00
Samuel Pitoiset	49ef890733	radv: expose VK_EXT_scalar_block_layout Nothing to do, the compiler already handles that. All new dEQP.VK.ubo.* and dEQP.VK.ssbo.* pass, except some 16-bit tests that are quite related to fdo bug #108114. Only enable the extension on CIK+ because it might not work on SI. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-05 17:38:20 +01:00
Samuel Pitoiset	c6465fec0c	spirv: add SpvCapabilityInt64Atomics Required for VK_KHR_shader_atomic_int64. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-12-05 14:39:55 +01:00
Michal Srb	63c0916ada	drisw: Use separate drisw_loader_funcs for shm The original code was modifying the global drisw_lf variable, which is bad when there are multiple contexts in single process, each initialized with different loader. One may support put_image_shm and the other not. Since there are currently only two possible combinations, lets create two global tables, one for each. Lets make them const, since we won't change them and they can be shared. This fixes crash in VLC. It used two GL contexts (each in different thread), one was initialized by its Qt GUI, the other by its video output plugin. The first one set the put_image_shm=drisw_put_image_shm, the second did not, but since the same structure was used, the drisw_put_image_shm was used too. Then it crashed because the second loader did not have putImageShm set. Downstream bug: https://bugzilla.opensuse.org/show_bug.cgi?id=1113533 v2: Added Fixes and described the VLC bug. Fixes: `63c427fa71` ("drisw: use putImageShm if available") Signed-off-by: Michal Srb <msrb@suse.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-12-05 13:16:09 +00:00
Michal Srb	c0ac038c97	gallium: Constify drisw_loader_funcs struct The content is not expected to change. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Michal Srb <msrb@suse.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-12-05 13:16:09 +00:00
Samuel Pitoiset	c7ada4901a	radv: wait on the high 32 bits of timestamp queries In case we are unlucky if the low part is 0xffffffff. Fixes: `5d6a560a29` ("radv: do not use the availability bit for timestamp queries") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-05 13:05:58 +01:00
Samuel Pitoiset	e899728769	radv: reset pending_reset_query when flushing caches If the driver used a compute shader for resetting a query pool, it should be completed when caches are flushed. This might reduce the number of stalls if operations are done between vkCmdResetQueryPool() and vkCmdBeginQuery() (or vkCmdWriteTimestamp()). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Alex Smith <asmith@feralinteractive.com>	2018-12-05 13:05:55 +01:00
Lionel Landwerlin	9a7b319903	anv/query: flush render target before copying results This change tracks render target writes in the pipeline and applies a render target flush before copying the query results to make sure the preceding operations have landed in memory before the command streamer initiates the copy. v2: Simplify logic in CopyQueryResults (Jason) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108909 Fixes: `37f9788e9a` ("anv: flush pipeline before query result copies") Cc: mesa-stable@lists.freedesktop.org	2018-12-05 11:43:34 +00:00
Alex Smith	c1b6cb068c	radv: Flush before vkCmdWriteTimestamp() if needed As done for vkCmdBeginQuery() already. Prevents timestamps from being overwritten by previous vkCmdResetQueryPool() calls if the shader path was used to do the reset. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108925 Fixes: `a41e2e9cf5` ("radv: allow to use a compute shader for resetting the query pool") Signed-off-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-05 10:52:48 +00:00
Samuel Pitoiset	824cfc1ee5	radv: rework the TC-compat HTILE hardware bug with COND_EXEC After investigating on this, it appears that COND_WRITE doesn't work correctly in some situations. I don't know exactly why does it fail to update DB_Z_INFO.ZRANGE_PRECISION, but as AMDVLK also uses COND_EXEC I think there is a reason. Now the driver stores a new metadata value in order to reflect the last fast depth clear state. If a TC-compat HTILE is fast cleared with 0.0f, we have to update ZRANGE_PRECISION to 0 in order to work around that hardware bug. This fixes rendering issues with The Forest and DXVK and doesn't seem to introduce any regressions. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108914 Fixes: `68dead112e` ("radv: update the ZRANGE_PRECISION value for the TC-compat bug") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-12-05 09:26:31 +01:00
Dieter Nützel	2669dbf881	docs/features: Delete double nv50 entry and wrong enumeration trivial Fix commit `d9b2234042` Signed-off-by: Dieter Nützel <Dieter@nuetzel-hh.de> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2018-12-04 18:51:18 -05:00
Marek Olšák	5907412d04	st/mesa: expose EXT_render_snorm on GLES Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-12-04 15:33:29 -05:00
Marek Olšák	1660f3aa05	mesa: expose AMD_texture_texture4 because the closed driver exposes it. Tested by piglit. Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-12-04 15:33:29 -05:00
Marek Olšák	908f817918	mesa: expose EXT_texture_compression_bptc in GLES tested by piglit. v2: rebase Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> (v1) Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2018-12-04 15:33:29 -05:00
Marek Olšák	34f07ddebb	mesa: expose EXT_texture_compression_rgtc on GLES The spec was modified to support GLES. Tested by piglit. v2: rebase Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> (v1) Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2018-12-04 15:33:29 -05:00
Erik Faye-Lund	91af56e383	mesa/main: fix up _mesa_has_rg_textures for gles2 rg-textures are supported in GLES 2.0 if EXT_texture_rg, so let's make sure the enums are accepted. Fixes: `510b642460` "mesa/main: do not allow rg-textures enums before gles3" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108936 Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Tested-by: Mark Janes <mark.a.janes@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-04 21:14:26 +01:00
Erik Faye-Lund	5bf38bfb64	mesa/main: correct validation for GL_RGB565 Technically speaking, this validation was incorrect, because GL_RGB565 is only supported in OpenGL ES 1.x if OES_framebuffer_object is supported. This couldn't lead to any real incorrect behavior, because all drivers support OES_framebuffer_object. But let's keep the code self-documenting, by correcting the check as per the spec. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-12-04 21:14:16 +01:00
Marek Olšák	4b218984d8	mesa: expose GL_EXT_texture_view as an alias of GL_OES_texture_view There are no spec changes. Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-12-04 12:50:36 -05:00
Marek Olšák	d9b2234042	st/mesa: expose GL_OES_texture_view For format fallbacks like ETC and ASTC, switching between sRGB and linear decoding is undefined, or at least is not bit-exact. Same as EXT_texture_sRGB_decode on GLES. There are no piglit or dEQP regresssions. Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-12-04 12:50:36 -05:00
Eric Engestrom	95d62baac5	loader: deduplicate logger function declaration Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-12-04 16:29:32 +00:00
Eric Engestrom	eade6ffeee	mesa: drop unused & deprecated lib DeprecationWarning: the imp module is deprecated in favour of importlib Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-04 16:26:21 +00:00
Eric Engestrom	919bec1c47	anv: add unreachable() for VK_EXT_fragment_density_map This silences the -Wswitch compiler warning. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-04 16:22:55 +00:00
Eric Engestrom	a0b14c1b02	meson: skip asm check when asm is disabled Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-04 16:22:51 +00:00
Andrii Simiklit	6ae873b97d	intel/tools: make sure the binary file is properly read 1. tools/i965_disasm.c:58:4: warning: ignoring return value of ‘fread’, declared with attribute warn_unused_result fread(assembly, *end, 1, fp); v2: Fixed incorrect return value check. ( Eric Engestrom <eric.engestrom@intel.com> ) v3: Zero size file check placed before fread with exit() ( Eric Engestrom <eric.engestrom@intel.com> ) v4: - Title is changed. - The 'size' variable was moved to top of a function scope. - The assertion was replaced by the proper error handling. - The error message on a caller side was fixed. ( Eric Engestrom <eric.engestrom@intel.com> ) Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-12-04 16:19:26 +00:00
Toni Lönnberg	d7b99ab947	intel/aubinator_error_decode: Get rid of warning for missing switch case ../src/intel/tools/aubinator_error_decode.c: In function ‘instdone_register_for_ring’: ../src/intel/tools/aubinator_error_decode.c:177:4: warning: enumeration value ‘I915_ENGINE_CLASS_INVALID’ not handled in switch [-Wswitch] switch (class) { ^~~~~~ Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-12-04 12:47:49 +00:00
Ilia Mirkin	bacf8471dc	nouveau: set texture upload budget It doesn't seem like the exact number has too much effect on the performaince in "teximage". However setting it to just about anything prevents some OOMs from getting hit. These values are not well-tuned, but don't seem too bad. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-12-03 23:11:29 -05:00
Ilia Mirkin	08c64fe7a1	nv50,nvc0: add explicit handling of PIPE_CAP_MAX_VERTEX_ELEMENT_SRC_OFFSET Since the max attrib stride is 2048, the max src offset makes sense as 2047. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-12-03 23:11:29 -05:00
Ilia Mirkin	de49e06507	nv50: always keep TSC slot 0 bound All TXF operations implicitly use sampler 0, and fail if it's not bound to anything. This does not happen in LINKED_TSC mode, but we don't currently use this. We ensure that TSC entry at id 0 has the SRGB conversion bit enabled (and all samplers we normally generate will too). Then when the TSC at slot 0 (not to be confused with entry 0 in the global TSC table) is unbound, we bind it to entry 0. This way, TXF operations are not dependent on there being a regular sampler bound there. Fixes arb_texture_buffer_object-subdata-sync among others. (TBO's are particularly susceptible to this as they don't bind a sampler.) Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-12-03 23:11:29 -05:00
Dave Airlie	1363a47c9c	radv: use 3d shader for gfx9 copies if dst is 3d This fixes some crucible 3d miptree tests I've been working on when executed using the compute shader path. Fixes: `d08f267814` (radv/gfx9: fix 3d image to image transfers on compute queues.) Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-04 10:42:31 +10:00
Bas Nieuwenhuizen	12e35a64c0	radv: Check for shareable images in central place. One place to put the logic makes things easier to change. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-04 01:21:38 +01:00
Bas Nieuwenhuizen	3bf48741e1	radv/android: Use buffer metadata to determine scanout compat. These days we don't always allocate scanout compatible textures anymore. That does mean we have to fix the radv android WSI though. Fixes: `b1444c9ccb` "radv: Implement VK_ANDROID_native_buffer." Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-04 01:21:38 +01:00
Bas Nieuwenhuizen	51091b3e1f	radv/android: Mark android WSI image as shareable. Fixes: `b1444c9ccb` "radv: Implement VK_ANDROID_native_buffer." Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-04 01:21:38 +01:00
Matt Turner	dd53bb7e1f	Revert "st/mesa: silenced unhanded enum warning in st_glsl_to_tgsi.cpp" This reverts commit `198c50f487`. This needs to be reverted after commit `017199d2d2` ("mesa: Revert INTEL_fragment_shader_ordering support")	2018-12-03 16:20:43 -08:00
Matt Turner	017199d2d2	mesa: Revert INTEL_fragment_shader_ordering support This extension is not properly tested (testing for GL_ARB_fragment_shader_interlock is not sufficient), and since this was noted in review on August 28th no tests have been sent. Revert "i965: Add INTEL_fragment_shader_ordering support." Revert "mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering" This reverts commit `03ecec9ed2`. This reverts commit `119435c877`. Cc: mesa-stable@lists.freedesktop.org Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Eric Anholt <eric@anholt.net>	2018-12-03 15:37:37 -08:00
Dave Airlie	e3f075439c	virgl: fix const warning on debug flags. Fixes: `8d4bb6e5c` (virgl: Add command and flags to initiate debugging on the host (v2))	2018-12-04 08:11:13 +10:00
Jason Ekstrand	71271e167b	vulkan: Update the XML and headers to 1.1.95 Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-12-03 14:27:10 -06:00
Tobias Klausmann	9401a2f2e6	amd/vulkan: meson build - use radv_deps for libvulkan_radeon Without this the build breaks with: FAILED: src/amd/vulkan/src@amd@vulkan@@vulkan_radeon@sha/radv_pipeline.c.o cc -Isrc/amd/vulkan/src@amd@vulkan@@vulkan_radeon@sha -Isrc/amd/vulkan -I../src/amd/vulkan -Isrc/../include -I../src/../include -Isrc -I../src -Isrc/mapi -I../src/mapi -Isrc/mesa -I../src/mesa -I../src/gallium/include -Isrc/gallium/auxiliary -I../src/gallium/auxiliary -Isrc/amd -I../src/amd -Isrc/amd/common -I../src/amd/common -Isrc/compiler -I../src/compiler -Isrc/vulkan/util -I../src/vulkan/util -Isrc/vulkan/wsi -I../src/vulkan/wsi -Isrc/compiler/nir -I../src/compiler/nir -I/usr/include -I/usr/include/libdrm -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -std=c99 -O2 -g '-DVERSION="18.3.0-rc5"' -DPACKAGE_VERSION=VERSION '-DPACKAGE_BUGREPORT="https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa"' -DGLX_USE_TLS -DHAVE_ST_VDPAU -DENABLE_ST_OMX_BELLAGIO=0 -DENABLE_ST_OMX_TIZONIA=0 -DHAVE_X11_PLATFORM -DGLX_INDIRECT_RENDERING -DGLX_DIRECT_RENDERING -DGLX_USE_DRM -DHAVE_DRM_PLATFORM -DENABLE_SHADER_CACHE -DHAVE___BUILTIN_BSWAP32 -DHAVE___BUILTIN_BSWAP64 -DHAVE___BUILTIN_CLZ -DHAVE___BUILTIN_CLZLL -DHAVE___BUILTIN_CTZ -DHAVE___BUILTIN_EXPECT -DHAVE___BUILTIN_FFS -DHAVE___BUILTIN_FFSLL -DHAVE___BUILTIN_POPCOUNT -DHAVE___BUILTIN_POPCOUNTLL -DHAVE___BUILTIN_UNREACHABLE -DHAVE_FUNC_ATTRIBUTE_CONST -DHAVE_FUNC_ATTRIBUTE_FLATTEN -DHAVE_FUNC_ATTRIBUTE_MALLOC -DHAVE_FUNC_ATTRIBUTE_PURE -DHAVE_FUNC_ATTRIBUTE_UNUSED -DHAVE_FUNC_ATTRIBUTE_WARN_UNUSED_RESULT -DHAVE_FUNC_ATTRIBUTE_WEAK -DHAVE_FUNC_ATTRIBUTE_FORMAT -DHAVE_FUNC_ATTRIBUTE_PACKED -DHAVE_FUNC_ATTRIBUTE_RETURNS_NONNULL -DHAVE_FUNC_ATTRIBUTE_VISIBILITY -DHAVE_FUNC_ATTRIBUTE_ALIAS -DHAVE_FUNC_ATTRIBUTE_NORETURN -DUSE_SSE41 -DUSE_GCC_ATOMIC_BUILTINS -DUSE_X86_64_ASM -DMAJOR_IN_SYSMACROS -DHAVE_SYS_SYSCTL_H -DHAVE_LINUX_FUTEX_H -DHAVE_ENDIAN_H -DHAVE_DLFCN_H -DHAVE_STRTOF -DHAVE_MKOSTEMP -DHAVE_POSIX_MEMALIGN -DHAVE_TIMESPEC_GET -DHAVE_MEMFD_CREATE -DHAVE_STRTOD_L -DHAVE_DLADDR -DHAVE_DL_ITERATE_PHDR -DHAVE_ZLIB -DHAVE_PTHREAD -DHAVE_PTHREAD_SETAFFINITY -DHAVE_LIBDRM -DHAVE_LLVM=0x0600 -DMESA_LLVM_VERSION_PATCH=1 -DHAVE_WAYLAND_PLATFORM -DWL_HIDE_DEPRECATED -DHAVE_DRI3 -DHAVE_DRI3_MODIFIERS -Werror=implicit-function-declaration -Werror=missing-prototypes -Werror=return-type -fno-math-errno -fno-trapping-math -Wno-missing-field-initializers -Wno-format-truncation -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -DNDEBUG -fPIC -pthread -D__STDC_FORMAT_MACROS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -fvisibility=hidden -Wno-override-init -DVK_USE_PLATFORM_XCB_KHR -DVK_USE_PLATFORM_XLIB_KHR -DVK_USE_PLATFORM_WAYLAND_KHR -DVK_USE_PLATFORM_DISPLAY_KHR -DVK_USE_PLATFORM_XLIB_XRANDR_EXT -MD -MQ 'src/amd/vulkan/src@amd@vulkan@@vulkan_radeon@sha/radv_pipeline.c.o' -MF 'src/amd/vulkan/src@amd@vulkan@@vulkan_radeon@sha/radv_pipeline.c.o.d' -o 'src/amd/vulkan/src@amd@vulkan@@vulkan_radeon@sha/radv_pipeline.c.o' -c ../src/amd/vulkan/radv_pipeline.c In file included from ../src/vulkan/util/vk_alloc.h:29, from ../src/amd/vulkan/radv_private.h:52, from ../src/amd/vulkan/radv_debug.h:27, from ../src/amd/vulkan/radv_pipeline.c:30: ../src/../include/vulkan/vulkan.h:54:10: fatal error: wayland-client.h: Datei oder Verzeichnis nicht gefunden #include <wayland-client.h> ^~~~~~~~~~~~~~~~~~ compilation terminated. The above command misses the include directory for wayland: -I/usr/include/wayland The missing include is contained in the (until now) unused radv_deps: if with_platform_wayland radv_deps += dep_wayland_client radv_flags += '-DVK_USE_PLATFORM_WAYLAND_KHR' libradv_files += files('radv_wsi_wayland.c') endif Fixes: `673dda8330` "meson: build "radv" vulkan driver for radeon hardware" Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-12-03 09:18:48 -08:00
Erik Faye-Lund	fcf9fcee3c	mesa/main: do not require float-texture filtering for es3 The OpenGL ES 3.0 specification, table 3.13 lists half-float textures as filterable, but not float textures. So we shouldn't depend on ARB_float_texture, which requires full filtering support for both. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	43015b2a89	mesa/st: do not probe for the same texture-formats twice This should be equalent of what we did before. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	212d270b4e	mesa/main: require EXT_texture_sRGB for gles3 sRGB textures is a requirement for OpenGL ES 3.0, so let's make sure we don't incorrectly enable a too high version. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	487010a099	mesa/main: require EXT_texture_type_2_10_10_10_REV for gles3 OpenGL ES 3.0 require this functionality, so we should also test for it to avoid incorrectly exposing a too high GLES version. On desktop, this has been required since all the way back in OpenGL 1.2 anyway. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	74eab1c62f	mesa/main: split float-texture support checking in two On OpenGL ES 2.0, there's separate extensions adding support for half-float and float textures. So we need to validate the enums separately as well. This also prevents these enums from incorrectly being allowed on OpenGL ES 1.x, where there's no extension that enables this in the first place. While we're at it, remove the pointless default-case, and the seemingly stale fallthrough comment. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	c4136ed5cc	mesa/main: do not allow EXT_texture_sRGB_R8 enums before gles3 ctx->Extensions.EXT_texture_sRGB_R8 is set regardless of the API that's used, so checking for those direcly will always allow the enums from this extensions when they are supported by the driver. There's no extension adding support for this on OpenGL ES before version 3.0, so let's tighten the check. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	d972939986	mesa/main: do not allow sRGB texture enums before gles3 ctx->Extensions.EXT_texture_sRGB is set regardless of the API that's used, so checking for those direcly will always allow the enums from this extensions when they are supported by the driver. There's no extension adding support for this on OpenGL ES before version 3.0, so let's tighten the check. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	3629ee025c	mesa/main: do not allow snorm-texture enums before gles3 ctx->Extensions.EXT_texture_snorm is set regardless of the API that's used, so checking for those direcly will always allow the enums from this extensions when they are supported by the driver. There's no extension adding support for this on OpenGL ES before version 3.0, so let's tighten the check. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	52dc8b4f7b	mesa/main: do not allow floating-point texture enums on gles1 ctx->Extensions.OES_texture_float is set regardless of the API that's used, so checking for those direcly will always allow the enums from this extensions when they are supported by the driver. There's no extension enabling floating-point textures for OpenGL ES 1.x, so we shouldn't allow those enums there. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	167dcd59ae	mesa/main: do not allow type_2_10_10_10_REV enums before gles3 ctx->Extensions.EXT_texture_type_2_10_10_10_REV is set regardless of the API that's used, so checking for those direcly will always enable extensions when they are supported by the driver. There's no corresponding extension for OpenGL ES 1.x/2.0, so we shouldn't allow these enums there. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	b112e62ba4	mesa/main: do not allow MESA_ycbcr_texture enums on gles This extension requies OpenGL, and shouldn't be available on OpenGL ES. So let's not allow the enums from it either. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	1b2e9aca77	mesa/main: do not allow EXT_texture_shared_exponent enums before gles3 ctx->Extensions.EXT_texture_shared_exponent is set regardless of the API that's used, so checking for those direcly will always allow the enums from this extensions when they are supported by the driver. We also need to make sure this is enabled on OpenGL ES 3. Because the check is repeated, let's introduce a helper. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	510b642460	mesa/main: do not allow rg-textures enums before gles3 EXT_packed_float isn't supported on OpenGL ES, we shouldn't allow these enums there, before OpenGL ES 3.0 which also introduce support for these enums. Since this check is repeated a lot, let's make a helper for this. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	59690bf0a3	mesa/main: do not allow EXT_packed_float enums before gles3 EXT_packed_float isn't supported on OpenGL ES, we shouldn't allow these enums there, before OpenGL ES 3.0 which also introduce support for these enums. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	83db9d3e3a	mesa/main: do not allow ARB_depth_buffer_float enums before gles3 Floating-point depth buffers are only supported on OpenGL 3.0, OpenGL ES 3.0, or if ARB_depth_buffer_float is supported. Because we checked a driver capability rather than using an extension-check helper, we ended up incorrectly allowing this on OpenGL ES 1.x and 2.x. Since this logic is repeated, let's make a helper for it. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	3bbd543b6e	mesa/main: do not allow integer-texture enums before gles3 Integer textures shouldn't be implicitly exposed on OpenGL ES 1.x and 2.x, but because the code checked against a driver-capability rather than using an extension-check helper, we ended up accidentally allowing these enums on older versions when the driver supports it. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	b5a370dc25	mesa/main: do not allow ARB_texture_rgb10_a2ui enums before gles3 ARB_texture_rgb10_a2ui isn't supported on OpenGL ES, we shouldn't expose it there even if the driver supports it. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	76b038bee7	mesa/main: do not allow stencil-texture enums on gles1 ctx->Extensions.ARB_texture_stencil8 is set regardless of the API that's used, so checking for those direcly will always allow the enums from this extensions when they are supported by the driver. So let's instead check for both ARB_texture_stencil8 and OES_texture_stencil8, so we support depth textures on OpenGL and OpenGL ES 2.0+. There's no extension enabling stencil-textures for OpenGL ES 1.x, so we shouldn't allow those enums there. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	19eb0bf28f	mesa/main: do not allow depth-texture enums on gles1 ctx->Extensions.ARB_depth_texture is set regardless of the API that's used, so checking for those direcly will always allow the enums from this extensions when they are supported by the driver. So let's instead check for both ARB_depth_texture and OES_depth_texture, so we support depth textures on OpenGL and OpenGL ES 2.0+. There's no extension enabling depth-textures for OpenGL ES 1.x, so we shouldn't allow those enums there. This fixes oes_packed_depth_stencil-depth-stencil-texture_gles1 on i965 Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	2dfcaf7554	mesa/main: do not allow astc enums on gles1 ctx->Extensions.KHR_texture_compression_astc_ldr is set regardless of the API that's used, so checking for those direcly will always enable extensions when they are supported by the driver. But there's no extension enabling ASTC for OpenGL ES 1.x, so we shouldn't allow those enums there. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	1aa134038c	mesa/main: do not allow etc2 enums on gles1 ctx->Extensions.ARB_ES3_compatibility is set regardless of the API that's used, so checking for those direcly will always enable extensions when they are supported by the driver. But there's no extension enabling ETC2 for OpenGL ES 1.x, so we shouldn't allow those enums there. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	27ca87ccca	mesa/main: do not allow s3tc enums on gles1 There's no extension enabling S3TC formats on OpenGL ES 1.x, so we shouldn't allow these even if the driver can support it. So let's check for EXT_texture_compression_s3tc instead of ANGLE_texture_compression_dxt, which is supported on all other OpenGL variations. We also need to use _mesa_has_EXT_texture_compression_s3tc() instead of checking the driver cap directly, otherwise we end up enabling this on OpenGL ES 1.x, as the API isn't checked. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	d70cfb322a	mesa/main: use _mesa_has_FOO_bar for compressed format checks _mesa_has_FOO_bar() knows about the APIs these extensions should be supported under, so let's use that to simplify these checks a bit. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	70bfd31287	mesa/main: clean up integer texture check This makes the logic a little bit easier to follow, and reduce a bit of repetition. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	5109742e7b	mesa/main: clean up ES2_compatibility check This makes the logic a little bit easier to follow; this is either about ES2 compatibility or about gles. GL_RGB565 was added already in OpenGL ES 1.0. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	2e753b77dd	mesa/main: clean up OES_texture_float_linear check Using the _mesa_has_FOO_bar helpers is generally more safe and should generally be prefered over checking driver-caps like this code did, because the _mesa_has_FOO_bar helpers also verify the API type and version. This shouldn't have any practical effect here, as this function only gets called for OpenGL ES 3.x right now. But if this was to change in the future, this makes the function behave a lot more predictable. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	1373d117c2	mesa/main: clean up S3_s3tc check S3_s3tc is the extension that enables this functionality on desktop, so let's check for that one. The _mesa_has_S3_s3tc() helper already verifies the API according to the extension-table. As for the second hunk, we currently already only expose EXT_texture_compression_s3tc on desktop so by using the helper instead, we get rid of this detail here, and once we enable it for GLES we'll automaticall get the interaction right. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	e8b331ae13	mesa/main: rename format-check function _mesa_es3_error_check_format_and_type isn't specific to OpenGL ES 3.x, it applies to all versions of OpenGL ES. So let's rename it to reflect this. While we're at it, let's also rename a helper function it uses similarly. As the helper is static, we can also remove the namespacing-prefix from the name. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:44 +01:00
Erik Faye-Lund	ca8e2a5277	mesa/main: make _mesa_has_tessellation return bool All other _mesa_has_foo functions return bool rather than GLboolean, so let's follow that style here as well. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-03 18:16:43 +01:00
Chad Versace	3ef0ca65c9	i965: Fix -Wswitch on INTEL_COPY_STREAMING_LOAD The warning is emitted when building without INLINE_SSE41. Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-12-03 13:07:56 +02:00
Karol Herbst	fc0139d283	nv50,nvc0: Fix gallium nine regression regarding sampler bindings The new approach is that samplers don't get unbound even if they won't be used in a draw and we should just leave them be as well. Fixes a regression in multiple windows games using gallium nine and nouveau. v2: adjust num_samplers to keep track of the highest sampler bound v3: rework how to set the new value of num_samplers Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106577 Fixes: `4d6fab245e` "cso: don't track the number of sampler states bound" Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-12-02 00:05:04 +01:00
Andre Heider	b6f095f7ce	d3dadapter9: use snprintf(..., "%s", ...) instead of strncpy Fixes -Wstringop-truncation compiler warnings. See `f836d799f9` "intel/decoder: use snprintf(..., "%s", ...) instead of strncpy" Signed-off-by: Andre Heider <a.heider@gmail.com> Reviewed-by: Axel Davy <davyaxel0@gmail.com>	2018-12-01 21:32:53 +01:00
Mauro Rossi	37a2072e97	android: st/mesa: fix building error due to sched_getcpu() Android has cpufeatures library but pinning of threads is not supported PIPE_OS_LINUX code path causes build error due to sched_getcpu() unavailable thus we need to avoid setting HAVE_SCHED_GETCPU for Android Fixes: `48f2160` ("st/mesa: regularly re-pin driver threads to the CCX where the app thread is") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-12-01 10:15:58 +01:00
Vinson Lee	4f74580d30	st/xvmc: Add X11 include path. This patch fixes this build error. CC tests/xvmc_bench.o In file included from tests/xvmc_bench.c:35: tests/testlib.h:38:10: fatal error: 'X11/Xlib.h' file not found ^~~~~~~~~~~~ Signed-off-by: Vinson Lee <vlee@freedesktop.org> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-30 22:09:43 -08:00
Mauro Rossi	eed3f1121c	android: amd/addrlib: update Mesa's copy of addrlib Needed to fix build error in addrlib in mesa for Android Fixes: `776b911` ("amd/addrlib: update Mesa's copy of addrlib") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2018-12-01 01:13:53 +01:00
Gurchetan Singh	89b4798c06	virgl: don't mark buffers as unclean after a write We can mark the buffer unclean if it's ever bound as a TBO, SSBO, ABO, or image. This improves dEQP-GLES3.performance.buffer.data_upload.function_call.map_buffer_range.new_specified_buffer.flag_write_full.stream_draw from 9.58 MB/s to 451.17 MB/s. v2: Track buffer cleanliness as a function of bindings (Ilia). v3: virgl_modify_clean --> virgl_dirty_res (Erik) Tested-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2018-11-30 12:21:01 +01:00
Gurchetan Singh	d18492c64f	virgl: avoid large inline transfers We flush everytime the command buffer (16 kB) is full, which is quite costly. This improves dEQP-GLES3.performance.buffer.data_upload.function_call.buffer_data.new_buffer.usage_stream_draw from 111.16 MB/s to 1930.36 MB/s. In addition, I made the benchmark produce buffers from 0 --> VIRGL_MAX_CMDBUF_DWORDS * 4, and tried ((VIRGL_MAX_CMDBUF_DWORDS * 4) / 2), ((VIRGL_MAX_CMDBUF_DWORDS * 4) / 4), etc. I didn't notice any clear differences, so let's just go with the most obvious heuristic. Tested-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2018-11-30 12:20:41 +01:00
Gurchetan Singh	c0773315af	virgl: quadruple command buffer size Tested running WebGL aquarium on Nvidia host (10,000 fishes) This moves us from 7 fps to 9 fps. After quadrupling, performance gains diminish. v2: Remove change ID (Erik) Tested-By: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2018-11-30 12:20:06 +01:00
Lionel Landwerlin	37f9788e9a	anv: flush pipeline before query result copies Pipeline state pending bits should be taken into account when copying results. In the particular bug below, the results of the vkCmdCopyQueryPoolResults() command was being overwritten by the preceding vkCmdCopyBuffer() with a same destination buffer. This is because we copy the buffers using the 3D pipeline whereas we copy the query results using the command streamer. Those pieces of HW work in parallel and the results are somewhat undefined. v2: Unconditionally flush the pipeline before copying the results (Jason) v3: Wrap & expressions (Jason) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108894 Cc: mesa-stable@lists.freedesktop.org	2018-11-29 22:07:31 +00:00
Marek Olšák	39b20b7d4f	Revert "winsys/amdgpu: overallocate buffers for faster address translation on Gfx9" I didn't mean to push this. I don't think it makes any difference. This reverts commit `f737fe00a0`.	2018-11-29 14:46:06 -05:00
Roland Scheidegger	fbf95ce074	draw: fix infinite loop in line stippling The calculated length of a line may be infinite, if the coords we get are bogus. This leads to an infinite loop in line stippling. To prevent this test for this explicitly (although technically on at least x86 sse it would actually work without the explicit test, as long as we use the int-converted length value). While here also get rid of some always-true condition. Note this does not actually solve the root cause, which is that the coords we receive are bogus after clipping. This seems a difficult problem to solve. One issue is that due to float arithmetic, clip w may become 0 after clipping if the incoming geometry is "sufficiently degenerate", hence x/y/z ndc (and window) coords will be all inf (or nan). Even with w not quite 0, I believe it's possible we produce values which are actually outside the view volume. (Also, x=y=z=w=0 coords in clipspace would be not considered subject to clipping, and similarly result in all NaN coords.) We just hope for now other draw stages (and rasterizers) can handle those relatively safely (llvmpipe itself should be sort of robust against this, certainly converstion to fixed point will produce garbage, it might fail a couple assertions but should neither hang nor crash otherwise). Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2018-11-29 18:39:40 +01:00
Józef Kucia	94bfb8bf38	nir: Fix assert in print_intrinsic_instr(). Signed-off-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-11-29 16:29:37 +00:00
Nicolai Hähnle	776b911365	amd/addrlib: update Mesa's copy of addrlib Update to the internal master as of 2018-11-15. This has a lot of gratuitous whitespace change, but on the plus side it's built using the same tooling that's used for AMDVLK, which should help going forward.	2018-11-29 13:18:24 +01:00
Nicolai Hähnle	621c107760	ac/surface/gfx9: let addrlib choose the preferred swizzle kind Our choices here are simply redundant as long as sin.flags is set correctly. (v2: - remove unused function parameter) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-11-29 13:18:23 +01:00
Nicolai Hähnle	729ebdf07e	radv: remove dependency on addrlib gfx9_enum.h v2: - use SI_CONTEXT_REG_OFFSET Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-11-29 13:18:23 +01:00
Thomas Hellstrom	058f85d41c	winsys/svga: Fix a memory leak The ioctl.cap_3d member was never freed. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-29 10:42:06 +01:00
Thomas Hellstrom	7fce3ca375	st/xa: Fix a memory leak Free the context after destruction. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-29 10:42:06 +01:00
Samuel Pitoiset	cc7deb749c	radv: drop few useless state changes when doing color/depth decompressions Viewport/scissor don't need to be updated for array textures. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-29 10:18:55 +01:00
Samuel Pitoiset	6d4f65deea	radv: remove unused pending_clears param in the transition path Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-29 10:18:53 +01:00
Samuel Pitoiset	4b9df824f7	radv: optimize CmdClear{Color,DepthStencil}Image() for layered textures If all layers are bound we can perform a fast color or depth clear instead of iterating over all layers. This has the advantage to avoid trashing the framebuffer for nothing if you we end up by doing a fast clear when calling radv_clear_image_layer(), and clearing all layers in one shot is obviously faster. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-29 10:18:42 +01:00
Samuel Pitoiset	7484bc894b	radv: refactor the fast clear path for better re-use Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-29 10:18:42 +01:00
Samuel Pitoiset	f78ee19702	radv: simplify a check in emit_fast_color_clear() Currently only true if RADV_PERFTEST=dccmsaa is set. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-29 10:18:42 +01:00
Samuel Pitoiset	eca931a726	radv: add radv_can_fast_clear_{color,depth}() helpers For further optimisations. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-29 10:18:42 +01:00
Samuel Pitoiset	93f5ce8fa7	radv: add radv_image_view_can_fast_clear() helper Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-29 10:18:42 +01:00
Samuel Pitoiset	aeaf8dbd09	radv: add radv_image_can_fast_clear() helper Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-29 10:18:42 +01:00
Samuel Pitoiset	3e718db1ff	radv: remove useless check in emit_fast_color_clear() The driver doesn't support DCC/CMASK for mipmapped textures. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-29 10:18:42 +01:00
Vinson Lee	d0c7b079d0	freedreno: Fix autotools build. Fix build error. CXXLD pipe_msm.la ../../../../src/gallium/drivers/freedreno/.libs/libfreedreno.a(freedreno_batch.o): In function `batch_init': src/gallium/drivers/freedreno/freedreno_batch.c:54: undefined reference to `fd_device_version' src/gallium/drivers/freedreno/freedreno_batch.c:59: undefined reference to `fd_submit_new' src/gallium/drivers/freedreno/freedreno_batch.c:61: undefined reference to `fd_submit_new_ringbuffer' src/gallium/drivers/freedreno/freedreno_batch.c:64: undefined reference to `fd_submit_new_ringbuffer' src/gallium/drivers/freedreno/freedreno_batch.c:66: undefined reference to `fd_submit_new_ringbuffer' src/gallium/drivers/freedreno/freedreno_batch.c:70: undefined reference to `fd_submit_new_ringbuffer' Fixes: `b4476138d5` ("freedreno: move drm to common location") Fixes: `aa0fed10d3` ("freedreno: move ir3 to common location") Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2018-11-28 22:23:52 -08:00
Marek Olšák	075fd5d8f2	radeonsi: add memory management stress tests for GDS Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-28 20:20:27 -05:00
Marek Olšák	c1d3c08699	winsys/amdgpu: add support for allocating GDS and OA resources Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-28 20:20:27 -05:00
Marek Olšák	d7a4fa91f0	radeonsi: allow si_cp_dma_clear_buffer to clear GDS from any IB Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-28 20:20:27 -05:00
Marek Olšák	72b2b61d8c	winsys/amdgpu: use optimal VM alignment for CPU allocations Acked-by: Christian König <christian.koenig@amd.com>	2018-11-28 20:20:27 -05:00
Marek Olšák	27f9935075	winsys/amdgpu: use optimal VM alignment for imported buffers Window system buffers didn't use the optimal alignment. Acked-by: Christian König <christian.koenig@amd.com>	2018-11-28 20:20:27 -05:00
Marek Olšák	6b554d863f	winsys/amdgpu,radeon: pass vm_alignment to buffer_from_handle Acked-by: Christian König <christian.koenig@amd.com>	2018-11-28 20:20:27 -05:00
Marek Olšák	f737fe00a0	winsys/amdgpu: overallocate buffers for faster address translation on Gfx9 Sadly, the 3 games I tested (DeusEx:MD, DiRT Rally, DOTA 2) are unaffected by the overallocation, because I guess their buffers don't fall into the small range below a power-of-two size. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-28 20:20:27 -05:00
Marek Olšák	8c00f778fc	winsys/amdgpu: increase the VM alignment to the MSB of the size for Gfx9 Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-28 20:20:27 -05:00
Marek Olšák	a2a6b06d48	winsys/amdgpu: use >= instead of > for VM address alignment Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-28 20:20:27 -05:00
Marek Olšák	98f2312b4f	winsys/amdgpu: clean up code around BO VM alignment Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-28 20:20:27 -05:00
Marek Olšák	5f9ccf827e	winsys/amdgpu: optimize slab allocation for 2 MB amdgpu page tables - the slab buffer size increased from 128 KB to 2 MB (PTE fragment size) - the max suballocated buffer size increased from 64 KB to 256 KB, this increases memory usage because it wastes memory - the number of suballocators increased from 1 to 3 and they are layered on top of each other to minimize unused space in slabs The final increase in memory usage is: DeusEx:MD: 1.8% DOTA 2: 1.75% DiRT Rally: 0.2% The kernel driver will also receive fewer buffers.	2018-11-28 20:20:27 -05:00
Marek Olšák	cf6835485c	radeonsi: generalize the slab allocator code to allow layered slab allocators There is no change in behavior. It just makes it easier to change the number of slab allocators.	2018-11-28 20:20:27 -05:00
Marek Olšák	9576266a37	winsys/amdgpu: always reclaim/release slabs if there is not enough memory	2018-11-28 20:20:27 -05:00
Marek Olšák	015061beb3	radeonsi: fix is_oneway_access_only for bindless images	2018-11-28 20:20:27 -05:00
Marek Olšák	8c25ab1a23	radeonsi/nir: parse more information about bindless usage fill more tgsi_shader_info fields.	2018-11-28 20:20:27 -05:00
Marek Olšák	2a936f8afa	tgsi/scan: add more information about bindless usage radeonsi will use this.	2018-11-28 20:20:27 -05:00
Marek Olšák	fba91b5173	radeonsi: small cleanup for memory opcodes	2018-11-28 20:20:27 -05:00
Marek Olšák	709905cbb6	radeonsi: fix is_oneway_access_only for image stores We need to look at the Dst for image stores.	2018-11-28 20:20:27 -05:00
Marek Olšák	648dc52367	radeonsi: use structured buffer intrinsics for image views to stop using the workaround in si_make_buffer_descriptor.	2018-11-28 20:20:27 -05:00
Marek Olšák	442dae2693	radeonsi: clean up primitive binning enablement no change in behavior. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-28 20:20:27 -05:00
Dave Airlie	8eb8be3f54	virgl: fix undefined shift to use unsigned. Ported from virglrenderer. Signed-off-by: Dave Airlie <airlied@redhat.com>	2018-11-29 09:09:31 +10:00
Dave Airlie	2ddd44d941	r600: make suballocator 256-bytes align Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108311 Cc: <mesa-stable@lists.freedesktop.org>	2018-11-29 09:09:02 +10:00
Kenneth Graunke	f11780779f	intel/compiler: Use nir's info when checking uses_streams. Vulkan and Gallium don't use Mesa's gl_program data structure, so they can't poke at 'prog'. But we can simply use the copy of the shader info stored with the NIR shader, which is guaranteed to exist. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-11-28 13:35:29 -08:00
Jason Ekstrand	199a0353d6	nir/derefs: Add a nir_derefs_do_not_alias enum value This makes some of the code more clear. Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-11-28 14:29:25 -06:00
Gurchetan Singh	eb44c36cf1	egl: add missing #include <stddef.h> in egldevice.h Otherwise, I get this error: main/egldevice.h:54:13: error: ‘NULL’ undeclared (first use in this function) dev = NULL; ^~~~ with this config: ./autogen.sh --enable-gles1 --enable-gles2 --with-platforms='surfaceless' --disable-glx --with-dri-drivers="i965" --with-gallium-drivers="" --enable-gbm v3: Use stddef.h (Matt) v4: Modify commit message (Eric) Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-28 11:22:47 -08:00
Matt Turner	2d48d5116b	gallivm: Use nextafterf(0.5, 0.0) as rounding constant The common truncf(x + 0.5) fails for the floating-point value just less than 0.5 (nextafterf(0.5, 0.0)). nextafterf(0.5, 0.0) + 0.5, after rounding is 1.0, thus truncf does not produce the desired value. The solution is to add nextafterf(0.5, 0.0) instead of 0.5 before truncating. This works for all values. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-11-28 11:22:47 -08:00
Juan A. Suarez Romero	e2ad94d928	docs: update calendar, add news item and link release notes for 18.2.6 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2018-11-28 19:20:09 +01:00
Juan A. Suarez Romero	a53a280479	docs: add sha256 checksums for 18.2.6 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `cfd1f8b92c`)	2018-11-28 19:20:09 +01:00
Juan A. Suarez Romero	f6ab6e2867	docs: add release notes for 18.2.6 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `3e741344d7`)	2018-11-28 19:20:09 +01:00
Nicolai Hähnle	c02390f8fc	egl/wayland: rather obvious build fix Fixes: `ce74a7bb8d` ("egl/wayland: plug memory leak in drm_handle_device()") Fixes: `c59d3aa4b9` ("egl/wayland: bail out when drmGetMagic fails")	2018-11-28 18:30:36 +01:00
Nicolai Hähnle	eb94b6bd5c	winsys/amdgpu: explicitly declare whether buffer_map is permanent or not Introduce a new driver-private transfer flag RADEON_TRANSFER_TEMPORARY that specifies whether the caller will use buffer_unmap or not. The default behavior is set to permanent maps, because that's what drivers do for Gallium buffer maps. This should eliminate the need for hacks in libdrm. Assertions are added to catch when the buffer_unmap calls don't match the (temporary) buffer_map calls. I did my best to update r600 for consistency (r300 needs no changes because it never calls buffer_unmap), even though the radeon winsys ignores the new flag. As an added bonus, this should actually improve the performance of the normal fast path, because we no longer call into libdrm at all after the first map, and there's one less atomic in the winsys itself (there are now no atomics left in the UNSYNCHRONIZED fast path). Cc: Leo Liu <leo.liu@amd.com> v2: - remove comment about visible VRAM (Marek) - don't rely on amdgpu_bo_cpu_map doing an atomic write Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-11-28 18:24:14 +01:00
Nicolai Hähnle	35eb81987c	winsys/amdgpu: add amdgpu_winsys_bo::lock We'll use it in the upcoming mapping change. Sparse buffers have always had one. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-11-28 18:23:29 +01:00
Eric Engestrom	e0f1f74eda	vulkan/wsi: fix s/,/;/ typo Fixes: `59e58c348e` "vulkan/wsi: Only wait on semaphores on the first swapchain" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-28 16:44:01 +00:00
Emil Velikov	ce74a7bb8d	egl/wayland: plug memory leak in drm_handle_device() As we fail to open the node, we leak the node/device name. v2: Log and then free() (Eric) Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-28 16:12:12 +00:00
Emil Velikov	c59d3aa4b9	egl/wayland: bail out when drmGetMagic fails Currently as the function fails, we pass uninitialized data to the authentication function. Stop doing that and print an warning when the function fails. v2: Plug memory leak in error path (Eric) Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v1) Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-28 16:11:22 +00:00
Eric Engestrom	9575cd2893	wsi/display: fix mem leak when freeing swapchains Fixes: `da997ebec9` "vulkan: Add KHR_display extension using DRM [v10]" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Keith Packard <keithp@keithp.com>	2018-11-28 12:09:54 +00:00
Gert Wollny	f08d107054	i965: Set the FBO error state INCOMPLETE_ATTACHMENT only for SRGB_R8 Originally the driver reported GL_FRAMEBUFFER_UNSUPPORTED in all cases, adding more specific error messages was not correct and broke many tests. Mostly revert this and only report GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT for MESA_FORMAT_R_SRGB8. Fixes: `ebcde34545` i965: be more specific about FBO completeness errors Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108805 Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-28 10:12:47 +01:00
Gert Wollny	d8bb88d0b4	i965: Explicitely handle swizzles for MESA_FORMAT_R_SRGB8 The format is emulated by using ISL_FORMAT_L8_SRGB, therefore we need to force swizzles for the GBA channels. However, doing this only based on the data type GL_RED breaks other formats, therefore, test specifically for the format. Fixes: `c5363869d4` i965: Force zero swizzles for unused components in GL_RED and GL_RG Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-28 10:07:02 +01:00
Gert Wollny	091295d7cb	virgl: Don't try handling server fences when they are not supported vtest doesn't implement the according API and would segfault: Program received signal SIGSEGV, Segmentation fault. #0 0x0000000000000000 in ?? () #1 in virgl_fence_server_sync at src/gallium/drivers/virgl/virgl_context.c:1049 #2 in st_server_wait_sync at src/mesa/state_tracker/st_cb_syncobj.c:155 so just don't do the call when the function pointers are not set. Fixes dEQP: dEQP-GLES3.functional.fence_sync.wait_sync_smalldraw dEQP-GLES3.functional.fence_sync.wait_sync_largedraw Fixes: `d1a1c21e76` virgl: native fence fd support Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Robert Foss <robert.foss@collabora.com>	2018-11-28 10:02:31 +01:00
Gert Wollny	073fdd7382	virgl,vtest: Initialize return value Avoids: Conditional jump or move depends on uninitialised value(s) at 0x9E2B39F: virgl_vtest_winsys_resource_cache_create (virgl_vtest_winsys.c:379) by 0x9E2725F: virgl_buffer_create (virgl_buffer.c:169) by 0x9E246D5: virgl_resource_create (virgl_resource.c:60) by 0xA0C1B9F: bufferobj_data (st_cb_bufferobjects.c:344) by 0xA0C1B9F: st_bufferobj_data (st_cb_bufferobjects.c:390) by 0x9F4ACE3: vbo_use_buffer_objects (vbo_exec_api.c:1136) by 0xA0C68C3: st_create_context_priv (st_context.c:416) by 0xA0C707A: st_create_context (st_context.c:598) by 0x9F81C6B: st_api_create_context (st_manager.c:918) by 0x9BBE591: dri_create_context (dri_context.c:161) by 0x9BB6931: driCreateContextAttribs (dri_util.c:473) by 0x4E97A44: drisw_create_context_attribs (drisw_glx.c:630) by 0x4E7C591: glXCreateContextAttribsARB (create_context.c:78) Uninitialised value was created by a stack allocation at 0x9E2B249: virgl_vtest_winsys_resource_cache_create (virgl_vtest_winsys.c:342) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Robert Foss <robert.foss@collabora.com>	2018-11-28 10:02:31 +01:00
Iago Toral Quiroga	e55cbf26ea	intel/compiler: fix register allocation in opt_peephole_sel This wasn't handling 64-bit cases properly. Found by inspection. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-11-28 08:28:27 +01:00
Matt Turner	6f737b9207	glsl: Remove unused member variable Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-27 22:29:53 -08:00
Matt Turner	1a210268b8	nir: Call fflush() at the end of nir_print_shader() We normally call with stderr which is unbuffered, so this won't affect that, but it does let me call nir_print_shader(nir, fopen("log", "w+")) from gdb and actually get the whole shader in my file. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-27 22:29:53 -08:00
Eric Anholt	e113b21cb7	v3d: Add renderonly support. I've been using this with the kmsro series to test v3d on VKMS without my old KMS hack in the v3d kernel driver. KMSRO still needs some cleanup, but v3d RO support seems reasonable.	2018-11-27 15:03:02 -08:00
Eric Anholt	55edafa73e	gallium: Remove unused variable in u_tests. Fixes: `0d17b685b1` ("gallium/u_tests: add a compute shader test that clears an image") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-11-27 15:02:57 -08:00
Bas Nieuwenhuizen	6569644bb6	radv: Align large buffers to the fragment size. Improves performance in Talos by about 15% (and significant improvements in RotR and possibly other but did not bench with final patch) on kernel 4.19 and earlier. On 4.20+ a similar effect comes from 433ca054949a "drm/amdgpu: try allocating VRAM as power of two" v2: Do not impact the alignment of the physical memory. Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> CC: <mesa-stable@lists.freedesktop.org>	2018-11-27 22:17:42 +01:00
Hyunjun Ko	76945e4140	freedreno: implements get_sample_position Since `1285f71d3e` landed, it needs to provide apps with proper sample position for MSAA. Currently no way to query this to hw, these are taken from blob driver. Fixes: dEQP-GLES31.functional.texture.multisample.samples_#.sample_position Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:03 -05:00
Rob Clark	5973a4d0b7	freedreno/a3xx: also set FSSUPERTHREADENABLE We set equiv bit in SP_FS_CTRL_REG0. Somehow the hw doesn't hang with this mismatched config, but does run slower. It is faster with either neither bit set, or both bits set, but both is the fastest of the three configurations. Worth a bit over 10% gain in glmark2. Spotted-by: Jonathan Marek <jonathan@marek.ca> Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:03 -05:00
Jonathan Marek	e68cd91251	freedreno: use MSM_BO_SCANOUT with scanout buffers Signed-off-by: Jonathan Marek <jonathan@marek.ca>	2018-11-27 15:44:03 -05:00
Jonathan Marek	3ed4aad524	freedreno: use GENERIC instead of TEXCOORD for blit program blip_fp uses GENERIC as input, so blit_vp should match for linking Signed-off-by: Jonathan Marek <jonathan@marek.ca> Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:03 -05:00
Jonathan Marek	3a273a4abc	freedreno: a2xx texture update Adds all missing texture related logic. For everything to work it also needs changes to ir2/fd2_program, which are part of the ir2 update patch. Note: it needs rnndb update Signed-off-by: Jonathan Marek <jonathan@marek.ca> [remove stray patch] Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:03 -05:00
Jonathan Marek	4887aba638	freedreno/a2xx: Compute depth base in gmem correctly Note: it needs rnndb update Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Jonathan Marek <jonathan@marek.ca> Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:03 -05:00
Jonathan Marek	e7114575f7	freedreno/a2xx: set VIZ_QUERY_ID on a20x Signed-off-by: Jonathan Marek <jonathan@marek.ca> Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:03 -05:00
Jonathan Marek	a50b8a0152	freedreno: add missing a20x ids 200: 256KiB GMEM A200 (imx53) 201: 128KiB GMEM A200 (imx51) Signed-off-by: Jonathan Marek <jonathan@marek.ca> Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:03 -05:00
Jonathan Marek	4e6ee033ff	freedreno/a2xx: fix POINT_MINMAX_MAX overflow As it stands, it overflows to zero. Signed-off-by: Jonathan Marek <jonathan@marek.ca> Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:03 -05:00
Jonathan Marek	78fede86d9	freedreno: a2xx: fd2_draw update Signed-off-by: Jonathan Marek <jonathan@marek.ca> Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Jonathan Marek	3e7186d472	nir: add fceil lowering lowers ceil(x) as -floor(-x) Signed-off-by: Jonathan Marek <jonathan@marek.ca> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Rob Clark	11593f9041	freedreno: update generated headers Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Rob Clark	d47d77d49d	freedreno/a6xx: set guardband clip On older gens, the CLIP_ADJ bitfields were actually 3.6 fixed point. Which might make more sense. Although this formula comes up with values pretty close to what blob does for various viewport sizes (for at least a5xx and a6xx), and seems to work. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Rob Clark	2773919f06	freedreno/a6xx: disable LRZ for z32 `f6131d4ec7` had the side effect of enabling LRZ w/ 32b depth buffers. But there are some bugs with this, which aren't fully understood yet, so for now just skip LRZ w/ z32.. Fixes: `f6131d4ec7` freedreno/a6xx: Clear z32 and separate stencil with blitter Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Kristian H. Kristensen	9595be67a9	freedreno/a6xx: Clear gmem buffers at flush time We generate an IB to clear the gmem at flush time and jump to it before rendering each tile. This lets us get rid of the command stream patching for gmem offsets. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Kristian H. Kristensen	b5a9bb28c6	freedreno/a6xx: Move resolve blits to an IB Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Kristian H. Kristensen	5f068cf3b0	freedreno/a6xx: Move restore blits to IB Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Rob Clark	09300bbe03	mesa/st: better colormask check for clear fallback For RGB surfaces (for example) we don't really care that the colormask is 0x7 instead of 0xf. This should not trigger clear_with_quad() slowpath. Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-11-27 15:44:02 -05:00
Rob Clark	65cee01430	mesa/st: swap order of clear() and clear_with_quad() If we can't clear all the buffers with pctx->clear() (say, for example, because of ColorMask), push the buffers we can clear with pctx->clear() first. Tilers want to see clears coming before draws to enable fast- paths, and clearing one of the attachments with a quad-draw first confuses that logic. Signed-off-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-11-27 15:44:02 -05:00
Rob Clark	aa0fed10d3	freedreno: move ir3 to common location Move (most of) the ir3 compiler to src/freedreno/ir3 so that it can be re-used by some future vulkan driver. The parts that are gallium specific have been refactored out and remain in the gallium driver. Getting the move done now so that it can happen before further refactoring to support a6xx specific instructions. NOTE also removes ir3_cmdline compiler tool from autotools build since that was easier than fixing it and I normally use meson build. Waiting patiently for the day that we can remove everything from the autotools build. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Rob Clark	556eec249d	freedreno/ir3: remove u_inlines usage Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Rob Clark	312eae45a3	freedreno/ir3: split up ir3_shader Split the parts that are gallium specific into ir3_gallium so the rest can move to a common location outside of gallium. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Rob Clark	ea4cbf601d	freedreno/ir3: remove pipe_stream_output_info dependency A bit annoying to have to copy into our own struct. But this is something the compiler really needs to know, at least on earlier generations where streamout is implemented in shader. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Rob Clark	030e98630d	freedreno/ir3: some header file cleanup Clean up some of the low-hanging-fruit usages of freedreno_util.h Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Rob Clark	2482153d52	freedreno/ir3: use env_var_as_unsigned() Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Rob Clark	a321f939f6	util: env_var_as_unsigned() helper So I can drop env2u() helper from freedreno_util.h and get rid of one small ir3 dependency on gallium/freedreno Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Rob Clark	bfd8d26372	freedreno/ir3: move disasm and optmsgs debug flags Move them to IR3_SHADER_DEBUG so we can remove ir3's dependency on fd_mesa_debug. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Rob Clark	424d75656f	freedreno: FD_SHADER_DEBUG -> IR3_SHADER_DEBUG Only used by ir3, so move it into ir3 to be more self contained. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Rob Clark	8a654f092e	freedreno: remove shader_stage_name() Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Rob Clark	c635703c50	freedreno: shader_t -> gl_shader_stage Just massive search/replace for the most part. Step towards removing ir3 dependency on disasm.h which is shared by a2xx. One step closer to being able to move ir3 out of gallium. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Rob Clark	388aac32ed	freedreno/ir3: standalone compiler updates Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Rob Clark	b4476138d5	freedreno: move drm to common location So that we can re-use at least parts of it for vulkan driver, and so that we can move ir3 to a common location (which uses fd_bo to allocate storage for shaders) Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Rob Clark	6cb74eb4f1	freedreno/drm: remove dependency on gallium driver Prep work to move drm to a common location. Slightly hacky, but the softpin debug flag is only temporary. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Dylan Baker	88c4680b5a	util: promote u_memory to src/util as well as os_memory* Reviewed-by: Rob Clark <robdclark@gmail.com>	2018-11-27 15:44:02 -05:00
Eric Anholt	bade179153	gallium: Fix uninitialized variable warning in compute test. The compiler doesn't know that ny != 0, so x might be uninitialized for the printf at the end. Reviewed-by: Elie Tournier <elie.tournier@collabora.com>	2018-11-27 11:23:22 -08:00
Bas Nieuwenhuizen	08ea6b9d9b	radv: Clamp gfx9 image view extents to the allocated image extents. Mirrors AMDVLK. Looks like if we go over the alignment of height we actually start to change the addressing. Seems like the extra miplevels actually work with this. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108245 Fixes: `f6cc15dccd` "radv/gfx9: fix block compression texture views. (v2)" Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-11-27 10:19:52 +01:00
Iago Toral Quiroga	453570cd8c	intel/compiler: fix indentation style in opt_algebraic()	2018-11-27 09:53:09 +01:00
Anuj Phogat	16e4911972	anv/icl: Set use full ways in L3CNTLREG L3 allocation table in h/w specification recommends using 4 KB granularity for programming allocation fields in L3CNTLREG. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2018-11-26 15:11:36 -08:00
Anuj Phogat	3f55fd3814	intel/icl: Set way_size_per_bank to 4 Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2018-11-26 15:11:36 -08:00
Anuj Phogat	3ce04da5b4	i965/icl: Set use full ways in L3CNTLREG L3 allocation table in h/w specification recommends using 4 KB granularity for programming allocation fields in L3CNTLREG. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2018-11-26 15:11:36 -08:00
Anuj Phogat	3282c7be89	i965/icl: Fix L3 configurations Use L3 configuration specified in h/w specification. V2: Drop configs which do under allocation of l3 cache. Bump up the comment above table. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2018-11-26 15:11:36 -08:00
Eric Engestrom	c0c533767e	build: stop defining unused VERSION Scons and autotools don't define it, and as of last commit nothing uses it. `VERSION` is also a generic enough name that something somewhere will eventually clash, and we don't want to repeat the LLVM `DEBUG` fiasco. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-11-26 22:05:02 +00:00
Eric Engestrom	bd12e02530	vulkan/utils: s/VERSION/PACKAGE_VERSION/ Everything else uses PACKAGE_VERSION, so let's be consistent, and VERSION and PACKAGE_VERSION are currently defined to be the same in meson and android, while VERSION is undefined in autotools and scons. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-11-26 22:05:02 +00:00
Eric Engestrom	56d126f8fd	anv: correctly use vulkan 1.0 by default Per chapter 3.2 "Instances": > Providing a NULL VkInstanceCreateInfo::pApplicationInfo or providing > an apiVersion of 0 is equivalent to providing an apiVersion of > VK_MAKE_VERSION(1,0,0). Reported-by: Niklas Haas <git@haasn.xyz> Fixes: `8c048af589` "anv: Copy the appliation info into the instance" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-26 22:05:02 +00:00
Erik Faye-Lund	d6d35d87f1	mesa/main: fixup requirements for GL_PRIMITIVES_GENERATED This enum is also allowed by EXT_tessellation_shader, which is supported on older i965 HW (as opposed to OES_geometry_shader). This was missed when narrowing this code-path, leading to dEQP regressions. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108868 Fixes: `f09d94fbd1` "mesa/main: fix validation of transform-feedback queries" Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Mark Janes <mark.a.janes@intel.com>	2018-11-26 22:12:07 +01:00
Erik Faye-Lund	c120dbfe4d	mesa/main: fix incorrect depth-error If glGetTexImage or glGetnTexImage is called with a level that doesn't exist, we get an error message on this form: Mesa: User error: GL_INVALID_VALUE in glGetTexImage(depth = 0) This is clearly nonsensical, because these APIs don't even have a depth-parameter. The reason is that get_texture_image_dims() return all-zero dimensions for non-existent texture-images, and we go on to validate these dimensions as if they were user-input, because glGetTextureSubImage requires checking. So let's split this logic in two, so glGetTextureSubImage can have stricter input-validation. All arguments that are no longer validated are generated internally by mesa, so there's no use in validating them. Fixes: `42891dbaa1` "gettextsubimage: verify zoffset and depth are correct" Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>	2018-11-26 12:29:54 +01:00
Erik Faye-Lund	38af69adfa	mesa/main: check cube-completeness in common code This check is the only part of dimensions_error_check that isn't about error-checking the offset and size arguments of glGet[Compressed]TextureSubImage(), so it doesn't really belong in here. This doesn't make a difference right now, apart for changing the presedence of this error. But it will make a difference for the next patch, where we no longer call this method from the non-sub tex-image getters. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>	2018-11-26 12:29:54 +01:00
Erik Faye-Lund	42820c5727	mesa/main: factor out common error-checking This error checking is the same for teximage and texsubimage getters, so let's factor it out to its own function. This will be useful when getteximage and gettexsubimage gets their own error checking routines a bit later. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>	2018-11-26 12:29:54 +01:00
Erik Faye-Lund	5e0a84f31c	mesa/main: factor out tex-image error-checking This will be useful when we split error-checking for getteximage and gettexsubimage later. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>	2018-11-26 12:29:54 +01:00
Erik Faye-Lund	38bbb61252	mesa/main: remove bogus error for zero-sized images The explanation quotes the spec on the following wording to justify the error: "An INVALID_VALUE error is generated if xoffset + width is greater than the texture’s width, yoffset + height is greater than the texture’s height, or zoffset + depth is greater than the texture’s depth." However, this shouldn't generate an error in the case where all three of width, xoffset and the texture's width are zero. In this case, we end up generating an unspecified error. So let's remove this check, and instead make sure that we consider this as an empty texture. So let's not generate an error, there's non mandated in the spec in xoffset/yoffset/zoffset = 0 case. We already avoid doing any work in this case, because of the final, non-error generating check in this function. Fixes: `b37b35a5d2` "getteximage: assume texture image is empty for non defined levels" Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>	2018-11-26 12:29:54 +01:00
Erik Faye-Lund	f1998e15ff	mesa/main: remove ARB suffix from glGetnTexImage This function has been core since OpenGL 4.3, so naming the implementation and reporting erros using an ARB-suffix can be confusing. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>	2018-11-26 12:29:54 +01:00
Gert Wollny	f5d053702f	glsl: free or reuse memory allocated for TF varying When a shader program is de-serialized the gl_shader_program passed in may actually still hold memory allocations for the transform feedback varyings. If that is the case, free the varying names and reallocate the new storage for the names array. This fixes a memory leak: Direct leak of 48 byte(s) in 6 object(s) allocated from: in malloc (/usr/lib64/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so+0xdb880) in transform_feedback_varyings ../../samba/mesa/src/mesa/main/transformfeedback.c:875 in _mesa_TransformFeedbackVaryings ../../samba/mesa/src/mesa/main/transformfeedback.c:985 ... Indirect leak of 42 byte(s) in 6 object(s) allocated from: in __interceptor_strdup (/usr/lib64/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so+0x761c8) in transform_feedback_varyings ../../samba/mesa/src/mesa/main/transformfeedback.c:887 in _mesa_TransformFeedbackVaryings ../../samba/mesa/src/mesa/main/transformfeedback.c:985 Fixes: `ab2643e4b0` glsl: serialize data from glTransformFeedbackVaryings Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-26 09:58:25 +01:00
Bas Nieuwenhuizen	3c96a1e3a9	radv: Fix opaque metadata descriptor last layer. We used the layer count which results in an off by one error. Not sure this really affects anything. Fixes: `f4e499ec79` "radv: add initial non-conformant radv vulkan driver" Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-11-26 09:29:39 +01:00
Mathias Fröhlich	ff466c2d48	mesa/st: Make st_pipe_vertex_format static. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-26 07:57:10 +01:00
Mathias Fröhlich	2a3eae82a1	mesa/st: Use binding information from the VAO in feedback rendering. Use VAO binding information in feedback rendering. In theory it should reduce the amount of buffer objects scheduled for rendering. Feedback rendering is implemented in a crude way anyhow, so I do not expect much gain here. But for the sake of code reuse we should use the same code for the same task. And finally if feeback rendering may get improved the array setup is already well done there. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-26 07:57:10 +01:00
Mathias Fröhlich	a00a8fb8d1	mesa/st: Avoid extra references in the feedback draw function scope. The change removes the reference that is held on the entries of the vbuffers[] array. The new code does not do that anymore as following the code into draw_set_vertex_buffers() the draw context holds an other reference as long as it is reset down the function again. So it should be already by that argument save to remove that additional reference count. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-26 07:57:10 +01:00
Mathias Fröhlich	6705188cc5	mesa/st: Factor out array and buffer setup from st_atom_array.c. Factor out vertex array setup routines from the array state atom. The factored functions will be used in feedback rendering in the next change. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-26 07:57:09 +01:00
Mathias Fröhlich	774d585d49	mesa/st: Only unmap the uploader that was actually used. In st_atom_array, we only need to unmap the upload buffer that was actually used. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-26 07:57:09 +01:00
Mathias Fröhlich	65332aff29	mesa/st: Only care about the uploader if it was used. In st_atom_array, we only need to care for unmapping the upload buffer if we actually used it. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-26 07:57:09 +01:00
Ilia Mirkin	927ce66b39	nv50/ir: remove dnz flag when converting MAD to ADD due to optimizations dnz flag only applies for multiplications (e.g. to make 0 * Infinity becomes 0 instead of NaN). Once we optimize a MAD into an ADD, the dnz flag no longer makes sense, and upsets the GM107 emitter (since it looks at the ftz and dnz flags together). Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2018-11-24 22:15:53 -05:00
Marek Olšák	d4e7d8b7f0	winsys/amdgpu: fix a device handle leak in amdgpu_winsys_create Cc: 18.2 18.3 <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-23 17:08:44 -05:00
Marek Olšák	82aa07f81f	winsys/amdgpu: fix a buffer leak in amdgpu_bo_from_handle Cc: 18.2 18.3 <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-23 17:08:42 -05:00
Samuel Pitoiset	9fc1ce258c	radv: ignore subpass self-dependencies for CreateRenderPass() too We really need to refactor this... Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-23 11:59:11 +01:00
Samuel Pitoiset	2951a766bd	radv: remove useless sync before CmdClear{Color,DepthStencil}Image() We don't need to flush anything before these two commands as well. This is because they have to be externally synchronized, so the app should have called CmdPipelineBarrier() prior to that and the driver should have flushed the caches. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-23 11:59:08 +01:00
Erik Faye-Lund	a652842982	mesa/main: remove overly strict query-validation The rules encoded in this code also applies to OpenGL ES 3.0 and up, but the per-enum validation has already been taught about these rules. So let's get rid of this duplicate, narrow version of the validation. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-23 10:48:36 +01:00
Erik Faye-Lund	d52be6dd29	mesa/main: fix validation of GL_TIMESTAMP ctx->Extensions.ARB_timer_query is set based on the driver- capabilities, not based on the context type. We need to check against _mesa_has_ARB_timer_query(ctx) instead to figure out if the extension is really supported. We also need to check for EXT_disjoint_timer_query for GLES-support. This shouln't have any functional effect, as this entry-point is only valid on desktop GL, or on GLES with EXT_disjoint_timer_query in the first place. But if this gets added to the core of a future version of ES, this should be a step in the right direction. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-23 10:48:36 +01:00
Erik Faye-Lund	7a4d74c35a	mesa/main: fix validation of ARB_query_buffer_object ctx->Extensions.ARB_query_buffer_object is set based on the driver- capabilities, not based on the context type. We need to check against _mesa_has_ARB_query_buffer_object(ctx) instead to figure out if the extension is really supported. This turns attempts to read queries into buffer objects on ES 3 into errors, as required by the spec. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-23 10:48:36 +01:00
Erik Faye-Lund	75e39b59dc	mesa/main: fix validation of transform-feedback overflow queries ctx->Extensions.ARB_transform_feedback_overflow_query is set based on the driver-capabilities, not based on the context type. We need to check against _mesa_has_RB_transform_feedback_overflow_query(ctx) instead to figure out if the extension is really supported. This turns usage of GL_TRANSFORM_FEEDBACK_STREAM_OVERFLOW and GL_TRANSFORM_FEEDBACK_OVERFLOW into errors on ES 3, as required by the spec. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-23 10:48:36 +01:00
Erik Faye-Lund	f09d94fbd1	mesa/main: fix validation of transform-feedback queries ctx->Extensions.EXT_transform_feedback is set based on the driver- capabilities, not based on the context type. We need to check against _mesa_has_EXT_transform_feedback(ctx) instead to figure out if the extension is really supported. We also need to check for OES_geometry_shader. This turns usage of GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN into an error on ES 2, as well as usage of GL_PRIMITIVES_GENERATED on ES 3, both as required by the spec. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-23 10:48:36 +01:00
Erik Faye-Lund	b551fe5fa7	mesa/main: fix validation of GL_TIME_ELAPSED ctx->Extensions.EXT_timer_query is set based on the driver- capabilities, not based on the context type. We need to check against _mesa_has_EXT_timer_query(ctx) instead to figure out if the extension is really supported. We also need to check for EXT_disjoint_timer_query, which enables the same functionality for ES. This turns usage of GL_TIME_ELAPSED into an error on ES 3, as is required by the spec. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-23 10:48:36 +01:00
Erik Faye-Lund	059928e114	mesa/main: fix validation of GL_ANY_SAMPLES_PASSED_CONSERVATIVE ctx->Extensions.ARB_ES3_compatibility is set based on the driver- capabilities, not based on the context type. We need to check against _mesa_has_ARB_ES3_compatibility(ctx) instead to figure out if the extension is really supported. In addition, EXT_occlusion_query_boolean should also allow this behavior. This shouldn't cause any functional change, as all drivers that support ES3_compatibility should in practice enable either ES3_compatibility or EXT_occlusion_query_boolean under all APIs that export this symbol. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-23 10:48:35 +01:00
Erik Faye-Lund	8ea819dd60	mesa/main: fix validation of GL_ANY_SAMPLES_PASSED ctx->Extensions.ARB_occlusion_query2 is set based on the driver- capabilities, not based on the context type. We need to check against _mesa_has_ARB_occlusion_query2(ctx) instead to figure out if the extension is really supported. In addition, EXT_occlusion_query_boolean should also allow this behavior. This shouldn't cause any functional change, as all drivers that support ARB_occlusion_query2 should in practice enable either ARB_occlusion_query2 or EXT_occlusion_query_boolean under all APIs that export this symbol. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-23 10:48:35 +01:00
Erik Faye-Lund	fff1738d57	mesa/main: fix validation of GL_SAMPLES_PASSED ctx->Extensions.ARB_occlusion_query is set based on the driver- capabilities, not based on the context type. We need to check against _mesa_has_ARB_occlusion_query(ctx) instead to figure out if the extension is really supported. We also need to check for ARB_occlusion_query2, as ARB_occlusion_query isn't available in core contexts. This turns usage of GL_SAMPLES_PASSED into an error on ES 3, as is required by the spec. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-23 10:48:35 +01:00
Erik Faye-Lund	9c13ad0ea4	mesa/main: simplify pipeline-statistics query validation The _mesa_has_ARB_pipeline_statistics_query(ctx)-helper will already check the GLES-version according to the extension-table, so if this extension would ever be back-ported to ES, we only need to update the table to support this. This shouln't have any functional effect. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-23 10:48:35 +01:00
Erik Faye-Lund	dd4241b34f	mesa/main: use non-prefixed enums for consistency These enums all have the same values as their non-prefixed versions, and there's several aliases for some of them. So let's switch to the non-prefixed versions for simplicity. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-23 10:48:35 +01:00
Erik Faye-Lund	ba4e8d3754	mesa/main: correct year for EXT_occlusion_query_boolean According to the extension spec, this was initially released in 2011, so let's set this to the correct value. The value of 2001 could be a copy-paste mistake, as ARB_occlusion_query which this is based on was released then. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-23 10:48:35 +01:00
Erik Faye-Lund	35555b08d7	mesa/main: correct requirement for EXT_occlusion_query_boolean EXT_occlusion_query_boolean require support for GL_ANY_SAMPLES_PASSED, which ARB_occlusion_query doesn't supply. We need ARB_occlusion_query2 for this instead. This is still not 100% accurate, as we also require support for the GL_SAMPLES_PASSED_CONSERVATIVE target, which isn't guaranteed by either ARB_occlusion_query nor ARB_occlusion_query2. But it should be trivial to implement for any driver supporting ARB_occlusion_query2, as it can simply be implemented as GL_ANY_SAMPLES_PASSED. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-23 10:48:35 +01:00
Tapani Pälli	09adaa4b89	anv: allow exporting an imported SYNC_FD semaphore type Fixes issues with following SkQP tests: unitTest_VulkanHardwareBuffer_Vulkan_EGL_Syncs unitTest_VulkanHardwareBuffer_Vulkan_Vulkan_Syncs Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-23 07:49:46 +02:00
Eric Engestrom	896c59d690	glapi: add missing visibility args Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108829 Fixes: `3218056e0e` "meson: Build i965 and dri stack" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-22 18:21:05 +00:00
Jason Ekstrand	a24654b49d	anv/nir: Rework arguments to apply_pipeline_layout Instead of taking a whole pipeline (which could be anything!), just take a physical device and robust_buffer_access boolean. This makes it easier to verify that only the things in the hash actually affect pipeline compilation. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-11-22 09:17:28 -06:00
Jason Ekstrand	617e402b3d	anv: Put robust buffer access in the pipeline hash It affects apply_pipeline_layout. Shaders compiled with the wrong value will work but they may not be robust as requested by the app. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-11-22 09:17:10 -06:00
Jason Ekstrand	a845c2bc10	anv: Expose VK_EXT_scalar_block_layout Our compile already splits UBO loads into scalars and the untyped surface read messages we use for SSBO reads and writes only require dword alignment. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-11-22 08:16:47 -06:00
Jason Ekstrand	2ca9a4417d	vulkan: Update the XML and headers to 1.1.93 Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-11-22 08:16:40 -06:00
Samuel Pitoiset	4ff4af3d91	radv: remove useless sync after CmdClear{Color,DepthStencil}Image() 'post_flush' is only set to NULL for the normal clear path (ie. only vkCmdClearColorImage() and vkCmdClearDepthStencilImage() are affected commands). Because these two operations have to be externally synchronized with VK_PIPELINE_STAGE_TRANSFER_BIT and VK_ACCESS_TRANSFER_WRITE_BIT, it's useless to set those flags internallY. VK_PIPELINE_STAGE_TRANSFER_BIT will wait for compute to be idle, while VK_ACCESS_TRANSFER_WRITE_BIT will invalidate both L1 vector caches and L2. RADV_CMD_FLAG_WRITEBACK_GLOBAL_L2 will be superseded by RADV_CMD_FLAG_INV_GLOBAL_L2. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-22 08:56:36 +01:00
Bas Nieuwenhuizen	33b2f74e77	vulkan: Allow storage images in the WSI. Since apps also have to follow the ImageFormatProperties query, we can disallow formats that don't allow image stores (for AMD that would be SRGB formats). Note that this only affects anything if the app actually decides to use the flag. Had someone ask for this on IRC and at least on the AMD side we can support it. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-21 21:36:55 +01:00
Axel Davy	1f1d4d571a	st/nine: Remove thread_submit warning thread_submit can be useful even without DRI_PRIME, as it can help avoid missed pageflips. Signed-off-by: Axel Davy <davyaxel0@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com>	2018-11-21 19:55:28 +01:00
Axel Davy	d304f0aa31	st/nine: Allow 'triple buffering' with thread_submit The path allowing triple buffering behaviour wasn't implemented yet for thread_submit Signed-off-by: Axel Davy <davyaxel0@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com>	2018-11-21 19:55:28 +01:00
Robert Foss	19af208c7d	virgl: add assert and missing function parameter Verify the pipe_fd_type to be of PIPE_FD_TYPE_NATIVE_SYNC. Fixes: `d1a1c21e76` "virgl: native fence fd support" Suggested-by: Eric Engestrom <eric.engestrom@intel.com> Signed-off-by: Robert Foss <robert.foss@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-21 15:59:00 +01:00
Gert Wollny	61b535437e	r600: clean up the GS ring buffers when the context is destroyed This fixes two memory leaks reported by ASAN: Direct leak of 248 byte(s) in 1 object(s) allocated from: in malloc (/usr/lib64/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so+0xdb880) in r600_alloc_buffer_struct ../../samba/mesa/src/gallium/drivers/r600/r600_buffer_common.c:578 in r600_buffer_create ../../samba/mesa/src/gallium/drivers/r600/r600_buffer_common.c:600 in r600_resource_create_common ../../samba/mesa/src/gallium/drivers/r600/r600_pipe_common.c:1265 in r600_resource_create ../../samba/mesa/src/gallium/drivers/r600/r600_pipe.c:725 in pipe_buffer_create ../../samba/mesa/src/gallium/auxiliary/util/u_inlines.h:291 in update_gs_block_state ../../samba/mesa/src/gallium/drivers/r600/r600_state_common.c:1482 Direct leak of 248 byte(s) in 1 object(s) allocated from: in malloc (/usr/lib64/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so+0xdb880) in r600_alloc_buffer_struct ../../samba/mesa/src/gallium/drivers/r600/r600_buffer_common.c:578 in r600_buffer_create ../../samba/mesa/src/gallium/drivers/r600/r600_buffer_common.c:600 in r600_resource_create_common ../../samba/mesa/src/gallium/drivers/r600/r600_pipe_common.c:1265 in r600_resource_create ../../samba/mesa/src/gallium/drivers/r600/r600_pipe.c:722 in pipe_buffer_create ../../samba/mesa/src/gallium/auxiliary/util/u_inlines.h:291 in update_gs_block_state ../../samba/mesa/src/gallium/drivers/r600/r600_state_common.c:1489 Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Fixes: `1371d65a7f` r600g: initial support for geometry shaders on evergreen (v2) Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-11-21 10:34:17 +01:00
Samuel Pitoiset	4b9bc4791b	radv: only sync CP DMA for transfer operations or bottom pipe CP DMA can only be busy when the driver copies buffers. The only affected Vulkan commands are vkCmdCopyBuffer() and vkCmdUpdateBuffer() (because we fallback to a copy depending on a threshold). Clear operations are currently not concerned because the driver always syncs after the last DMA operation. Per the spec, these two operations have to be externally synchronized with VK_PIPELINE_STAGE_TRANSFER_BIT. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-21 10:03:01 +01:00
Samuel Pitoiset	457ac6ce1e	radv: ignore subpass self-dependencies Unnecessary as they allow the app to call vkCmdPipelineBarrier() inside the render pass. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-21 10:02:59 +01:00
Iago Toral Quiroga	8e73b57634	Revert "nir/builder: Assert that intN_t immediates fit" This reverts commit `1f29f4db1e`. For this to work the compiler must ensure that it never puts the values that arrive to this helper into unsigned variables at any point in its processing, since that would not apply sign extension to the value and it would break the expectations here. Unfortunately, we use uint64_t extensively to pass and copy things around, so some times we get to this helper with values that are not properly sign extended to 64-bit. Here is an example for an 8-bit value that comes from a switch case: (gdb) p /x x $1 = 0xffffffd6 The value seems to have been sign extended to 32-bit at some point getting proper sign extension, but then copied into a uint64_t which wont' apply sign extension, breaking the expectations of the assertion. Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-21 08:12:50 +01:00
Iago Toral Quiroga	387888e3b7	nir/from_ssa: fix bit-size of temporary register Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-21 08:07:22 +01:00
Mathias Fröhlich	2d3c466add	mesa: Remove unneeded bitfield widths from the VAO. With the current VAO layout we do not need to make these fields a bitfield. We get a tight struct layout with this change for VAO attributes. v2: Change unsigned char -> GLubyte. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-21 06:27:19 +01:00
Mathias Fröhlich	0a7020b4e6	mesa: Factor out struct gl_vertex_format. Factor out struct gl_vertex_format from array attributes. The data type is supposed to describe the type of a vertex element. At this current stage the data type is only used with the VAO, but actually is useful in various other places. Due to the bitfields being used, special care needs to be taken for the glGet code paths. v2: Change unsigned char -> GLubyte. Use struct assignment for struct gl_vertex_format. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-21 06:27:19 +01:00
Mathias Fröhlich	2da7b0a2fb	tnl: Use gl_array_attribute::_ElementSize. Instead of open coding the size computation, use the already available gl_array_attribute::_ElementSize value. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-21 06:27:19 +01:00
Mathias Fröhlich	a4c01839c2	nouveau: Use gl_array_attribute::_ElementSize. Instead of open coding the size computation, use the already available gl_array_attribute::_ElementSize value. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-21 06:27:19 +01:00
Mathias Fröhlich	182ed6de8c	mesa: Unify glEdgeFlagPointer data type. Use GL_UNSIGNED_BYTE as initialization data type for the edge flag vertex attribute array. The same datatype is used in the glEdgeFlagPointer function when setting the array pointer. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-21 06:27:19 +01:00
Mathias Fröhlich	1b743e2966	mesa: Work with bitmasks when en/dis-abling VAO arrays. For enabling or disabling VAO arrays it is now possible to change a set of arrays with a single call without the need to iterate the attributes. Make use of this technique in the vao module. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-21 06:27:19 +01:00
Mathias Fröhlich	3c46fa5988	mesa: Remove gl_array_attributes::Enabled. Now that all users go via the VAO Enabled bitfield, get rid of the Enabled boolean. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-21 06:27:19 +01:00
Mathias Fröhlich	093aeb3565	mesa: Use gl_vertex_array_object::Enabled for glGet. Instead of using gl_array_attributes::Enabled use the much more compact representation stored in gl_vertex_array_object::Enabled using the corresponding bits. Keep the glGet changes in a seperate patch at least for review. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-21 06:27:19 +01:00
Mathias Fröhlich	1217a8448c	mesa: Use the gl_vertex_array_object::Enabled bitfield. Instead of using gl_array_attributes::Enabled use the much more compact representation stored in gl_vertex_array_object::Enabled using the corresponding bits. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-21 06:27:19 +01:00
Mathias Fröhlich	73d2d313e9	mesa: Rename gl_vertex_array_object::_Enabled -> Enabled. Mark the up to now derived bitfield value now as primary value by removing the underscore. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-21 06:27:19 +01:00
Marek Olšák	ea9f95e2a6	radeonsi: go back to using bottom-of-pipe for beginning of TIME_ELAPSED Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102597 Cc: 18.3 <mesa-stable@lists.freedesktop.org> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-11-20 21:18:48 -05:00
Marek Olšák	6c1a34d2e7	radeonsi: don't send data after write-confirm with BOTTOM_OF_PIPE_TS There are no writes. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-11-20 21:18:46 -05:00
Marek Olšák	bc5adc27b5	st/mesa: pin driver threads to a fixed CCX when glthread is enabled radeonsi has 3 driver threads (glthread, gallium, winsys), other drivers may have 2 (glthread, gallium), so it makes sense to pin them to a random CCX and keep that irrespective of the app thread. Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-11-20 21:18:43 -05:00
Marek Olšák	48f2160936	st/mesa: regularly re-pin driver threads to the CCX where the app thread is This is used when glthread is disabled. Mesa pretty much chases the app thread on the CPU. The performance is the same as pinning the app thread. Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-11-20 21:18:30 -05:00
Marek Olšák	ce7f84eb77	drirc: enable glthread for Talos Principle Ryzen 1700X, Vega 56, 1600x900, 4xAA: improvement +4.4% Immediate mode was needed. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-11-20 21:17:42 -05:00
Marek Olšák	7f1cac7ba6	mesa/glthread: enable immediate mode Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-11-20 21:17:41 -05:00
Marek Olšák	247d5a8e94	mesa/glthread: pass the function name to _mesa_glthread_restore_dispatch If you insert printf there, you'll know why glthread was disabled. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-11-20 21:17:38 -05:00
Marek Olšák	25d95ed535	gallium/u_tests: fix MSVC build by using old-style zero initializers	2018-11-20 19:06:40 -05:00
Kenneth Graunke	562448b75a	i965: Do NIR shader cloning in the caller. This moves nir_shader_clone() to the driver-specific compile function, rather than the shared src/intel/compiler code. This allows i965 to do key-specific passes before calling brw_compile_*. Vulkan should not need this cloning as it doesn't compile multiple variants. We do need to continue cloning in the compute shader code because we lower various things in NIR based on the SIMD width. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2018-11-20 15:53:46 -08:00
Kenneth Graunke	6a10dd08f4	i965: Use a 'nir' temporary rather than poking at brw_program It's shorter and will also be useful when I adjust cloning soon. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2018-11-20 15:53:46 -08:00
Marek Olšák	0d17b685b1	gallium/u_tests: add a compute shader test that clears an image	2018-11-20 18:50:48 -05:00
Dave Airlie	3486fe655a	ac: handle cast derefs Just give back the same value for now. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-21 08:54:46 +10:00
Dave Airlie	baa4bdd3a6	radv: handle loading from shared pointers We won't have a var to load from, so don't try to the processing required if we don't need it. This avoids crashes in: dEQP-VK.spirv_assembly.instruction.compute.variable_pointers.compute.workgroup_two_buffers Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-21 08:54:42 +10:00
Dave Airlie	ec9fe8abc7	ac: avoid casting pointers on bcsel and stores For variable pointers we really don't want to case the pointers to int without a good reason, just add a wrapper for bcsel loading and result storing. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-21 08:54:25 +10:00
Dylan Baker	a999798daa	meson: Add tests to suites Meson test has a concepts of suites, which allow tests to be grouped together. This allows for a subtest of tests to be run only (say only the tests for nir). A test can be added to more than one suite, but for the most part I've only added a test to a single suite, though I've added a compiler group that includes nir, glsl, and glcpp tests. To use this you'll need to invoke meson test directly, instead of ninja test (which always runs all targets). it can be invoked as: `meson test -C builddir --suite $suitename` (meson test has addition options that are pretty useful). Tested-By: Gert Wollny <gert.wollny@collabora.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-20 09:09:22 -08:00
Andrii Simiklit	b787dcf57b	i965/batch: avoid reverting batch buffer if saved state is an empty There's no point reverting to the last saved point if that save point is the empty batch, we will just repeat ourselves. v2: Merge with new commits, changes was minimized, added the 'fixes' tag v3: Added in to patch series v4: Fixed the regression which was introduced by this patch Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108630 Reported-by: Mark Janes <mark.a.janes@intel.com> The solution provided by: Jordan Justen <jordan.l.justen@intel.com> CC: Chris Wilson <chris@chris-wilson.co.uk> Fixes: `3faf56ffbd` "intel: Add an interface for saving/restoring the batchbuffer state." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107626 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108630 (fixed in v4) Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-11-20 06:33:43 -08:00
Emil Velikov	982e012b3a	travis: adding missing x11-xcb for meson+vulkan Required by the x11 WSI Fixes: `df82012b2c` ("travis: add meson build for vulkan drivers.") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-11-20 11:16:46 +00:00
Emil Velikov	5bc509363b	glx: make xf86vidmode mandatory for direct rendering Currently we detect the module and if missing, the glXGetMsc* API is effectively a stub, always returning false. This is what effectively has been happening with our meson build :-( Thus users have no chance of using it - they cannot even distinguish if the failure is due to a misconfigured build. There's no reason for keeping xf86vidmode optional - it has been available in all distributions for years. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Fixes: `a47c525f32` "meson: build glx"	2018-11-20 11:13:20 +00:00
Emil Velikov	84445a86d1	travis: drop unneeded x11proto-xf86vidmode-dev The only place where the package is needed is for building the DRI based libGL library. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Dylan Baker <dylan@pnwbakers.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-20 11:13:20 +00:00
Samuel Pitoiset	f4563d8f5b	ac/nir: fix intrinsic name string size in visit_image_atomic() Fixes an assertion in SoTTR. Fixes: `dd0172e865` ("radv: Use structured intrinsics instead of indexing workaround for GFX9.") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-20 10:23:45 +01:00
Bas Nieuwenhuizen	dd0172e865	radv: Use structured intrinsics instead of indexing workaround for GFX9. These force the index to be used in the instruction so we don't need the workaround. Totals: SGPRS: 1321642 -> 1321802 (0.01 %) VGPRS: 943664 -> 943788 (0.01 %) Spilled SGPRs: 28468 -> 28480 (0.04 %) Spilled VGPRs: 88 -> 89 (1.14 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 80 -> 80 (0.00 %) dwords per thread Code Size: 52415292 -> 52338932 (-0.15 %) bytes LDS: 400 -> 400 (0.00 %) blocks Max Waves: 233903 -> 233803 (-0.04 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 238344 -> 238504 (0.07 %) VGPRS: 232732 -> 232856 (0.05 %) Spilled SGPRs: 13125 -> 13137 (0.09 %) Spilled VGPRs: 88 -> 89 (1.14 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 80 -> 80 (0.00 %) dwords per thread Code Size: 15752712 -> 15676352 (-0.48 %) bytes LDS: 139 -> 139 (0.00 %) blocks Max Waves: 31680 -> 31580 (-0.32 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-11-19 23:36:00 +01:00
Kenneth Graunke	0990168642	i965: Allow only one slot of clip distances to be set on Gen4-5. The existing backend code assumed that if VARYING_SLOT_CLIP_DIST0 was written, then VARYING_SLOT_CLIP_DIST1 would be as well. That's true with the current lowering, but not necessary if there are 4 or fewer clip distances. Separate out the checks to allow this. The new NIR-based lowering will trigger this case, which would have caused backend validation errors (src is null) without this patch. Reviewed-by: Eric Anholt <eric@anholt.net>	2018-11-19 14:33:16 -08:00
Kenneth Graunke	5b682143da	nir: Make nir_lower_clip_vs optionally work with variables. The way nir_lower_clip_vs() works with store_output intrinsics makes a ton of assumptions about the driver_location field. In i965 and iris, I'd rather do this lowering early and work with variables. v3d may want to switch to that as well, and ir3 could too, but I'm not sure exactly what would need updating. For now, handle both methods. Reviewed-by: Eric Anholt <eric@anholt.net>	2018-11-19 14:33:16 -08:00
Kenneth Graunke	d0f746b645	nir: Save nir_variable pointers in nir_lower_clip_vs rather than locs. I'll want the variables in the next patch. Reviewed-by: Eric Anholt <eric@anholt.net>	2018-11-19 14:33:16 -08:00
Kenneth Graunke	63c8696874	nir: Inline lower_clip_vs() into nir_lower_clip_vs(). It's now called exactly once, and there's not really any distinction. Reviewed-by: Eric Anholt <eric@anholt.net>	2018-11-19 14:33:14 -08:00
Kenneth Graunke	bfa789aceb	nir: Use nir_shader_get_entrypoint in nir_lower_clip_vs(). Reviewed-by: Eric Anholt <eric@anholt.net>	2018-11-19 14:31:20 -08:00
Dave Airlie	c8a35285f0	nir: handle shared pointers in lowering indirect derefs. Check if the base ends up with no variable, and continue if we see that case outside the loop. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-20 05:36:52 +10:00
Dave Airlie	760859cac2	nir: move getting deref from var after we check deref type. I posted a load of hacks before to do this, Jason suggested this, just check the deref mode, not the variable mode and delay getting the variable until we know the type. avoids crashes when derefing shared memory pointers. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-20 05:36:38 +10:00
Dave Airlie	2f4f5a5055	spirv/vtn: handle variable pointers without offset lowering Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-20 05:36:16 +10:00
Jason Ekstrand	dca35c598d	intel/fs,vec4: Fix a compiler warning ../src/intel/compiler/brw_fs_nir.cpp:3534:46: warning: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘int’ [-Wsign-compare] assert(nir_intrinsic_write_mask(instr) == ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ (1 << instr->num_components) - 1); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This was caused by `6339aba775` which added these completely valid checks. However clang likes to complain about signedness mismatches. Fixes: `6339aba775` "intel/compiler: Lower SSBO and shared..." Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2018-11-19 09:57:41 -06:00
Jason Ekstrand	060817b2fa	intel,nir: Move gl_LocalInvocationID lowering to nir_lower_system_values It's not at all intel-specific; the formula is dictated by OpenGL and Vulkan. The only intel-specific thing is that we need the lowering. As a nice side-effect, the new version is variable-group-size ready. Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>	2018-11-19 09:57:41 -06:00
Eric Engestrom	486091bc00	gbm: add missing comma between strings Fixes: `d971a4230d` "loader: Factor out the common driver opening logic from each loader." Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-11-19 15:50:56 +00:00
Samuel Pitoiset	724107553c	radv: implement fast HTILE clears for depth or stencil only on GFX9 This allows to fast clear the depth part (or the stencil part) of a depth+stencil surface when HTILE is enabled. I didn't test on GFX8, so it's disabled currently. This gives a very nice boost, for example when clearing the depth aspect of a 4096x4096 D32_SFLOAT_S8_UINT image (18x faster). BEFORE: 235 us AFTER: 13 us Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-19 16:32:18 +01:00
Samuel Pitoiset	7dcddbe54d	radv: rewrite the condition that checks allowed depth/stencil values Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-19 16:32:16 +01:00
Samuel Pitoiset	9133bbf186	radv: check allowed fast HTILE clears a bit earlier Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-19 16:32:14 +01:00
Samuel Pitoiset	193ad4748b	radv: add radv_is_fast_clear_{depth,stencil}_allowed() helpers Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-19 16:32:12 +01:00
Samuel Pitoiset	c7e142ed78	radv: add radv_get_htile_fast_clear_value() helper Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-19 16:32:10 +01:00
Samuel Pitoiset	6f3fbcc041	radv: remove unnecessary goto in the fast clear paths Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-19 16:32:08 +01:00
Samuel Pitoiset	36006e3cec	radv/winsys: remove the max IBs per submit limit for the sysmem path This path will be eventually improved later but as it's only used on SI (or with RADV_DEBUG=noibs), I'm not sure if that matters much. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-19 16:32:06 +01:00
Samuel Pitoiset	4d30f2c6f4	radv/winsys: remove the max IBs per submit limit for the fallback path The chained submission is the fastest path and it should now be used more often than before. This removes some EOP events. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-19 16:32:04 +01:00
Lucas Stach	8ca8a6a7b1	etnaviv: use dummy RT buffer when rendering without color buffer At least GC2000 seems to push some dirt from the PE color cache into the last bound render target when drawing depth only. Newer cores seem to behave properly and don't do this, but I have found no way to fix it on GC2000. Flushes and stalls don't seem to make any difference. In order to stop the core from pushing the dirt into a precious real render target, plug in dummy buffer when rendering without a color buffer. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>	2018-11-19 15:48:10 +01:00
Dave Airlie	8706204074	virgl: fix vtest regression since fencing changes. The in_fence_fd needs to be initialised to -1. Fixes: `d1a1c21e7` (virgl: native fence fd support) Reviewed-by: Robert Foss <robert.foss@collabora.com>	2018-11-19 15:33:19 +01:00
Samuel Pitoiset	55c75d2b49	radv: always clear the FCE predicate after DCC/FMASK/CMASK decompressions DCC and FMASK also imply a fast-clear eliminate, so it should be safe to reset the predicate unconditionally. We still only skip FMASK or CMASK decompressions for now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-19 14:05:35 +01:00
Samuel Pitoiset	483a28bfd4	radv: tidy up radv_set_dcc_need_cmask_elim_pred() This is just a small cleanup. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-19 14:05:33 +01:00
Nicolai Hähnle	46a59ce026	radeonsi: fix an out-of-bounds read reported by ASAN We read 4 values out of sample_locs_8x, so make sure the array is big enough. Fixes: `ac76aeef20` ("radeonsi: switch back to standard DX sample positions") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-11-19 11:16:35 +01:00
Gert Wollny	d174cbccfa	r600: Only set context streamout strides info from the shader that has outputs With 5d517a streamout info is only attached to the shader for which the transform feedback is actually recorded, but the driver set the context info with each state submitted, thereby always using the info data that was attached to the vertex shader. Pass the streamout stride info to the context only from the shader that actually has outputs. (Thanks to Marek Olšák for pointing me in the right direction) Fixes regresion with: dEQP-GLES31.functional.tessellation.invariance.* Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108734 Fixes: `5d517a599b` st/mesa: Don't record garbage streamout information in the non-SSO case. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-11-19 11:06:56 +01:00
Gert Wollny	18a8e11aea	i965:use FRAMEBUFFER_UNSUPPORTED instead of FRAMEBUFFER_INCOMPLETE_DIMENSIONS FRAMEBUFFER_INCOMPLETE_DIMENSIONS is not supported for GLES 3.0 and later and not defined for Desktop OpenGL. Instead use FRAMEBUFFER_UNSUPPORTED like it was done before. Thanks to Iago Toral and Andrey Simiklit for pointing out the problem and the details. Fixes: `ebcde34545` i965: be more specific about FBO completeness errors Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-11-19 11:06:52 +01:00
Gert Wollny	40eca7d3e1	virgl: Use file descriptor instead of un-allocated object The structure qdws is not allocated at this point, nor is the file descriptor set to it's member. Use the fd directly instead. Fixes: `d1a1c21e76` virgl: native fence fd support Signed-off-by: Gert Wollny <gert.wollny@collabora.com>	2018-11-19 11:03:56 +01:00
Gert Wollny	78fdc507a3	i965: Add support for and expose EXT_texture_sRGB_R8 Emulate MESA_FORMAT_R_SRGB8 by using L8_UNORM_SRGB. This is possible because component swizzling is handled based on the mesa format and, hence, the a r001 swizzling can be used to correct the components. Enables and makes pass (tested on Kabylake) dEQP-GLES31.functional.srgb_texture_decode.skip_decode.sr8.* dEQP-GLES31.functional.texture.filtering.cube_array.formats.sr8* Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-19 08:05:44 +01:00
Gert Wollny	c5363869d4	i965: Force zero swizzles for unused components in GL_RED and GL_RG This makes it possible to use a hardware luminance format as RED format. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-19 08:05:44 +01:00
Gert Wollny	ebcde34545	i965: be more specific about FBO completeness errors The driver was returning GL_FRAMEBUFFER_UNSUPPORTED for all cases of an incomplete fbo, be a bit more specific about this following the description of glCheckFramebufferStatus. This helps to keeps dEQP happy when adding EXT_texture_sRGB_R8 support. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-19 08:05:44 +01:00
Gert Wollny	24a02157dd	i965: Correct L8_UNORM_SRGB table entry As the name says, the format is an sRGB format. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-19 08:05:44 +01:00
Robert Foss	70692adf48	virgl: Clean up fences commit Remove a dead variable, a int->bool conversion and some whitespace changes. Signed-off-by: Robert Foss <robert.foss@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-18 12:14:55 +01:00
Kenneth Graunke	c2e3d0f163	i915: Delete swizzling detection logic. This is all leftover from the i965 split. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-11-17 10:26:31 -08:00
Ilia Mirkin	beb66d3747	nv50/ir/ra: enforce max register requirement, and change spill order On nv50, certain operations must happen on regs below 64, due to encoding requirements. First of all, we add infrastructure to enforce this. Secondly we change the spill order to first spill RIG nodes that are unconstrained, followed by ones that are. This makes the gamecube logo shadertoy compile properly. Curiously, if we adjust the spill order so that we first spill the constrained RIG nodes instead, the RA also succeeds. However it seems more logical to first spill the unconstrained ones. While we're at it, drop the nv50 max register to reserve r127 as the zero register of last resort (r63 is preferred). Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Acked-by: Karol Herbst <kherbst@redhat.com>	2018-11-16 22:43:52 -05:00
Ilia Mirkin	799e021894	nv50/ir/ra: improve condition for short regs, unify with cond for 16-bit Instead of the size restriction existing in two places, and potentially being applied twice, we move this together. Ops with 16-bit register addresses can only take a short reg, and ops with immediates can only take a short reg. Of course we leave the immediate 0 in place since we know that it will be replaced by r63/r127 down the line, so don't treat zeroes as an immediate. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2018-11-16 20:53:33 -05:00
Ilia Mirkin	955d943c33	nv50/ir: delete MINMAX instruction that is no longer in the BB We removed the op from the BB, but it was still listed in its sources' uses. This could trip up some logic down the line which analyzes all the uses of an l-value, e.g. spilling. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2018-11-16 20:53:09 -05:00
Eric Anholt	7e9fc11ff8	egl: Print the actual message to the console from _eglError(). Previously we would print errors on the console like: libEGL debug: EGL user error 0x3001 (EGL_NOT_INITIALIZED) in eglInitialize When we had everything we needed for: libEGL debug: EGL user error 0x3001 (EGL_NOT_INITIALIZED) in eglInitialize: DRI2: failed to find EGLDevice (for a gbm error in my case) Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-16 17:49:31 -08:00
Eric Anholt	d971a4230d	loader: Factor out the common driver opening logic from each loader. I copied the code from egl_dri2.c, but the functionality was equivalent between all the loaders other than their particular environment variables. v2: Drop the logging function equivalent to loader_default_logger() (requested by Eric, Emil). Move the SCons workaround across. Drop the now-unused driGetDriverExtensions() declaration that was lost in a rebase. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1) Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)	2018-11-16 17:49:17 -08:00
Eric Anholt	cc19815738	loader: Stop using a local definition for an in-tree header I need other types from the header now, and "gl.h is big" is not a good reason to duplicate definitions. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-16 15:38:18 -08:00
Eric Anholt	2bc1f5c2e7	egl: Move loader_set_logger() up to egl_dri2.c. Everyone needs to call it, and platform_x11 forgot to. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-16 15:38:18 -08:00
Eric Anholt	c2b515379b	glx: Move DRI extensions pointer loading to driOpenDriver(). The only thing you do with a dri driver handle is get the extensions pointer, so just fold it in to simplify the callers. v2: Add the declaration of driGetDriverExtensions() that got lost in a rebase. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1) Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)	2018-11-16 15:38:18 -08:00
Eric Anholt	7076e9f116	glx: Remove an old DEFAULT_DRIVER_DIR default. You can tell by "Mesa/configs/default" how old this is. Your build system really has to provide the DEFAULT_DRIVER_DIR, or other loaders will break. v2: Move the bad (non-prefix-dependent) define to the SConscript to avoid breaking it. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1) Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)	2018-11-16 15:37:47 -08:00
Samuel Pitoiset	d031d5c999	radv: enable primitive binning by default After doing a bunch of benchmarks, primitive binning helps some games like The Talos Principle (+5%) or Serious Sam 2017 (+3%). For other titles, either it doesn't change anything or it hurts very few (less than 1%). This only affects GFX9. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-16 17:51:15 +01:00
Samuel Pitoiset	afd834b62e	radv: add a debug option for disabling primitive binning Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-16 17:51:12 +01:00
Robert Foss	d1a1c21e76	virgl: native fence fd support Following the support for fences on the virtio driver add support for native fence on virgl. This was somewhat based on the freedeno one. Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.com> Signed-off-by: Robert Foss <robert.foss@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-16 14:41:57 +01:00
Lionel Landwerlin	0db898cef2	intel/aub_viewer: Print blend states properly Identical fix to : commit `70de31d0c1` Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Fri Aug 24 16:05:08 2018 -0500 intel/batch_decoder: Print blend states properly Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Toni Lönnberg <toni.lonnberg@intel.com>	2018-11-16 11:40:38 +00:00
Lionel Landwerlin	ac324a6809	intel/aub_viewer: fix dynamic state printing Identical fix to : commit `cbd4bc1346` Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Fri Aug 24 16:04:03 2018 -0500 intel/batch_decoder: Fix dynamic state printing Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Toni Lönnberg <toni.lonnberg@intel.com>	2018-11-16 11:40:14 +00:00
Lionel Landwerlin	59c1059528	intel/aubinator: fix ring buffer pointer We can only start parsing commands from the head pointer. This was working fine up to now because we only dealt with a "made up" ring buffer (generated by aub_write) which always had its head at 0. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Toni Lönnberg <toni.lonnberg@intel.com>	2018-11-16 11:39:54 +00:00
Lionel Landwerlin	25443cbb72	intel/decoders: read ring buffer length Use this value to limit reading the ring buffer. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Toni Lönnberg <toni.lonnberg@intel.com>	2018-11-16 11:37:08 +00:00
Lionel Landwerlin	1c56d21156	egl/dri: fix error value with unknown drm format According to the EGL_EXT_image_dma_buf_import spec, creating an EGL image with a DRM format not supported should yield the BAD_MATCH error : " * If <target> is EGL_LINUX_DMA_BUF_EXT, and the EGL_LINUX_DRM_FOURCC_EXT attribute is set to a format not supported by the EGL, EGL_BAD_MATCH is generated. " Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `20de7f9f22` ("egl/dri2: support for creating images out of dma buffers") Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2018-11-16 10:28:06 +00:00
Daniel Stone	5e1fe240c4	gbm: Clarify acceptable formats for gbm_bo gbm_bo_create() was presumably meant to originally accept gbm_bo_format enums, but it's accepted GBM_FORMAT_* tokens since the dawn of time. This is good, since gbm_bo_format is rarely used and covers a lot less ground than GBM_FORMAT_*. Change the documentation to refer to both; this involves removing a 'see also' for gbm_bo_format, since we can't also use \sa to refer to a family of anonymous #defines. Signed-off-by: Daniel Stone <daniels@collabora.com> Reported-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-11-16 09:40:46 +00:00
Connor Abbott	ba94a00c7c	Revert "radv: disable VK_SUBGROUP_FEATURE_VOTE_BIT" This reverts commit `647c2b90e9`. There was one recently-introduced bug in ac for dvec3 loads, but the other test failures were actually bugs in the tests. See `9429e621c4` Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-16 10:32:03 +01:00
Eric Anholt	cc71bf529c	vc4: Don't return a vc4 BO handle on a renderonly screen. The handles exported need to be on the KMS device's fd, anything else is failure. Also, this code is assuming that the scanout resource has been created already, so assert it.	2018-11-15 21:11:44 -08:00
Eric Anholt	cc0bc76a38	vc4: Make sure we make ro scanout resources for create_with_modifiers. The DRI3 create_with_modifiers paths don't set tmpl.bind to SCANOUT or SHARED, with the theory that given that you've got modifiers, that's all you need. However, we were looking at the tmpl.bind for setting up the KMS handle in the renderonly case, so we'd end up trying to use vc4's handle on the hx8357d fd. Fixes: `84ed8b67c5` ("vc4: Set shareable BOs as T tiled if possible")	2018-11-15 21:11:44 -08:00
Danylo Piliaiev	f9fd0cf479	i965: Fix calculation of layers array length for isl_view Handle all cases in calculation of layers count for isl_view taking into account texture view and image unit. st_convert_image was taken as a reference. When u->Layered is true the whole level is taken with respect to image view. In other case only one layer is taken. v3: (Józef Kucia and Ilia Mirkin) - Rewrote patch by taking st_convert_image as a reference - Removed now unused get_image_num_layers function - Changed commit message v4: (Jason Ekstrand) - Added assert Fixes: `5a8c8903` Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107856 Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-15 19:59:54 -06:00
Jason Ekstrand	6339aba775	intel/compiler: Lower SSBO and shared loads/stores in NIR We have a bunch of code to do this in the back-end compiler but it's fairly specific to typed surface messages and the way we emit them. This breaks it out into NIR were it's easier to do things a bit more generally. It also means we can easily share the code between the vec4 and FS back-ends if we wish. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-11-15 19:59:49 -06:00
Jason Ekstrand	d34fd81e76	nir: Add alignment parameters to SSBO, UBO, and shared access This also changes spirv_to_nir and glsl_to_nir to set them. The one place that doesn't set them is shared memory access lowering in nir_lower_io. That will have to be updated before any consumers of it can effectively use these new alignments. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Acked-by: Karol Herbst <kherbst@redhat.com>	2018-11-15 19:59:42 -06:00
Jason Ekstrand	fb127f7729	nir/lower_io: Add shared to get_io_offset_src Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-11-15 19:59:31 -06:00
Jason Ekstrand	b5c48271d4	nir/glsl: Force 32-bit for UBO and SSBO Booleans Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-11-15 19:59:30 -06:00
Jason Ekstrand	44b7005581	nir/spirv: Force 32-bit for UBO and SSBO Booleans Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-11-15 19:59:29 -06:00
Jason Ekstrand	f16bd8a9fe	nir/builder: Add a nir_pack/unpack/bitcast helpers The new helpers can generate any pack/unpack operation including those for which we do not have specific opcodes and they express a bitcast in terms of these pack/unpack operations. In particular, the new helpers properly handle 8-bit types. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-11-15 19:59:28 -06:00
Jason Ekstrand	b77d68b78e	nir/builder: Add iadd_imm and imul_imm helpers The pattern of adding or multiplying an integer by an immediate is fairly common especially in deref chain handling. This adds a helper for it and uses it a few places. The advantage to the helper is that it automatically handles bit sizes for you. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2018-11-15 19:59:27 -06:00
Jason Ekstrand	1f29f4db1e	nir/builder: Assert that intN_t immediates fit This assert won't catch all mistakes with this helper but it will at least ensure that the top bits are all zero or all one which should help catch bugs. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-11-15 19:59:26 -06:00
Jason Ekstrand	4266932c0b	nir/lower_alu_to_scalar: Don't try to lower unpack_32_2x16 It messes up when trying to lower. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-11-15 19:59:09 -06:00
Ian Romanick	425c133ab9	glsl: Refactor type checking for redeclarations Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-11-15 14:27:32 -08:00
Ian Romanick	61e003ce7e	glsl: Omit redundant qualifier checks on redeclarations Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-11-15 14:27:29 -08:00
Ian Romanick	9b9f3218db	glsl: prevent qualifiers modification of predeclared variables Section 3.7 (Identifiers) of the GLSL spec says: However, as noted in the specification, there are some cases where previously declared variables can be redeclared to change or add some property, and predeclared "gl_" names are allowed to be redeclared in a shader only for these specific purposes. More generally, it is an error to redeclare a variable, including those starting "gl_". This patch should fix piglit tests: clip-distance-redeclare-without-inout.frag clip-distance-redeclare-without-inout.vert However, this causes a regression in clip-distance-out-values.shader_test. A fix for that test has been sent to the piglit list for review: https://patchwork.freedesktop.org/patch/255201/ As far as I understood following mailing thread: https://lists.freedesktop.org/archives/piglit/2013-October/007935.html looks like we have accepted to remove an ability to change qualifiers but have not done it yet. Unless I missed something) v2 (idr): Move 'earlier->data.mode != var->data.mode' test much earlier in the function. Add special handling for gl_LastFragData. Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com> Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-11-15 14:27:26 -08:00
Eric Anholt	538bca78e2	v3d: Don't try to set PF flags on a LDTMU operation We need an ALU op in order to set PF. Fixes a recent assertion failure in dEQP-GLES3.functional.ubo.single_basic_type.shared.bool_vertex	2018-11-15 11:12:54 -08:00
Eric Anholt	03928dd682	v3d: Fix double-swapping of R/B on V3D 4.1 Fixes: `4018eb04e8` ("v3d: Use the TLB R/B swapping instead of recompiles when available.")	2018-11-15 11:12:54 -08:00
Eric Engestrom	2b2f790e59	egl: fix bad rebase I screwed up a rebase over a refactor and didn't notice locally because the uncommitted refactor hid the issue. Fixes: `c973364967` "egl: add missing glvnd entrypoint for EGL_ANDROID_blob_cache" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-15 17:51:40 +00:00
Sagar Ghuge	6e60ff1ea9	intel/compiler: Disassemble GEN6_SFID_DATAPORT_SAMPLER_CACHE as dp_sampler Both BRW_SFID_SAMPLER and GEN6_SFID_DATAPORT_SAMPLER_CACHE are getting disassembled as "sampler", which is misleading for assembler tool. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>	2018-11-15 09:36:55 -08:00
Eric Engestrom	c973364967	egl: add missing glvnd entrypoint for EGL_ANDROID_blob_cache Fixes dEQP-EGL.functional.get_proc_address.extension.egl_android_blob_cache on builds with glvnd enabled. Fixes: `6f5b57093b` "egl: add support for EGL_ANDROID_blob_cache" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-15 16:27:27 +00:00
Eric Engestrom	2640854399	gbm: add new entrypoint to symbols check Fixes: `6328536ff2` "gbm: Introduce a helper function for printing GBM format names." Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-11-15 16:25:42 +00:00
Emil Velikov	adbdfc6666	bin/get-pick-list.sh: handle reverts prior to the branchpoint Currently we detect when a breaking commit: - has landed in stable, and - is referenced by a untagged fix in master Yet we did not consider the case of breaking commit: - prior to the branchpoint, and - is referenced by a untagged fix in master Addressing the latter is extremely slow, due to the size of the lookup. That said, we can trivially use the existing is_sha_nomination() helper to catch reverts. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-15 16:15:15 +00:00
Emil Velikov	c0012a0708	bin/get-pick-list.sh: use test instead of [ ] Latter is rather picky wrt surrounding white space. The explicit `test` doesn't have that problem, plus the statements read a bit easier. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-15 15:55:51 +00:00
Emil Velikov	77ff0bfb5f	bin/get-pick-list.sh: handle unofficial "broken by" tag We have a number of cases were devs will use a tag "broken by". While it's not something officially documented or recommended, checking for it is trivial enough. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-15 15:55:47 +00:00
Emil Velikov	209525aafb	bin/get-pick-list.sh: handle fixes tag with missing colon Every so often, we forget to add the colon after "fixes". Trivially tweak the script to catch it. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-15 15:55:44 +00:00
Emil Velikov	b7418d1f3f	bin/get-pick-list.sh: flesh out is_sha_nomination Refactor is_fixes_nomination into a is_sha_nomination helper. This way we can reuse it for more than the usual "Fixes:" tag. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-15 15:55:40 +00:00
Emil Velikov	533fead423	bin/get-pick-list.sh: tweak the commit sha matching pattern Currently we match on: - any arbitrary length of, - any a-z A-Z and 0-9 characters At the same time, a commit sha consists of lowercase hexadecimal numbers. Any sha shorter than 8 characters is ambiguous - in some cases even 11+ are required. So change the pattern to a-f0-9 and adjust the length to 8-40. As we're here we could use a single grep, instead of the grep/sed combo. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-15 15:55:36 +00:00
Emil Velikov	181203f3c5	bin/get-pick-list.sh: handle the fixes tag Having a separate script to handle the fixes tag, brings a number of issues, so let's fold it in get-pick-list.sh. v2: - pass the sha as argument to the function - Keep original sed pattern Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-15 15:55:31 +00:00
Emil Velikov	e6b3a3b201	bin/get-pick-list.sh: handle "typod" usecase. As the comment in get-typod-pick-list.sh says, there's little point in having a duplicate file. Add the new pattern + tag to get-pick-list.sh and nuke this file. v2: - pass the sha as argument to the function - grep -q instead of using a variable (Eric) Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-15 15:55:24 +00:00
Emil Velikov	fac10169bb	bin/get-pick-list.sh: prefix output with "[stable] " With later commits we'll fold all the different scripts into one. Add the explicit prefix, so that we know the origin of the nomination v2: - pass the sha as argument to the function - swap $tag = none for an else statment (Juan) - grep -q instead of using a variable (Eric) - print the tag and commit oneline separately (Eric) v3: - drop unused "tag=none" assignment (Juan) - typo nomination Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v2) Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-15 15:54:48 +00:00
Emil Velikov	559c32d241	bin/get-pick-list.sh: simplify git oneline printing Currently we force disable the pager via "\|cat" where --no-pager exists. Additionally we could use git show instead of git log -n1. Use those for a slightly more understandable code. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-15 15:51:24 +00:00
Emil Velikov	7d9556681d	docs: document the staging branch and add reference to it A while back we agreed that having a live/staging branch is beneficial. Sadly we forgot to document that, so here is my first attempt. Document the caveat that the branch history is not stable. CC: Andres Gomez <agomez@igalia.com> CC: Dylan Baker <dylan@pnwbakers.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>	2018-11-15 15:48:15 +00:00
Emil Velikov	4ae749acf1	docs/submittingpatches.html: correctly handle the <p> tag As pointed out by the w3c validator. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>	2018-11-15 15:48:13 +00:00
Emil Velikov	19a081473f	docs/releasing.html: polish cherry-picking/testing text Reword slightly and highlight the important parts of the text. CC: Andres Gomez <agomez@igalia.com> CC: Dylan Baker <dylan@pnwbakers.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>	2018-11-15 15:48:08 +00:00
Guido Günther	ab5653680e	etnaviv: Make sure rs alignment checks match etna_resource_alloc and etna_resource_from_handle currently use different checks. This leads to etna_resource_from_handle:492: target=2, format=PIPE_FORMAT_B8G8R8X8_UNORM, 1080x1920x1, array_size=1, last_level=0, nr_samples=0, usage=0, bind=8000a, flags=0 etna_resource_from_handle:541: BO stride 4320 is too small for RS engine width padding (4352, format PIPE_FORMAT_B8G8R8X8_UNORM) since etna_resource_from_handle wants to be aligned to a 16 byte boundary while the etna_resource_alloc does not. Adjust the two checks by using a common function. Broken by `baff59ebf0` Signed-off-by: Guido Günther <guido.gunther@puri.sm> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>	2018-11-15 16:38:35 +01:00
Juan A. Suarez Romero	52368ef83a	docs: update calendar, add news item and link release notes for 18.2.5 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2018-11-15 13:08:58 +00:00
Juan A. Suarez Romero	aa7a419b8b	docs: add sha256 checksums for 18.2.5 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit 79be754f9a74a43b5748dc0934241e7701cb9581)	2018-11-15 13:06:12 +00:00
Juan A. Suarez Romero	e53ec08931	docs: add release notes for 18.2.5 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `f34bddc325`)	2018-11-15 13:06:10 +00:00
Marek Olšák	9367514524	radeonsi: fix video APIs on Raven2 This was missed when I added the new enum. Cc: 18.3 <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Leo Liu <leo.liu@amd.com>	2018-11-14 17:08:34 -05:00
Andrii Simiklit	e13dd70581	i965: avoid 'unused variable' warnings 1. brw_pipe_control.c:311:34: warning: unused variable ‘devinfo’ 2. brw_program_binary.c:209:19: warning: unused variable ‘gen_size’ 3. brw_program_binary.c:216:19: warning: unused variable ‘nir_size’ v2: Changes for unreproducible issues were removed Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-14 14:41:58 +00:00
Andrii Simiklit	7aca650122	compiler: avoid 'unused variable' warnings 1. nir/nir_lower_vars_to_ssa.c:691:21: warning: unused variable ‘var’ nir_variable *var = path->path[0]->var; v2: Changes for some part of 'may be used uninitialized' warnings were removed, seems like it is a compiler issue. ( Eric Engestrom <eric.engestrom@intel.com> ) Possible like this one: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46684 This issue is flagged as duplicate but an original one is not closed yet. Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-14 13:35:38 +00:00
Andrii Simiklit	69ee49ac46	intel/tools: avoid 'unused variable' warnings 1. tools/aub_read.c:271:31: warning: unused variable ‘end’ const uint32_t p = data, end = data + data_len, next; 2. tools/aub_mem.c:292:13: warning: unused variable ‘res’ void res = mmap((uint8_t )bo.map + map_offset, 4096, PROT_READ, tools/aub_mem.c:357:13: warning: unused variable ‘res’ void res = mmap((uint8_t *)bo.map + (page - bo.addr), 4096, PROT_READ, v2: The i965_disasm.c changes was moved into a separate patch The 'end' variable declared separately with MAYBE_UNUSED to avoid effect of it to other variables. ( Eric Engestrom <eric.engestrom@intel.com> ) Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-14 13:35:28 +00:00
Thomas Hellstrom	25b48e3df9	st/xa: Bump minor Bump minor to signal support for new formats and higher precision solid pictures. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-11-14 13:12:09 +01:00
Thomas Hellstrom	c9085f6d3b	st/xa: Support Component Alpha with trivial blending Support Component Alpha for those composite operations that do not require per-channel alpha blending. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com>	2018-11-14 13:12:09 +01:00
Thomas Hellstrom	0477d17f51	st/xa: Minor renderer cleanups constify function arguments to clean up the code a bit. Reported-by: Brian Paul <brianp@vmware.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com>	2018-11-14 13:12:09 +01:00
Thomas Hellstrom	56aa23b146	st/xa: Fix transformations when we have both source and mask samplers In the case when we had both source and mask samplers, transformations were typically not applied correctly. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com>	2018-11-14 13:12:09 +01:00
Thomas Hellstrom	e1298def9f	st/xa: Support a couple of new formats Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-11-14 13:12:09 +01:00
Thomas Hellstrom	258d20152a	st/xa: Support higher color precision for solid pictures The only solid fill picture type we supported only had 8 bit color channels. Add a new solid picture type that supports float channels. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-11-14 13:11:51 +01:00
Thomas Hellstrom	d86ad38205	st/xa: Render update. Better support for solid pictures Remove unused and obsolete code for gradients and component-alpha Support solid source- and mask pictures using a variable number of samplers in the composite pipeline rather than the fixed number we used before. Tested using rendercheck for XA. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-11-14 13:07:00 +01:00
Gert Wollny	4bba280937	nir: Allow to skip integer ops in nir_lower_to_source_mods Some hardware supports source mods only for float operations. Make it possible to skip lowering to source mods in these cases. v2: use option flags instead of a boolean (Jason Ekstrand) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-14 08:59:26 +01:00
Karol Herbst	b4380cb070	nir/spirv: cast shift operand to u32 v2: fix for specialization constants as well Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Karol Herbst <kherbst@redhat.com>	2018-11-14 02:09:11 +01:00
Karol Herbst	099728b115	nir: replace nir_load_system_value calls with appropiate builder functions this helps reduce the overall code changes when a bit_size parameter is added to nir_load_system_value Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Karol Herbst <kherbst@redhat.com>	2018-11-14 02:09:11 +01:00
Karol Herbst	80db331c2d	nir: add const_index parameters to system value builder function this allows to replace some nir_load_system_value calls with the specific system value constructor Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Karol Herbst <kherbst@redhat.com>	2018-11-14 02:09:11 +01:00
Timothy Arceri	95b513c937	radv: make use of nir_move_out_const_to_consumer() vkpipeline-db results: Totals from affected shaders: SGPRS: 28400 -> 28576 (0.62 %) VGPRS: 27916 -> 27692 (-0.80 %) Spilled SGPRs: 140 -> 138 (-1.43 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 1534456 -> 1520560 (-0.91 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 3541 -> 3582 (1.16 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-11-14 09:41:50 +11:00
Lionel Landwerlin	ea53f76d7b	anv: move helper function internally It's only used in anv_image.c Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-13 18:56:31 +00:00
Lionel Landwerlin	8b00d3d6eb	anv: use image aspects rather than computed ones This shouldn't make any difference but I feel uneasy to use the expanded aspects that do not represent the image in its entirety. If we ever change the implementation of the anv_image_aspect_to_plane() helper, this is safer. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-13 18:56:27 +00:00
Lionel Landwerlin	465de47bad	anv: associate vulkan formats with aspects This will make it easier to associate an aspect with a plane number. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-13 18:56:24 +00:00
Lionel Landwerlin	fe3b7fe982	anv/lower_ycbcr: make sure to set 0s on all components To play around with debugging, we might want to disable one or the other component. Having 0s as default values makes this work. Otherwise we might have NULL components, leading to crashes. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-13 18:56:21 +00:00
Lionel Landwerlin	ee8d65c25a	anv/image: remove unused parameter Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-13 18:56:13 +00:00
Lionel Landwerlin	352e297091	anv: simplify internal address offset Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-13 18:56:10 +00:00
Eric Engestrom	4fa2fb3524	meson: fix wayland-less builds Those empty variables in the !wayland case are useless and running that meson.build with them breaks the build: [287/850] Generating wayland-drm-client-protocol.h with a custom command. FAILED: src/egl/wayland/wayland-drm/wayland-drm-client-protocol.h client-header ../src/egl/wayland/wayland-drm/wayland-drm.xml src/egl/wayland/wayland-drm/wayland-drm-client-protocol.h /bin/sh: client-header: command not found ninja: build stopped: subcommand failed. Fixes: `d1992255bb` "meson: Add build Intel "anv" vulkan driver" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-11-13 17:25:02 +00:00
Eric Engestrom	7df80de6e6	gbm: remove unnecessary meson include `inc_wayland_drm` is only used if wayland is built, and it's already added in that case a few lines below. Fixes: `a29869e872` "gbm: Don't traverse backwards for includes" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-11-13 17:25:02 +00:00
Eric Engestrom	3832db275e	meson: only run vulkan's meson.build when building vulkan Fixes: `d1992255bb` "meson: Add build Intel "anv" vulkan driver" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-11-13 17:25:02 +00:00
Eric Engestrom	4f1ae271e1	xmlpool: update translation po files These files are close to 4 years out of date; a lot's changed since. Let's just check in a recently-regenerated version. Changes generated by running `ninja xmlpool-{pot,update-po,gmo}`. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-13 17:25:02 +00:00
Eric Engestrom	1e918e5bef	REVIEWERS: add Vulkan reviewer group Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Emil Velikov <emil.l.velikov@gmail.com>	2018-11-13 17:25:02 +00:00
Eric Engestrom	59b3335496	REVIEWERS: add Emil as EGL reviewer Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Emil Velikov <emil.l.velikov@gmail.com>	2018-11-13 17:25:02 +00:00
Eric Engestrom	923aca84b2	REVIEWERS: add include path for EGL Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Emil Velikov <emil.l.velikov@gmail.com>	2018-11-13 17:25:02 +00:00
Toni Lönnberg	2af4e3345f	intel/genxml: Add engine definition to render engine instructions (gen11) Instructions meant for the render engine now have a definition specifying that so that can differentiate instructions meant for different engines due to shared opcodes. v2: Divided into individual patches for each gen v3: Added additional engine definitions. v4: Added missing engine definition to MI_TOPOLOGY_FILTER. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-11-13 15:10:12 +00:00
Toni Lönnberg	1921982d3e	intel/genxml: Add engine definition to render engine instructions (gen10) Instructions meant for the render engine now have a definition specifying that so that can differentiate instructions meant for different engines due to shared opcodes. v2: Divided into individual patches for each gen v3: Added additional engine definitions. v4: Added missing engine definition to MI_TOPOLOGY_FILTER. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-11-13 15:10:12 +00:00
Toni Lönnberg	030fe0f981	intel/genxml: Add engine definition to render engine instructions (gen9) Instructions meant for the render engine now have a definition specifying that so that can differentiate instructions meant for different engines due to shared opcodes. v2: Divided into individual patches for each gen v3: Added additional engine definitions. v4: Added more missing engine definitions. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-11-13 15:10:12 +00:00
Toni Lönnberg	12e34fc7ba	intel/genxml: Add engine definition to render engine instructions (gen8) Instructions meant for the render engine now have a definition specifying that so that can differentiate instructions meant for different engines due to shared opcodes. v2: Divided into individual patches for each gen v3: Added additional engine definitions. v4: Added missing engine tag for MI_TOPOLOGY_FILTER and MI_LOAD_URB_MEM. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-11-13 15:10:12 +00:00
Toni Lönnberg	a883fd2277	intel/genxml: Add engine definition to render engine instructions (gen75) Instructions meant for the render engine now have a definition specifying that so that can differentiate instructions meant for different engines due to shared opcodes. v2: Divided into individual patches for each gen v3: Added additional engine definitions. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-11-13 15:10:12 +00:00
Toni Lönnberg	27cf6252d3	intel/genxml: Add engine definition to render engine instructions (gen7) Instructions meant for the render engine now have a definition specifying that so that can differentiate instructions meant for different engines due to shared opcodes. v2: Divided into individual patches for each gen v3: Added additional engine definitions. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-11-13 15:10:12 +00:00
Toni Lönnberg	ecf62a967e	intel/genxml: Add engine definition to render engine instructions (gen6) Instructions meant for the render engine now have a definition specifying that so that can differentiate instructions meant for different engines due to shared opcodes. v2: Divided into individual patches for each gen v3: Added additional engine definitions v4: Added missing engine to MEDIA_GATEWAY_STATE Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-11-13 15:10:12 +00:00
Toni Lönnberg	571d6447d8	intel/genxml: Add engine definition to render engine instructions (gen5) Instructions meant for the render engine now have a definition specifying that so that can differentiate instructions meant for different engines due to shared opcodes. v2: Divided into individual patches for each gen v3: Added additional engine definitions. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-11-13 15:10:12 +00:00
Toni Lönnberg	6463ceca69	intel/genxml: Add engine definition to render engine instructions (gen45) Instructions meant for the render engine now have a definition specifying that so that can differentiate instructions meant for different engines due to shared opcodes. v2: Divided into individual patches for each gen v3: Added addition engine definitions. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-11-13 15:10:12 +00:00
Toni Lönnberg	a4ca710c96	intel/genxml: Add engine definition to render engine instructions (gen4) Instructions meant for the render engine now have a definition specifying that so that can differentiate instructions meant for different engines due to shared opcodes. v2: Divided into individual patches for each gen v3: Added additional engine definitions. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-11-13 15:10:12 +00:00
Toni Lönnberg	102dadec81	intel/decoder: tools: Use engine for decoding batch instructions The engine to which the batch was sent to is now set to the decoder context when decoding the batch. This is needed so that we can distinguish between instructions as the render and video pipe share some of the instruction opcodes. v2: The engine is now in the decoder context and the batch decoder uses a local function for finding the instruction for an engine. v3: Spec uses engine_mask now instead of engine, replaced engine class enums with the definitions from UAPI. v4: Fix up aubinator_viewer (Lionel) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-11-13 15:10:12 +00:00
Toni Lönnberg	a6aab7e436	intel/decoder: tools: gen_engine to drm_i915_gem_engine_class Removed the gen_engine enum and changed the involved functions to use the drm_i915_gem_engine_class enum from UAPI instead. v3: Wrong engine was being used for blocks in video ring v4: Fixed aubinator_viewer.cpp Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-11-13 15:10:12 +00:00
Toni Lönnberg	b00bccd012	intel/decoder: Engine parameter for instructions Preliminary work for adding handling of different pipes to gen_decoder. Each instruction needs to have a definition describing which engine it is meant for. If left undefined, by default, the instruction is defined for all engines. v2: Changed to use the engine class definitions from UAPI v3: Changed I915_ENGINE_CLASS_TO_MASK to use BITSET_BIT, change engine to engine_mask, added check for incorrect engine and added the possibility to define an instruction to multiple engines using the "\|" as a delimiter in the engine attribute. v4: Fixed the memory leak. v5: Removed an unnecessary ralloc_free(). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-11-13 15:10:12 +00:00
Gert Wollny	8d4bb6e5cd	virgl: Add command and flags to initiate debugging on the host (v2) On the host VREND_DEBUG=guestallow must be set to let the guest override the debug flags. v2: Send flag string instead of flags, this avoids the need to keep the flags in sync. v3: Only request host logging if the host actually understands the command Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2018-11-13 14:42:22 +01:00
Gert Wollny	caa964b422	mesa: Reference count shaders that are used by transform feedback objects Transform feedback objects may hold a pointer to a shader program, and at least in Gallium, this must be a valid pointer until ctx->Driver.EndTransformFeedback in glEndTransformFeedback has been called - which is conform with the spec that any program that is part of a current rendering state should only be flagged for deletion by glDeleteProgram. This was not handled properly for the transform feedback objects so that a call sequence glUseProgram(x) glBeginTransformFreedback(...) glPauseTransformFeedback(...) glDeleteProgram(x) glEndTransformFeedback(...) would result in a use after free bug. With this patch the transform feedback object also updates the reference count to the used program thereby keeping the program valid as long as the transform feedback objects links to it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108713 Fixes: `654587696b` mesa: add end_transform_feedback() helper Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-13 10:57:25 +01:00
Samuel Pitoiset	90d68858ed	radv: set optimal OVERWRITE_COMBINER_WATERMARK on GFX9 Ported from RadeonSI. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-13 10:24:36 +01:00
Samuel Pitoiset	f70c5d31cd	radv: set PA.SC_CONSERVATIVE_RASTERIZATION.NULL_SQUAD_AA_MASK_ENABLE Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-13 10:24:33 +01:00
Samuel Pitoiset	b5f213bb1d	radv: binding streamout buffers doesn't change context regs Cc: 18.3 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-13 10:24:31 +01:00
Plamena Manolova	c5f3013cba	nir: Don't lower the local work group size if it's variable. If the local work group size is variable it won't be available at compile time so we can't lower it in nir_lower_system_values(). Signed-off-by: Plamena Manolova <plamena.n.manolova@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2018-11-13 10:57:04 +02:00
Matt Turner	efb1ccadca	util/ralloc: Make sizeof(linear_header) a multiple of 8 Prior to this patch sizeof(linear_header) was 20 bytes in a non-debug build on 32-bit platforms. We do some pointer arithmetic to calculate the next available location with ptr = (linear_size_chunk )((char )&latest[1] + latest->offset); in linear_alloc_child(). The &latest[1] adds 20 bytes, so an allocation would only be 4-byte aligned. On 32-bit SPARC a 'sttw' instruction (which stores a consecutive pair of 4-byte registers to memory) requires an 8-byte aligned address. Such an instruction is used to store to an 8-byte integer type, like intmax_t which is used in glcpp's expression_value_t struct. As a result of the 4-byte alignment returned by linear_alloc_child() we would generate a SIGBUS (unaligned exception) on SPARC. According to the GNU libc manual malloc() always returns memory that has at least an alignment of 8-bytes [1]. I think our allocator should do the same. So, simple fix with two parts: (1) Increase SUBALLOC_ALIGNMENT to 8 unconditionally. (2) Mark linear_header with an aligned attribute, which will cause its sizeof to be rounded up to that alignment. (We already do this for ralloc_header) With this done, all Mesa's unit tests now pass on SPARC. [1] https://www.gnu.org/software/libc/manual/html_node/Aligned-Memory-Blocks.html Fixes: `47e1758692` ("glcpp: use the linear allocator for most objects") Bug: https://bugs.gentoo.org/636326 Reviewed-by: Eric Anholt <eric@anholt.net>	2018-11-12 20:54:49 -08:00
Matt Turner	7e3748c268	util/ralloc: Switch from DEBUG to NDEBUG The debug code is all asserts, so protect it with the same thing that controls assert. Reviewed-by: Eric Anholt <eric@anholt.net>	2018-11-12 20:54:49 -08:00
Timothy Arceri	34dffcf913	nir: add support for removing redundant stores to copy prop var For example the following type of thing is seen in TCS from a number of Vulkan and DXVK games: vec1 32 ssa_557 = deref_var &oPatch (shader_out float) vec1 32 ssa_558 = intrinsic load_deref (ssa_557) () vec1 32 ssa_559 = deref_var &oPatch@42 (shader_out float) vec1 32 ssa_560 = intrinsic load_deref (ssa_559) () vec1 32 ssa_561 = deref_var &oPatch@43 (shader_out float) vec1 32 ssa_562 = intrinsic load_deref (ssa_561) () intrinsic store_deref (ssa_557, ssa_558) (1) /* wrmask=x / intrinsic store_deref (ssa_559, ssa_560) (1) / wrmask=x / intrinsic store_deref (ssa_561, ssa_562) (1) / wrmask=x */ No shader-db changes on i965 (SKL). vkpipeline-db results RADV (VEGA): Totals from affected shaders: SGPRS: 7832 -> 7728 (-1.33 %) VGPRS: 6476 -> 6740 (4.08 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 469572 -> 456596 (-2.76 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 989 -> 960 (-2.93 %) Wait states: 0 -> 0 (0.00 %) The Max Waves and VGPRS changes here are misleading. What is happening is a bunch of TCS outputs are being optimised away as they are now recognised as unused. This results in more varyings being compacted via nir_compact_varyings() which can result in more register pressure when they are not packed in an optimal way. This is an existing problem independent of this patch. I've run some benchmarks and haven't noticed any performance regressions in affected games. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-13 15:19:36 +11:00
Timothy Arceri	3561108de0	anv/i965: make use of nir_link_constant_varyings() shader-db results for SLK: total instructions in shared programs: 13106498 -> 13091573 (-0.11%) instructions in affected programs: 1186244 -> 1171319 (-1.26%) helped: 6186 HURT: 0 total cycles in shared programs: 332062633 -> 331961653 (-0.03%) cycles in affected programs: 8537165 -> 8436185 (-1.18%) helped: 5371 HURT: 862 LOST: 6 GAINED: 14 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-13 14:06:32 +11:00
Eric Anholt	621b0fa892	egl: Improve the debugging of gbm format matching in DRI configs. Previously the debug would be: libEGL debug: No DRI config supports native format 0x20203852 libEGL debug: No DRI config supports native format 0x38385247 but libEGL debug: No DRI config supports native format R8 libEGL debug: No DRI config supports native format GR88 is a lot easier to understand. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2018-11-12 15:20:23 -08:00
Eric Anholt	6328536ff2	gbm: Introduce a helper function for printing GBM format names. This requires that the caller make a little (stack) allocation to store the string. v2: Use gbm_format_canonicalize (suggested by Daniel) Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2018-11-12 15:20:23 -08:00
Eric Anholt	ee7f848c00	gbm: Move gbm_format_canonicalize() to the core. I want it for the format name debugging code. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Daniel Stone <daniels@collabora.com>	2018-11-12 15:20:23 -08:00
Dylan Baker	4eab98b66e	meson: fix libatomic tests There are two problems: 1) the extra underscore in MISSING_64BIT_ATOMICS 2) we should link with libatomic if the previous test decided we needed it Fixes: `d1992255bb` ("meson: Add build Intel "anv" vulkan driver") Reviewed-and-Tested-by: Matt Turner <mattst88@gmail.com>	2018-11-12 13:29:00 -08:00
Marek Olšák	32a334777c	mesa: mark GL_SR8_EXT non-renderable on GLES Fixes: dEQP-GLES3.functional.fbo.completeness.renderable.texture.color0.sr8_ext Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-11-12 16:19:43 -05:00
Marek Olšák	e0c7114eb3	st/mesa: disable L3 thread pinning This implementation can have massive drawbacks. Cc: 18.3 <mesa-stable@lists.freedesktop.org> Reviewed-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>	2018-11-12 16:18:15 -05:00
Christian Gmeiner	c6aaafa3a1	nir: add lowering for ffloor Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-12 21:57:25 +01:00
Alyssa Rosenzweig	41c8f99137	util: Fix warning in u_cpu_detect on non-x86 regs is only set and used on x86; on other platforms (like ARM), this code causes a trivial warning, solved by moving the regs declaration to the architecture-dependent usage. Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>	2018-11-12 10:28:04 -08:00
Dylan Baker	9c2a95b298	meson: Don't set -Wall meson does this for you with its warn levels, so we don't need to set it ourselves. Fixes: `d1992255bb` ("meson: Add build Intel "anv" vulkan driver") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-12 08:55:55 -08:00
Rob Clark	4a0c2cfdd6	freedreno/drm: fix unused 'entry' warnings Looks like importing libdrm_freedreno into mesa crossed paths with `e27902a261`. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-12 10:45:48 -05:00
Lionel Landwerlin	89785e2d56	i965: add support for sampling from AYUV Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-12 13:22:54 +00:00
Lionel Landwerlin	252ca7b43f	dri: add AYUV format v2: Add a AYUV entry android in the android backend (Tapani) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-12 13:22:54 +00:00
Lionel Landwerlin	8a15f06d19	nir/lower_tex: Add AYUV lowering support Byte ordering is : 0: V 1: U 2: Y 3: A v2: Split refactoring of alpha channel (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v1) Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v2)	2018-11-12 13:22:54 +00:00
Lionel Landwerlin	0a30c33e83	nir/lower_tex: add alpha channel parameter for yuv lowering We're about to introduce AYUV support which provides its own alpha channel. So give alpha as a parameter and set it to 1 on exising formats. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-12 13:22:54 +00:00
Samuel Pitoiset	97fb1a02fd	radv: make use of num_good_cu_per_sh in si_emit_graphics() too Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-12 09:35:46 +01:00
Samuel Pitoiset	d9d14346c2	radv: clean up setting partial_es_wave for distributed tess on VI Only needed when the pipeline actually uses tessellation. I don't think that changes anything, except improving readability. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-12 09:35:44 +01:00
Samuel Pitoiset	cc4569b733	radv: cleanup and document a Hawaii bug with offchip buffers Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-12 09:35:42 +01:00
Hanno Böck	8dc2085baf	glsl/test: Fix use after free in test_optpass. The variable state is free'd and afterwards state->error is used as the return value, resulting in a use after free bug detected by memory safety tools like address sanitizer. Signed-off-by: Hanno Böck <hanno@hboeck.de> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108636 Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-12 07:42:58 +02:00
Timothy Arceri	a068958692	nir: don't pack varyings ints with floats unless flat Fixes: `1c9c42d16b` ("nir: add varying component packing helpers") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-12 15:38:56 +11:00
Timothy Arceri	9dd737bb02	nir: add glsl_type_is_integer() helper Fixes: `1c9c42d16b` ("nir: add varying component packing helpers") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-12 15:38:56 +11:00
Francisco Jerez	552642066f	intel/fs: Prevent emission of IR instructions not aligned to their own execution size. This can occur during payload setup of SIMD-split send message instructions, which can lead to the emission of header setup instructions with a non-zero channel group and fixed SIMD width. Such instructions could end up using undefined channel enable signals except they don't care since they're always marked force_writemask_all. Not known to affect correctness of any workload at this point, but it would be trivial to back-port to stable if something comes up. Reported-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Tested-by: Sagar Ghuge <sagar.ghuge@intel.com>	2018-11-09 19:39:22 -08:00
Timothy Arceri	590fcb50e7	st/mesa: make use of nir_link_constant_varyings() Shader-db results radeonsi (VEGA): Totals from affected shaders: SGPRS: 161464 -> 161368 (-0.06 %) VGPRS: 86904 -> 86292 (-0.70 %) Spilled SGPRs: 296 -> 314 (6.08 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 3618596 -> 3573852 (-1.24 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 26189 -> 26276 (0.33 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Eric Anholt <eric@anholt.net>	2018-11-10 11:41:00 +11:00
Timothy Arceri	d40dd05553	nir: add new linking opt nir_link_constant_varyings() This pass moves constant outputs to the consuming shader stage where possible. Reviewed-by: Eric Anholt <eric@anholt.net>	2018-11-10 11:41:00 +11:00
Andre Heider	414470854d	st/nine: clean up thead shutdown sequence a bit Just break out of the loop instead, it does the same thing. Signed-off-by: Andre Heider <a.heider@gmail.com> Reviewed-by: Axel Davy <davyaxel0@gmail.com>	2018-11-09 22:37:27 +01:00
Andre Heider	123bf9cbe7	st/nine: plug thread related leaks Signed-off-by: Andre Heider <a.heider@gmail.com> Reviewed-by: Axel Davy <davyaxel0@gmail.com>	2018-11-09 22:37:27 +01:00
Andre Heider	10598c9667	st/nine: fix stack corruption due to ABI mismatch This fixes various crashes and hangs when using nine's 'thread_submit' feature. On 64bit, the thread function's data argument would just be NULL. On 32bit, the data argument would be garbage depending on the compiler flags (in my case -march>=core2). Fixes: `f3fa7e3068` ("st/nine: Use WINE thread for threadpool") Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Andre Heider <a.heider@gmail.com> Reviewed-by: Axel Davy <davyaxel0@gmail.com>	2018-11-09 22:37:26 +01:00
Marek Olšák	d2b2364313	radeonsi: stop command submission with PIPE_CONTEXT_LOSE_CONTEXT_ON_RESET only Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-11-09 14:55:04 -05:00
Marek Olšák	4bec5025ac	gallium: add PIPE_CONTEXT_LOSE_CONTEXT_ON_RESET Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-11-09 14:55:04 -05:00
Marek Olšák	9dc776f3f2	radeonsi: don't set the CB clear color registers for 0/1 clear colors on Raven2 and add has_dcc_constant_encode.	2018-11-09 14:55:04 -05:00
Marek Olšák	832ab883e2	radeonsi: use better DCC clear codes Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-11-09 14:55:04 -05:00
Marek Olšák	d059eae269	ac/surface: remove the overallocation workaround for Vega12 not needed anymore (probably since the tile_swizzle fix) Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-11-09 14:55:04 -05:00
Lionel Landwerlin	959e2a5aeb	intel/aub_read: remove useless breaks Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-09 18:17:30 +00:00
Erik Faye-Lund	b55af392d9	Revert "mesa: expose NV_conditional_render on GLES" This reverts commit `5213be9fab`.	2018-11-09 17:39:25 +01:00
Erik Faye-Lund	cf8b271cbe	Revert "mesa/main: fixup make check after NV_conditional_render for gles" This reverts commit `cccd7a253f`.	2018-11-09 17:39:22 +01:00
Erik Faye-Lund	cccd7a253f	mesa/main: fixup make check after NV_conditional_render for gles It seems I missed some details when exposing NV_conditional_render on GLES; this fixes up "make check". Fixes: `5213be9fab` ("mesa: expose NV_conditional_render on GLES") Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-and-Tested-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-09 16:47:34 +01:00
Nicolai Hähnle	8c97abc066	radv: include LLVM IR in the VK_AMD_shader_info "disassembly" Helpful for debugging compiler backend problems: this allows us to easily retrieve the LLVM IR from RenderDoc. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-11-09 14:54:37 +01:00
Erik Faye-Lund	5213be9fab	mesa: expose NV_conditional_render on GLES The extension spec has been updated to include GLES 2 support, so let's enable it there. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-11-09 13:03:00 +01:00
Iago Toral Quiroga	35baee5dce	nir/constant_folding: fix incorrect bit-size check nir_alu_type_get_type_size takes a type as parameter and we were passing a bit-size instead, which did what we wanted by accident, since a bit-size of zero matches nir_type_invalid, which has a size of 0 too. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-11-09 08:22:15 +01:00
Iago Toral Quiroga	6c418dfa42	intel/compiler: fix node interference of simd16 instructions SIMD16 instructions need to have additional interferences to prevent source / destination hazards when the source and destination registers are off by one register. While we already have code to handle this, it was only running for SIMD16 dispatches, however, we can have SIDM16 instructions in a SIMD8 dispatch. An example of this are pull constant loads since commit `b56fa830c6`, but there are more cases. This fixes a number of CTS test failures found in work-in-progress tests that were hitting this situation for 16-wide pull constants in a SIMD8 program. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-11-09 08:22:08 +01:00
Roland Scheidegger	a3c898dc97	gallivm: fix improper clamping of vertex index when fetching gs inputs Because we only have one file_max for the (2d) gs input file, the value actually represents the max of attrib and vertex index (although I'm not entirely sure if we really want the max, since the max valid value of the vertex dimension can be easily deduced from the input primitive). Thus in cases where the number of inputs is higher than the number of vertices per prim, we did not properly clamp the vertex index, which would result in out-of-bound fetches, potentially causing segfaults (the segfaults seemed actually difficult to trigger, but valgrind certainly wasn't happy). This might have happened even if the shader did not actually try to fetch bogus vertices, if the fetching happened in non-active conditional clauses. To fix simply use the correct max vertex index value (derived from the input prim type) instead when clamping for this case. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2018-11-09 00:53:03 +01:00
Aditya Swarup	a5c39ed974	i965: Lift restriction in external textures for EGLImage support Fixes Skqp's unitTest_EGLImageTest test. For Intel platforms, we support external textures only for EGLImages created with EGL_EXT_image_dma_buf_import. This restriction seems to be Intel specific and not present for other platforms. While running SKQP test - unitTest_EGLImageTest, GL_INVALID is sent to the test because of this restriction. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105301 Signed-off-by: Aditya Swarup <aditya.swarup@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2018-11-08 12:33:06 -08:00
Ian Romanick	c5a4c26450	glsl: Add pragma to disable all warnings Use #pragma warning(off) and #pragma warning(on) to disable or enable all warnings. This is a big hammer. If we ever need a smaller hammer, we can enhance this functionality. There is one lame thing about this. Because we parse everything, create an AST, then convert the AST to GLSL IR, we have to treat the #pragma like a statment. This means that you can't do something like ' void ' #pragma warning(off) ' __foo ' #pragma warning(on) ' (float param0); Fixing that would, as far as I can tell, require a huge amount of work. I did try just handling the #pragma during parsing (like we do for state for the whole shader. v2: Fix the #pragma lines in the commit message that git-commit ate. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-11-08 11:00:00 -08:00
Ian Romanick	011abfc963	glsl: Add warning tests for identifiers with __ Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-11-08 10:59:53 -08:00
Jason Ekstrand	d28bc35ece	intel/fs: Add an assert to optimize_frontfacing_ternary Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-11-08 10:09:25 -06:00
Jason Ekstrand	bcc6aab065	anv: Use nir_src_is_const and friends in lowering code Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-11-08 10:09:25 -06:00
Jason Ekstrand	52145070c0	intel/analyze_ubo_ranges: Use nir_src_is_const and friends Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-11-08 10:09:25 -06:00
Jason Ekstrand	1413512b4c	intel/vec4: Use the new nir_src_is_const and friends As of this commit, all uses of const sources either go through a nir_src_as_<type> helper which handles bit sizes correctly or else are accompanied by a nir_src_bit_size() == 32 assertion to assert that we have the size we think we have. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-11-08 10:09:25 -06:00
Jason Ekstrand	61e15348c4	nir: Add a read_mask helper for ALU instructions Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-11-08 10:09:22 -06:00
Jason Ekstrand	344cfe6980	intel/fs: Use the new nir_src_is_const and friends As of this commit, all uses of const sources either go through a nir_src_as_<type> helper which handles bit sizes correctly or else are accompanied by a nir_src_bit_size() == 32 assertion to assert that we have the size we think we have. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-11-08 10:09:20 -06:00
Jason Ekstrand	6b2918709a	intel/fs,vec4: Clean up a repeated pattern with SSBOs Everywhere we handle SSBO intrinsics, we have exactly the same pattern for computing the index so we may as well make a helper for it. We also add a get_nir_src_imm to vec4 and use it for SSBO offsets. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-11-08 10:09:06 -06:00
Samuel Pitoiset	c472ad82e4	radv: fix GPU hangs when loading depth/stencil clear values on SI/CIK HTILE is supported on these chips, not sure how I missed that. This restores using PFP_SYNC_ME when LOAD_CONTEXT_REG is not used. Fixes: `f425d9ee74` ("radv: use LOAD_CONTEXT_REG when loading fast clear values") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-11-08 11:20:03 +01:00
Samuel Pitoiset	f425d9ee74	radv: use LOAD_CONTEXT_REG when loading fast clear values This avoids syncing the Micro Engine. This is only supported for VI+ currently. There is probably a way for using LOAD_CONTEXT_REG on previous chips but that could be done later. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-11-08 10:41:45 +01:00
Samuel Pitoiset	0dcd99c687	radv: only expose VK_SUBGROUP_FEATURE_ARITHMETIC_BIT for VI+ Inclusive and exclusives scan are missing because older chips don't have llvm.amdgcn.update.dpp. This fixes crashes with dEQP-VK.subgroups.arithmetic.*. CC: mesa-stable@lists.freedesktop.org Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-11-08 10:41:41 +01:00
Adam Jackson	16f1023037	glx: Demand success from CreateContext requests (v2) GLXCreate{,New}Context, like most X resource creation requests, does not emit a reply and therefore is emitted into the X stream asynchronously. However, unlike most resource creation requests, the GLXContext we return is a handle to library state instead of an XID. So if context creation fails for any reason - say, the server doesn't support indirect contexts - then we will fail in strange places for strange reasons. We could make every GLX entrypoint robust against half-created contexts, or we could just verify that context creation worked. Reuse the __glXIsDirect code to do this, as a cheap way of verifying that the XID is real. glXCreateContextAttribsARB solves this by using the _checked version of the xcb command, so effectively this change makes the classic context creation paths as robust as CreateContextAttribs. v2: Better use of Bool, check that error != NULL first (Olivier Fourdan) Signed-off-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2018-11-07 12:38:05 -05:00
Karol Herbst	f7fae7f64e	gm107/ir: fix compile time warning in getTEXSMask In function 'uint8_t nv50_ir::getTEXSMask(uint8_t)': warning: control reaches end of non-void function [-Wreturn-type] Reported-by: Moiman@freenode Fixes: `f821e80213` "gm107/ir: use scalar tex instructions where possible" Signed-off-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-11-07 17:48:58 +01:00
Michel Dänzer	32b0eb51a3	winsys/amdgpu: Stop using amdgpu_bo_handle_type_kms_noimport It only behaves any different from amdgpu_bo_handle_type_kms with libdrm 2.4.93, and it breaks if an older version is picked up. Bugzilla: https://bugs.freedesktop.org/108096 Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-11-07 17:37:47 +01:00
Lionel Landwerlin	792dde66f2	intel/dump_gpu: add platform option Got tired of remembering the PCI ids. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-07 11:27:41 +00:00
Lionel Landwerlin	e262cc0353	intel/dump_gpu: move output option together Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-07 11:27:38 +00:00
Samuel Pitoiset	0a0aa2ba6c	radv: disable conditional rendering for vkCmdCopyQueryPoolResults() VK_EXT_conditional_rendering says that copy commands should not be affected by conditional rendering. Cc: 18.2 18.3 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-11-07 11:31:36 +01:00
Samuel Pitoiset	1e7c3379e1	radv: allocate enough space in CS when copying query results with compute Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-11-07 11:31:34 +01:00
Timothy Arceri	9aa3c1915e	ac/nir_to_llvm: fix b2f for f64 Fixes: `d7e0d47b9d` ("nir: Add a bunch of b2[if] optimizations") Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-11-07 16:35:07 +11:00
Karol Herbst	f821e80213	gm107/ir: use scalar tex instructions where possible TEXS, TLD4 and TLD4S are variants of tex instructions which are more scalar, which gives RA more freedom and is less likely to insert silly MOVs to satisfy quad registers. shader-db changes: total instructions in shared programs : 7687265 -> 7614782 (-0.94%) total gprs used in shared programs : 803620 -> 798045 (-0.69%) total shared used in shared programs : 639636 -> 639636 (0.00%) total local used in shared programs : 24648 -> 24648 (0.00%) total bytes used in shared programs : 82103400 -> 81330696 (-0.94%) local shared gpr inst bytes helped 0 0 3648 10647 10647 hurt 0 0 464 205 205 Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-11-06 19:57:05 +01:00
Karol Herbst	edd6c41751	nv50/ir: add scalar field to TexInstructions Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-11-06 19:57:05 +01:00
Karol Herbst	8d825f78fc	nv50/ra: add condenseDef overloads for partial condenses Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-11-06 19:57:05 +01:00
Karol Herbst	a4550de434	nv50/ir: print color masks of tex instructions v2: print the mask for TXG as well make the mask to be printed more mask like Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-11-06 19:57:05 +01:00
Jason Ekstrand	610061838a	vulkan: Update the XML and headers to 1.1.91 The biggest change here is the rename of VK_NVX_ray_tracing to VK_NV_ray_tracing and the total removal of VK_KHR_mir_surface. Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-11-06 12:21:19 -06:00
Gert Wollny	c171d76b94	r600: Add support for EXT_texture_sRGB_R8 Enables on R600 and makes pass: dEQP-GLES31.functional.srgb_texture_decode.skip_decode.sr8.* dEQP-GLES31.functional.texture.filtering.cube_array.formats.sr8* v2: remove chunk for dri/radeon (Emil) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-11-06 18:49:02 +01:00
Lionel Landwerlin	421fa01d64	anv/android: mark gralloc allocated BOs as external Allocating through Gralloc implies buffers are going to be used outside the driver. We have special MOCS settings for external BOs and we probably want to use them here too. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `a1220e7311` ("anv/android: Set the BO flags in bo_cache_import (v2)") Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-06 15:28:07 +00:00
Lionel Landwerlin	b43f955037	anv: stub internal android code This reduces the amount of #ifdef ANDROID we'll have to have inside the driver. Potentially offering better coverage of the android extensions. v2: Move anv_android.h include before anv_entrypoints.h (Tapani) Fix autotools android build (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-06 15:28:07 +00:00
Kristian H. Kristensen	f6131d4ec7	freedreno/a6xx: Clear z32 and separate stencil with blitter Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2018-11-06 08:56:38 -05:00
Rob Clark	3bbad81c80	freedreno/a6xx: fix VSC bug with larger # of tiles At higher resolutions with the addition of MSAA, the number of tiles can increase to the point where we use more than one VSC pipe per tile. Which would cause us to calculate an out-of-bounds offset for VSC_SIZE_ADDRESS. So don't try to be clever, just always put it at a fixed offset assuming the max 32 VSC pipes in use. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-06 08:56:21 -05:00
Rob Clark	2d9c3a5db2	freedreno: update generated headers Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-11-06 08:43:27 -05:00
Olivier Fourdan	55af17ffed	wayland/egl: Resize EGL surface on update buffer for swrast After commit `a9fb331ea` ("wayland/egl: update surface size on window resize"), the surface size is updated as soon as the resize is done, and `update_buffers()` would resize only if the surface size differs from the attached size. However, in the case of swrast, there is no resize callback and the attached size is updated in `dri2_wl_swrast_commit_backbuffer()` prior to the `swrast_update_buffers()` so the attached size is always up to date when it reaches `swrast_update_buffers()` and the surface is never resized. This can be observed with "totem" using the GDK backend on Wayland (the default) when running on software rendering: $ LIBGL_ALWAYS_SOFTWARE=true CLUTTER_BACKEND=gdk totem Resizing the window would leave the EGL surface size unchanged. To avoid the issue, partially revert the part of commit `a9fb331ea` for `swrast_update_buffers()` and resize on the win size and not the attached size. Fixes: `a9fb331ea` - wayland/egl: update surface size on window resize Signed-off-by: Olivier Fourdan <ofourdan@redhat.com> CC: Daniel Stone <daniel@fooishbar.org> CC: Juan A. Suarez Romero <jasuarez@igalia.com> CC: mesa-stable@lists.freedesktop.org Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>	2018-11-06 13:59:38 +01:00
Lionel Landwerlin	b47a69ed4c	intel/decoders: fix instruction base address parsing Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `00103db04a` ("intel: Fix decoding for partial STATE_BASE_ADDRESS updates.") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-11-05 13:22:35 -08:00
Emil Velikov	b3ade65387	egl/glvnd: correctly report errors when vendor cannot be found If the user provides an invalid display or device the ToVendor lookup will fail. In this case, the local [Mesa vendor] error code will be set. Thus on sequential eglGetError(), the error will be EGL_SUCCESS. To be more specific, GLVND remembers the last vendor and calls back into it's eglGetError, although there's no guarantee to ever have had one. v2: - Add _eglError call, so the debug callback is executed (Kyle) - Drop XXX comment. Piglit: tests/egl/spec/egl_ext_device_query Fixes: `ce562f9e3f` ("EGL: Implement the libglvnd interface for EGL (v3)") Cc: Eric Engestrom <eric@engestrom.ch> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Kyle Brenneman <kbrenneman@nvidia.com>	2018-11-05 20:53:05 +00:00
Emil Velikov	2a8fefdeb0	egl: add EGL_EXT_device_base entrypoints eglQueryDevicesEXT (unlike the other three functions) does not depend on the display. It is implemented in GLVND, which calls into each driver collecting the list of devices and presenting it to the user. For the other entrypoints, GLVND acts as pass through stub calling into the vendor library. The vendor implementation calls back into GLVND to get the vendor dispatch. Then the driver proceeds to call itself via the said dispatch. This design makes is possible to keep using "old" GLVND with newer vendor drivers. Since effectively all the extension code is within the latter itself. Without said entrypoints, any user will outright crash - as reported in the bug report. Note: there's a follow-up fix needed to our GLVND code, to make piglit happy. v2: add some beefy documentation in the commit message. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108635 Fixes: `7552fcb7b9` ("egl: add base EGL_EXT_device_base implementation") Reported-by: kyle.devir@mykolab.com Cc: kyle.devir@mykolab.com Acked-by: Eric Engestrom <eric@engestrom.ch> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Tested-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-05 20:53:05 +00:00
Emil Velikov	7e169cf2a0	docs: mention EXT_shader_implicit_conversions Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-05 20:53:05 +00:00
Marek Olšák	04298a2f24	st/va: fix incorrect use of resource_destroy Fixes: `4373dd3215` ("st/va: Support YUV formats in vaCreateSurfaces") Cc: Drew Davenport <ddavenport@chromium.org> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2018-11-05 15:47:50 -05:00
Sergii Romantsov	5aeee1ab15	i965/batch/debug: Allow log be dumped before assert Message that may show the culprit of assert now will be dumped before that for debug purposes. Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com> Reviewed-by: Lionel G Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-11-05 09:24:55 -08:00
Lionel Landwerlin	4fd0ff75f3	intel/sanitize_gpu: add debug message on mmap fail Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-05 15:45:08 +00:00
Lionel Landwerlin	e400ac52e4	intel/sanitize_gpu: deal with non page multiple buffer sizes We can only map at page aligned offsets. We got that wrong with buffer size where (size % 4096) != 0 (anv has a WA buffer of 1024). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-05 15:45:07 +00:00
Lionel Landwerlin	c5fca35af1	intel/sanitize_gpu: add help/gdb options to wrapper Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-05 15:45:07 +00:00
Lionel Landwerlin	9ab5089150	intel/dump_gpu: add missing gdb option Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-05 15:43:34 +00:00
Eric Engestrom	d515ded4d9	wsi/wayland: only finish() a successfully init()ed display Fixes: `4369102498` "vulkan/wsi/wayland: Stop caching Wayland displays" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>	2018-11-05 15:29:21 +00:00
Eric Engestrom	dcee22afed	wsi/wayland: use proper VkResult type Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-11-05 14:55:05 +00:00
Sergii Romantsov	ce837a5372	autotools: library-dependency when no sse and 32-bit Building of 32bit Mesa may fail if __SSE__ is not specified. Added missed dependency from libm. v2: avoided dependecy on any flag, just link v3: meson doesn't fail, but have added dependency on libm CC: Dylan Baker <dylan@pnwbakers.com> CC: Lionel G Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108560 Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-11-05 13:21:49 +01:00
Samuel Pitoiset	f7fd0d86a9	radv: more use of radv_cp_wait_mem() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-11-05 09:48:50 +01:00
Samuel Pitoiset	c571ca7a08	radv: replace si_emit_wait_fence() with radv_cp_wait_mem() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-11-05 09:48:50 +01:00
Samuel Pitoiset	b1b2dd06a7	radv: add missing TFB queries support to CmdCopyQueryPoolsResults() Cc: 18.3 <mesa-stable@lists.freedesktop.org> Fixes: `b4eb029062` ("radv: implement VK_EXT_transform_feedback") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-11-05 09:48:43 +01:00
Samuel Pitoiset	dc3419195c	radv: remove useless sync after copying query results with compute The spec says: "vkCmdCopyQueryPoolResults is considered to be a transfer operation, and its writes to buffer memory must be synchronized using VK_PIPELINE_STAGE_TRANSFER_BIT and VK_ACCESS_TRANSFER_WRITE_BIT before using the results." VK_PIPELINE_STAGE_TRANSFER_BIT will wait for compute to be idle, while VK_ACCESS_TRANSFER_WRITE_BIT will invalidate both L1 vector caches and L2. So, it's useless to set those flags internally. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-11-05 09:47:55 +01:00
Vinson Lee	64a9ed8848	r600/sb: Fix constant logical operand in assert. Fixes: `da977ad907` ("r600/sb: start adding GDS support") Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-By: Gert Wollny <gert.wollny@collabora.com>	2018-11-04 21:09:55 -08:00
Kenneth Graunke	5d517a599b	st/mesa: Don't record garbage streamout information in the non-SSO case. In the non-SSO case, where multiple shader stages are linked together, we were recording garbage pipe_stream_output_info structures for all but the last enabled geometry-processing stage. Specifically, we were using the gl_transform_feedback_info from shader_program->last_vert_prog (the stage whose outputs will be recorded)...but were pairing it with the output varying mappings from the current shader stage. For example, a program with a VS and GS, the VS's pipe_shader_state would have a pipe_stream_output_info based on the GS transform feedback info, but the VS output mapping. This generally worked out okay because only the pipe_stream_output_info for the last stage really matters - the others can be ignored. However, we'd like to avoid confusing the pipe driver. In particular, my new driver translates the stream out information to hardware packets at bind_{vs,tes,gs}_state() time...and was hitting asserts about garbage varyings that didn't exist. This patch changes st/mesa to record a blank pipe_stream_output_info with num_outputs = 0 for all stages prior to last_vert_prog. The last one is captured as normal. (In the fully-SSO case, nothing should change - each program contains a single shader stage, so last_vert_prog is the current shader.) Tested with llvmpipe (piglit's gpu profile), and freedreno (a3xx, gpu profile with -t transform.feedback). Fixes several hundred CTS tests on my new driver. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-11-03 23:34:36 -07:00
Kenneth Graunke	b6410a2d22	st/nir: Drop unused parameter from st_nir_assign_uniform_locations(). ARB programs won't have one of these, and we don't use it anyway. Reviewed-by: Rob Clark <robdclark@gmail.com>	2018-11-03 23:34:36 -07:00
Kenneth Graunke	5294d65011	st/mesa: Pull nir_lower_wpos_ytransform work into a helper function. This will let me use it in the ARB program code as well. Reviewed-by: Rob Clark <robdclark@gmail.com>	2018-11-03 23:34:34 -07:00
Kenneth Graunke	424a6052df	intel: Use a URB start offset of 0 for disabled stages. There are some cases where the VS is the only stage enabled, it uses the entire URB, and the URB is large enough that placing later stages after the VS exceeds the number of bits for "URB Starting Address". For example, on Icelake GT2, "varying-packing-simple mat2x4 array" from Piglit is getting a starting offset of 128 for the GS/HS/DS. But the field is only large enough to hold an offset of 127. i965 doesn't hit any genxml assertions because it's still using the old OUT_BATCH mechanism. 128 << GEN7_URB_STARTING_ADDRESS_SHIFT (57) == 0, with the extra bit falling off the end. So we place the disabled stage at the beginning of the URB (overlapping with push constants). This is likely okay since it's a zero size region (0 entries). It seems like the Vulkan driver might hit this assertion, however, and the situation seems harmless. To work around this, always place disabled stages at the start of the URB, so the last enabled stage can fill the remaining space without overflowing the field. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2018-11-03 23:25:57 -07:00
Mauro Rossi	5c0cff868a	android: radv: add libmesa_git_sha1 static dependency libmesa_git_sha1 whole static dependency is added to get git_sha1.h header and avoid following building error: external/mesa/src/amd/vulkan/radv_device.c:46:10: fatal error: 'git_sha1.h' file not found ^ 1 error generated. Fixes: `9d40ec2cf6` ("radv: Add support for VK_KHR_driver_properties.") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-11-03 10:48:45 +01:00
Eric Anholt	0d78c6af0d	vc4: Use the normal simulator ioctl path for CL submit as well. The simulator no longer needs to look back into the gallium structs.	2018-11-02 14:26:38 -07:00
Eric Anholt	c80e267a0a	vc4: Maintain a separate GEM mapping of BOs in the simulator. This will let us avoid looking back into the gallium driver's vc4_bo.	2018-11-02 14:26:38 -07:00
Eric Anholt	645ca269d2	vc4: Take advantage of _mesa_hash_table_remove_key() in the simulator.	2018-11-02 14:26:38 -07:00
Eric Anholt	f32ba7abd7	v3d: Remove the special path for simulaton of the submit ioctl. Now that it doesn't need to find the struct v3d_bos, it can just take the normal v3d_ioctl() path.	2018-11-02 14:26:38 -07:00
Eric Anholt	df9f574c13	v3d: Maintain a mapping of the GEM buffer in the simulator. This way we don't need to reach back into the gallium driver code to get the mapping.	2018-11-02 14:26:38 -07:00
Dylan Baker	7652931d33	meson: link gallium nine with pthreads In some cases (not building with llvm, which automatically pulls in pthreads) nine needs to be directly linked with pthreads. Fixes building on x86 (32 bit) without llvm. Distro bug: https://bugs.gentoo.org/670094 Fixes: `6b4c7047d5` ("meson: build gallium nine state_tracker") Tested-by: Rafal Lalik <rafallalik@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-11-02 13:10:33 -07:00
Anuj Phogat	1c140470ef	anv/icl: Disable prefetching of sampler state entries WA_1606682166: Incorrect TDL's SSP address shift in SARB for 16:6 & 18:8 modes. Disable the Sampler state prefetch functionality in the SARB by programming 0xB000[30] to '1'. This is to be done at boot time and the feature must remain disabled permanently. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-11-02 08:34:33 -07:00
Topi Pohjolainen	9a41a10f8a	i965/icl: Disable prefetching of sampler state entries In the same spirit as commit `a5889d70f2` "i965/icl: Disable binding table prefetching". Fixes some 110+ intermittent piglit failures with tex-miplevel-selection variants. WA_1606682166: Incorrect TDL's SSP address shift in SARB for 16:6 & 18:8 modes. Disable the Sampler state prefetch functionality in the SARB by programming 0xB000[30] to '1'. This is to be done at boot time and the feature must remain disabled permanently. Anuj: Set SamplerCount = 0 for vs, gs, hs, ds and wm units as well. Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com> Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Cc: Mark Janes <mark.a.janes@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-11-02 08:34:33 -07:00
Jan Vesely	9cab8ccd6c	amd: Make vgpr-spilling depend on llvm version The option was removed in LLVM r345763 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-11-02 10:32:47 -04:00
Timothy Arceri	769ae9fb7f	nir: fix condition propagation when src has a swizzle We cannot use nir_build_alu() to create the new alu as it has no way to know how many components of the src we will use. This results in it guessing the max number of components from one of its inputs. Fixes the following CTS tests: dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_frag dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_geom dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_tessc dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order_vert Fixes: `2975422ceb` ("nir: propagates if condition evaluation down some alu chains") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-03 00:44:01 +11:00
Mauro Rossi	b9dec214f5	android: gallium/auxiliary: add include to get u_debug.h header To avoid build error in u_debug_stack_android.cpp due to now missing u_debug.h header: external/mesa/src/gallium/auxiliary/util/u_debug_stack_android.cpp:26:10: fatal error: 'u_debug.h' file not found #include "u_debug.h" ^ 1 error generated. Fixes: `37db383abb` ("util: Move u_debug to utils") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-11-02 13:31:37 +01:00
Gert Wollny	b710680093	virgl/vtest-winsys: Use virgl version of bind flags The bind flags defined by mesa/gallium might not always be in sync with the ones copied to virglrenderer/gallium. Therefore, use the flags defined in virgl like it is done for all the other calls to create resources. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-11-02 11:53:09 +01:00
Gert Wollny	acd2968005	mesa/st: Add support for EXT_texture_sRGB_R8 This only adds support on the Gallium core level, for the drivers it is likely that additional changes are needed to support the new texture format and thereby enabling the extension. Enables on softpipe and makes pass: dEQP-GLES31.functional.srgb_texture_decode.skip_decode.sr8.* v2: - add include for getting GL_SR8_EXT v4: - since the extension is not required don't bother providing a fallback (Ilia Mirkin) - split patch (2/2) to separate Gallium and mesa/st parts (Roland Scheidegger) - trim commit message to only contain the history of the patch relevant to this part v5: - don't include GLES headers (required enum has been added to glheader.h) (Ilia Mirkin) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-11-02 11:52:44 +01:00
Gert Wollny	29f0ab2c30	Gallium: Add format PIPE_FORMAT_R8_SRGB This format is needed to support EXT_texture_sRGB_R8. THe patch adds a new format enum, the format entries in Gallium and and svga, the mapping between sRGB and linear formats, and tests. v2: - add mapping to linear format for PIPE_FORMATR_R8_SRGB v3: - Add texture format to svga format table since otherwise building mesa will fail when this driver is enabled. It was not tested whether the extension actually works. v4: - svga: remove the SVGA specific format definitions and table entries and only add correct the location of PIPE_FORMAT_R8_SRGB in the format_conversion_table (Ilia Mirkin) - Split patch (1/2) to separate Gallium part and mesa/st part. (Roland Scheidegger) - Trim the commit message to only contain the relevant parts from the split. v5: - svga: correct location of PIPE_FORMAT_SRGB_R8 (Ilia Mirkin) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-11-02 11:52:44 +01:00
Gert Wollny	b8e9c6522d	mesa/core: Add definitions and translations for EXT_texture_sRGB_R8 v2: - fix format definition line - disable for desktop GL - don't add GL_R8_EXT to glext.h since it is already in GLES2/gl2ext.h in glext.h and include this header where needed (all Emil) v3: - swrast: Fill the function table for sRGB_R8 The size of the function table is checked at compile time and must correspond to the number of mesa texture formats. dri/swrast being gles-2.0 doesn't support the extension though v4: - correct format layout comment (Ilia Mirkin) - correct logic for accepting GL_RED only textures (in part Ilia Mirkin) EXT_texture_sRGB_R8 requires OpenGL ES 3.0 which includes ARB_texture_rg/EXT_texture_rg, so one only must check for the first when SR8_EXT is really requested. v5: - add define for GL_ES8_XT to glheader.h and don't include GLES headers (Ilia Mirkin) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-11-02 11:52:44 +01:00
Erik Faye-Lund	742dace825	glsl: do not allow implicit casts of unsized array initializers The GLSL 4.6 specification (section 4.1.14. "Implicit Conversions") says: "There are no implicit array or structure conversions. For example, an array of int cannot be implicitly converted to an array of float." So let's add a check in place when assigning array initializers to implicitly sized arrays, to avoid incorrectly allowing code on the form: int[] foo = float[](1.0, 2.0, 3.0) This fixes the following dEQP test-cases: - dEQP-GLES31.functional.shaders.implicit_conversions.es31.invalid.arrays.int_to_float_vertex - dEQP-GLES31.functional.shaders.implicit_conversions.es31.invalid.arrays.int_to_float_fragment - dEQP-GLES31.functional.shaders.implicit_conversions.es31.invalid.arrays.int_to_uint_vertex - dEQP-GLES31.functional.shaders.implicit_conversions.es31.invalid.arrays.int_to_uint_fragment - dEQP-GLES31.functional.shaders.implicit_conversions.es31.invalid.arrays.uint_to_float_vertex - dEQP-GLES31.functional.shaders.implicit_conversions.es31.invalid.arrays.uint_to_float_fragment - dEQP-GLES31.functional.shaders.implicit_conversions.es32.invalid.arrays.int_to_float_vertex - dEQP-GLES31.functional.shaders.implicit_conversions.es32.invalid.arrays.int_to_float_fragment - dEQP-GLES31.functional.shaders.implicit_conversions.es32.invalid.arrays.int_to_uint_vertex - dEQP-GLES31.functional.shaders.implicit_conversions.es32.invalid.arrays.int_to_uint_fragment - dEQP-GLES31.functional.shaders.implicit_conversions.es32.invalid.arrays.uint_to_float_vertex - dEQP-GLES31.functional.shaders.implicit_conversions.es32.invalid.arrays.uint_to_float_fragment Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-02 11:10:36 +01:00
Erik Faye-Lund	6df922f438	mesa/glsl: add support for EXT_shader_implicit_conversions EXT_shader_implicit_conversions adds support for implicit conversions for GLES 3.1 and above. This is essentially a subset of ARB_gpu_shader5, and augments OES_gpu_shader5. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-02 11:10:36 +01:00
Erik Faye-Lund	ecab2d6f14	glsl: fall back to inexact function-match In GLES, we currently either need an exact match with a local function, or an exact match with a builtin. However, if we add support for implicit conversions for GLES shaders, we also need to fall back to a non-exact match in the case where there were no builtin match either. Luckily, we already have a variable ready with this, so let's just return it if the builtin-search failed. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-02 11:10:36 +01:00
Erik Faye-Lund	e975c5b785	glsl: add has_implicit_uint_to_int_conversion()-helper This makes the code a bit easier to read, as well as reduces repetition, especially when we add support for EXT_shader_implicit_conversions. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-02 11:10:36 +01:00
Erik Faye-Lund	12f001f013	glsl: add has_implicit_conversions()-helper This makes the code a bit easier to read, as well as will reduce repetition when we add support for EXT_shader_implicit_conversions. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-11-02 11:10:36 +01:00
Mathias Fröhlich	9f009c1a8f	mesa: Remove needless indirection in some draw functions. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-02 08:42:03 +01:00
Timothy Arceri	c7bdda8aa5	nir: allow propagation of if evaluation for bcsel Shader-db results Skylake: total instructions in shared programs: 13109035 -> 13109024 (<.01%) instructions in affected programs: 4777 -> 4766 (-0.23%) helped: 11 HURT: 0 total cycles in shared programs: 332090418 -> 332090443 (<.01%) cycles in affected programs: 19474 -> 19499 (0.13%) helped: 6 HURT: 4 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-02 15:56:34 +11:00
Dave Airlie	677b496b6b	radv: fix begin/end transform feedback with 0 counter buffers. If the user gives 0 counterBuffers then the driver should still enable transform feedback on all targets. This changes the driver to always enable xfb, and use counter buffers where one is defined for the target in question. Fixes: `b4eb029062` (radv: implement VK_EXT_transform_feedback) Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-11-02 04:15:07 +00:00
Dave Airlie	7f37a52a21	radv: apply xfb buffer offset at buffer binding time not later. (v2) In order to handle pause/resume properly, the offset should be added to the buffer binding not to the begin/end paths. v2: don't add offset to size Fixes ext_transform_feedback-alignment* under zink Fixes: `b4eb029062` (radv: implement VK_EXT_transform_feedback) Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-11-02 04:13:31 +00:00
Mark Janes	5f312e95f8	Revert "i965/batch: avoid reverting batch buffer if saved state is an empty" This reverts commit `a9031bf9b5`. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108630	2018-11-01 16:28:05 -07:00
Eric Anholt	43a397c580	vc4: Drop the winsys_stride relayout in the simluator Since `0c1dd9dee0` ("broadcom/vc4: Allow importing linear BOs with arbitrary offset/stride."), we have the vc4-side BO properly laid out (assuming it's linear) in the winsys BO so that we can skip this extra copy.	2018-11-01 14:34:02 -07:00
Eric Anholt	4e1b163eed	v3d: Update the TLB config for depth writes on V3D 4.2. Fixes 311 piglit cases on the simulator.	2018-11-01 13:56:30 -07:00
Eric Anholt	4018eb04e8	v3d: Use the TLB R/B swapping instead of recompiles when available. The recompile reduction is nice, but this also makes it so that a straight texture copy could get optimized some day to not unpack/repack the f16 values.	2018-11-01 13:56:30 -07:00
Eric Anholt	3923cf626d	v3d: Take advantage of _mesa_hash_table_remove_key() in the simulator.	2018-11-01 13:54:36 -07:00
Eric Anholt	47586ab569	v3d: Respect user-passed strides for BO imports. If the caller has passed in a stride for (linear) BO import, we should use that stride when rendering to the BO (or, if we some day support texturing from linear-imported BOs, when doing the linear-to-UIF shadow copy). This lets us remove the extra stride-changing relayout in the simulator.	2018-11-01 13:54:36 -07:00
Eric Anholt	5313fb8abd	v3d: Drop #if 0-ed out v3d_dump_to_file(). This came from vc4, where we had a file format for GPU hangs. I don't have one of those for V3D, and I probably won't ever have the simulator side produce dumps even if I do.	2018-11-01 13:54:36 -07:00
Eric Anholt	d3f66c385b	v3d: Fix a typo in a comment in job handling.	2018-11-01 13:54:36 -07:00
Eric Anholt	b93fc160f4	v3d: Fix a copy-and-paste comment in the simulator code.	2018-11-01 13:54:36 -07:00
Anuj Phogat	13c955182f	anv/icl: Set Error Detection Behavior Control Bit in L3CNTLREG The default setting of this bit is not the desirable behavior. WA_1406697149 Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-11-01 12:00:23 -07:00
Anuj Phogat	b3d6937fb0	i965/icl: Set Error Detection Behavior Control Bit in L3CNTLREG The default setting of this bit is not the desirable behavior. WA_1406697149 Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-11-01 12:00:23 -07:00
Emil Velikov	ac95a0e024	docs: add 19.0.0-devel release notes template Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-01 18:56:54 +00:00
Emil Velikov	97c73c9174	mesa: bump version to 19.1.0-devel Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-01 18:54:02 +00:00
Dylan Baker	1f41104b9b	meson: don't install translation files Tested-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Fixes: `7834926a4f` ("meson: add support for generating translation mo files")	2018-11-01 10:49:16 -07:00
Eric Engestrom	4da169d368	egl: use the LC_ALL hammer instead of LANG Some environment (like Travis apparently) set LC_* vars, messing up the sort ordering, so let's use envvar with the highest priority to make sure this is actually sorted in ASCII order. Suggested-by: Michel Dänzer <michel@daenzer.net> Fixes: `b42dc50a5f` "egl: fix entrypoint sorting test" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2018-11-01 17:25:08 +00:00
Eric Engestrom	b42dc50a5f	egl: fix entrypoint sorting test Fixes: `68dc591af1` "egl: Fix eglentrypoint.h sort order." Cc: Emil Velikov <emil.l.velikov@gmail.com> Cc: Tapani Pälli <tapani.palli@intel.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-01 15:45:26 +00:00
Andrii Simiklit	fc3cecda8c	intel/tools: fix resource leak Some memory and file descriptors are not freed/closed. v2: fixed case where we skipped the 'aub' variable initialization Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-11-01 13:21:07 +00:00
Jonathan Gray	ae8e81b0e3	intel/tools: include stdarg.h in error2aub Include stdarg.h in error2aub.c otherwise it fails to build on OpenBSD due to not finding definitions for va_list va_start va_end. Signed-off-by: Jonathan Gray <jsg@jsg.id.au> Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-11-01 10:27:26 +00:00
Mathias Fröhlich	68dc591af1	egl: Fix eglentrypoint.h sort order. Fixes a make check failure. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108617 Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-01 10:56:21 +01:00
Samuel Pitoiset	9cbdcc86b7	radv: set PA_SU_PRIM_FILTER_CNTL optimally Ported from RadeonSI. It's always TRUE for CIK+ because RADV doesn't support 16 samples. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-11-01 08:49:15 +01:00
Samuel Pitoiset	85010585cd	radv: only enable gl_SampleMask if MSAA is enabled too Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-11-01 08:49:11 +01:00
Samuel Pitoiset	0c08074cef	radv: use radeon_info::num_good_cu_per_sh Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-11-01 08:49:08 +01:00
Samuel Pitoiset	9278089d05	ac/nir: make use of i1false in few more places Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-11-01 08:49:05 +01:00
Samuel Pitoiset	79410b1e87	radv: add support for Raven2 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-11-01 08:48:52 +01:00
Mathias Fröhlich	ad52e19408	mesa: Collect all the draw functions in draw.{h,c}. Some of these functions were distributed across different implementation and header files. Put them at a central place. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-01 06:08:49 +01:00
Mathias Fröhlich	3d64f3c795	mesa/vbo: Move _vbo_draw_indirect -> _mesa_draw_indirect Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-01 06:08:49 +01:00
Mathias Fröhlich	f726c61cc1	mesa/vbo: Move src/mesa/vbo/vbo_exec_array.c -> src/mesa/main/draw.c The array type draw is no longer directly dependent on the vbo module. Thus move array type draws into mesa/main/draw.c. Rename symbols starting with vbo_* to _mesa_* and apply some reindenting to make it consistent. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-01 06:08:49 +01:00
Mathias Fröhlich	952a5da584	vbo: Pull the _mesa_set_draw_vao calls out of the if clauses. These calls are just the same in each if branch. So pull that before the if. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-01 06:08:49 +01:00
Mathias Fröhlich	b00cb994ef	vbo: Preserve vbo_save::no_current_update on primitive restart. With this change we preserve the no_current_update property when we observe a glPrimitiveRestart call. That means that we now also get the no_current_update optimization for display lists that are made out of indexed draws using primitive restart. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-01 06:08:49 +01:00
Mathias Fröhlich	f2a52b3c25	vbo: Make no_current_update an argument to vbo_save_NotifyBegin. Instead of coding additional information into the primitive mode, make the only remaining flag there a direct argument to vbo_save_NotifyBegin. v2: Fix incorrect no_current_update in glRectf. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-01 06:08:49 +01:00
Mathias Fröhlich	b899f5e59c	vbo: Move no_current_update out of _mesa_prim. The _mesa_prim::no_current_update flag should tell the compiled display list if the current attributes that are placed in the dlists vbo shall take a defined state past replay of a display list. Immediate mode draws compiled into display lists should set the current values. Array draws may leave the current values in undefined state. So finally this flag is not a property of every primitive but it is a property of the compiled display list and there it is a property of the last primitive compiled into the list. So move the flag out of _mesa_prim into vbo_save. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-01 06:08:49 +01:00
Mathias Fröhlich	eae4ee9419	vbo: Remove the now unused VBO_SAVE_PRIM_WEAK define. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-01 06:08:49 +01:00
Mathias Fröhlich	873adb06fa	vbo: Remove the always false branch dlist replay. The previous patch left a constant if (0) in the code. Clean that up now. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-01 06:08:49 +01:00
Mathias Fröhlich	1387b4d533	vbo: Test for VBO_SAVE_PRIM_WEAK in _mesa_prim::mode is false. When setting the _mesa_prim::mode field we always filter out all non OpenGL primitive mode bits. So this tested bit cannot be there anymore and the test evaluates to zero. The zero is removed with the next patch to ease review. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-01 06:08:49 +01:00
Mathias Fröhlich	cee0dd8d5a	vbo: Remove VBO_SAVE_PRIM_WEAK from vbo_save_NotifyBegin calls. Now looking at the implementation of vbo_save_NotifyBegin. The VBO_SAVE_PRIM_WEAK flag, delivered in the primitive mode argument to vbo_save_NotifyBegin, is not evaluated anymore. The two users of the mode argument are the primitive mode itself, where the VBO_SAVE_PRIM_WEAK bit is masked out to retrieve the underlying OpenGL primitive mode. The other user is to check for the VBO_SAVE_PRIM_NO_CURRENT_UPDATE bit which is different from VBO_SAVE_PRIM_WEAK. So, since vbo_save_NotifyBegin does not care about VBO_SAVE_PRIM_WEAK, we can savely remove it from the call arguments of vbo_save_NotifyBegin. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-01 06:08:49 +01:00
Mathias Fröhlich	b632c072b2	vbo: Remove set but not used weak field from _mesa_prim. The only reader of the weak field in _mesa_prim is pretty console printing. By that, remove the weak field from _mesa_prim. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-01 06:08:49 +01:00
Mathias Fröhlich	2dc951b7c3	vbo: Remove the VBO_SAVE_FALLBACK flag. On finishing a display list playback the VBO_SAVE_FALLBACK bit is still kept in vbo_save_context::replay_flags. But examining replay_flags and the display list flags that feed this value the corresponding bit is never set these days anymore. So, since it is nowhere set or checked, we can safely remove it. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-01 06:08:49 +01:00
Mathias Fröhlich	5b41504f66	vbo: Remove unused vbo_save_fallback function. Reviewed-by: Brian Paul <brianp@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-01 06:08:49 +01:00
Emil Velikov	075f92b2b7	docs/relnotes: add the EGL Device extensions Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-01 00:05:43 +00:00
Emil Velikov	83c7fbb4e4	meson: egl: group dri2 bits separately from haiku One cannot have haiku and dri2 - surfaceless,x11,etc. Group things up, which will make the addition of platform_device a bit easier. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-11-01 00:05:43 +00:00
Emil Velikov	c7cc135e23	egl: enable EGL_EXT_device_{base,enumeration,query} Now that we support the extensions, fully, enabled them. The specs mandate that we always have at least one device and each dpy has a device associated with it. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-01 00:05:43 +00:00
Emil Velikov	00992700c9	egl: set the EGLDevice when creating a display This is the final requirement from the base EGLDevice spec. v2: - split from another patch - move wayland hunk after we have the fd Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-01 00:05:43 +00:00
Emil Velikov	dbb4457d98	egl: add EGL_EXT_device_drm support Add implementation based around the drmDevice API. As such it's only available only when building with libdrm. With the latter already a requirement when using !SW code paths in the platform code. Note: the current code will work if a device is hot-plugged. Yet hot-unplugged is not implemented, since I have no ways of testing it. v2: - ddd some _eglDeviceSupports checks - require DRM_NODE_RENDER - add _eglGetDRMDeviceRenderNode helper v3: - flip inverted asserts (Mathias) Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-01 00:05:43 +00:00
Emil Velikov	f73c5d27c1	egl: add EGL_MESA_device_software support Add a plain software device, which is always available. We can safely assign it as the first/initial device in _eglGlobals, although we ensure that's the case with a handful of _eglDeviceSupports checks throughout the code. v2: - s/_eglFindDevice/_eglAddDevice/ (Eric) - s/_eglLookupAllDevices/_eglRefreshDeviceList/ (Eric) - move ^^ helpers into a earlier patch (Eric, Mathias) - set the SW device on _eglGlobal init. (Eric) - add a number of _eglDeviceSupports checks (Mathias) - split Device/Display attach to a separate patch v3: - flip inverted asserts (Mathias) - s/on-stack/static/ (Mathias) Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-01 00:05:43 +00:00
Adam Jackson	3f08e500c4	specs: Add EGL_MESA_device_software The device extension string is expected to contain the name of the extension defining what kind of device it is, so the caller can know what kinds of operations it can perform with it. So that string had better be non-empty, hence this trivial extension. v2: - drop "fallback", update history and update contributor list Signed-off-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-01 00:05:43 +00:00
Emil Velikov	7552fcb7b9	egl: add base EGL_EXT_device_base implementation Introduce the API for device query and enumeration. Those at the moment produce nothing useful since zero devices are actually available. That contradicts with the spec, so the extension isn't advertised just yet. With later commits we'll add support for software (always) and hardware devices. Each one exposing the respective extension string. v2: - fold API boilerplate into this patch - move _eglAddDevice, _eglDeviceSupports, _eglRefreshDeviceList to this patch (Eric, Mathias) - make _eglFiniDevice the one called last v3: - comment on the dummy _egl_device_extension enum entry (Eric) - annotate dev as MAYBE_UNUSED (Mathias) Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-11-01 00:05:43 +00:00
Emil Velikov	e55c1bcb08	glx: be explicit about when mapping X <> GLX visuals Write down both X and GLX visual types when mapping from one to the other. Makes grepping through the code a tiny bit easier. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-11-01 00:05:43 +00:00
Emil Velikov	833e3cad19	glx: remove unused __glXPreferEGL() declaration The function definition is no longer around, drop the useless declaration. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-11-01 00:05:43 +00:00
Emil Velikov	4428eed896	travis: use mako for python2 Earlier commit flipped the default to python2 but forgot to update the travis file. Props to pip caching things "worked" for a little while. Fixes: `f22ad5ef18` ("travis: use python3 for the autoconf builds") Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-11-01 00:05:43 +00:00
Dave Airlie	fcf15a007d	radv/xfb: don't increase offset by component mask start. This is incorrect, the offset is into the buffer, and it's legal to write loc 0,0 -> buffer0, offset 0 loc 0,1 -> buffer1, offset 0 This fixes a bunch of piglits running on my zink xfb code on radv. Fixes: `6c21645046` (radv: emit stream outputs for vertex and tessellation stages) Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-10-31 23:48:10 +00:00
Dylan Baker	d25179469b	util/gen_xmlpool: Make use of python's foreach loop Instead of using a while loop with indexing. This is much cleaner. This requires some other small changes. Acked-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-31 16:37:46 -07:00
Dylan Baker	465cfcb266	util/gen_xmlpool: Don't use len to test for container emptiness This is a very common python anti-pattern. Not using length allows us to go through faster C paths, but has the same meaning. Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-31 16:37:46 -07:00
Dylan Baker	b9cd81ea31	util/gen_xmlpool: Don't write via shell redirection Using shell redirection to write to a file is more complicated than necessary, and has the potential to run into unicode encoding problems. It's also less code. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108530 v2: - update commit message to say less about LANG=C - use flags instead of positional arguments for the script (Emil) Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-31 16:37:46 -07:00
Dylan Baker	1df086662a	util/gen_xmlpool: use with statement to open file Which ensures it is closed at the end of the scope. Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-31 16:37:12 -07:00
Dylan Baker	bc4a7645e4	util/gen_xmlpool: use a main function Again, just good style Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-31 16:37:12 -07:00
Dylan Baker	187fad5c0b	util/gen_xmlpool: Use print function instad of sys.stderr.write This ensures that stderr is flushed, unlike writing Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-31 16:37:12 -07:00
Dylan Baker	2c2aa98ee7	util/gen_xmlpool: Use more standard style gen_xmlpool uses a style unlike the rest of mesa, spaces between function/method calls and the parens, strange whitespace to force lining up method calls, and some other whitespace stuff. Since I'm going to be doing some work in the file, I'm going to start cleaning those up. Acked-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-31 16:37:12 -07:00
Dylan Baker	a8004ef03e	docs/meson: Add note about update translations Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-31 16:37:12 -07:00
Dylan Baker	0621e91a8c	util/xmlpool: Update for meson generation Meson won't put the .gmo files in the layout that python's gettext.translation() expects, it puts them in the build directory in a flat layout. This modifies android and autotools to do the same (scons doesn't work with translations at all) v3: - Squash 4 patches into this patch Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-31 16:37:12 -07:00
Dylan Baker	7834926a4f	meson: add support for generating translation mo files Meson has handy a handy built-in module for handling gettext called i18n, this module works a bit differently than our autotools build does, namely it doesn't automatically generate translations instead it creates 3 new top level targets to run. These are: xmlpool-pot xmlpool-update-po xmlpool-gmo v2: - Add new files to autotools dist tarball Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-31 16:37:12 -07:00
Dylan Baker	2857b18991	util/gen_xmlpool: use argparse for argument handling This is a little cleaner than just looking at sys.argv, but it's also going to allow us to handle the differences in the way meson and autotools handle translations more cleanly. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-31 16:37:12 -07:00
Timothy Arceri	5b757b4097	nir: fix if condition propagation for alu use We need to update the cursor before we check if the alu use is dominated by the if condition. Previously we were checking if the current location of the alu instruction was dominated by the if condition which would miss some optimisation opportunities. Fixes: `a3b4cb3458` ("nir/opt_if: Rework condition propagation") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-11-01 09:22:55 +11:00
Vinson Lee	802ae533ab	freedreno: Do not link ir3_compiler with valgrind libraries. This patch fixes this freedreno autotools build error. CXXLD ir3_compiler /usr/lib/valgrind/libcoregrind-amd64-linux.a(libcoregrind_amd64_linux_a-m_main.o): In function `_start': (.text+0x0): multiple definition of `_start' /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o:(.text+0x0): first defined here /usr/bin/ld: /usr/lib/valgrind/libcoregrind-amd64-linux.a(libcoregrind_amd64_linux_a-m_main.o): relocation R_X86_64_32S against undefined symbol `vgPlain_interim_stack' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: /usr/lib/valgrind/libcoregrind-amd64-linux.a(libcoregrind_amd64_linux_a-m_trampoline.o): relocation R_X86_64_32 against `.text' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: /usr/lib/valgrind/libcoregrind-amd64-linux.a(libcoregrind_amd64_linux_a-dispatch-amd64-linux.o): relocation R_X86_64_32S against symbol `vgPlain_stats__n_xindirs_32' can not be used when making a PIE object; recompile with -fPIC /usr/bin/ld: final link failed: Nonrepresentable section on output collect2: error: ld returned 1 exit status Fixes: `f3cc0d2747` ("freedreno: import libdrm_freedreno + redesign submit") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108595 Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2018-10-31 15:05:28 -07:00
Emil Velikov	f22ad5ef18	travis: use python3 for the autoconf builds Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-31 19:16:00 +00:00
Emil Velikov	986033a275	configure: allow building with python3 Pretty much all of the scripts are python2+3 compatible. Check and allow using python3, while adjusting the PYTHON2 refs. Note: - python3.4 is used as it's the earliest supported version - python2 chosen prior to python3 v2: use python2 by default Cc: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-31 19:15:50 +00:00
Juan A. Suarez Romero	6d7d3dbda5	docs: update calendar, add new item and link release notes for 18.2.4 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2018-10-31 19:58:00 +01:00
Juan A. Suarez Romero	5b074c756e	docs: add sha256 checksums for 18.2.4 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `624e384ea8`)	2018-10-31 19:55:28 +01:00
Juan A. Suarez Romero	7c2239aa55	docs: add release notes for 18.2.4 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `1cdef5e70c`)	2018-10-31 19:55:25 +01:00
Eric Engestrom	091da79bb0	meson: hide warnings from external project `gtest` gtest is an external project that is copied in this tree for technical reasons, but isn't maintained by us, so its warnings are irrelevant. Cc: Emil Velikov <emil.l.velikov@gmail.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-10-31 18:20:25 +00:00
Eric Engestrom	455a3cd515	tools/imgui: disable all warnings This is an external project we have no control over, and will not be fixing (other than by sometimes pulling the latest sources), so warnings serve no purpose here. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-31 16:28:33 +00:00
Alejandro Piñeiro	95b8da22cf	glspirv: no need to force entrypoint name to "main" Since commit "intel/compiler: Stop assuming the entrypoint is called "main"" there is no need to force the entrypoint name to be "main". Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-31 15:57:23 +01:00
Tapani Pälli	27f1298b9d	glsl/linker: validate attribute aliasing before optimizations Patch does a 'dry run' of assign_attribute_or_color_locations before optimizations to catch cases where we have aliasing of unused attributes which is forbidden by the GLSL ES 3.x specifications. We need to run this pass before unused attributes may be removed and with attribute binding information from program, therefore we re-use existing pass in linker rather than attempt to write another one. This fixes WebGL2 test 'gl-bindAttribLocation-aliasing-inactive' and Piglit test 'gles-3.0-attribute-aliasing'. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106833 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-31 14:53:47 +02:00
Eric Engestrom	a96749b13c	egl: drop EGL driver `name` This is a revert of Marek's `2cb9ab53dd` revert. It was needed to revert the previous commit, and didn't have any issue itself. -- The "DRI2" name was reported as confusing when printing EGL infos (one user reported thinking DRI3 was not working on his X server), and the only alternative is Haiku, which can only be used on a Haiku machine. The name therefore doesn't add any information that the user wouldn't know already, so let's just drop it. Suggested-by: Emil Velikov <emil.l.velikov@gmail.com> Related-to: `b174a1ae72` ("egl: Simplify the "driver" interface") Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Reviewed-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-31 11:01:54 +00:00
Eric Engestrom	cb0980e69a	egl: move alloc & init out of _eglBuiltInDriver{DRI2,Haiku} This is a revert of Marek's `84f3afc2e1` revert, with a missing line added back. I failed a rebase and dropped that crucial line, and didn't do a runtime test after my rebase, and as a result broke EGL for everyone. This commit has been tested by Intel's CI and I re-read it once more, so it should be good this time. -- Note: dropping the EGL_BAD_ALLOC in egl_haiku because it's overwritten by the EGL_NOT_INITIALIZED in eglInitialize(). Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-31 11:01:54 +00:00
Christian Gmeiner	21d9b78289	Revert "imx: make use of loader_open_render_node(..) helper" This reverts commit `773d6ea6e7`. Since kernel 4.17 (drm/etnaviv: remove the need for a gpu-subsystem DT node) the etnaviv DRM driver doesn't have an associated DT node anymore. This is technically correct, as the etnaviv device is a virtual device driving multiple hardware devices. Before 4.17 the userspace had access to the following information: DRIVER=etnaviv OF_NAME=gpu-subsystem OF_FULLNAME=/gpu-subsystem OF_COMPATIBLE_0=fsl,imx-gpu-subsystem OF_COMPATIBLE_N=1 MODALIAS=of:Ngpu-subsystemT<NULL>Cfsl,imx-gpu-subsystem DRIVER=imx-drm OF_NAME=display-subsystem OF_FULLNAME=/display-subsystem OF_COMPATIBLE_0=fsl,imx-display-subsystem OF_COMPATIBLE_N=1 Afer 4.17: DRIVER=etnaviv MODALIAS=platform:etnaviv The OF node has never been part of the etnaviv UABI, simply due to the fact that it's still possible to instantiate the etnaviv driver from a platform file, instead of a devicetree node. A patch set to fix this problem was send out [1] but it looks like that a proper solution needs more time to bake. [1] https://lists.freedesktop.org/archives/dri-devel/2018-October/194651.html Suggested-by: Emil Velikov <emil.l.velikov@gmail.com> Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2018-10-31 09:41:26 +01:00
Samuel Pitoiset	9ef8ea1451	radv: use WAIT_REG_MEM_GREATER_OR_EQUAL instead of a magic value Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-31 09:21:28 +01:00
Samuel Pitoiset	a9a56f47f8	radv: use pool->stride when calling radv_query_shader() Not needed to recompute the stride. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-31 09:21:28 +01:00
Samuel Pitoiset	e60ab66e33	radv: rename some parameters in Cmd{Begin,End}TransformFeedbackEXT() To match latest spec. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-31 09:21:28 +01:00
Samuel Pitoiset	57982b683b	radv/winsys: do not assign last submission when chained path failed I don't think we want to wait for something that hasn't been correctly submitted. This is similar to the fallback path. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-31 09:21:28 +01:00
Samuel Pitoiset	ae3aecd07f	radv/winsys: fix buffer deletion in the sysmem path In case we failed to submit the CS correctly. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-31 09:21:28 +01:00
Samuel Pitoiset	72877865d9	radv/winsys: cleanup the chained submission path Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-31 09:21:28 +01:00
Samuel Pitoiset	d12dd16a97	radv/winsys: remove unused surface_best() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-31 09:21:28 +01:00
Jason Ekstrand	d3a0d8b750	intel/compiler: Stop assuming the entrypoint is called "main" This isn't true for Vulkan so we have to whack it to "main" in anv which is silly. Instead of walking the list of functions and asserting that everything is named "main" and hoping there's only one function named "main", just use the nir_shader_get_entrypoint() helper which has better assertions anyway. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-10-30 20:14:52 -05:00
Timothy Arceri	31596836fc	st/glsl_to_nir: fix next_stage gathering ffs() just returns the bit that is set, we need to know what stage that bit represents so use u_bit_scan() instead. Fixes: `2ca5d9548f` ("st/glsl_to_nir: gather next_stage in shader_info") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-31 09:33:17 +11:00
Timothy Arceri	9ec4a5ef29	st/mesa: calculate buffer size correctly for packed uniforms Fixes: `edded12376` ("mesa: rework ParameterList to allow packing") Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-31 09:32:41 +11:00
Dylan Baker	fb02bd3d1c	util: move u_cpu_detect to util CC: vlee@freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107870 Fixes: `80825abb5d` ("move u_math to src/util") Tested-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-30 14:32:52 -07:00
Dylan Baker	37db383abb	util: Move u_debug to utils Tested-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-30 14:32:52 -07:00
Dylan Baker	2fd5dff7e7	util: Move os_misc to util this is needed by u_debug Tested-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-30 14:32:52 -07:00
Dylan Baker	f1f104e548	gallium/util: remove u_inlines.h from u_debug.c It's not used, and I'm not pulling u_inlines into src/util. Tested-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-30 14:32:52 -07:00
Dylan Baker	59d494c1cc	gallium/util: remove p_format.h from u_debug.h Tested-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-30 14:32:52 -07:00
Dylan Baker	314777e86a	gallium/util: move memory debug declarations into u_debug_gallium Tested-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-30 14:32:52 -07:00
Dylan Baker	68074dfa0e	gallium/util: move debug_print_tranfer_flags to u_debug_galilum This also appears to be unused. Tested-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-30 14:32:52 -07:00
Dylan Baker	fc39dc9841	gallium/util: move debug_print_bind_flags to u_debug_gallium This also appears to be unused. Tested-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-30 14:32:52 -07:00
Dylan Baker	e4f1fea821	gallium/util: move debug_print_usage_enum to the u_debug_gallium This isn't used in mesa, maybe vmware uses this in a closed source state tracker? Tested-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-30 14:32:52 -07:00
Dylan Baker	078b3cdb34	gallium/util: start splitting u_debug into generic and gallium specific components In order to pull u_debug into src/util we need to break the generically useful bits from the bits that are tightly coupled to gallium. Tested-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-30 14:32:52 -07:00
Dylan Baker	389d59c72a	gallium: split u_prim_name out of u_debug.h This allows us to pull u_prim.h out of u_debug.h Tested-by: Brian Paul <brianp@vmware.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-30 14:32:52 -07:00
Andre Heider	25a3ce97d5	gallium/hud: fix power sensor readings for amdgpu users amdgpu doesn't use the INPUT but the AVERAGE subfeature: $ sensors -u amdgpu-pci-0100 Adapter: PCI adapter power1: power1_average: 17.233 power1_cap: 180.000 Signed-off-by: Andre Heider <a.heider@gmail.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2018-10-30 16:30:32 -04:00
Rhys Perry	5172eb231d	glsl_to_tgsi: don't create 64-bit integer MAD/FMA TGSI has no I64MAD/U64MAD opcode. Fixes: `278580729a` ('st/glsl_to_tgsi: add support for 64-bit integers') Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-30 20:27:12 +00:00
Marek Olšák	26cb93e229	radeonsi: add support for Raven2 (v2) v2: fix enabling primitive binning Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-10-30 16:03:02 -04:00
Marek Olšák	0dea85928e	radeonsi: clean up decompress flags in fast color clear	2018-10-30 16:03:02 -04:00
Marek Olšák	99835fff08	radeonsi/gfx9: set optimal OVERWRITE_COMBINER_WATERMARK	2018-10-30 16:03:02 -04:00
Marek Olšák	8ad12c8bec	gallium: rework PIPE_HANDLE_USAGE_* flags Only radeonsi uses them, so adjust them to match its needs.	2018-10-30 16:03:02 -04:00
Danylo Piliaiev	00fc56a68d	anv: Disable dual source blending when shader doesn't support it on gen8+ Dual source blending behaviour is undefined when shader doesn't have second color output. "If SRC1 is included in a src/dst blend factor and a DualSource RT Write message is not used, results are UNDEFINED. (This reflects the same restriction in DX APIs, where undefined results are produced if “o1” is not written by a PS – there are no default values defined)." Dismissing fragment in such situation leads to a hang on gen8+ if depth test in enabled. Since blending cannot be gracefully fixed in such case and the result is undefined - blending is simply disabled. v2 (Jason Ekstrand): - Apply the workaround to each individual entry - Emit a warning through debug_report Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-10-30 12:59:53 -07:00
Danylo Piliaiev	eca4a6548d	i965: Disable dual source blending when shader doesn't support it on gen8+ Dual source blending behaviour is undefined when shader doesn't have second color output, dismissing fragment in such situation leads to a hang on gen8+ if depth test in enabled. Since blending cannot be gracefully fixed in such case and the result is undefined - blending is simply disabled. v2 (Kenneth Graunke): - Listen to BRW_NEW_FS_PROG_DATA in 3DSTATE_PS_BLEND - Also whack BLEND_STATE[] to keep the two in sync, since we're not sure exactly which copy of the redundant info the hardware will use. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107088 Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-10-30 12:59:53 -07:00
Kenneth Graunke	337a808062	i965: Respect GL_TEXTURE_SRGB_DECODE_EXT in GenerateMipmaps() Apparently, we're supposed to look at the texture object's built-in sampler object's sRGB decode setting in order to decide whether to decode/downsample/re-encode, or simply downsample as-is. Previously, I had always done the decoding/encoding. Fixes SKQP's Skia_Unit_Tests.SRGBMipMaps test. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-10-30 12:59:53 -07:00
Andrii Simiklit	e4e0fd5ffe	i965/batch: don't ignore the 'brw_new_batch' call for a 'new batch' If we restore the 'new batch' using 'intel_batchbuffer_reset_to_saved' function we must restore the default state of the batch using 'brw_new_batch' function because the 'intel_batchbuffer_flush' function will not do it for the 'new batch' again. At least the following fields of the batch 'state_base_address_emitted','aperture_space', 'state_used' should be restored to default values to avoid: 1. the aperture_space overflow 2. the missed STATE_BASE_ADDRESS commad in the batch 3. the memory overconsumption of the 'statebuffer' due to uncleared 'state_used' field. etc. v2: merge with new commits, changes was minimized, added the 'fixes' tag v3: added in to patch series Fixes: `3faf56ffbd` "intel: Add an interface for saving/restoring the batchbuffer state." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107626 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-10-30 12:09:17 -07:00
Andrii Simiklit	a9031bf9b5	i965/batch: avoid reverting batch buffer if saved state is an empty There's no point reverting to the last saved point if that save point is the empty batch, we will just repeat ourselves. CC: Chris Wilson <chris@chris-wilson.co.uk> Fixes: `3faf56ffbd` "intel: Add an interface for saving/restoring the batchbuffer state." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107626 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-10-30 12:09:09 -07:00
Eric Engestrom	ea738a91de	egl: add messages to a few assert() and turn a couple into unreachable() Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-30 18:10:59 +00:00
Eric Engestrom	d0d6ec549d	util: s/0/NULL/ for pointer Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-30 18:10:59 +00:00
Eric Engestrom	5c64847322	i965: add missing case to fix -Wswitch While at it, turn "unreachable" assert() into unreachable(). Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-30 18:10:59 +00:00
Eric Engestrom	2894e278cf	mesa: fix struct/class mismatch Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-30 18:10:59 +00:00
Eric Engestrom	6000895e2d	mesa: fix memcpy() and memset(0) of non-trivial structs Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-30 18:10:59 +00:00
Eric Engestrom	69eb6d58e8	nouveau: remove unused class member Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-10-30 18:10:59 +00:00
Eric Engestrom	6f9309d5d4	scons: drop unused HAVE_STDINT_H macro This was required back when MSVC didn't support C99 and was missing this header, but since MSVC 2013 (or maybe earlier?) this isn't it does and this code isn't doing anything anymore. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-30 18:10:59 +00:00
Eric Engestrom	a18d726621	aub_viewer: show vertex buffer pitch Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-30 18:10:59 +00:00
Eric Engestrom	0bbee28a3b	meson: add note about intel tools build options Fixes: `ea83a1d304` "intel: tools: import ImGui" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-10-30 18:10:59 +00:00
Eric Engestrom	4a266d01a7	vl: drop left-over variable Fixes: `6ccc435e7a` "pipe-loader: move dup(fd) within pipe_loader_drm_probe_fd" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-30 18:10:59 +00:00
Eric Anholt	68657d76b9	vc4: Fix unused variable warning. Fixes: `bb84fa146f` ("util: use C99 declaration in the for-loop hash_table_foreach() macro")	2018-10-30 10:46:52 -07:00
Eric Anholt	cc54e1acf9	v3d: Use nir_remove_unused_io_vars to handle binner shader output DCE We were doing this late after nir_lower_io, but we can just reuse the core code. By doing it at this stage, we won't even set up the VS attributes as inputs, reducing our VPM size.	2018-10-30 10:46:52 -07:00
Eric Anholt	c152c79d5e	v3d: Only add output slot tracking for the current varying slot. We always emit 4 slots per slot because things like color output and position processing in the epilogue will potentially look up more values than the variable declaration had. However, when we get a .location_frac != 0, we don't want to overwrite components of the following .driver_location.	2018-10-30 10:46:52 -07:00
Eric Anholt	17c8198952	v3d: Use nir_lower_io_to_scalar_early to DCE unused VS input components. This lets us trim unused trailing components in the vertex attributes, reducing the size of our VPM allocations.	2018-10-30 10:46:52 -07:00
Eric Anholt	fc85f7cfdc	v3d: Don't rely on sorting input vars for VPM read setup. For supporting scalar VPM i/o at the NIR level, we need to do a pass over the vars to figure out how big each attribute is after DCE. Once we've done that, we can just walk over c->vattr_sizes[] instead of bothering with vars.	2018-10-30 10:46:52 -07:00
Eric Anholt	cc78676030	v3d: Split out NIR input setup between FS and VPM. They don't share much code, and I'm about to rewrite the remaining shared code for the VPM case.	2018-10-30 10:46:52 -07:00
Eric Anholt	8265dfaa87	nir: Allow using nir_lower_io_to_scalar_early on VS input vars. This will be used on V3D to cut down the size of the VS inputs in the VPM (memory area for sharing data between shader stages). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-30 10:46:52 -07:00
Jason Ekstrand	f48b742289	anv: Bump the advertised patch version to 90 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-30 11:43:43 -05:00
Emil Velikov	29283921b7	m4: add Werror when checking for compiler flags Seemingly that at some point clang started accepting _any_ flags, whereas previously it would error out. These days, you can give it -Whamsandwich and it will succeed, while at the same time throwing an annoying warning. Add -Werror so that everything gets flagged and set accordingly. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108082 Cc: Vinson Lee <vlee@freedesktop.org> Repored-by: Vinson Lee <vlee@freedesktop.org> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-30 16:41:05 +00:00
Dylan Baker	a8bed38b54	docs/calendar: Add 18.3 plan and expand 18.2 Emil will be helping out with 18.3, while Juan finalises 18.2 v2: [Emil] add Emil for 18.3, fix typos CC: Emil Velikov <emil.velikov@collabora.com> CC: Juan A. Romero Suarez <jasuarez@igalia.com> Cc: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>	2018-10-30 16:35:58 +00:00
Emil Velikov	c210d0c3b7	vulkan/wsi: use the drmGetDevice2() API On older kernels, the drmGetDevice() call will wake up all the GPUs on the system, while fetching the PCI revision. Use the 2 version of the API and pass flags == 0, so we don't fetch the device PCI revision, since we don't need that information. Fixes: `baa38c144f` ("vulkan/wsi: Use VK_EXT_pci_bus_info for DRM fd matching") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-30 16:35:50 +00:00
Jason Ekstrand	a45b6fb452	spirv: Pass SSA values through functions Previously, we would create temporary variables and fill them out. Instead, we create as many function parameters as we need and pass them through as SSA defs. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-10-30 11:22:44 -05:00
Mauro Rossi	bfe0e32913	android: i965/tiled_memcpy: fix build for x86 generic target x86 32 bit generic target does not enable ARCH_X86_HAVE_SSE4_1 for this reason all Android library modules using SSE4_1 in mesa are built conditionally to ARCH_X86_HAVE_SSE4_1 The same approach is now applied to libmesa_intel_tiled_memcpy_sse41 in order to avoid the following building errors: external/mesa/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c:574:15: error: initializing '__m128i' (vector of 2 'long long' values) with an expression of incompatible type 'int' __m128i val = _mm_stream_load_si128((__m128i )src); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ external/mesa/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c:578:15: error: initializing '__m128i' (vector of 2 'long long' values) with an expression of incompatible type 'int' __m128i val0 = _mm_stream_load_si128(((__m128i )src) + 0); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ external/mesa/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c:579:15: error: initializing '__m128i' (vector of 2 'long long' values) with an expression of incompatible type 'int' __m128i val1 = _mm_stream_load_si128(((__m128i )src) + 1); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ external/mesa/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c:580:15: error: initializing '__m128i' (vector of 2 'long long' values) with an expression of incompatible type 'int' __m128i val2 = _mm_stream_load_si128(((__m128i )src) + 2); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ external/mesa/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c:581:15: error: initializing '__m128i' (vector of 2 'long long' values) with an expression of incompatible type 'int' __m128i val3 = _mm_stream_load_si128(((__m128i *)src) + 3); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 5 errors generated. Fixes: `11b1afdc92` ("i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-10-30 14:45:16 +02:00
Toni Lönnberg	50e952840f	intel: tools: Add handling for video pipe Preliminary work for adding handling of different pipes to gen_decoder. We need to be able to distinguish between different pipes in order to decode the packets correctly due to opcode re-use. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-30 12:43:00 +00:00
Toni Lönnberg	d5a938c58d	intel/decoder: Use 'DWord Length' and 'bias' fields for packet length. Use the 'DWord Length' and 'bias' fields from the instruction definition to parse the packet length from the command stream when possible. The hardcoded mechanism is used whenever an instruction doesn't have this field. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-30 12:43:00 +00:00
Marek Olšák	a09cbaffbf	mesa: expose EXT_texture_compression_s3tc on GLES The spec was modified to support GLES. Tested-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-10-30 13:31:00 +01:00
Michał Janiszewski	2734baa9e2	mesa: Add missing include guards Signed-off-by: Michał Janiszewski <janisozaur+signed@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-10-30 06:19:10 -06:00
Michał Janiszewski	ec994ca0fc	glx: Add missing include guards Signed-off-by: Michał Janiszewski <janisozaur+signed@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-10-30 06:19:10 -06:00
Michał Janiszewski	8ebd7039c4	svga: Add missing include guards Signed-off-by: Michał Janiszewski <janisozaur+signed@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-10-30 06:19:09 -06:00
Michał Janiszewski	0654450911	glsl: Add missing include guards Signed-off-by: Michał Janiszewski <janisozaur+signed@gmail.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-10-30 06:19:09 -06:00
Eric Engestrom	fddf384d1d	intel/batch-decoder: remove never-used function This function was there when the file was introduced in commit `38f10d5a03` "intel: tools: add aubinator viewer", but was never actually used. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-30 10:59:43 +00:00
Eric Engestrom	e9fb81375a	st/dri: remove leftover local variable Left over from the cleanup in `6ccc435e7a` "pipe-loader: move dup(fd) within pipe_loader_drm_probe_fd" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-30 10:20:58 +00:00
Vadym Shovkoplias	7d66eddbbd	glsl/linker: Fix out variables linking during single stage Since out variables are copied from shader objects instruction streams to linked shader instruction steam it should be cloned at first to keep source instruction steam unaltered. Fixes: `966a797e43` ("glsl/linker: Link all out vars from a shader objects on a single stage") Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105731	2018-10-30 10:19:17 +11:00
Marek Olšák	8676af12c8	ac: fix ac_build_fdiv for f64 trivial Fixes: `a5f35aa742`	2018-10-29 17:24:21 -04:00
Brian Paul	9007c0ed26	nir: fix yet another MSVC build break Trivial.	2018-10-29 11:15:12 -06:00
Eric Engestrom	f3a5757eba	vulkan/wsi: simplify meson file tracking Meson already automatically tracks included headers, so there's no need to add them everywhere; cleans up the code a bit. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-10-29 16:39:47 +00:00
Eric Engestrom	1df0c1e8fb	clover: add missing meson build dependency Fixes: `42ea0631f1` "meson: build clover" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-10-29 16:39:42 +00:00
Eric Engestrom	98e7c3e7a7	svga: add missing meson build dependency Fixes: `a537231b22` "meson: build svga driver on linux" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-10-29 16:39:38 +00:00
Eric Engestrom	912cd0ce3b	radv: add missing meson build dependency Fixes: `9d40ec2cf6` "radv: Add support for VK_KHR_driver_properties." Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-10-29 16:39:34 +00:00
Eric Engestrom	2be1f9ceba	anv: add missing meson build dependency Fixes: `e4538b93f5` "anv: Implement VK_KHR_driver_properties" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-10-29 16:39:07 +00:00
Samuel Pitoiset	b4eb029062	radv: implement VK_EXT_transform_feedback This implementation should work and potential bugs can be fixed during the release candidates window anyway. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-29 17:10:58 +01:00
Samuel Pitoiset	f8d0337299	radv: add multiple streams support for the GS copy shader Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-29 17:09:08 +01:00
Samuel Pitoiset	6c21645046	radv: emit stream outputs for vertex and tessellation stages Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-29 17:09:08 +01:00
Samuel Pitoiset	19f1b49236	radv: declare streamout SGPRs Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-29 17:09:08 +01:00
Samuel Pitoiset	f4fa8de794	radv: gather stream output info Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-29 17:09:08 +01:00
Samuel Pitoiset	fe551ec122	radv: allow to emit a vertex to a specified stream This is required for GS multiple streams support. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-29 17:09:08 +01:00
Samuel Pitoiset	a59f1b06ef	radv: allow to use up to 4 GSVS ring buffers For all streams. We basically just need to update the base address and compute a stride for every stream. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-29 17:09:08 +01:00
Samuel Pitoiset	98c09c3fcd	radv: adjust the number of output components per stream Same as the previous patch, except that is only the number of components. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-29 17:09:08 +01:00
Samuel Pitoiset	4649471a9e	radv: adjust the GSVS ring sizes based on the number of components For multiple streams support we have to set the different ring buffer sizes correctly. This relies on the number of output components per stream. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-29 17:09:08 +01:00
Samuel Pitoiset	8e428e24a8	radv: gather which GS stream is used for every outputs To only emit outputs for the given stream. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-29 17:09:08 +01:00
Samuel Pitoiset	dd996d1885	radv: gather the number of output components per stream This will be also used for splitting the GS->VS ring buffer. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-29 17:09:08 +01:00
Samuel Pitoiset	87e6866b04	radv: gather the number of streams used by geometry shaders This will be used for splitting the GS->VS ring buffer. The stream ID is always 0 for now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-29 17:09:08 +01:00
Jason Ekstrand	19064b8c3a	nir: Add a pass for gathering transform feedback info This is different from the GL_ARB_spirv pass because it generates a much simpler data structure that isn't tied to OpenGL and mtypes.h. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-10-29 17:09:08 +01:00
Jason Ekstrand	e8a5fa054d	vulkan: Update the XML and headers to 1.1.90 This doesn't include any new features but it does include an XML and header typo fix for modifiers. Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-10-29 10:17:19 -05:00
Samuel Pitoiset	9e56ffb0b4	radv: remove wrong comment in calculate_gs_ring_sizes() about streams The computation seems correct compared to RadeonSI. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-29 12:33:58 +01:00
Rob Clark	a61952e737	freedreno: don't flush when new and old pfb is identical In the 'inorder' case (ie. FD_MESA_DEBUG=inorder, or old kernel), if the u_blitter clear path is used (a3xx, a4xx, and some fallback cases on newer gens), util_blitter_restore_fb_state() will set_framebuffer_state() to something that is identical to the current fb state, which triggers an unnecessary flush, and then eventually an assert: (gdb) bt #0 0x0000007fbf24a078 in kill () from /lib64/libc.so.6 #1 0x0000007fbe061278 in _debug_assert_fail (expr=0x7fbe93a820 "!batch->flushed", file=0x7fbe93a628 "../src/gallium/drivers/freedreno/freedreno_batch.c", line=491, function=0x7fbe93a990 <__func__.17380> "fd_batch_check_size") at ../src/gallium/auxiliary/util/u_debug.c:322 #2 0x0000007fbe1ccb8c in fd_batch_check_size (batch=0x55556d5a70) at ../src/gallium/drivers/freedreno/freedreno_batch.c:491 #3 0x0000007fbe1d0e08 in fd_clear (pctx=0x55555c61e0, buffers=5, color=0x55556e388c, depth=1, stencil=0) at ../src/gallium/drivers/freedreno/freedreno_draw.c:463 #4 0x0000007fbe57afa4 in st_Clear (ctx=0x55556e17b0, mask=18) at ../src/mesa/state_tracker/st_cb_clear.c:452 The assert was introduced in `4b847b38ae`, so from a functionality standpoint this patch fixes that commit. But it should also avoid an unnecessary flush in the 'inorder' case, fixing a performance bug. Fixes: `4b847b38ae` freedreno: make fd_batch a one-shot thing Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-28 14:03:38 -04:00
Rob Clark	32dd75b927	freedreno: dependency tracking for z/s depends on ZSA state ZSA state can change whether depth or stencil is enabled This plus previous patch fix stk, and various things w/ FD_MESA_DEBUG=inorder Fixes: `ec717fc629` freedreno: reduce resource dependency tracking overhead Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-28 14:03:38 -04:00
Rob Clark	05e868925c	freedreno: mark all state dirty after switching batch The problem isn't directly with `ec717fc629` but rather that commit exposes the problem. When we switch batch we cannot assume previous state is clean so we should mark all state dirty. Fixes: `ec717fc629` freedreno: reduce resource dependency tracking overhead Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-28 14:03:38 -04:00
Jason Ekstrand	1bd4f8fefc	anv: Use absolute timeouts in wait_for_bo_fences We were previously using relative timeouts and decrementing the user-provided timeout as we waited. Instead, this commit refactors things to use absolute timeouts throughout. This should fix a subtle bug in the waitAll case where we aren't decrementing the timeout after a successful GPU wait. Since pthread_cond_timedwait already takes an absolute timeout, it's also significantly simpler. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-27 16:18:33 -05:00
Jason Ekstrand	cbd4468695	anv: Flag semaphore BOs as external It probably doesn't actually break anything but it does cause some assertions in debug builds. Fixes: `7a89a0d9ed` "anv: Use separate MOCS settings for external BOs" Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-27 00:02:32 -05:00
Jason Ekstrand	663a113700	anv: Improve the asserts in anv_buffer_get_range Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-27 00:02:32 -05:00
Rob Clark	c41772d17a	freedreno/a6xx: inline draw_impl() Now that it is just called once per draw (instead of once for binning and once for draw), let's just inline it. If nothing else, it makes perf-annotate easier to look at. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-26 18:10:00 -04:00
Rob Clark	604b5f1dca	freedreno/a6xx: small cleanup Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-26 18:10:00 -04:00
Rob Clark	2a74d9ae8d	freedreno/a6xx: move where we handle dirty vbo state Historically this wasn't in fdN_emit_state(), because prior to addition of blitter in a5xx, fdN_emit_state() was also used in the clear path. These days that is only true for a2xx (a3xx and a4xx use u_blitter). So the reason for it not to be in fd6_emit_state() no longer exists. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-26 18:10:00 -04:00
Rob Clark	ddb7fadaf8	freedreno: avoid no-op flushes by re-using last-fence Noticed that with webgl (in chromium, at least) we end up generating a lot of no-op submits just to get a fence. Tracking the last fence and returning that if there is no rendering since last flush avoids this. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-26 18:10:00 -04:00
Kristian H. Kristensen	01194cd582	freedreno/a6xx: Move stencil/depth/alpha state to IB Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2018-10-26 18:10:00 -04:00
Kristian H. Kristensen	a664dc2d59	freedreno/a6xx: Move stencil mask emit to FD_DIRTY_ZSA group Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2018-10-26 18:10:00 -04:00
Kristian H. Kristensen	3073926512	freedreno/a6xx: Rename FD6_GROUP_ZSA ro FD6_GROUP_LRZ Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2018-10-26 18:10:00 -04:00
Kristian H. Kristensen	edc0f1b10f	freedreno/a6xx: Move rasterizer state to state object Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2018-10-26 18:10:00 -04:00
Kristian H. Kristensen	3264eb691a	freedreno/a6xx: Fix set_blit_scissor helper The scissor maxx/maxy are non-inclusive, so don't subtract one from framebuffer width and height. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2018-10-26 18:10:00 -04:00
Kristian H. Kristensen	4222fe8af2	freedreno/a2xx: Squash a compiler warning We get a warning here for assigning a const char * pointer to char *swizzle in struct ir2_src_register. The constructor strdups a 4 byte string here, so just memcpy to that instead. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2018-10-26 18:10:00 -04:00
Kristian H. Kristensen	4fd6265f42	freedreno/a6xx: Use fd6_emit_ib from a6xx Move it to a header and use it where possible to avoid vfunc call. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2018-10-26 18:10:00 -04:00
Rob Clark	f3cc0d2747	freedreno: import libdrm_freedreno + redesign submit In the pursuit of lowering driver overhead, it became clear that some amount of redesign of how libdrm_freedreno constructs the submit ioctl would be needed. In particular, as the gallium driver is starting to make heavier use of CP_SET_DRAW_STATE state groups/objects, the over- head of tracking cmd buffers and relocs becomes too much. And for "streaming" state, which isn't ever reused (like uniform uploads) the overhead of allocating/freeing ringbuffer[1] objects is too high. This redesign makes two main changes: 1) Introduces a fd_submit object for tracking bos and cmds table for the submit ioctl, making ringbuffer objects more light- weight. This was previously done in the ringbuffer. But we have many ringbuffer instances involved in a submit (gmem + draw + potentially 1000's of state-group rbs), and only need a single bos and cmds table. (Reloc table is still per-rb) The submit is also a convenient place for a slab allocator for ringbuffer objects. Other options would have required locking because, while we can guarantee allocations will only happen on a single thread, free's could happen either on the application thread or the flush_queue thread. With the slab allocator in the submit object, any frees that happen on the flush_queue thread happen after we know that the application thread is done with the submit. 2) Introduce a new "softpin" msm_ringbuffer_sp implementation that does not use relocs and only has cmds table entries for IB1 (ie. the cmdstream buffers that kernel needs to CP_INDIRECT_BUFFER to from the RB). To do this properly will require some updates on the kernel side, so whether you get the softpin or legacy submit/ringbuffer implementation at runtime depends on your kernel version. To make all these changes in libdrm would basically require adding a libdrm_freedreno2, so this is a good point to just pull the libdrm code into mesa. Plus it allows for using mesa's hashtable, slab allocator, etc. And it lets us have asserts enabled for debug mesa buids but omitted for release builds. And it makes life easier if further API changes become necessary. At this point I haven't tried to pull in the kgsl backend. Although I left the level of vfunc indirection which would make it possible to have other backends. (And this was convenient to keep to allow for the "softpin" ringbuffer to coexist.) NOTE: if bisecting a build error takes you here, try a clean build. There are a bunch of ways things can go wrong if you still have libdrm_freedreno cflags. [1] "ringbuffer" is probably a bad name, the only level of cmdstream buffer that is actually a ring is RB managed by kernel. User- space cmdstream is all IB1/IB2 and state-groups. Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-26 18:10:00 -04:00
Jason Ekstrand	aa02d7e878	Revert "anv/skylake: disable ForceThreadDispatchEnable" This reverts commit `0fa9e6d7b3`. The real issue appears to have been that HiZ ops don't like having WM thread dispatch force-enabled. The previous commit fixes that problem so we can go back to using the ForceThreadDispatchEnable bit even on SKL+. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-10-26 16:39:47 -05:00
Jason Ekstrand	b6b2b27809	blorp: Emit a dummy 3DSTATE_WM prior to 3DSTATE_WM_HZ_OP Cc: mesa-stable@lists.freedesktop.org Suggested-by: Francisco Jerez <currojerez@riseup.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-10-26 16:39:35 -05:00
Axel Davy	2318ca68bb	st/nine: Handle window resize when a presentation buffer is used Usually when a window is resized, the app calls d3d to resize the back buffer to the window size. In some cases, it is not done, and it expects the output resizes to the window size, even if the back buffer size is unchanged. This patch introduces the behaviour when a presentation buffer is used. ID3DPresent_GetWindowInfo is a function available with D3DPresent v1.0, and thus we don't need to check if the function is available. The function had been introduced to implement this very feature. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-10-26 22:16:16 +02:00
Axel Davy	e50d374b61	d3dadapter: Fix wrong naming in header file GetWindowInfo used to be GetWindowSize before gallium nine was merged. A left-over remained... Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-10-26 22:16:16 +02:00
Axel Davy	3d975e98e4	st/nine: Reduce MaxSimultaneousTextures to 8 Windows drivers don't set this flag (which affects ff) to more than 8. Do the same in case some games check for 8. v2: Remove any dependence on MaxSimultaneousTextures. For non-ff the number of textures is 16 when the device is able of vs/ps3. Add this requirement of 16 textures to the driver requirements. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-10-26 22:16:16 +02:00
Axel Davy	739c700950	st/nine: Enable shadow mapping for ps 1.X We didn't implement shadow textures for ps 1.X, assuming the case couldn't happen... Well it does. Fixes: https://github.com/iXit/Mesa-3D/issues/261 Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-10-26 22:16:16 +02:00
Axel Davy	847861aab4	st/nine: Do not set unused states for stateblocks A lot of these states are used only for the context, and are unused for stateblocks (which just uses the changed.* fields instead for a lot of them). Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-10-26 22:16:16 +02:00
Axel Davy	6f373b9b74	st/nine: Fix aliasing states for stateblocks If NINE_STATE_FF_MATERIAL is set, the stateblock will upload its recorded materials matrix. If NINE_STATE_FF_LIGHTING is set, the lighting set is uploaded. These flags could be set by a NineDevice9_SetTransform call or by setting some states related to ff, but that shouldn't trigger these stateblock behaviours. We don't need to follow the context states dirtied by render states. NINE_STATE_FF_VSTRANSF is exactly the state controlling stateblock updates of transformation matrices, NINE_STATE_FF is too broad. These two changes avoid setting the two mentionned states when we shouldn't. Fixes: https://github.com/iXit/Mesa-3D/issues/320 Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-10-26 22:16:16 +02:00
Axel Davy	454201b452	st/nine: Never update device changed.* fields The device state changed.* field are never used. These fields are used only for stateblocks. Avoid setting them at all for clarity. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-10-26 22:16:16 +02:00
Axel Davy	2594b2efdc	st/nine: Capture also default matrices for D3DSBT_ALL We avoid allocating space for never unused matrices. However we must do as if we had captured them. Thus when a D3DSBT_ALL stateblock apply has fewer matrices than device state, allocate the default matrices for the stateblock before applying. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-10-26 22:16:16 +02:00
Axel Davy	bbeddb801e	st/nine: Mark transform matrices dirty for D3DSBT_ALL D3DSBT_ALL stateblocks capture the transform matrices. Fixes some d3d test programs not displaying properly. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-10-26 22:16:16 +02:00
Axel Davy	a4e9bbb8f8	st/nine: Don't update unused world matrices While to the application we have to track accurately all 256 world matrices (including in stateblocks), hw vertex processing enables to set a limit to the number of world matrices the hardware can access to in the advertised caps, which is 8 for nine. Thus don't bother in the stateblock code to send the updated values for the unreachable matrices. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-10-26 22:16:16 +02:00
Axel Davy	2e51c4c7cc	st/nine: Remove two unused states. NINE_STATE_MATERIAL was used incorrectly at one location. Replace it with the correct state. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-10-26 22:16:16 +02:00
Axel Davy	cb8ea21e1c	st/nine: Remove commented nine_context_apply_stateblock At some point the project was to adapt the commented version to csmt. The csmt rework enabled to fix some state aliasing issues between stateblocks and internal state updates. The commented version needs a lot of work to work with that. Just drop it. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-10-26 22:16:16 +02:00
Brian Paul	7e64e39f8b	nir: Fix array initializer Empty initializer is not standard C. This fixes MSVC build. Trivial.	2018-10-26 12:35:48 -06:00
Jason Ekstrand	07eb8e7466	anv: Return VK_ERROR_DEVICE_LOST from anv_device_set_lost This lets us get rid of a bunch of duplicated error messages. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-26 13:27:21 -05:00
Jason Ekstrand	ade22ae1ac	anv/util: Split a vk_errorv helper out of vk_errorf Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-26 13:27:21 -05:00
Brian Paul	d6be0b5556	scons/svga: remove opt from the list of valid build types This reverts commit `a5fd54f8bf`. The whole point was to add a way to pass -DVMX86_STATS to the build, but we can do that with a command line argument when we invoke scons. Reviewed-by: José Fonseca <jfonseca@vmware.com>	2018-10-26 12:09:00 -06:00
Nanley Chery	5bcf479524	intel/blorp: Define the clear value bounds for HiZ clears Follow the restriction of making sure the clear value is between the min and max values defined in CC_VIEWPORT. Avoids a simulator warning for some piglit tests, one of them being: ./bin/depthstencil-render-miplevels 146 d=z32f_s8 Jason found this to fix incorrect clearing on SKL. Fixes: `09948151ab` ("intel/blorp: Add the BDW+ optimized HZ_OP sequence to BLORP") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-26 10:34:07 -07:00
Eric Engestrom	285ebc84c7	radv: remove duplicate brackets in version string MESA_GIT_SHA1 resolves to either an empty "" string if not build from git, or " (git-DEADBEEF)" if it is. No need to wrap it in additional "()". Fixes: `9d40ec2cf6` "radv: Add support for VK_KHR_driver_properties." Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-26 18:33:11 +01:00
Eric Engestrom	738f0f789b	vulkan: drop always-true param Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-26 18:33:11 +01:00
Boyuan Zhang	f4126cfaab	radeon/vcn: use util function to get h264 profile idc Use utility function for converting h264 pipe video profile to profile idc, instead of using array. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Christian König <christian.koenig at amd.com>	2018-10-26 13:23:06 -04:00
Boyuan Zhang	55cf565698	radeon/vce: use util function to get h264 profile idc Use utility function for converting h264 pipe video profile to profile idc, instead of using array. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Christian König <christian.koenig at amd.com>	2018-10-26 13:23:06 -04:00
Boyuan Zhang	b15d0200a9	vl: get h264 profile idc Adding a function for converting h264 pipe video profile to profile idc Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Christian König <christian.koenig at amd.com>	2018-10-26 13:23:06 -04:00
Jason Ekstrand	5cdeefe057	intel/nir: Use the OPT macro for more passes Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	18fb2c5d92	spirv: Initialize subgroup destinations with the destination type Instead of initializing them manually, just use the type that we already have sitting there. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	8fa70cfcfd	spirv: Use the right bit-size for spec constant ops Previously, we would always pull the bit size from the destination which is wrong for opcodes like nir_ilt where the sources are variable-sized but the destination is a fixed size. We were getting lucky before because nir_op_ilt returns a 32-bit value and basically everyone who uses spec constants uses 32-bit ones. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	1d2ed694c1	nir/prog: Use nir_bany in kill handling We have a helper that does exactly what the bany_inequal was doing. It emits the same code but is a bit higher level and is designed to operate on a bvec4. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	2fe3031440	glsl/nir: Use i2b instead of ine for fixing UBO/SSBO Booleans They do the same thing in the end but i2b is a bit simpler. Also, let's clean up the mess of code for SSBO handling with one line of builder. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	5bfce5fcc2	nir/system_values: Use the bit size from the load_deref This isn't a great solution for bit-sizes but we don't have a particularly convenient way to get a bit size from the system value enum and this keeps the lowering pass from changing it. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	a3b4cb3458	nir/opt_if: Rework condition propagation Instead of doing our own constant folding, we just emit instructions and let constant folding happen. This is substantially simpler and lets us use the nir_imm_bool helper instead of dealing with the const_value's ourselves. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	4cd8a58595	nir/search: Use the nir_imm_* helpers from nir_builder This requires that we rework the interface a bit to use nir_builder but that's a nice little modernization anyway. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	6e32115bd6	nir/builder: Handle 16-bit floats in nir_imm_floatN_t Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	ff45649bc2	nir/builder: Add a nir_imm_true/false helpers Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	249e32ab17	nir/constant_folding: Use nir_src_as_bool for discard_if Missed one while converting to the nir_src_as_* helpers. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	6de1869e86	nir/constant_folding: Add an unreachable to a switch Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	28bb6abd1d	nir/validate: Print when the validation failed Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-26 11:45:29 -05:00
Jason Ekstrand	292ebdbf98	anv: Handle the device loss abort in anv_device_set_lost Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-26 08:40:23 -05:00
Jason Ekstrand	cd0960b430	anv: Add helpers for setting/checking device lost Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-26 08:40:21 -05:00
Jason Ekstrand	319ff6f1ad	anv: Provide a error message with a DEVICE_LOST Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-26 08:40:10 -05:00
Alex Smith	3bd239f71d	anv: Fix sanitization of stencil state when the depth test is disabled When depth testing is disabled, we shouldn't pay attention to the specified depthCompareOp, and just treat it as always passing. Before, if the depth test is disabled, but depthCompareOp is VK_COMPARE_OP_NEVER (e.g. from the app having zero-initialized the structure), then sanitize_stencil_face() would have incorrectly changed passOp to VK_STENCIL_OP_KEEP. v2: Roll the depthTestEnable check into the ds_aspect check below since they now both do the same thing. Fixes: `028e1137e6` "anv/pipeline: Be smarter about depth/stencil state" Signed-off-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-26 10:25:40 +01:00
Samuel Pitoiset	79bbdf8e45	radv: implement image to image operations for R32G32B32 This should address the remaining failures in Batman Arkhman City. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107765 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-26 10:50:08 +02:00
Samuel Pitoiset	6198245775	radv: fix a comment in radv_meta_buffer_to_image_cs_r32g32b32() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-26 10:50:05 +02:00
Samuel Pitoiset	02ccef7874	radv: add get_image_stride_for_r32g32b32() helper For the special R32G32B32 paths. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-26 10:50:03 +02:00
Samuel Pitoiset	468c33e2f7	radv: add create_bview_for_r32g32b32() helper For the special R32G32B32 paths. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-26 10:50:00 +02:00
Samuel Pitoiset	e60e3e1b3f	radv: add create_buffer_from_image() helper For the special R32G32B32 paths. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-26 10:49:58 +02:00
Sagar Ghuge	416abe809a	intel/compiler: Print message descriptor as immediate source While disassembling send(c) instruction print message descriptor as immediate source operand along with message descriptor. This allows assembler to read immediate source operand and set bits accordingly. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-10-26 06:42:14 +02:00
Sagar Ghuge	d15fa24860	intel/compiler: Print hex representation along with floating point value While encoding the immediate floating point values in instruction we use values upto precision 9, but while disassembling, we print precision to 6 places, which round up the value and gives wrong interpretation for encoded immediate constant. To avoid misinterpretation of encoded immediate values in instruction and disassembled output, print hex representation along with floating point value which can be used by assembler in future. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-10-26 06:41:08 +02:00
David McFarland	07a00a8729	util: Change remaining uint32 cache ids to sha1 After discussion with Timothy Arceri. disk_cache_get_function_identifier was using only the first byte of the sha1 build-id. Replace disk_cache_get_function_identifier with implementation from radv_get_build_id. Instead of writing a uint32_t it now writes to a mesa_sha1. All drivers using disk_cache_get_function_identifier are updated accordingly. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Fixes: `83ea8dd99b` ("util: add disk_cache_get_function_identifier()")	2018-10-26 14:49:22 +11:00
Hyunjun Ko	3d198926a4	freedreno: use fd_bc_alloc_batch instead of fd_batch_create. Following the commit `2385d7b066` and `8e798e28f7`, for resource dependancy tracking. Fixes: dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo with FD_MESA_DEBUG=inorder Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-25 18:46:19 -04:00
Hyunjun Ko	703271c22a	freedreno/ir3: take reg->num out of union in ir3_register To avoid wrong result when identifying the type of register. Ie. If the reg is an array, it might be identified as address or predicate register. Fixes: dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.6 Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-25 18:45:45 -04:00
Rob Clark	3c402d0dc2	freedreno/a6xx: disable unused groups Don't leave vsconst/fsconst group enabled if we switch to shader with no uniforms. Fixes: `abcdf5627a` freedreno/a6xx: move const emit to state group Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-25 18:38:53 -04:00
Rob Clark	d53074d3f1	freedreno: add useful assert Would have been useful to catch the problem fixed in `8e798e28f7` Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-25 18:38:53 -04:00
Alok Hota	edf38019a0	swr/rast: ignore CreateElementUnorderedAtomicMemCpy This function's API changed between LLVM 5 and 6. Compile errors occur when building with LLVM 6+ if LLVM 5 was used for a dist tarball CC: <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107865 Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-25 11:05:59 -05:00
Alok Hota	8c872ac2e3	swr/rast: fix intrinsic/function for LLVM 7 compatibility Converted from x86 VFMADDPS intrinsic to generic LLVM intrinsic, and removed createInstructionSimplifierPass, which were both removed in LLVM 7.0.0 These changes combine patches we received from the community and our own internal patches Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com> Tested-by: Chuck Atkins <chuck.atkins@kitware.com>	2018-10-25 10:32:27 -05:00
Rhys Perry	26ed0f0234	nvc0: increase NOUVEAU_TRANSFER_PUSHBUF_THRESHOLD to 1024 on Kepler+ Gives a +3.89% to +5.27% FPS improvement with Hitman and +2.73% to +2.82% FPS improvement with Dirt Rally on my GTX 1060. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-10-25 15:25:10 +01:00
Bas Nieuwenhuizen	d41c3cc013	radv: Emit enqueued pipeline barriers on event write. Since the CPU can read them we need to execute any GPU->CPU flushes before the event is written. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108524 Fixes: `f4e499ec79` "radv: add initial non-conformant radv vulkan driver" Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-10-25 16:17:54 +02:00
Bas Nieuwenhuizen	9d40ec2cf6	radv: Add support for VK_KHR_driver_properties. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-25 16:14:43 +02:00
Eric Engestrom	e27902a261	util: use C99 declaration in the for-loop set_foreach() macro Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-25 12:43:18 +01:00
Eric Engestrom	bb84fa146f	util: use C99 declaration in the for-loop hash_table_foreach() macro Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-25 12:43:18 +01:00
Dylan Baker	3d261cf77b	gen: Add AMD_gpu_shader_int64.xml to tarball CC: Ian Romanick <ian.d.romanick@intel.com> CC: Marek Olšák <marek.olsak@amd.com> Fixes: `b3c17330e6` ("mesa: expose AMD_gpu_shader_int64") Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>	2018-10-24 11:29:30 -07:00
Dylan Baker	6d5fa65c74	gen: Add EXT_vertex_attrib_64bit.xml to dependency lists Which is also required to put it in the tarball, a requirement for building with meson from the tarball. CC: Ian Romanick <ian.d.romanick@intel.com> CC: Marek Olšák <marek.olsak@amd.com> Fixes: `263c962cfd` ("mesa: expose EXT_vertex_attrib_64bit") Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>	2018-10-24 11:29:29 -07:00
Eric Engestrom	edc06dd533	anv: move variable to proper scope and mark as MAYBE_UNUSED Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-24 18:16:20 +01:00
Eric Engestrom	ed5d65a6a1	anv: use snprintf() instead of memset()+strcpy() snprintf() guarantees that it will not write more chars than allowed, and that the string will be null-terminated, without the need to fill the whole thing with zeroes to begin with. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-24 18:15:56 +01:00
Eric Engestrom	33d757096d	anv: drop unused includes Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-24 18:15:05 +01:00
Dylan Baker	c4de8ba036	autotools: include intel_tiled_memcopy.c There are two problems with the fixed patch. First, it fails to create a dependency on the sourced .c file, so changes to intel_tiled_memcpy.c won't trigger a rebuild. It also doesn't get included in the dist tarball. Fixes: `11b1afdc92` ("i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear") Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>	2018-10-24 09:22:15 -07:00
Dylan Baker	43b0d5fa04	meson: fix formatting and add extra_files to i965 extra_files is just a nice way to to tell certain IDEs (and those reading the file) that this file is also a dependency. Meson will use the .d file generated by the compiler to figure out what the target actually depends on. Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>	2018-10-24 09:22:13 -07:00
Eduardo Lima Mitev	b0c427043b	ir3_compiler/nir: fix imageSize() for buffer-backed images GL_EXT_texture_buffer introduced texture buffers, which can be used in shaders through a new type imageBuffer. Because how image access is implemented in freedreno, calling imageSize on an imageBuffer returns the size in bytes instead of texels, which is incorrect. This patch adds a division of imageSize result by the bytes-per-pixel of the image format, when image is buffer-backed. Fixes all tests under dEQP-GLES31.functional.image_load_store.buffer.image_size.* v2: Pre-compute and submit the log2 of the image format's bpp as shader constant instead of emitting the LOG2 instruction in code. (Rob Clark) v3: Use ffs (find-first-bit) helper for computing log2 (Ilia Mirkin) Reviewed-by: Rob Clark <robdclark@gmail.com>	2018-10-24 18:18:35 +02:00
Jose Fonseca	d9a04196d9	nir: Fix array initializer. Empty initializer is not standard C. This fixes MSVC build. Trivial.	2018-10-24 11:37:09 +01:00
Liviu Prodea	d99fda17c8	scons: Put to rest zombie texture_float build option. I found a remnant of texture_float build option that wasn't removed in commit `66673bef94` This patch removes it. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2018-10-24 11:10:17 +01:00
Alex Smith	6c56c1fbd4	anv: Allow presenting via a different GPU anv_GetPhysicalDeviceSurfaceSupportKHR will already return success for this, but anv_GetPhysicalDevice{Xcb,Xlib}PresentationSupportKHR do not. Apps which check for presentation support via the latter (all Feral Vulkan games at least) will therefore fail. This allows me to render on an Intel GPU and present to a display connected to an AMD card (tested HD 530 + Vega 64). v2: Rebase on current master. Signed-off-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-24 09:40:02 +01:00
Juan A. Suarez Romero	3112da346b	nir: fix nir_copy_propagation test Use nir_src_comp_as_uint() to read the proper second component, as nir_src_as_uint() returns the first one. v2: Use nir_src_comp_as_uint() [Jason] Fixes: `16870de8a0` ("nir: Use nir_src_is_const and nir_src_as_* in core code") Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108532 Tested-by: Michel Dänzer <michel.daenzer@amd.com> Tested-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-24 09:13:24 +02:00
Timothy Arceri	0ff1ccca25	radv: call nir_link_xfb_varyings() Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-10-24 08:21:29 +11:00
Timothy Arceri	c769ed10de	radv: move nir_lower_io_to_scalar_early() to radv_link_shaders() nir_lower_io_to_scalar_early() is really part of the link time optimisations. Moving it here allows the code to be simplified and also keeps the code easy to follow in the next patch. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-10-24 08:21:29 +11:00
Samuel Pitoiset	7c694cbfa4	nir: add linking helper nir_link_xfb_varyings() The linking opts shouldn't try removing or compacting XFB varyings in the consumer. To avoid this we copy the always_active_io flag from the producer. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-24 08:21:29 +11:00
Sagar Ghuge	0a7664fe8c	intel/compiler: Change src1 reg type to unsigned doubleword To have uniform behavior while disassembling send(c) instruction use register type of unsigned doubleword for src1 when message descriptor is immediate value. Bspec does not specifiy anything for src1 immediate default type. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>	2018-10-23 12:44:24 -07:00
Eduardo Lima Mitev	22ddd4988e	mesa/glformats: Remove redundant helper _mesa_base_format_component_count There exists _mesa_components_in_format() which already includes all cases handled in _mesa_base_format_component_count(). Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-10-23 21:29:15 +02:00
Jason Ekstrand	ecb7775e1c	nir/algebraic: Fix a typo in the bit size validation code The conon_bit_class and canon_var_class variables got switched. Fixes: `932c650e0b` "nir/algebraic: Loosen a restriction on variables" Reported-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-10-23 12:22:29 -05:00
Leo Liu	b75fb8ee36	amd/common: check DRM version 3.27 for JPEG decode JPEG was added after DRM version 3.26 Signed-off-by: Leo Liu <leo.liu@amd.com> Fixes: 4558758c51749(amd/common: add vcn jpeg ip info query) Cc: Boyuan Zhang <boyuan.zhang@amd.com> Cc: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>	2018-10-23 13:12:05 -04:00
Juan A. Suarez Romero	a8c2a6b0ac	docs: update calendar I'll take care of 18.2 releases series on Andres behalf. CC: Andres Gomez <agomez@igalia.com> CC: Dylan Baker <dylan@pnwbakers.com> CC: Emil Velikov <emil.l.velikov@gmail.com>	2018-10-23 18:40:09 +02:00
Lionel Landwerlin	a8594887bc	intel/decoders: fix end of batch limit Pointer arithmetic... v2: s/4/sizeof(uint32_t)/ (Eric) v3: Give bytes to print_batch() in error_decode (Lionel) Make clear what values we're dealing with in error_decode (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v2) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-10-23 14:49:33 +01:00
Boyuan Zhang	55e7de7b19	radeonsi: enable vcn jpeg decode for raven Enable vcn jpeg decode for raven. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2018-10-23 08:50:02 -04:00
Boyuan Zhang	97c473bb29	winsys/amdgpu: add vcn jpeg cs support Add vcn jpeg cs support, align cs by no-op. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2018-10-23 08:50:02 -04:00
Boyuan Zhang	4558758c51	amd/common: add vcn jpeg ip info query Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2018-10-23 08:50:02 -04:00
Boyuan Zhang	6d2d910653	radeon/vcn: implement jpeg target buffer cmd Implement jpeg target buffer cmd by programming registers directly, since there is no firmware for VCN Jpeg decode. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com>	2018-10-23 08:50:02 -04:00
Boyuan Zhang	0ee5630cfc	radeon/vcn: implement jpeg bitstream buffer cmd Implement jpeg bitstream buffer cmd by programming registers directly, since there is no firmware for VCN Jpeg decode. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Acked-by: Leo Liu <leo.liu@amd.com>	2018-10-23 08:50:02 -04:00
Boyuan Zhang	9b478b0c7a	radeon/uvd: remove get mjpeg slice header Move the previous get_mjpeg_slice_heaeder function and eoi from "radeon/vcn" to "st/va". Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2018-10-23 08:50:02 -04:00
Boyuan Zhang	4fc2368e3b	st/va: get mjpeg slice header Move the previous get_mjpeg_slice_heaeder function and eoi from "radeon/vcn" to "st/va". Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2018-10-23 08:50:02 -04:00
Boyuan Zhang	c7a5ef26ad	radeon/vcn: add jpeg decode implementation Add a new file to handle VCN Jpeg decode specific functions. Use Jpeg specific cmd sending function in end_frame call. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2018-10-23 08:50:02 -04:00
Boyuan Zhang	40fceb55f3	radeon/vcn: separate send cmd call from end frame Use function pointer for sending cmd in end_frame call. By doing this, we can assign different cmd sending logics for Jpeg decode later. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2018-10-23 08:50:02 -04:00
Boyuan Zhang	4f1f128f8e	radeon/vcn: create cs based on ring type Add RING_VCN_JPEG for VCN Jpeg decode, and keep RING_VCN_DEC for other codecs. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2018-10-23 08:50:02 -04:00
Boyuan Zhang	f7116e4ff8	radeon/winsys: add vcn jpeg ring type Add a new ring type for vcn jpeg. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2018-10-23 08:50:02 -04:00
Boyuan Zhang	e7e68d15b5	radeon/vcn: add vcn jpeg decode interface Add VCN Jpeg decode interfaces and register defines. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2018-10-23 08:50:02 -04:00
Boyuan Zhang	6bc0a3a834	radeon/vcn: move radeon decoder define to header file Move radeon_decoder definition from "radeon_vcn_dec.c" to "radeon_vcn_dec.h", so that it can be included by other files later. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2018-10-23 08:50:02 -04:00
Boyuan Zhang	0f59e3f088	meson: update required amdgpu version to 2.4.95 VCN jpeg requires new hw ip Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2018-10-23 08:50:02 -04:00
Boyuan Zhang	2e768ade61	configure.ac: update libdrm amdgpu version to 2.4.95 VCN jpeg requires new hw ip Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2018-10-23 08:50:02 -04:00
Samuel Pitoiset	69c44de798	radv: fix btoi for R32G32B32 when the dest offset is not 0 Fixes: `593996bc02` ("radv: implement buffer to image operations for R32G32B32") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-23 14:29:26 +02:00
Scott D Phillips	54c823ec79	i965/miptree: Use cpu tiling/detiling when mapping Rename the (un)map_gtt functions to (un)map_map (map by returning a map) and add new functions (un)map_tiled_memcpy that return a shadow buffer populated with the intel_tiled_memcpy functions. Tiling/detiling with the cpu will be the only way to handle Yf/Ys tiling, when support is added for those formats. v2: Compute extents properly in the x\|y-rounded-down case (Chris Wilson) v3: Add units to parameter names of tile_extents (Nanley Chery) Use _mesa_align_malloc for the shadow copy (Nanley) Continue using gtt maps on gen4 (Nanley) v4: Use streaming_load_memcpy when detiling v5: (edited by Ken) Move map_tiled_memcpy above map_movntdqa, so it takes precedence. Add intel_miptree_access_raw, needed after rebasing on commit `b499b85b0f`. v6: refactor to changes done for sse41 separation (Tapani) Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v5) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Tapani Pälli <tapani.palli@intel.com>	2018-10-23 14:08:05 +03:00
Scott D Phillips	11b1afdc92	i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear The reference for MOVNTDQA says: For WC memory type, the nontemporal hint may be implemented by loading a temporary internal buffer with the equivalent of an aligned cache line without filling this data to the cache. [...] Subsequent MOVNTDQA reads to unread portions of the WC cache line will receive data from the temporary internal buffer if data is available. This hidden cache line sized temporary buffer can improve the read performance from wc maps. v2: Add mfence at start of tiled_to_linear for streaming loads (Chris) v3: add Android build support (Tapani) v4: squash 'fix i915: Fix streaming loads for intel_tiled_memcpy' separate sse41 to own static library (Tapani) Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2) Reviewed-by: Matt Turner <mattst88@gmail.com> (v2) Acked-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Tapani Pälli <tapani.palli@intel.com>	2018-10-23 14:08:05 +03:00
Tapani Pälli	91d3a5d1a8	i965: expose type of memcpy instead of memcpy function itself There is currently no use of returned memcpy functions outside intel_tiled_memcpy. Patch changes intel_get_memcpy to return memcpy type instead of actual function. This makes it easier later to separate streaming load copy in to own static library. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-10-23 14:08:05 +03:00
Eric Engestrom	bc021be78d	util: use unsigned ints for bit operations Fixes errors thrown by GCC's Undefined Behaviour sanitizer (ubsan) every time this macro is used. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-23 11:44:02 +01:00
Eric Engestrom	17b03b5320	radv: s/abs/fabsf/ for floats Fixes: `a4c4efad89` "radv: Rework guard band calculation" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-23 11:43:51 +01:00
Eric Engestrom	8629d807aa	meson: drop option description relic `platforms` is no longer a comma-separated string, and some of our option descriptions are way too long already. Just drop the incorrect bit. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-10-23 11:43:51 +01:00
Jason Ekstrand	8b626a22b2	st/mesa: Record shader access qualifiers for images They're not required to be the same as the access flag on the image unit. For hardware that does shader image lowering based on the qualifier (Intel), it may be required for state setup. v2: (by Kenneth Graunke, incorporating feedback from Marek Olšák) - Reduce both access and shader_access to uint16_t to avoid making the pipe_image_view structure larger. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-23 02:36:24 -07:00
Jason Ekstrand	bf441d22a7	nir/algebraic: Provide descriptive asserts for bit size checks This will hopefully make debugging opt_algebraic bit-size compile failures easier. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-10-22 16:00:18 -05:00
Jason Ekstrand	932c650e0b	nir/algebraic: Loosen a restriction on variables Previously, we would fail if a variable had an assigned but unknown bit size X and we tried to assign it an actual bit size. However, this is ok because, at the time we do the search, the variable does have an actual bit size and it will match X because of the NIR rules. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-10-22 16:00:18 -05:00
Jason Ekstrand	ea9e651423	nir/algebraic: A bit of validation refactoring' We rename some local variables in validate() to be more readable and plumb the var through to get/set_var_bit_class instead of the var index. Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-10-22 16:00:18 -05:00
Jason Ekstrand	641f4be8e8	nir/algebraic: Make internal classes str-able Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-10-22 16:00:18 -05:00
Jason Ekstrand	6068be543b	nir/algebraic: Generalize an optimization There's nothing boolean about (a \| ~a) ~> -1 Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-10-22 16:00:18 -05:00
Jason Ekstrand	69618a8678	nir/algebraic: Use bool internally instead of bool32 Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-10-22 16:00:18 -05:00
Kenneth Graunke	00103db04a	intel: Fix decoding for partial STATE_BASE_ADDRESS updates. STATE_BASE_ADDRESS only modifies various bases if the "modify" bit is set. Otherwise, we want to keep the existing base address. Iris uses this for updating Surface State Base Address while leaving the others as-is. v2: Also update aubinator_viewer_decoder (caught by Lionel) Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-22 13:38:44 -07:00
Jason Ekstrand	16870de8a0	nir: Use nir_src_is_const and nir_src_as_* in core code Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-22 14:24:15 -05:00
Jason Ekstrand	ce36f412c9	nir/search_helpers: Use nir_src_is_const and friends This not only makes them safe for more bit sizes but it also fixes a bug in is_zero_to_one where it would return true for constant NaN. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-22 14:24:15 -05:00
Jason Ekstrand	7bae7828aa	nir/search: Use nir_src_is_const and friends Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-22 14:24:15 -05:00
Jason Ekstrand	bca5c2c688	nir: Add some new helpers for working with const sources Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-22 14:24:15 -05:00
Alyssa Rosenzweig	e0c267c752	mesa/st: Only call nir_lower_io_to_scalar_early on scalar ISAs On scalar ISAs, nir_lower_io_to_scalar_early enables significant optimizations. However, on vector ISAs, it is counterproductive and impedes optimal codegen. This patch only calls nir_lower_io_to_scalar_early for scalar ISAs. It appears that at present there are no upstreamed drivers using Gallium, NIR, and a vector ISA, so for existing code, this should be a no-op. However, this patch is necessary for the upcoming Panfrost (Midgard) and Lima (Utgard) compilers, which are vector. With this patch, Panfrost is able to consume NIR directly, rather than TGSI with the TGSI->NIR conversion. For how this affects Lima, see https://www.mail-archive.com/mesa-dev@lists.freedesktop.org/msg189216.html Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2018-10-22 20:37:07 +02:00
Dylan Baker	4e785fb383	meson: don't require libelf for r600 without LLVM r600 doesn't have a hard requirement on LLVM, and therefore doesn't have a hard requirement on libelf. Currently the logic doesn't allow that however. Distro-bug: https://bugs.gentoo.org/669058 Fixes: `5060c51b6f` ("meson: build r600 driver") Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-10-22 11:29:55 -07:00
Jason Ekstrand	ca4e465f7d	anv,radv: Trivially expose two new VK_GOOGLE extensions This patch exposes support for the following two extensions: * VK_GOOGLE_decorate_string * VK_GOOGLE_hlsl_functionality1 There's nothing for the driver to do; it's all handled in spirv_to_nir. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107971 Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-22 10:50:20 -05:00
Jason Ekstrand	891886da2f	spirv: Add no-op support for VK_GOOGLE_hlsl_functionality1 This extension adds two new decorations which carry meaning only for HLSL shaders. They are expected to be handled by higher level layers and can be ignored by implementations. However, it does save the client a bit of work if the implementation safely ignores them instead of the client having to strip them out of the SPIR-V in order for it to be valid. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-22 10:49:53 -05:00
Jason Ekstrand	5f0322d5c3	spirv: Add support for SPV_GOOGLE_decorate_string Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-22 10:49:53 -05:00
Rob Herring	2bb05d70af	android: Build kms_swrast for the Android platform Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Robert Foss <robert.foss@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-22 13:08:17 +01:00
Connor Abbott	27fe3f5b5a	ac: Fix loading a dvec3 from an SSBO The comment was wrong, since the loop above casts to a type with the correct bitsize already. Fixes: `7e7ee82698` ("ac: add support for 16bit buffer loads") Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-22 09:44:51 +02:00
Connor Abbott	59535b05cf	ac: Introduce ac_build_expand() And implement ac_bulid_expand_to_vec4() on top of it. Fixes: `7e7ee82698` ("ac: add support for 16bit buffer loads") Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-22 09:44:51 +02:00
Eduardo Lima Mitev	fdd926d5b2	ir3/nir: Set up image_dims consts for image_deref_size intrinsic too `nir_intrinsic_image_deref_size` is not being considered during scan for driver constants, so image constants are not emitted if a shader only ever query the size of an image (no load, store, atomic op, etc). This is unlikely, but possible. Reviewed-by: Rob Clark <robdclark@gmail.com>	2018-10-21 21:29:18 +02:00
Karol Herbst	2d235d69c8	nv50/ir: fix ConstantFolding::createMul for 64 bit muls Fixes: `2f52925f5c` "nv50/ir: move a * b -> a << log2(b) code into createMul()" Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Signed-off-by: Karol Herbst <kherbst@redhat.com>	2018-10-20 03:00:04 +02:00
Sonny Jiang	bfb2b90246	radeonsi: Disable clear_state with radeon kernel driver Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Tested-by: Michel Dänzer <michel.daenzer@amd.com>	2018-10-19 16:16:57 -04:00
Kenneth Graunke	f91f9bab83	meson: Add -Werror=return-type when supported. This warning detects non-void functions with a missing return statement, return statements with a value in void functions, and functions with an bogus return type that ends up defaulting to int. It's already enabled by default with -Wall. Generally, these are fairly serious bugs in the code, which developers would like to notice and fix immediately. This patch promotes it from a warning to an error, to help developers catch such mistakes early. I would not expect this warning to change much based on the compiler version, so hopefully it won't become a problem for packagers/builders. See the GCC documentation or 'man gcc' for more details: https://gcc.gnu.org/onlinedocs/gcc-7.3.0/gcc/Warning-Options.html#index-Wreturn-type Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-19 10:16:57 -07:00
Jason Ekstrand	0d380af809	anv: Define trampolines as the weak functions Instead of having weak references to the anv functions and separate trampoline functions with their own dispatch table, just make the trampoline functions weak. This gets rid of a dispatch table and potentially lets the compiler delete the unused weak function. The end result is a reduction in the .text section of 5.7K and a reduction in the .data section of 1.4K. Before: text data bss dec hex filename 3190329 282232 8960 3481521 351fb1 _install/lib64/libvulkan_intel.so After: text data bss dec hex filename 3184548 280792 8960 3474300 35037c _install/lib64/libvulkan_intel.so Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-19 11:52:00 -05:00
Juan A. Suarez Romero	f8e789d2ac	docs: fix typo in 18.2.3 release notes link Fixes: `86b4bd52dc` ("docs: update calendar, add news item and link release notes for 18.2.3") Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2018-10-19 18:48:12 +02:00
Juan A. Suarez Romero	86b4bd52dc	docs: update calendar, add news item and link release notes for 18.2.3 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2018-10-19 18:45:41 +02:00
Juan A. Suarez Romero	01f5d37d3e	docs: add sha256 checksums for 18.2.3 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `27fd12857b`)	2018-10-19 18:43:49 +02:00
Juan A. Suarez Romero	e30970e2cd	docs: add release notes for 18.2.3 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `d219361b42`)	2018-10-19 18:43:48 +02:00
Jose Fonseca	45bacc4b63	scons: Remove gles option. It's broken, and WGL state tracker is always built with GLES support noawadays. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-10-19 16:50:26 +01:00
Bas Nieuwenhuizen	68c7833540	radv: Fix WSI & PCI bus info initialization order. Trying to access the bus info before it is initialized is not going to work. Fixes: `baa38c144f` "vulkan/wsi: Use VK_EXT_pci_bus_info for DRM fd matching" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108491 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Andre Heider <a.heider@gmail.com>	2018-10-19 13:24:19 +02:00
Marek Olšák	69a87b5d47	radeonsi: fix a typo in a comment in emit_guardband	2018-10-18 18:01:22 -04:00
Marek Olšák	2a26b1c045	radeonsi: fix gnome-shell crash I wasn't expecting to get viewports with the center having negative coordinates. Broken by: `6cc79e4411`	2018-10-18 17:55:44 -04:00
Jason Ekstrand	8c0b9fdfa1	Revert "anv: Stop generating weak references for instance entrypoints" This reverts commit `00bb42105d`. It was not as well thought out as I had intended and broke the build when VK_KHR_display is disabled in the build.	2018-10-18 15:36:26 -05:00
Marek Olšák	77bcbe712e	radeonsi: clamp point size to the limit This fixes dEQP-GLES2.functional.rasterization.limits.points. Broken by: `ea039f789d` Tested-by: Jakob Bornecrantz <jakob@collabora.com>	2018-10-18 16:08:56 -04:00
Marek Olšák	eae8f49fc6	radeonsi: fix a VGT hang with primitive restart on Polaris10 and later Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org> Tested-by: Jakob Bornecrantz <jakob@collabora.com>	2018-10-18 16:08:56 -04:00
Marek Olšák	165817d47f	radeonsi: fix a deadlock due to partially-initialized context on CI	2018-10-18 16:08:56 -04:00
Jan Vesely	06bf56725d	radeonsi: Bump number of allowed global buffers to 32 Fixes assertion failure/crash when running luxmark/luxball on clover. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108272 CC: mesa-stable@lists.freedesktop.org Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-18 16:02:42 -04:00
Andres Rodriguez	e71a87775e	radv: fix check for perftest options size It was using the debug options array size. CC: mesa-stable@lists.freedesktop.org Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-18 15:42:20 -04:00
Marek Olšák	6cc79e4411	radeonsi: fix incorrect hw screen offset and guardband computation It resulted in assertion failures or incorrect rendering. Broken by: `9e182b8313`	2018-10-18 14:42:42 -04:00
Jason Ekstrand	baa38c144f	vulkan/wsi: Use VK_EXT_pci_bus_info for DRM fd matching This lets us avoid passing the DRM fd around all over the place and gets us closer to layer utopia. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-18 11:29:00 -05:00
Michel Dänzer	c20ba1be18	loader/dri3: Also wait for front buffer fence if we triggered it In that case, we have to wait for the fence to synchronize with the corresponding drawing we triggered in the X server. Fixes incorrect display with the i965 driver and some applications, e.g. solvespace. Bugzilla: https://bugs.freedesktop.org/108097 Fixes: `aefac10fec` "loader/dri3: Only wait for back buffer fences in dri3_get_buffer" Tested-by: Sergii Romantsov <sergii.romantsov@globallogic.com>	2018-10-18 16:52:06 +02:00
Jason Ekstrand	00bb42105d	anv: Stop generating weak references for instance entrypoints We don't need weak references to instance entrypoints because we never have more than one of each so we don't need the NULL fall-back. This also helps us avoid forgetting things because we now get link errors for missing instance entrypoints. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-18 09:17:39 -05:00
Jason Ekstrand	7c65cf9844	vulkan/wsi: Implement GetPhysicalDevicePresentRectanglesKHR This got missed during 1.1 enabling because it was defined as an interaction between device groups and WSI and it wasn't obvious it was in the delta. The idea behind it is that it's supposed to provide a hint to the application in a multi-GPU setup to indicate which regions of the screen are being scanned out by which GPU so a multi-device split-screen rendering application can render each part of the screen on the GPU that will be presenting it and avoid extra bus traffic between GPUs. On a single-GPU setup or one which doesn't support this present mode, we need to do something. We choose to return the window size (or a max-size rect) if the compositor, X server, or crtc is associated with the given physical device and zero rectangles otherwise. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-18 09:17:39 -05:00
Jason Ekstrand	7629c00557	vulkan/wsi: Store the instance allocator in wsi_device We already have wsi_device and we know the instance allocator at wsi_device_init time so there's no need to pass it into the physical device queries. This also fixes a memory allocation domain bug that can occur if CreateSwapchain gets called prior to any queries (not likely) in which case the cached connection gets allocated off the device instead of the instance. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-18 09:17:39 -05:00
Michał Janiszewski	0ef50ecc69	st/xlib: Use more appropriate include guard Signed-off-by: Michał Janiszewski <janisozaur+signed@gmail.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com	2018-10-18 11:03:04 +01:00
Michał Janiszewski	bcc613acc1	gallium: Fix mismatched ifdef-guards Signed-off-by: Michał Janiszewski <janisozaur+signed@gmail.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-18 11:03:03 +01:00
Gert Wollny	74adc624b6	softpipe: dynamically allocate space for immediate constants The number of immediate constants was fixed and the size check was only done by means of an assertion. Given this a shader that emits more immediate constants would result in a memory corruption when mesa is build in release mode. Instead of using this fixed limit allocate the space dynamically, let it grow as needed, and also remove the unused ImmArray. Fixes: dEQP-GLES31.functional.ssbo.layout.random.arrays_of_arrays.1 Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-10-18 10:59:51 +02:00
Timothy Arceri	3a95396f3c	radv: use nir_shrink_vec_array_vars() Totals from affected shaders: SGPRS: 1096 -> 1096 (0.00 %) VGPRS: 1192 -> 1056 (-11.41 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 100940 -> 94384 (-6.49 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 100 -> 112 (12.00 %) Wait states: 0 -> 0 (0.00 %) All affected shaders are from Batman Arkham City. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-18 15:04:09 +11:00
Timothy Arceri	8086fa1bcd	radv: use nir_split_array_vars() We call in the opt loop in case another pass results in an array with indirect access being turned into direct access. Totals from affected shaders: SGPRS: 512 -> 496 (-3.12 %) VGPRS: 456 -> 452 (-0.88 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 40040 -> 39664 (-0.94 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 41 -> 43 (4.88 %) Wait states: 0 -> 0 (0.00 %) All affected shaders are from Batman Arkham City. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-18 15:04:09 +11:00
Timothy Arceri	06675711e7	radv: use nir_opt_find_array_copies() Totals from affected shaders: SGPRS: 1112 -> 1112 (0.00 %) VGPRS: 1492 -> 1196 (-19.84 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 112172 -> 101316 (-9.68 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 93 -> 98 (5.38 %) Wait states: 0 -> 0 (0.00 %) All affected shaders are from "Batman: Arkham City" over DXVK. The pass detects that the temporary array created by DXVK for storing TCS inputs is a copy of the input arrays and allows us to avoid copying all of the input data and then indirecting on it with if-ladders, instead we just do indirect indexing. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-18 15:04:09 +11:00
Timothy Arceri	9d5b106b2e	radv: use nir_opt_copy_prop_vars and nir_opt_dead_write_vars Totals from affected shaders: SGPRS: 2856 -> 2856 (0.00 %) VGPRS: 3236 -> 3248 (0.37 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 236560 -> 233548 (-1.27 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 277 -> 283 (2.17 %) Wait states: 0 -> 0 (0.00 %) Even in the cases were we have increased VGPR use it appears the NIR is improved significantly. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-18 15:04:09 +11:00
Keith Packard	67a2c1493c	vulkan: Add VK_EXT_calibrated_timestamps extension (radv and anv) [v5] Offers three clocks, device, clock monotonic and clock monotonic raw. Could use some kernel support to reduce the deviation between clock values. v2: Ensure deviation is at least as big as the GPU time interval. v3: Set device->lost when returning DEVICE_LOST. Use MAX2 and DIV_ROUND_UP instead of open coding these. Delete spurious TIMESTAMP in radv version. Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> v4: Add anv_gem_reg_read to anv_gem_stubs.c Suggested-by: Jason Ekstrand <jason@jlekstrand.net> v5: Adjust maxDeviation computation to max(sampled_clock_period) + sample_interval. Suggested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Keith Packard <keithp@keithp.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-17 20:10:15 -07:00
Topi Pohjolainen	a11cafbd7a	intel/compiler/icl: Use invocation id bits 22:16 instead of 23:17 Identifier bits in the dispatch header have changed. See Bspec: SINGLE_PATCH Payload: 3D Pipeline Stages - 3D Pipeline Geometry - Hull Shader (HS) Stage IVB+ - Payloads IVB+ Fixes: KHR-GL46.tessellation_shader.tessellation_shader_tc_barriers.barrier_guarded_read_write_calls Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2018-10-17 21:19:57 +03:00
Neil Roberts	a9475d9337	Fix setting indent-tabs-mode in the Emacs .dir-locals.el files Some of the .dir-locals.el had the wrong name for the truthy value so it wasn’t setting indent-tabs-mode. Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-10-17 19:03:08 +02:00
Rob Clark	d27b1c83b9	freedreno/a6xx: don't allocate binning rb Now that a single cmdstream is used for both binning and draw passes, we can skip allocation of cmdstream buffer for binning. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:49 -04:00
Rob Clark	24d57a6d8f	freedreno/a6xx: single cmdstream for draw+binning Now that state which is different for draw vs binning pass is split out into different state-groups with appropriate enable_mask (so the appropriate one is chosen for draw vs binning), switch over to using a single cmdstream for both passes. This should significantly lower draw overhead for CPU bound benchmarks. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:49 -04:00
Rob Clark	72f6164fef	freedreno/a6xx: split binning vs draw program stateobj's Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:49 -04:00
Rob Clark	3313d693af	freedreno/a6xx: split VBO state into binning/draw variants Blob seems to manage to use same input registers for BS (binning pass) vs VS (draw pass) shaders, so it can use the same VBO state for both. We can't quite do that yet, so split them. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:49 -04:00
Rob Clark	b23fc4cacb	freedreno/a6xx: move VBO state to stateobj Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:49 -04:00
Rob Clark	e194056832	freedreno/a6xx: move ZSA state to stateobj Step towards single cmdstream, where we need different state-group-id's for binning vs draw ZSA state. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Rob Clark	a50a9a44e8	freedreno/a6xx: remove vismode param We don't need to keep this IGNORE_VISIBILITY in binning pass. Prep work for using single cmdstream for both draw and binning passes. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Rob Clark	d9dbc9c21f	freedreno/ir3: move binning-pass fixup for a6xx+ Move this to after ir3_cp (which can add lowered immediates to the const state) for a6xx+, to ensure the uniform state matches between binning and vertex shaders. This way we can emit just a single VS_CONST state- group when we re-use single cmdstream for both binning and draw passes. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Rob Clark	1a51c4a87e	freedreno/a6xx: a bit more state emit cleanup Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Rob Clark	2ffc79c7d1	freedreno/a6xx: move framebuffer state emit to emit_mrt() No point in checking this per-draw, since framebuffer change means new batch. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Rob Clark	5894f37b85	freedreno/a6xx: small emit_mrt() cleanup On a6xx, this is only used for pfb->cbufs so we can just directly pass the pfb state. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Rob Clark	b4e94af37d	freedreno/a6xx: use program cache Use the in-memory cache to construct shader program state and re-use it on subsequent draws, to lower driver overhead. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Rob Clark	1d7fbe2cd1	freedreno/ir3: shader variant cache Cache that maps gallium hwcso (in this case, 'struct ir3_shader') plus shader variant key to a generation specific state object. This could eventually replace the linked list of shader variants, but for now it lets us re-use the work currently done in fdN_program_emit() Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Rob Clark	2e9c08c0bc	freedreno/ir3: move binning_pass out of shader variant key Prep work for a following patch, that introduces a cache to map from program state (all shader stages) plus variant key to pre-baked hw state (which could be emit'd via CP_SET_DRAW_STATE, for example). To do that, we really want the variant key to be immutable, and to treat the binning pass shader as an extra shader stage, rather than as a VS variant. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Rob Clark	8b1a3b5dde	freedreno/ir3: track # of samplers used by shader This is useful for a6xx to avoid program state from depending on bound tex/samp state. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Rob Clark	1b9d69410c	freedreno/a6xx: texture state obj Unfortunately gallium doesn't match what the hw wants perfectly here, in using a separate CSO for each texture/sampler. So we have to use a hash table to map the collection of texture/samplers to hw state object. We probably could use separate hw state objects for texture and sampler state, but mesa/st tends to update the tex and samp state together. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Rob Clark	e8606b11dd	freedreno: add resource seqno Intended to be something more compact than a 64b pointer, which could be used as a key into hashtables. Prep work for texture state objects. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Rob Clark	abcdf5627a	freedreno/a6xx: move const emit to state group Eventually we want to move nearly everything, but no other state depends on const state, so this is the easiest one to move first. For webgl aquarium, this reduces GPU load by about 10%, since for each fish it does a uniform upload plus draw.. fish frequently are visible in only a single tile, so this skips the uniform uploads for other tiles. The additional step of avoiding WFI's when using CP_SET_DRAW_STATE seems to be work an additional 10% gain for aquarium. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Rob Clark	a398d26fd2	freedreno/a6xx: add infrastructure for CP_DRAW_STATE Add helper to add state-groups to emit, and code to emit CP_DRAW_STATE packet if we have any state-groups. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Rob Clark	ec717fc629	freedreno: reduce resource dependency tracking overhead Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Neil Roberts	ee61790daf	freedreno: Remove the Emacs mode lines These are not necessary because the corresponding settings are set via the .dir-locals.el file anyway. Most of them were missing a ‘:’ after “tab-width” which was making Emacs display an annoying warning whenever you open the file. This patch was made with: sed -ri '/-\- mode:/,/^$/d' \ $(find src/gallium/{drivers,winsys} -name \.\[ch\] \ -exec grep -l -- '-\*- mode:' {} \+) Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Neil Roberts	afe640b360	freedreno: Fix the Emacs indentation configuration file The .dir-locals.el had the wrong name for the truthy value so it wasn’t setting indent-tabs-mode. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Hyunjun Ko	8e798e28f7	freedreno: allocate batches from the cache in launch_grid Needs to allocate batches from the cache so that it could get a valid index and make resource dependancy tracking right. In addition this fixes assertion on debug build since the commit `1a40faa8` landed. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Hyunjun Ko	2385d7b066	freedreno: adds nondraw param to fd_bc_alloc_batch Needs to specify nondraw when creating a batch through fd_bc_alloc_batch since it'd better create a batch through it rather than fd_batch_create. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Rob Clark	9e6019bd46	freedreno/a6xx: remove fd6_emit_render_cntl() It was dead code carried over from a5xx Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Rob Clark	835cb06965	freedreno/ir3: fix broken texcoord inputs TODO not sure if this is best solution, but current logic is broken for texcoord inputs. It is definitely the simplest solution. Fixes: `1a24f51966` freedreno/ir3: ignore unused inputs Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Rob Clark	cbf9fe50b5	freedreno: fix off-by-one error in BEGIN_RING() Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-17 12:44:48 -04:00
Marek Olšák	669dd22983	util: document a limitation of util_fast_udiv32 trivial	2018-10-17 12:27:58 -04:00
Matt Turner	58a51d0a67	i965/fs: Add 64-bit int immediate support to dump_instructions() Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-10-16 17:48:17 -07:00
Marek Olšák	fcc70e4855	radeonsi: track context rolls better for the Vega scissor bug workaround We should get fewer context rolls with the SET_CONTEXT_REG optimization, but it would have been for nothing if the scissor state rolled the context anyway. Don't emit the scissor state if there is no context roll.	2018-10-16 17:23:25 -04:00
Marek Olšák	25ddb15cfe	radeonsi: emit sample locations for 1xAA only when the hw bug is present	2018-10-16 17:23:25 -04:00
Marek Olšák	9b331e462e	radeonsi: use compute shaders for clear_buffer & copy_buffer Fast color clears should be much faster. Also, fast color clears on evicted buffers should be 200x faster on GFX8 and older.	2018-10-16 17:23:25 -04:00
Marek Olšák	5030adcbe0	radeonsi: use copy_buffer in buffer_do_flush_region directly	2018-10-16 17:23:25 -04:00
Marek Olšák	0b40fbc879	radeonsi: use faster integer division for instance divisors We know the divisors when we upload them, so instead we can precompute and upload division factors derived from each divisor. This fast division consists of add, mul_hi, and two shifts, and we have to load 4 dwords intead of 1. This probably won't affect any apps.	2018-10-16 17:23:25 -04:00
Marek Olšák	bfc795670e	ac: add helpers for fast integer division by a constant	2018-10-16 17:23:25 -04:00
Marek Olšák	ea039f789d	radeonsi: use higher subpixel precision (QUANT_MODE) for smaller viewports	2018-10-16 15:28:22 -04:00
Marek Olšák	4fd8d2df9c	radeonsi: move emission of PA_SU_VTX_CNTL into emit_guardband We'll modify the quant mode there, which also affects the guarband computation.	2018-10-16 15:28:22 -04:00
Marek Olšák	41a6c3de1f	radeonsi: don't re-upload the sample position constant buffer repeatedly	2018-10-16 15:28:22 -04:00
Marek Olšák	b94824c787	radeonsi: set PA_SU_PRIM_FILTER_CNTL optimally	2018-10-16 15:28:22 -04:00
Marek Olšák	9e182b8313	radeonsi: center viewport to improve guardband clipping for high resolutions This will be more useful when we change the quant mode to increase subpixel precision and decrease the viewport range (which might not be possible if the viewport is not centered in the viewport range).	2018-10-16 15:28:22 -04:00
Marek Olšák	fedc1fda30	radeonsi: save raster config in screen, add se_tile_repeat	2018-10-16 15:28:22 -04:00
Marek Olšák	ac76aeef20	radeonsi: switch back to standard DX sample positions Apps may rely on them.	2018-10-16 15:28:22 -04:00
Marek Olšák	67f02cf810	radeonsi: add GDS support to CP DMA	2018-10-16 15:28:22 -04:00
Marek Olšák	0d05581578	radeonsi: rename si_gfx_* functions to si_cp_* and write_event_eop -> release_mem	2018-10-16 15:28:22 -04:00
Marek Olšák	6e1cf6532d	radeonsi: make si_gfx_write_event_eop more configurable	2018-10-16 15:28:22 -04:00
Sergii Romantsov	0fa9e6d7b3	anv/skylake: disable ForceThreadDispatchEnable On Skylake enabling of ForceThreadDispatchEnable causes gpu-hang. -v2: enabling of ForceThreadDispatchEnable is only for gen8, for gen9 and higher reverted enabling of PixelShaderHasUAV. -v3 (Jason Ekstrand): Rework the comments a bit. CC: Jason Ekstrand <jason.ekstrand@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107941 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107760 Fixes: `79270d2140` (anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV) Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-16 13:20:51 -05:00
Lionel Landwerlin	322a919a41	anv: Implement VK_EXT_pci_bus_info Even though the Intel GPU are always at the same PCI location, all the info we need is already provided by libdrm. Let's be future proof. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-16 12:47:55 +01:00
Jose Fonseca	8550be7a2f	appveyor: Cache pip's cache files. It should speed up the Python packages installation. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-10-16 09:41:14 +01:00
Jose Fonseca	bfb8afb14d	appveyor: Update to newer Mako/winflexbison versions. As that's what most people are bound to use. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-10-16 09:41:12 +01:00
Jose Fonseca	b94f9cd8f9	appveyor: Update to MSVC 2017. That's what we (and I suppose most people out there) are using now. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-10-16 09:41:07 +01:00
Samuel Pitoiset	647c2b90e9	radv: disable VK_SUBGROUP_FEATURE_VOTE_BIT This feature isn't used for now, so disable it until wwm is fixed in LLVM. Fixes dEQP-VK.subgroups.vote.graphics.subgroupallequal* https://bugs.freedesktop.org/show_bug.cgi?id=108115 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-16 10:24:19 +02:00
Samuel Pitoiset	593996bc02	radv: implement buffer to image operations for R32G32B32 This should fix rendering issues with Batman Arkham City. We will probably need to implement itob and itoi at some point, but currently nothing hits these paths. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107765 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-16 09:22:38 +02:00
Alex Smith	ca83d51cfb	ac/nir: Use context-specific LLVM types LLVMInt*Type() return types from the global context and therefore are not safe for use in other contexts. Use types from our own context instead. Fixes frequent crashes seen when doing multithreaded pipeline creation. Fixes: `4d0b02bb5a` "ac: add support for 16bit load_push_constant" Fixes: `7e7ee82698` "ac: add support for 16bit buffer loads" Cc: "18.2" <mesa-stable@lists.freedesktop.org> Signed-off-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-10-16 08:18:24 +01:00
Vadym Shovkoplias	ad558408ff	glsl: Check the subroutine associated functions names Adding compile time check for subroutine functions with the same names. Similar check for intrastage linking was already landed in commit `5f0567a4f6`. From Section 6.1.2 (Subroutines) of the GLSL 4.00 specification "A program will fail to compile or link if any shader or stage contains two or more functions with the same name if the name is associated with a subroutine type." Fixes: * no-overloads.vert Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108109 Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-10-16 08:15:21 +03:00
Vadym Shovkoplias	d2ea3d4a76	glsl/linker: Change the format of spec quotation Also there is no "OpenGL ES Shading Language 4.00" spec, so change it to GLSL 4.00 spec. Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-10-16 08:15:21 +03:00
Dave Airlie	ff281e6204	nir: fix clip cull lowering to not assert if GLSL already lowered. If GLSL has already done the lowering, we'd rather not crash in this pass. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-10-15 18:53:48 -07:00
Kenneth Graunke	5bd8369681	i965: Add PCI IDs for new Amberlake parts that are Coffeelake based See commit c0c46ca461f136a0ae1ed69da6c874e850aeeb53 in the Linux kernel, where José Roberto de Souza added this new PCI ID there. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2018-10-15 18:10:27 -07:00
Kenneth Graunke	8f8111646c	intel: disable FS IR validation in release mode. We probably don't need to iterate, fprintf, and abort in release mode. Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-10-15 18:10:27 -07:00
Caio Marcelo de Oliveira Filho	b3c6146925	nir: Copy propagation between blocks Extend the pass to propagate the copies information along the control flow graph. It performs two walks, first it collects the vars that were written inside each node. Then it walks applying the copy propagation using a list of copies previously available. At each node the list is invalidated according to results from the first walk. This approach is simpler than a full data-flow analysis, but covers various cases. If derefs are used for operating on more memory resources (e.g. SSBOs), the difference from a regular pass is expected to be more visible -- as the SSA copy propagation pass won't apply to those. A full data-flow analysis would handle more scenarios: conditional breaks in the control flow and merge equivalent effects from multiple branches (e.g. using a phi node to merge the source for writes to the same deref). However, as previous commentary in the code stated, its complexity 'rapidly get out of hand'. The current patch is a good intermediate step towards more complex analysis. The 'copies' linked list was modified to use util_dynarray to make it more convenient to clone it (to handle ifs/loops). Annotated shader-db results for Skylake: total instructions in shared programs: 15105796 -> 15105451 (<.01%) instructions in affected programs: 152293 -> 151948 (-0.23%) helped: 96 HURT: 17 All the HURTs and many HELPs are one instruction. Looking at pass by pass outputs, the copy prop kicks in removing a bunch of loads correctly, which ends up altering what other other optimizations kick. In those cases the copies would be propagated after lowering to SSA. In few HELPs we are actually helping doing more than was possible previously, e.g. consolidating load_uniforms from different blocks. Most of those are from shaders/dolphin/ubershaders/. total cycles in shared programs: 566048861 -> 565954876 (-0.02%) cycles in affected programs: 151461830 -> 151367845 (-0.06%) helped: 2933 HURT: 2950 A lot of noise on both sides. total loops in shared programs: 4603 -> 4603 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 11085 -> 11073 (-0.11%) spills in affected programs: 23 -> 11 (-52.17%) helped: 1 HURT: 0 The shaders/dolphin/ubershaders/12.shader_test was able to pull a couple of loads from inside if statements and reuse them. total fills in shared programs: 23143 -> 23089 (-0.23%) fills in affected programs: 2718 -> 2664 (-1.99%) helped: 27 HURT: 0 All from shaders/dolphin/ubershaders/. LOST: 0 GAINED: 0 The other generations follow the same overall shape. The spills and fills HURTs are all from the same game. shader-db results for Broadwell. total instructions in shared programs: 15402037 -> 15401841 (<.01%) instructions in affected programs: 144386 -> 144190 (-0.14%) helped: 86 HURT: 9 total cycles in shared programs: 600912755 -> 600902486 (<.01%) cycles in affected programs: 185662820 -> 185652551 (<.01%) helped: 2598 HURT: 3053 total loops in shared programs: 4579 -> 4579 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 80929 -> 80924 (<.01%) spills in affected programs: 720 -> 715 (-0.69%) helped: 1 HURT: 5 total fills in shared programs: 93057 -> 93013 (-0.05%) fills in affected programs: 3398 -> 3354 (-1.29%) helped: 27 HURT: 5 LOST: 0 GAINED: 2 shader-db results for Haswell: total instructions in shared programs: 9231975 -> 9230357 (-0.02%) instructions in affected programs: 44992 -> 43374 (-3.60%) helped: 27 HURT: 69 total cycles in shared programs: 87760587 -> 87727502 (-0.04%) cycles in affected programs: 7720673 -> 7687588 (-0.43%) helped: 1609 HURT: 1416 total loops in shared programs: 1830 -> 1830 (0.00%) loops in affected programs: 0 -> 0 helped: 0 HURT: 0 total spills in shared programs: 1988 -> 1692 (-14.89%) spills in affected programs: 296 -> 0 helped: 1 HURT: 0 total fills in shared programs: 2103 -> 1668 (-20.68%) fills in affected programs: 438 -> 3 (-99.32%) helped: 4 HURT: 0 LOST: 0 GAINED: 1 v2: Remove the DISABLE prefix from tests we now pass. v3: Add comments about missing write_mask handling. (Caio) Add unreachable when switching on cf_node type. (Jason) Properly merge the component information in written map instead of replacing. (Jason) Explain how removal from written arrays works. (Jason) Use mode directly from deref instead of getting the var. (Jason) v4: Register the local written mode for calls. (Jason) Prefer cf_node instead of node. (Jason) Clarify that remove inside iteration only works in backward iterations. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	dc349f07b5	nir: Take call instruction into account in copy_prop_vars Calls are not used yet (functions are inlined), but since new code is already taking them into account, do it here too. The convention here and in other places is that no writable memory is assumed to remain unchanged, as well as global variables. Also, explicitly state the modes affected (instead of using the reverse logic) in one of the apply_for_barrier_modes calls. Suggested by Jason. v2: Consider local vars used by a call to be conservative, SPIR-V has such cases. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	797f01c220	nir: Add tests for copy propagation of derefs Also tests for removal of redundant loads, that we currently handle as part of the copy propagation. Note some tests involve multiple blocks and are currently DISABLED because they (expectedly) fail. v2: Add missing DISABLED prefix to "multi block" tests. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	4dfa7adc10	nir: Remove handling of dead writes from copy_prop_vars These are covered by another pass now. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	c20dd1f77c	intel/nir, freedreno/ir3: Use the separated dead write vars pass No changes to shader-db for intel. No changes to shader-db expected for freedreno. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	cb126cf67a	nir: Separate dead write removal into its own pass Instead of doing this as part of the existing copy_prop_vars pass. Separation makes easier to expand the scope of both passes to be more than per-block. For copy propagation, the information about valid copies comes from previous instructions; while the dead write removal depends on information from later instructions ("have any instruction used this deref before overwrite it?"). Also change the tests to use this pass (instead of copy prop vars). Note that the disabled tests continue to fail, since the standalone pass is still per-block. v2: Remove entries from dynarray instead of marking items as deleted. Use foreach_reverse. (Caio) (all from Jason) Do not cache nir_deref_path. Not worthy for this patch. Clear unused writes when hitting a call instruction. Clean up enumeration of modes for barriers. Move metadata calls to the inner function. v3: For copies, use the vector length to calculate the mask. (all from Jason) Use nir_component_mask_t when applicable. Rename functions for clarity. Consider local vars used by a call to be conservative (SPIR-V has such cases). Comment and assert the assumption that stores and copies are always to a deref that ends with a vector or scalar. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	a02fd7000d	nir: Add tests for dead write elimination Note at the moment the pass called is nir_opt_copy_prop_vars, because dead write elimination is implemented there. Also added tests that involve identifying dead writes in multiple blocks (e.g. the overwrite happens in another block). Those currently fail as expected, so are marked to be skipped. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	bbda2a17f7	nir: Add test file for vars related passes Add basic helpers for doing tests on the vars related optimization passes. The main goal is to lower the barrier to create tests during development and debugging of the passes. Full coverage is not a requirement. v2: Make find_next_intrinsic() skip blocks before 'after'. (Jason) Move nir_imm_ivec2() to nir_builder.h. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	c869646b7d	nir: Add nir_imm_ivec2 helper Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho	3966f053a1	util: Add foreach_reverse for dynarray Useful to walk the array removing elements by swapping them with the last element. v2: Change iteration to make sure we never underflow. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 17:29:46 -07:00
Eric Anholt	8ec83dc51e	v3d: Add support for hardware pack/unpack of half floats. Cuts the formerly 7-minute simulation time of fs-packHalf2x16.shader_test in half.	2018-10-15 17:16:44 -07:00
Eric Anholt	7d77fe1bcc	nir: Expose nir_remove_unused_io_vars(). For gallium drivers where you want to do some linking at variant compile time, you don't have the other producer/consumer shader on hand to modify. By exposing the inner function, the driver can have the used varyings in the compiled shader cache key and still do linking. This is also useful for V3D, where the binning shader wants to only output position and TF varyings. We've been removing those after nir_lower_io, but this will be less driver-specific code and let more of the shader get DCEed early in NIR. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-15 17:16:44 -07:00
Eric Anholt	b788ab6d5c	nir: Be sure to fix deref modes after demoting shader i/o vars to global. Fixes assertion failures when calling nir_remove_unused_varyings() or nir_remove_unused_io_vars(). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-15 17:16:44 -07:00
Eric Anholt	dda1ae9b3c	gallium/ttn: Convert inputs and outputs to derefs of variables. This means that TTN shaders more closely resemble GTN shaders: they have inputs and outputs as variable derefs, with the variables having their .driver_location already set up for you. This will be useful for v3d to do input variable DCE in NIR, which we can't do when the TTN shaders never have a pre-nir_lower_io stage. Acked-by: Rob Clark <robdclark@gmail.com>	2018-10-15 17:16:43 -07:00
Eric Anholt	da15a0d88e	gallium/ttn: Fix the type of gl_FragDepth. In TGSI we have a vec4 of which only .z is used, but for NIR we should be using a float the same as other NIR IR. We were already moving TGSI's .z to the .x channel. Acked-by: Rob Clark <robdclark@gmail.com>	2018-10-15 17:16:43 -07:00
Kristian H. Kristensen	f93e431272	freedreno/a6xx: Enable blitter Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2018-10-15 15:22:38 -07:00
Kristian H. Kristensen	47bc9fad3e	freedreno/a6xx: Update headers Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2018-10-15 15:22:35 -07:00
Kristian H. Kristensen	421863412c	freedreno/a6xx: Remove unnecessary GRAS_2D_BLIT_INFO write Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Reviewed-by: Rob Clark <robdclark@gmail.com>	2018-10-15 15:20:28 -07:00
Jason Ekstrand	e4c9bcd037	anv: Don't advertise ASTC support on BSW Tested-by: Mark Janes <mark.a.janes@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-10-15 16:55:25 -05:00
Samuel Pitoiset	26a2ce35ab	radv: do not force the flat qualifier for clip/cull distances This fixes some new CTS that reads clip/cull distances from the fragment shader stage: dEQP-VK.clipping.user_defined.clip_* Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-15 21:55:28 +02:00
Samuel Pitoiset	80c84bdba9	radv: bump discreteQueuePriorities to 2 It's the minimum value required by the spec. This fixes dEQP-VK.api.info.device.properties. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-15 21:55:25 +02:00
Jason Ekstrand	ae18c53ba6	anv: Split dispatch tables into device and instance There's no reason why we need generate trampoline functions for instance functions or carry N copies of the instance dispatch table around for every hardware generation. Splitting the tables and being more conservative shaves about 34K off .text and about 4K off .data when built with clang. Before splitting dispatch tables: text data bss dec hex filename 3224305 286216 8960 3519481 35b3f9 _install/lib64/libvulkan_intel.so After splitting dispatch tables: text data bss dec hex filename 3190325 282232 8960 3481517 351fad _install/lib64/libvulkan_intel.so Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-15 13:30:24 -05:00
Kenneth Graunke	18cc65edf8	i965: Drop assert about number of uniforms in ARB handling. My recent prog_to_nir patch started making new sampler uniforms, which apparently increased the number of parameters. We used to poke at the one parameter directly, making it important that there was only one, but we haven't done that in a while. It should be safe to just delete the assertion. Fixes: `1c0f92d8a8` "nir: Create sampler variables in prog_to_nir." Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-15 10:56:12 -07:00
Jason Ekstrand	2241be1d1b	vulkan: Add the fuchsia headers These were missing in the last couple of spec updates. Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-15 10:20:31 -05:00
Bas Nieuwenhuizen	6ed0fd24d4	radv: Implement VK_EXT_pci_bus_info. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-10-15 12:27:49 +02:00
Kenneth Graunke	38a23517fd	gallium/u_transfer_helper: Add support for separate Z24/S8 as well. u_transfer_helper already had code to handle treating packed Z32_S8 as separate Z32_FLOAT and S8_UINT resources, since some drivers can't handle that interleaved format natively. Other hardware needs depth and stencil as separate resources for all formats. For example, V3D3 needs this for 24-bit depth as well. This patch adds a new flag to lower all depth/stencils formats, and implements support for Z24_UNORM_S8_UINT. (S8_UINT_Z24_UNORM is left as an exercise to the reader, preferably someone who has access to a machine that uses that format.) Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-14 23:36:28 -07:00
Kenneth Graunke	c3d219837a	gallium/format: Add a helper to combine separate Z24 and S8 stencil. This new function takes separate Z24 depth and S8 stencil sources, and packs them into a single combined Z24S8 buffer. Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-14 23:36:28 -07:00
Kenneth Graunke	5849e0612c	gallium/auxiliary: Add util_format_get_depth_only() helper. This will be used by u_transfer_helper.c shortly, in order to split packed depth-stencil into separate resources. Reviewed-by: Eric Anholt <eric@anholt.net>	2018-10-14 23:36:28 -07:00
Kenneth Graunke	1c0f92d8a8	nir: Create sampler variables in prog_to_nir. This is needed for nir_gather_info to actually count the textures, since it operates solely on variables. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-14 23:35:47 -07:00
Kenneth Graunke	ed169c9ad2	nir: Create sampler2D variables in nir_lower_{bitmap,drawpixels}. This is needed for nir_gather_info to actually count the new textures, since it operates solely on variables. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-14 23:35:35 -07:00
Jason Ekstrand	b7397b09d5	spirv: Update SPIR-V json and headers to Khronos master This corresponds to commit 801cca8104245c07e8cc532 on GitHub. Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-13 09:56:18 -05:00
Samuel Pitoiset	13fd4e601c	vulkan: Update the XML and headers to 1.1.88 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-13 09:56:18 -05:00
Vinson Lee	cc33621e3b	r600/sb: Fix constant-logical-operand warning. sb/sb_bc_parser.cpp:620:27: warning: use of logical '&&' with constant operand [-Wconstant-logical-operand] if (cf->bc.op_ptr->flags && FF_GDS) ^ ~~~~~~ sb/sb_bc_parser.cpp:620:27: note: use '&' for a bitwise operation if (cf->bc.op_ptr->flags && FF_GDS) ^~ & sb/sb_bc_parser.cpp:620:27: note: remove constant to silence this warning if (cf->bc.op_ptr->flags && FF_GDS) ~^~~~~~~~~ Fixes: `da977ad907` ("r600/sb: start adding GDS support") Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-12 10:58:58 -07:00
Rafael Antognolli	ca168ec008	i965/miptree: Use enum instead of boolean. ISL_AUX_USAGE_NONE happens to be the same as "false", but let's do the right thing and use the enum. v2: fix intel_miptree_finish_depth too (Caio) Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-12 10:14:20 -07:00
Samuel Pitoiset	2c139e2cdf	radv: do not support blitting surfaces for R32G32B32 formats Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108113 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-12 15:28:21 +02:00
Jose Fonseca	7c5aececda	scons: Allow building with custom MSVC_USE_SCRIPT script. SCons MSVC support relies on vcvarsall.bat to extract the PATH, CPP includes, library paths, etc. And SCons also has an build env var named MSVC_USE_SCRIPT which one can use to point to alternative vcvarsall.bat script. This change exposes this MSVC_USE_SCRIPT build env variable as a SCons command line variable. This will enable using MSVC outside Program Files (e.g, network shares, etc.) This change also links advapi32 library, necessary for the Windows Registry API used by WGL state tracker, avoiding missing symbols. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-10-12 07:45:53 +01:00
Samuel Pitoiset	416013b4f5	radv: emit the GLC bit for SSBO loads/stores when needed This fixes some new memory model tests: dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.* Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108112 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-12 08:42:08 +02:00
Samuel Pitoiset	4b74f05f6b	spirv/nir: handle memory access qualifiers for SSBO loads/stores v2: - change how the access qualifiers are accumulated v3: - duplicate members in struct_member_decoration_cb() - handle access qualifiers on variables - remove access qualifiers handling in _vtn_variable_load_store() - fix setting access qualifiers on type->array_element Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net	2018-10-12 08:42:08 +02:00
Tapani Pälli	26a10e3844	anv/android: we need git_sha1.h in include paths Fixes: `e4538b9` "anv: Implement VK_KHR_driver_properties" Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-12 07:29:03 +03:00
Nanley Chery	0ee0e0b6b9	anv: Clear WM_HZ_OP overrides in init_device_state This is basically a port of commit, `3ade766684` ("i965: Disable 3DSTATE_WM_HZ_OP fields.") The BDW+ docs describe how to use the 3DSTATE_WM_HZ_OP instruction in the section titled, "Optimized Depth Buffer Clear and/or Stencil Buffer Clear." It mentions that the packet overrides GPU state for the clear operation and needs to be reset to 0s to clear the overrides. Depending on the kernel, we may not get a context with the GPU state for this packet zeroed. Do it ourselves just in case. Prevents a number of GPU hangs when running crucible on ICL. I tried to get the exact number of hangs that occurs without this patch, but was unsuccessful. The test machine became unresponsive before completing the full run. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-10-11 16:31:08 -07:00
Jordan Justen	494d2ec277	i965/gen10+: Initialize new fields in STATE_BASE_ADDRESS Ref: `263b584d5e` "i965/skl: Emit extra zeros in STATE_BASE_ADDRESS on Skylake." Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-10-11 15:16:04 -07:00
Jordan Justen	d18a0d955e	anv/gen9+: Initialize new fields in STATE_BASE_ADDRESS Ref: `263b584d5e` "i965/skl: Emit extra zeros in STATE_BASE_ADDRESS on Skylake." Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-10-11 15:16:00 -07:00
Jason Ekstrand	d7e0d47b9d	nir: Add a bunch of b2[if] optimizations The b2f and b2i conversions always produce zero or one which are both representable in every type and size. Since b2i and b2f support all bit sizes, we can just get rid of the conversion opcode. total instructions in shared programs: 15089335 -> 15084368 (-0.03%) instructions in affected programs: 212564 -> 207597 (-2.34%) helped: 896 HURT: 0 total cycles in shared programs: 369831123 -> 369826267 (<.01%) cycles in affected programs: 2008647 -> 2003791 (-0.24%) helped: 693 HURT: 216 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-10-11 15:21:19 -05:00
Jason Ekstrand	0e0dc596a2	intel/vec4: Fix nir_op_b2[fi] with 64-bit result This is valid NIR but you can't actually hit this case today. GLSL IR doesn't have a bool to double opcode; it does f2d(b2f(x)). In SPIR-V we don't have any to/from bool conversion opcodes at all. However, the next commit will make us start generating it so we should be ready. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-10-11 15:21:19 -05:00
Jason Ekstrand	497675c21e	intel/fs: Fix nir_op_b2[fi] with 64-bit result on Gen8 LP and Gen9 LP Several of the Atom GPUs have additional restrictions on alignment when moving < 64-bit source to a 64-bit destination. All of the nir_op_264 code generation paths respected this, but nir_op_b2[fi] did not. Previous to commit `a68dd47b91` it was not possible to generate such an instruction from the GLSL path. It may have been possible from SPIR-V, but it's not clear. The aforementioned patch converts a 64-bit nir_op_fsign into a sequence of operations including a nir_op_b2f with a 64-bit result. This "just works" everywhere except these Atom parts. This problem was not detected during normal CI testing because the Atom parts are not included in developer builds. v2 (idr): Make the patch compile, and make some cosmetic changes. Add a commit message. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108319 Fixes: `a68dd47b91` "nir/algebraic: Simplify fsat of fsign" Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-10-11 15:21:19 -05:00
Vinson Lee	4ece6aa552	egl: Use correct shared libraries suffix on macOS. Signed-off-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-11 11:30:00 -07:00
Illia Iorin	b18f8e63ef	mesa: Fix pack_uint_Z_FLOAT32() Fixed pack_uint_Z_FLOAT32 by casting row data to float instead uint. Remove code duplicate function pack_uint_Z_FLOAT32_X24S8. Edited case in "_mesa_get_pack_uint_z_func". Now it looks like "_mesa_get_pack_float_z_func". Remove _mesa_problem call, which was added for debuging this issue. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91433 Signed-off-by: Illia Iorin <illia.iorin@globallogic.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-10-11 10:15:09 -07:00
Rodrigo Vivi	24db1c7fcc	intel: Introducing Whiskey Lake platform Whiskey Lake uses the same gen graphics as Coffe Lake, including some ids that were previously marked as reserved on Coffe Lake, but that now are moved to WHL page. This follows the ids and approach used on kernel's commit b9be78531d27 ("drm/i915/whl: Introducing Whiskey Lake platform") and commit c1c8f6fa731b ("drm/i915: Redefine some Whiskey Lake SKUs") v2: Lionel noticed that GT{1,2,3} on kernel wasn't following spec when looking to number of EUs, so kernel has been updated. Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Anuj Phogat <anuj.phogat@gmail.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-11 10:02:40 -07:00
Boyuan Zhang	d76c277421	st/va: use provided sizes and coords for vlVaGetImage vlVaGetImage should respect the width, height, and coordinates x and y that passed in. Therefore, pipe_box should be created with the passed in values instead of surface width/height. v2: add input size check, return error when size out of bounds v3: fix the size check for vaimage v4: add size adjustment for x and y coordinates Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Cc: "18.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Acked-by: Christian König <christian.koenig@amd.com>	2018-10-11 09:00:18 -04:00
Samuel Pitoiset	229803b66a	radv: implement clear operations for R32G32B32 This fixes crashes for some CTS: dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.color..linear__* dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.color.._linear_* Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108113 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-11 14:49:16 +02:00
Samuel Pitoiset	c3ba3c2611	radv: disallow 3D images and mipmaps/layers for R32G32B32 linear formats R32G32B32 are weird formats and we are only going to support some basic operations for now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-11 14:49:14 +02:00
Samuel Pitoiset	d179312b53	radv: add a workaround for a VGT hang with prim restart and strips Otherwise, Yakuza and The Evil Within hang the GPU with DXVK. This apparently only works on Polaris. Suggested by Marek. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-11 10:16:11 +02:00
Timothy Arceri	3bc012a34e	glsl: remove redundant es_shader checks The es check is already covered by the is_version() check. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-10-11 14:45:43 +11:00
Dave Airlie	cc2fe57922	st/glsl_to_tgsi: initialise need_uarl in contructor Found by coverity Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-11 10:20:37 +10:00
Dave Airlie	c5c3da6c90	glspirv: drop pointless assert (size_t is unsigned) Found by coverity Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2018-10-11 10:19:48 +10:00
Dave Airlie	600d8ecb57	radv: remove unsigned comparison against 0 The value is always >= 0 here. Found by coverity Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-10-11 10:19:20 +10:00
Dave Airlie	6e1d294804	radv: remove dead code for master_fd close We have never opened master_Fd at this point, so remove code to close it. Found by coverity. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-10-11 10:19:16 +10:00
Dave Airlie	7c04b96f03	radv: don't pass shader key by copy Coverity pointed out we were copying 168 bytes here unnecessarily. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-10-11 10:18:43 +10:00
Dave Airlie	29a7631986	anv: add missing unlock in error path. Not going to matter, but be consistent. Found by coverity Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Fixes: `caf41c78c` (anv/allocator: Support softpin in the BO cache)	2018-10-11 09:50:27 +10:00
Jason Ekstrand	4ba445e011	intel: Don't propagate conditional modifiers if a UD source is negated This fixes a bug uncovered by my NIR integer division by constant optimization series. Fixes: `19f9cb72c8` "i965/fs: Add pass to propagate conditional..." Fixes: `627f94b72e` "i965/vec4: adding vec4_cmod_propagation..." Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-10-10 13:13:12 -05:00
Jason Ekstrand	328d4d080b	util: Add tests for fast integer division by constants While I generally trust rediculousfish to have done his homework, we've made some adjustments to suit the needs of mesa and it'd be good to test those. Also, there's no better place than unit tests to clearly document the different edge cases of the different methods. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-10 13:13:12 -05:00
Marek Olšák	a9be8dddfe	util: Add power-of-two divisor support to compute_fast_udiv_info Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-10 13:13:12 -05:00
Jason Ekstrand	7cde4dbcd7	util: Generalize fast integer division to be variable bit-width There's nothing inherently fixed-width in the code. All that's required to generalize it is to make everything internally 64-bit and pass UINT_BITS in as a parameter to util_compute_fast_[us]div_info. With that, it can now handle 8, 16, 32, and 64-bit integer division by a constant. We also add support for division by 1 and by other powers of 2. This is useful if you want to divide by a uniform value in a shader where you have the opportunity to adjust the uniform on the CPU before passing it in. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-10 13:13:12 -05:00
Marek Olšák	64eb0738d4	util: Add fast division helpers Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-10 13:13:12 -05:00
Marek Olšák	2940c257a6	util: import public domain code for integer division by a constant Compilers can use this to generate optimal code for integer division by a constant. Additionally, an unsigned division by a uniform that is constant but not known at compile time can still be optimized by passing 2-4 division factors to the shader as uniforms and executing one of the fast_udiv* variants. The signed division algorithm doesn't have this capability. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-10 13:13:12 -05:00
Jason Ekstrand	0dca6730b4	util: Add a simple big math library Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-10-10 13:13:12 -05:00
Dylan Baker	b8521704ed	meson: Don't allow building EGL on Windows or MacOS Currently mesa only supports EGL on Unix like systems, cygwin, and haiku. Meson should actually enforce this. This fixes the default build on MacOS. v2: - invert the condition, mark darwin and windows as not supported instead of trying to mark what is supported. v3: - add missing ) v3: - Update comment to reflect condition change in v2 CC: 18.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-10 11:02:36 -07:00
Timothy Arceri	0346ad3774	glsl: ignore trailing whitespace when define redefined The Nvidia/AMD binary drivers allow this, as does GCC. This fixes shader compilation issues in the latest update of No Mans Sky. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-10-10 15:08:32 +11:00
Ian Romanick	b44c9292b7	intel/compiler: Don't handle fsign.sat No shader-db or CI changes on any Intel platform. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-10-09 13:56:42 -07:00
Ian Romanick	a68dd47b91	nir/algebraic: Simplify fsat of fsign These allows us to not support fsign.sat in the Intel compiler backend, and that will simplify some later changes. No shader-db changes on any Intel platform. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-10-09 13:56:42 -07:00
Ian Romanick	1546204cdd	nir/algebraic: sign(x)xx is abs(x)*x shader-db results: All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 15106023 -> 15105981 (<.01%) instructions in affected programs: 300 -> 258 (-14.00%) helped: 6 HURT: 0 helped stats (abs) min: 7 max: 7 x̄: 7.00 x̃: 7 helped stats (rel) min: 14.00% max: 14.00% x̄: 14.00% x̃: 14.00% 95% mean confidence interval for instructions value: -7.00 -7.00 95% mean confidence interval for instructions %-change: -14.00% -14.00% Instructions are helped. total cycles in shared programs: 566050327 -> 566050075 (<.01%) cycles in affected programs: 2826 -> 2574 (-8.92%) helped: 6 HURT: 0 helped stats (abs) min: 40 max: 44 x̄: 42.00 x̃: 42 helped stats (rel) min: 8.89% max: 8.94% x̄: 8.92% x̃: 8.92% 95% mean confidence interval for cycles value: -44.30 -39.70 95% mean confidence interval for cycles %-change: -8.95% -8.88% Cycles are helped. No changes on Gen6 or earlier. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-10-09 13:56:42 -07:00
Ian Romanick	10f4a8871e	nir: Add helper functions to get the instruction that generated a nir_src Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-10-09 13:56:42 -07:00
Brian Paul	797e34f658	svga: change svga_destroy_shader_variant() to return void svga_destroy_shader_variant() itself flushes and retries the command if there's a failure. So no need for the callers to do it. Other callers of the function were already ignoring the return value. This also fixes a corner-case double-free reported by Coverity (and reported by Dave Airlie). Tested with various OpenGL apps. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-10-09 11:17:14 -06:00
Dylan Baker	b781688636	meson: Don't build glsl compiler tests unless OpenGL is enabled Since there are no other users of the glsl compiler. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-09 08:56:00 -07:00
Dylan Baker	d84f003b95	meson: Only build gallium state tracker tests with shared_glapi This has always been a requirement, it's just somehow been missed in the meson build. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-09 08:55:56 -07:00
Dylan Baker	0fa6a8271a	meson: only build clapi tests when OpenGL is being built Otherwise building just vulkan (among other things) will build these tests, pull in a bunch of stuff they shouldn't, and potentially fail to compile. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-09 08:55:48 -07:00
Ilia Mirkin	92f56fbd89	nvc0: fix blitting red to srgb8_alpha For some reason the 2d engine can't handle this. Red formats get special treatment there, so perhaps related. Fixes dEQP-GLES3 tests of the form: dEQP-GLES3.functional.fbo.blit.conversion.r{8,16f,32f}_to_srgb8_alpha8 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Karol Herbst <kherbst@redhat.com> Cc: mesa-stable@lists.freedesktop.org	2018-10-09 10:33:11 -04:00
Ilia Mirkin	9bf0614116	nv50,nvc0: guard against zero-size blits The current state tracker can generate these sometimes. Fixing this is more involved, and due to some integer math we can generate divisions-by-zero. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Karol Herbst <kherbst@redhat.com> Cc: mesa-stable@lists.freedesktop.org	2018-10-09 10:33:11 -04:00
Ilia Mirkin	78d3640e49	nv50,nvc0: mark RGBX_UINT formats as renderable This helps st/mesa avoid some (apparently) buggy fallbacks. Specifically the CopyTexSubImage fallback tries to read texture A as RGBA_FLOAT and write back that data into the target format, which fails for integer formats which have no appropriate logic to do the conversion. Since integer formats don't blend, there's no harm in the fact that the "A" component gets written anyways. Fixes, among others: https://www.khronos.org/registry/webgl/sdk/tests/conformance2/textures/canvas/tex-2d-rgb8ui-rgb_integer-unsigned_byte.html Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org	2018-10-09 10:33:11 -04:00
Eric Engestrom	976188737d	radv: add missing meson c++ visibility arguments Fixes: `6f3aee40f9` "radv: using tls to store llvm related info and speed up compiles (v10)" Cc: Dave Airlie <airlied@redhat.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-10-09 14:22:24 +01:00
Michel Dänzer	9d3fefdc41	gbm: Add GBM_FORMAT_ARGB1555 support Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-09 10:32:51 +02:00
Michel Dänzer	e7e033ed8a	st/dri: Handle BGRA5551 format Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-09 10:32:50 +02:00
Rob Clark	fa52ff856d	freedreno/a5xx+a6xx: fix LRZ pitch alignment Both RB_2D_DST_SIZE.PITCH (a6xx) and RB_MRT[n].PITCH (a5xx) need alignment to 64. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-08 19:05:14 -04:00
Rob Clark	82c3b6fe49	freedreno/a6xx: add LRZ support As with a5xx, hidden behind FD_MESA_DEBUG=lrz due to being paranoid about z-fighting issues with some games (in particular, this was observed with 0ad on a5xx.. but I think the proper solution to enable this by default is to figure out how to do driver specific driconf options). Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-08 19:05:14 -04:00
Rob Clark	a877451a41	freedreno: update generated headers Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-08 18:03:35 -04:00
Rob Clark	bf79a7cc25	freedreno/a6xx: add helper for various CP_EVENT_WRITE Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-08 17:50:26 -04:00
Rob Clark	60af89815e	freedreno/a6xx: remove unused fxns Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-08 17:50:26 -04:00
Rob Clark	d5bd3ce89c	freedreno/a6xx: remove fd6_shader_stateobj Earlier gen's already got this cleanup, but a6xx was still off on a branch then. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-08 17:50:26 -04:00
Ilia Mirkin	1bb1c03d61	glsl: fix array assignments of a swizzled vector This happens in situations where we might do vec.wzyx[i] = ... The swizzle would get effectively ignored because of the interaction between how ir_assignment->set_lhs works and overwriting the write_mask. There are two cases, one where i is a constant, and another where i is variable. We have to be extra-careful in both cases. Fixes the following WebGL test: https://www.khronos.org/registry/webgl/sdk/tests/conformance2/glsl3/vector-dynamic-indexing-swizzled-lvalue.html And the new piglit tests: swizzled-writemask-indexing-nonconst.shader_test swizzled-writemask-indexing.shader_test Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Cc: mesa-stable@lists.freedesktop.org	2018-10-08 14:29:14 -04:00
Samuel Pitoiset	d3682766f6	radv: tidy up radv_pipeline_init_multisample_state() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-08 14:17:43 +02:00
Samuel Pitoiset	b38228ccb0	radv: always set PA_SC_MODE_CNTL_1.OUT_OF_ORDER_WATER_MARK It has probably no effect without out of order rasterization anyway. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-08 14:17:40 +02:00
Samuel Pitoiset	937986ca1d	radv: set DB_EQAA.INCOHERENT_EQAA_READS My attempt was to set this field instead of duplicating one. Fixes: `6cfa321c39` ("radv: add potential missing fields for DB_EQAA") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-08 14:17:33 +02:00
Chystiakov, Dmytro	47e3338b04	i965: fallback RGBX to RGBA in glEGLImageTargetRenderbufferStorageOES In the same fashion as is done for glEGLImageTextureTarget2D. v2: share the fallback which sets baseformat and internalformat correctly which makes both of the tests pass (Tapani) Fixes android.hardware.nativehardware.cts.AHardwareBufferNativeTests: #SingleLayer_ColorTest_GpuColorOutputCpuRead_R8G8B8X8_UNORM #SingleLayer_ColorTest_GpuColorOutputIsRenderable_R8G8B8X8_UNORM Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2018-10-08 08:03:45 +03:00
Tapani Pälli	d1fa69ed61	glsl: do not attempt assignment if operand type not parsed correctly v2: check types of both operands (Ian) Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108012	2018-10-08 08:02:50 +03:00
Marek Olšák	d877451b48	util/u_queue: add UTIL_QUEUE_INIT_SET_FULL_THREAD_AFFINITY Initial version discussed with Rob Clark under a different patch name. This approach leaves his driver unaffected.	2018-10-06 22:05:58 -04:00
Marek Olšák	066aa44fc5	radeonsi: fix a typo at CS_PARTIAL_FLUSH harmless	2018-10-06 21:50:52 -04:00
Marek Olšák	77903c8cfb	ac: add ac_build_round	2018-10-06 21:50:09 -04:00
Marek Olšák	fa023f293e	ac: correct PKT3_COPY_DATA definitions	2018-10-06 21:50:09 -04:00
Marek Olšák	82f5f89bf6	ac: simplify LLVM alloca helpers	2018-10-06 21:50:09 -04:00
Marek Olšák	a668c8d6ba	ac: define all address spaces properly	2018-10-06 21:50:09 -04:00
Gert Wollny	8f77156c26	gallivm: Make it possible to disable some optimization shortcuts in release builds For testing it is of interest that all tests of dEQP pass, e.g. to test virglrenderer on a host only providing software rendering like in a CI. Hence make it possible to disable certain optimizations that make tests fail. While we are there also add some documentation to the flags to make it clear that this is opt-out. Setting the environment variable "GALLIVM_PERF=no_filter_hacks" can be used to make the following tests pass in release mode: dEQP-GLES2.functional.texture.mipmap.2d.affine._linear_ dEQP-GLES2.functional.texture.mipmap.cube.generate.* dEQP-GLES2.functional.texture.vertex.2d.filtering._mipmap_linear_ dEQP-GLES2.functional.texture.vertex.2d.wrap.* Related: https://bugs.freedesktop.org/show_bug.cgi?id=94957 v2: rename optimization disabling flag to 'safemath' and also move the nopt flag to the perf flags. v3: rename flag "safemath" to "no_filter_hacks" since safemath is usually associated with floating point operations (Roland) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-10-06 13:12:48 +02:00
Tomeu Vizoso	9d81cd8e7c	virgl: Pass resource size and transfer offsets Pass the size of a resource when creating it so a backing can be kept in the other side. Also pass the required offset to transfer commands. This moves vtest closer to how virtio-gpu works, making it more useful for testing. v2: - Use new messages for creation and transfers, as changing the behavior of the existing messages would be messy given that we don't want to break compatibility with older servers. v3: - Use correct strides: The resource corresponding to the output display might have a differnt line stride then the IOVs, so when reading back to this resource take the resource stride and the the IOV stride into account. v4: Fix transfer size calculation (Andrey Simiklit) v5: Add comment about transfer size value in the PUT commend (Gurchetan). Add a comment about the size correction for transfers for reading and writing the resource. Fixing this by correctly evaluating the size upfront will need some work also on the virglrenderer side. Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> (v2) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2018-10-06 13:12:44 +02:00
Gert Wollny	5d7858f151	virgl, vtest: Correct the transfer size calculation The transfer size used in virglrenderer refers to uint32_t, so one must add 3 and then divide by 4 instead of adding 3/4 which is a no-op with integers. Fixes: `b3b82fe8ea` virgl/vtest: add vtest driver Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2018-10-06 13:12:44 +02:00
Alan Coopersmith	066850edad	util: Make xmlconfig.c build on Solaris without d_type in dirent (v2) v2: check for lstat() failing Fixes: `04bdbbcab3` "xmlconfig: read more config files from drirc.d/" Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com> Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-10-05 17:30:45 -07:00
Sonny Jiang	084cf3b966	radeonsi:optimizing SET_CONTEXT_REG for shaders vgt_vertex_reuse Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2018-10-05 19:04:13 -04:00
Sonny Jiang	ce1d72609d	radeonsi:optimizing SET_CONTEXT_REG for shaders Tessellation Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2018-10-05 19:04:13 -04:00
Sonny Jiang	4de328da07	radeonsi:optimizing SET_CONTEXT_REG for shaders PS Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2018-10-05 19:04:13 -04:00
Sonny Jiang	f243980f2c	radeonsi:optimizing SET_CONTEXT_REG for shaders VS Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2018-10-05 19:04:13 -04:00
Sonny Jiang	4052624398	radeonsi:optimizing SET_CONTEXT_REG for shaders GS Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2018-10-05 19:04:13 -04:00
Marek Olšák	86f004bdfc	radeonsi: optimize and allow reg > 31 in radeon_opt_set_context_reg functions reg_saved will have 64 bits, and (1 << reg) where reg > 31 has undefined behavior. (1ull << reg) would be correct for 64 bits. This commit shifts the other way in order to merge the conditions.	2018-10-05 19:04:13 -04:00
Sonny Jiang	eeb9170599	radeonsi: optimizing SET_CONTEXT_REG for shaders ES Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2018-10-05 17:53:52 -04:00
Samuel Pitoiset	a1bc152340	spirv: mark variables decorated with XfbBuffer as always active Otherwise, they are removed during NIR linking or in some lowering passes. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-10-05 18:13:25 +02:00
Juan A. Suarez Romero	5bd03d02c1	docs: update calendar, add news and link release notes to 18.2.2 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2018-10-05 12:51:34 +02:00
Juan A. Suarez Romero	c565eeee0b	docs: add sha256 checksums for 18.2.2 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `cb63a4e114`)	2018-10-05 12:46:33 +02:00
Juan A. Suarez Romero	3537465059	docs: add release notes for 18.2.2 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `abaeb79eb2`)	2018-10-05 12:46:31 +02:00
Jason Ekstrand	dd553bc67f	nir/alu_to_scalar: Use ssa_for_alu_src in hand-rolled expansions The ssa_for_alu_src helper will correctly handle swizzles and other source modifiers for you. The expansions for unpack_half_2x16, pack_uvec2_to_uint, and pack_uvec4_to_uint were all broken with regards to swizzles. The brokenness of unpack_half_2x16 was causing rendering errors in Rise of the Tomb Raider on Intel ever since `c11833ab24` which added an extra copy propagation to the optimization pipeline and caused us to start seeing swizzles where we hadn't seen any before. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107926 Fixes: `9ce901058f` "nir: Add lowering of nir_op_unpack_half_2x16." Fixes: `9b8786eba9` "nir: Add lowering support for packing opcodes." Tested-by: Alex Smith <asmith@feralinteractive.com> Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-10-04 12:43:59 -05:00
Vadym Shovkoplias	5f0567a4f6	glsl/linker: Check the subroutine associated functions names >From Section 6.1.2 (Subroutines) of the GLSL 4.00 specification "A program will fail to compile or link if any shader or stage contains two or more functions with the same name if the name is associated with a subroutine type." v2: - error out earlier (Tapani) - style fixes (Iago) Fixes: * no-overloads.vert Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108109 Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-10-04 17:41:19 +02:00
Tomeu Vizoso	ed53a79cf8	virgl: Negotiate version with vtest server Check if server supports version negotation by sending a PING_PROTOCOL_VERSION message right before a dummy RESOURCE_BUSY_WAIT. If we don't get a reply for the first, we know the server doesn't support it. If it does support it, we can query the max protocol version supported by the server and fall back if needed. v2: - Send a new message to negotiate the protocol version, checking if the server supports this message by immediately sending a busy wait message. (Dave Airlie) v3: - Send a zero-arg command PING_PROTOCOL_VERSION so we actually keep compatibility with older servers. (Code by Dave Airlie) Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2018-10-04 16:18:36 +02:00
Sagar Ghuge	0c70e11206	intel: aubinator: Fix memory leaks Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-04 10:01:56 +01:00
Sagar Ghuge	29a2eaf3db	intel/decoder: construct correct xml filename construct correct gen xml filename when we try to load hardware xml description from a given path v2: remove temporary variable (Francesco Ansanelli) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-04 10:01:56 +01:00
Sagar Ghuge	f9c8468c82	intel/decoder: Avoid freeing invalid pointer v2: Free ctx.spec if error while reading genxml (Lionel Landwerlin) v3: Handle case where genxml is empty (Lionel Landwerlin) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-04 10:01:56 +01:00
Sagar Ghuge	ba3304e764	intel/decoder: add gen_spec_init method Initialize gen_spec instance properly when loading hardware xml description from specifc directory to avoid segmentation fault. v2: correct function definition (Lionel Landwerlin) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-04 10:01:56 +01:00
Samuel Pitoiset	2b34985d93	radv: fix resetting the pool for timestamp queries Since the driver no longer uses the availability bit for timestamp queries it shouldn't reset it. Instead, it should reset the query values to UINT32_MAX. This fixes VM faults. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108164 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Józef Kucia <joseph.kucia@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-10-04 10:56:25 +02:00
Guido Günther	b2a876a42b	etnaviv: Use write combine instead of unached mappings for shader bo The later are sensitive to unaligned accesses on arm64[1] and we don't need an uncached mapping here. [1]: https://lists.freedesktop.org/archives/etnaviv/2018-September/001956.html Signed-off-by: Guido Günther <guido.gunther@puri.sm> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>	2018-10-04 10:33:25 +02:00
Marek Olšák	8e0b4cb8a1	drirc: add a workaround for ARMA 3 Cc: 18.2 <mesa-stable@lists.freedesktop.org>	2018-10-04 01:01:54 -04:00
Jason Ekstrand	f5bab06428	anv/batch_chain: Don't start a new BO just for BATCH_BUFFER_START Previously, we just went ahead and emitted MI_BATCH_BUFFER_START as normal. If we are near enough to the end, this can cause us to start a new BO just for the MI_BATCH_BUFFER_START which messes up chaining. We always reserve enough space at the end for an MI_BATCH_BUFFER_START so we can just increment cmd_buffer->batch.end prior to emitting the command. Fixes: `a0b133286a` "anv/batch_chain: Simplify secondary batch return..." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107926 Tested-by: Alex Smith <asmith@feralinteractive.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-03 09:03:12 -05:00
Jason Ekstrand	7a89a0d9ed	anv: Use separate MOCS settings for external BOs On Broadwell and above, we have to use different MOCS settings to allow the kernel to take over and disable caching when needed for external buffers. On Broadwell, this is especially important because the kernel can't disable eLLC so we have to do it in userspace. We very badly don't want to do that on everything so we need separate MOCS for external and internal BOs. In order to do this, we add an anv-specific BO flag for "external" and use that to distinguish between buffers which may be shared with other processes and/or display and those which are entirely internal. That, together with an anv_mocs_for_bo helper lets us choose the right MOCS settings for each BO use. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99507 Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-03 09:03:03 -05:00
Emil Velikov	08bff097e1	meson: remove invalid "opencl" llvm component Seeming copy/paste mistake from configure.ac which uses $2 for the component and $3 for the fancy name printing. Cc: Dylan Baker <dylan@pnwbakers.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-10-03 13:38:06 +01:00
Emil Velikov	fe8be81b4a	Revert "mesa: remove unnecessary 'sort by year' for the GL extensions" This reverts commit `3d81e11b49`. As reported by Federico, some games require the 'sort by year' since they truncate the extensions which do not fit the fixed size string array. Seemingly I did not consider that, as the documentation (both Mesa and Nvidia) mentions about program crashes ... which are worked around by setting the env. variable. This commit reinstates the workaround and enhances the documentation. Cc: Marek Olšák <maraeo@gmail.com> Cc: Ian Romanick <idr@freedesktop.org> Reported-by: Federico Dossena <info@fdossena.com> Fixes: `3d81e11b49` ("mesa: remove unnecessary 'sort by year' for the GL extensions") Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Tested-by: Federico Dossena <info@fdossena.com>	2018-10-03 13:38:06 +01:00
Emil Velikov	91ff8b1dd9	mesa: reorder and document the tokens in glheader.h Split into different sections, document each one as well as strange cases like GL_ATI_texture_compression_3dc. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-03 13:38:06 +01:00
Emil Velikov	5f70964b1d	mesa: remove duplicate declarations from glheader.h Remove all the desktop GL and GLX entries from the list. Former are pulled by the gl.h and glext.h includes at the top while the latter are no longer needed. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-03 13:38:06 +01:00
Emil Velikov	01b92916af	i965: reference __DRI_ATTRIB_SWAP_COPY token over the GLX one Earlier commit updated the code to use the DRI tokens, yet forgot to update the comment. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-03 13:38:06 +01:00
Emil Velikov	e04b2c0376	i915: reference __DRI_ATTRIB_SWAP_COPY token over the GLX one Earlier commit updated the code to use the DRI tokens, yet forgot to update the comment. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-03 13:38:06 +01:00
Emil Velikov	d26b122ee8	dri/common: move the required GLX_* token definitions locally Will allow us to remove even bigger hack elsewhere. But more importantly, we should not be using _any_ GLX tokens in DRI. Document the gory details about the current side-effects. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-03 13:38:06 +01:00
Emil Velikov	4ef53669af	dri/common: use __DRI_ATTRIB_SWAP* instances when describing db_modes Somewhat recently Thomas Hellstrom added the respective DRI tokens and updated the drivers. Update the documentation to match reality. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-10-03 13:38:06 +01:00
Emil Velikov	d6a6760139	egl/x11: remove eglSwap* surface check Already handled further up in eglapi.c. To make things a tiny bit strange, X11+DRI3 was doing the wrong thing by returning EGL_FALSE (+ no error), while X11+DRI2 was returning EGL_TRUE. Cc: samiuddi <sami.uddin.mohammad@intel.com> Cc: Eric Engestrom <eric.engestrom@intel.com> Cc: Erik Faye-Lund <kusmabite@gmail.com> Cc: Tomasz Figa <tfiga@chromium.org> Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>	2018-10-03 13:38:06 +01:00
Emil Velikov	8030741996	egl/surfaceless: remove eglSwap* stubs The API validation in eglapi.c already returns if the surface type is !window. Cc: samiuddi <sami.uddin.mohammad@intel.com> Cc: Erik Faye-Lund <kusmabite@gmail.com> Cc: Tomasz Figa <tfiga@chromium.org> Cc: Gurchetan Singh <gurchetansingh@chromium.org> Cc: Chad Versace <chadversary@chromium.org> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-03 13:38:06 +01:00
Emil Velikov	a370e278d3	egl/drm: remove eglSwap* surface check Already handled further up in eglapi.c Cc: samiuddi <sami.uddin.mohammad@intel.com> Cc: Erik Faye-Lund <kusmabite@gmail.com> Cc: Tomasz Figa <tfiga@chromium.org> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-03 13:38:06 +01:00
Emil Velikov	91ccb59ff4	egl/android: remove eglSwap* surface check Already handled further up in eglapi.c Cc: samiuddi <sami.uddin.mohammad@intel.com> Cc: Erik Faye-Lund <kusmabite@gmail.com> Cc: Tomasz Figa <tfiga@chromium.org> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-03 13:38:06 +01:00
Emil Velikov	8f66743ca2	egl: make eglSwapBuffers* a no-op for !window surfaces Analogous to the previous commit - the spec says the function is a no-op when a pbuffer or pixmap surface is used. Cc: samiuddi <sami.uddin.mohammad@intel.com> Cc: Erik Faye-Lund <kusmabite@gmail.com> Cc: Tomasz Figa <tfiga@chromium.org> Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-03 13:38:05 +01:00
Emil Velikov	64b4ccde0c	egl: make eglSwapInterval a no-op for !window surfaces As the spec says, the function is a no-op when the surface is not a window one. That spec implies that EGL_TRUE should be returned in that case, yet the ARM driver seems to return EGL_FALSE + EGL_BAD_SURFACE. The Nvidia driver returns EGL_TRUE. We follow that behaviour until a decision is made. https://gitlab.khronos.org/egl/API/merge_requests/17 Cc: samiuddi <sami.uddin.mohammad@intel.com> Cc: Erik Faye-Lund <kusmabite@gmail.com> Cc: Tomasz Figa <tfiga@chromium.org> Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-03 13:38:05 +01:00
Emil Velikov	c231b49c53	freedreno: add the a6xx sources to the Android build Add the files otherwise things just won't build. Haven't actually tested it, but it's a small step in the right direction. Fixes: `de3b34df97` ("freedreno: Add a6xx backend") Cc: Kristian H. Kristensen <hoegsberg@chromium.org> Cc: Rob Clark <robdclark@gmail.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Rob Clark <robdclark@gmail.com>	2018-10-03 13:38:05 +01:00
Emil Velikov	7419b22413	pipe-loader: add a dup() in pipe_loader_sw_probe_kms The pipe_loader_release API closes the fd given, even if the pipe-loader should _not_ take ownership of it. With earlier commit we fixed pipe_loader_drm_probe_fd, and now with cover the final piece. Note that unlike the DRM case, here the caller _did_ forget to dup before using it ... most likely leading to all sorts of fun. Don't forget the close in the error path. Seems like the things are a bit leaky/asymmetrical with the semi-recent config work. But we can shave that yak another day ;-) Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-03 13:38:05 +01:00
Emil Velikov	6ccc435e7a	pipe-loader: move dup(fd) within pipe_loader_drm_probe_fd Currently pipe_loader_drm_probe_fd takes ownership of the fd given. To match that, pipe_loader_release closes it. Yet we have many instances which do not want the change of ownership, and thus duplicate the fd before passing it to the pipe-loader. Move the dup() within pipe-loader, explicitly document that and document all the cases through the codebase. A trivial git grep -2 pipe_loader_release makes things as obvious as it gets ;-) Cc: Leo Liu <leo.liu@amd.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Axel Davy <davyaxel0@gmail.com> Cc: Patrick Rudolph <siro@das-labor.org> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Axel Davy <davyaxel0@gmail.com> (for nine)	2018-10-03 13:38:05 +01:00
Emil Velikov	7b8d1b313c	st/nine: do not double-close the fd on teardown As the newly introduced comment says: The pipe loader takes ownership of the fd Thus, there's no need to close it again. Cc: Patrick Rudolph <siro@das-labor.org> Cc: Axel Davy <davyaxel0@gmail.com> Cc: mesa-stable@lists.freedesktop.org> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Axel Davy <davyaxel0@gmail.com>	2018-10-03 13:38:05 +01:00
Emil Velikov	fa9df82f67	mesa: fold _glapi_check_multithread() back into _mesa_make_current With commit `c6c0f94714`, back in 2006 Brian removed the _glapi_check_multithread() call from core mesa - _mesa_make_current. It was done to remove fairly awkward #ifdef guard which caused subtle differences in core mesa. Since that guard is long gone, we can drop the duplication and reintroduce the call in core. Note that the function is was missing when using EGL + classic dri HW drivers. Yet on TLS builds it's a no-op, so we're safe. Any non TLS users - more or less anything !Linux (or even musl on Linux up-to semi-recently) may have experienced problems. v2: don't remove the call from swrast - move it to core (Eric) Cc: Eric Anholt <eric@anholt.net> Cc: Brian Paul <brianp@vmware.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-10-03 13:38:05 +01:00
Emil Velikov	d081ad2aa2	vl/dri3: do full teardown on screen_destroy Earlier commit added support for 'front_buffers', erroneously adding a return in vl_dri3_screen_destroy. Effectively leaking a lot of state. Fixes: `8d7ac0a4e4` ("vl/dri3: implement DRI3 BufferFromPixmap") Cc: Leo Liu <leo.liu@amd.com> Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Leo Liu <leo.liu@amd.com>	2018-10-03 13:38:05 +01:00
Emil Velikov	1301674c39	st/dri: make swrast_no_present member of dri_screen Just like the dri2 options, this is better suited in the dri_screen struct. Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-10-03 13:38:05 +01:00
Emil Velikov	80b62e2d6d	st/dri: inline dri2_buffer.h within dri2.c The header was used only by dri2.c, containing a two-member struct and cast wrapper. Just inline it where it's used/needed. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-03 13:38:05 +01:00
Emil Velikov	89c2c386c0	st/xa: remove unused xa_screen::d[s]_depth_bits_last Unused since the initial import. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-03 13:38:05 +01:00
Emil Velikov	5ade4b10e2	mesa: use C99 initializer in get_gl_override() The overrides array contains entries indexed on the gl_api enum. Use a C99 initializer to make it a bit more obvious. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-10-03 13:38:05 +01:00
Gabriel Majeri	f0b987646a	anv: Ensure discreteQueuePriorities is at least 2 This is the minimum value according to the spec. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-10-03 07:57:37 +02:00
Timothy Arceri	2b5f42068d	r600: use build-id when available for disk cache Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-03 09:49:21 +10:00
Timothy Arceri	397f2603eb	nouveau: use build-id when available for disk cache Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-03 09:49:21 +10:00
Timothy Arceri	2169acbf34	radeonsi: use build-id when available for disk cache Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-03 09:49:21 +10:00
Timothy Arceri	83ea8dd99b	util: add disk_cache_get_function_identifier() This can be used as a drop in replacement for disk_cache_get_function_timestamp(). Here we use build-id to generate a driver-id rather than build timestamp if available. This should resolve issues such as distros using reproducable builds and flatpak not having real build timestamps. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-03 09:49:21 +10:00
Timothy Arceri	6a884014e4	util: rename timestamp param in disk_cache_create() Only some drivers use a timestamp here. Others use things such as build-id, or even a combination of build-ids from Mesa and LLVM. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-10-03 09:49:21 +10:00
Józef Kucia	e24a4e05c7	radeonsi: avoid sending GS_EMIT in shaders without outputs Fixes GPU hangs. Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107857 Signed-off-by: Józef Kucia <joseph.kucia@gmail.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2018-10-02 17:13:52 -04:00
Fritz Koenig	08f97407fb	i965: Replace checks for rb->Name with FlipY (v2) In the GL_MESA_framebuffer_flip_y implementation _mesa_is_winsys_fbo checks were replaced with FlipY checks. rb->Name is also used to determine if a buffer is winsys. v2: Fixes annotation [for emil] Fixes: `ab05dd183c` ("i965: implement GL_MESA_framebuffer_flip_y [v3]") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Chad Versace <chadversary@chromium.org>	2018-10-02 11:28:46 -07:00
Marek Olšák	2fd58d8eb2	radeonsi: initialize ac_gpu_info::name when using SI_FORCE_FAMILY so that it's not NULL when loading radeonsi and a GCN GPU is not present in the system.	2018-10-02 12:21:49 -04:00
Marek Olšák	0b062f0419	radeonsi: don't set the VS prolog key for the blit VS	2018-10-02 12:21:49 -04:00
Jason Ekstrand	58360ca09d	spirv: Move function call handling to vtn_cfg It makes way more sense for it to live there with the rest of function handling. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-10-02 10:24:56 -05:00
Jason Ekstrand	00f385e6d4	nir/from_ssa: Don't rewrite derefs destinations to registers We already call nir_rematerialize_derefs_in_use_blocks_impl prior to calling nir_lower_ssa_defs_to_regs_block so the assertion that all deref uses in the block should hold. This fixes the following CTS test when SPIR-V optimization recipe 1: dEQP-VK.glsl.struct.local.loop_nested_struct_array_vertex Fixes: `606eb56ab9` "intel/nir: Only lower load/store derefs" Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-10-02 10:24:56 -05:00
Jason Ekstrand	bfc89c668e	nir/cf: Remove phi sources if needed in nir_handle_add_jump If the block in which the jump is inserted is the predecessor of a phi then we need to remove phi sources otherwise the phi may end up with things improperly connected. This fixes the following CTS test when dEQP is run with SPIR-V optimization recipe 1: dEQP-VK.glsl.functions.control_flow.return_in_nested_loop_vertex Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-10-02 10:24:56 -05:00
Eric Engestrom	7b0752fb10	anv: suppress warning about unhandled image layout Let's just be explicit that VK_NV_shading_rate_image is not supported. Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com> Fixes: `6ee1709170` "vulkan: Update the XML and headers to 1.1.86" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>	2018-10-02 15:09:29 +01:00
Rob Clark	ae78489d3e	freedreno/a6xx: hwbinning Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-02 10:08:18 -04:00
Rob Clark	8ff349e564	freedreno: update generated headers Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-10-02 10:08:18 -04:00
Jason Ekstrand	7e7959fcb7	intel/fs: Fix a typo in need_matching_subreg_offset This fixes a bunch of Vulkan subgroup tests on little core platforms. Fixes: `4150920b95` "intel/fs: Add a helper for emitting scan operations" Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Tested-by: Mark Janes <mark.a.janes@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-10-02 07:44:25 -05:00
Timothy Arceri	ea66bfda88	util: disable cache if we have no build-id and timestamp is zero Timestamp can be zero for example when Flatpak is used. In this case just disable the cache rather then segfaulting when incompatible cache items are loaded. V2: actually return false when mtime is 0. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-02 22:07:55 +10:00
Eric Engestrom	0bdf7b1d0f	include: sync eglext.h from Khronos Signed-off-by: Eric Engestrom <eric@engestrom.ch> Acked-by: Tapani Pälli <tapani.palli@intel.com>	2018-10-02 12:10:46 +01:00
Timothy Arceri	0e6cdfd561	radeonsi: add a workaround for bitfield_extract when count is 0 This ports the fix from `3d41757788`. Both LLVM 7 & 8 continue to have this problem. It fixes rendering issues in some menu and loading screens of Civ VI which can be seen in the trace from bug 104602. Note: This does not fix the black triangles on Vega for bug 104602. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104602 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107276	2018-10-02 08:39:51 +10:00
Jason Ekstrand	e4538b93f5	anv: Implement VK_KHR_driver_properties Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-01 13:21:12 -05:00
Jason Ekstrand	6ee1709170	vulkan: Update the XML and headers to 1.1.86 Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-01 11:43:20 -05:00
Samuel Pitoiset	c2867e4c2a	radv: do not try to set DCC_CONTROL when image doesn't use DCC Unnecessary. While we are at it, remove the check for pre-VI because it's already checked earlier. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-01 12:13:12 +02:00
Samuel Pitoiset	f622ab889a	radv: add a sanity check for mutable formats and TC-compat HTILE If apps use the MUTABLE bit and the same formats as the image one in the list, we can still enable TC-compat HTILE. I don't think this happens often but given the fact that TC-compat HTILE allows a nice boost in some situations, it's worth checking. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-01 12:13:09 +02:00
Samuel Pitoiset	dc91c4d40a	radv: disable HTILE for very small depth surfaces Like we disable DCC/CMASK for small color surfaces as well. Serious Sam 2017 creates a 1x1 depth surface and I think it should be faster to do slow clears on the graphics queue instead of fast clears on compute, and eventually a depth expand if the surface isn't TC-compatible HTILE. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-01 10:16:33 +02:00
Samuel Pitoiset	6cfa321c39	radv: add potential missing fields for DB_EQAA Other drivers set these two as well, just apply the same rule. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-01 10:16:30 +02:00
Samuel Pitoiset	bd6df2f923	radv: disable complicated point clipping against user clip planes I don't think this is required by Vulkan too. Ported from RadeonSI (AMDVLK doesn't set it either). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-10-01 10:16:25 +02:00
Michel Dänzer	cb863de626	gallium/util: Clarify comment in util_init_thread_pinning As discussed in the review of the patch which added the comment: Nothing happens when a thread is created, because pthread_atfork doesn't affect creating threads. However, spawning a child process will likely crash. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-09-28 17:52:11 +02:00
Samuel Pitoiset	3fb4adae83	radv: do not sync CP DMA when copying buffers We already track if the DMA engine is busy/idle with a flag, and we emit a packet that waits for all CP DMA operations to be complete. This is done at end of command buffer because the kernel doesn't wait for them, and also when emitting barriers, so it should be safe. This improves small copies for both aligned and unaligned sizes. Aligned sizes: BEFORE: 1 KB: 59.840000 ms 2 KB: 71.200000 ms AFTER: 1 KB: 31.200000 ms 2 KB: 31.040000 ms Unaligned sizes: BEFORE: 2 KB: 68.3200 ms 3 KB: 79.3600 ms 5 KB: 76.6400 ms 9 KB: 90.8800 ms 17 KB: 116.0000 ms AFTER: 2 KB: 31.0400 ms 3 KB: 32.0000 ms 5 KB: 30.8800 ms 9 KB: 30.5600 ms 17 KB: 29.6000 ms Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-28 09:08:52 +02:00
Samuel Pitoiset	621e70dd40	radv: adjust the CmdUpdateBuffer threshold for optimal performance According to my benchmark results, it appears that we should reduce the threshold to 1024. BEFORE: 1 KB: 68.656000 ms 2 KB: 118.368000 ms AFTER: 1 KB: 31.760000 ms 2 KB: 29.840000 ms Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-28 09:08:44 +02:00
Samuel Pitoiset	5d6a560a29	radv: do not use the availability bit for timestamp queries It's unnecessary because we can just check if the timestamp is to different to the default value when a pool is created or resetted. Instead of waiting for the availability bit to be 1, we have to emit a not equal WAIT_REG_MEM for checking if the timestamp is ready. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-09-28 09:08:03 +02:00
Kristian H. Kristensen	3e90505224	freedreno/a6xx: Build up draw dword0 outside visibilty if statement Pulling this logic out means we can share the logic and avoid a couple of temporary variables that helped make things clearer before. Note that in either vismode case, we always program vismode 0. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2018-09-27 16:08:52 -04:00
Kristian H. Kristensen	74a87cdaa6	freedreno/a6xx: Simplify draw_emit() branches a bit Now that we've copied the emit logic into each branch of the if (info->index_size) statement, we can simplify the logic a bit according to which case we're in. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2018-09-27 16:08:52 -04:00
Kristian H. Kristensen	2516073cb6	freedreno/a6xx: Copy OUT_RING() part into each branch of the index if Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2018-09-27 16:08:52 -04:00
Kristian H. Kristensen	c3d58d9ffc	freedreno/a6xx: Split fd6_draw_emit into direct and indirect paths This splits the two code paths into separate functions and moves the "if (info->indirect)" test into draw_impl(). Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2018-09-27 16:08:52 -04:00
Kristian H. Kristensen	adcd83fb22	freedreno/a6xx: Inline fd6_draw() Simplify the code a bit by inlining this helper. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2018-09-27 16:08:52 -04:00
Kristian H. Kristensen	fb1c6b89a2	freedreno/a6xx: Move emit_marker and wfi to draw_impl() This way the markers clearly bracket the draw call and isn't duplicated for both direct and indirect draw code. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2018-09-27 16:08:52 -04:00
Kristian H. Kristensen	0559050557	freedreno/a6xx: Move inline functions out of fd6_draw.h Only used in fd6_draw.c so put them there. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2018-09-27 16:08:52 -04:00
Hyunjun Ko	1a40faa864	freedreno: fix a typo in launch_grid	2018-09-27 16:06:19 -04:00
Hyunjun Ko	aef410f31e	freedreno/ir3: fix the param order of cmpxchg According to the following definition, int AtomicCompSwap(inout int mem, uint compare, uint data); the preceding one in atomic_comp_swap of NIR is compare and data is followed, while src0 for cmpxchg needs vec2(data, compare) So for ssbo/image deref comp_swap, that should be reversed. Fixes: dEQP-GLES31.functional.image_load_store..atomic.comp_swap	2018-09-27 16:05:49 -04:00
Rob Clark	49d22c2dfc	freedreno/a6xx: fix shaders w/ >= 24 regs Possibly these bits mean something else now. Blob always seems to use FOUR_QUADS, and changing to TWO_QUADS seems to cause different threads to overlap registers. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-27 15:49:14 -04:00
Rob Clark	6530fcc4a7	freedreno/a6xx: fix gl_FragCoord.w Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-27 15:45:44 -04:00
Rob Clark	919741b8d5	freedreno: handle invalidated buffers harder Do a better job of skipping mem2gmem/gmem2mem.. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-27 15:41:46 -04:00
Rob Clark	19e9d28646	freedreno/a6xx: fix constlen Fix a few bits of confusion, as with previous gen's constlen is aligned to 4, and value in bitfield is left-shifted by 2 (ie. divided by 4). But this is done by the CONSTLEN() accessor/builder fxn, so don't do it twice. Also HLSQ_FS_CNTL.CONSTLEN is not special. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-27 15:33:10 -04:00
Rob Clark	12de415ad1	freedreno: fix inorder rendering case Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-27 15:32:39 -04:00
Rob Clark	b65b6f7606	freedreno/a6xx: backface stencil state Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-27 15:31:56 -04:00
Rob Clark	93db15d300	freedreno/a6xx: fix gpu crash with separate-stencil Fixes a crash in (of all things) dEQP-GLES2.info.vendor with --deqp-surface-type=fbo.. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-27 15:31:34 -04:00
Rob Clark	a52ef80d24	freedreno/a6xx: fix MRT config Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-27 15:30:36 -04:00
Rob Clark	8930e83642	freedreno: fix potential hang when destroying batch batch_flush_reset_dependencies() expects to be called unlocked, and can call fd_batch_reference() which can try to aquire the screen lock again. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-27 15:29:45 -04:00
Rob Clark	ef6d15f8a8	freedreno: fix corrupted fb state In `c3d9f29b` we allowed ctx->batch to be null, and started tracking the current framebuffer state in fd_context. But the existing logic in fd_blitter_pipe_begin() would, if !ctx->batch, set null fb state to be restored after blit. Which broke the world of deqp (and probably other things) Fixes: `c3d9f29b78` freedreno: allocate ctx's batch on demand Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-27 15:27:38 -04:00
Rob Clark	5bb96bf73a	freedreno: simplify pctx->clear() This is defined to always clear the entire surface(s) specified, regardless of scissor state.. mesa/st will turn scissored clears into a draw. So rip about a bunch of unnecessary machinery. Also remove a comment that was obsolete since using u_blitter to turn clear into draw (for the cases where there isn't a hw blitter fast-path). Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-27 15:26:32 -04:00
Rob Clark	a7fa44cd33	freedreno: fix FD_MESA_DEBUG=flush The logic to force a flush every draw was short-circuited with newer kernels. Also it should apply to clears as well. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-27 15:25:49 -04:00
Rob Clark	83c5c026ee	freedreno: fix scissor state emit The effective scissor changes based on rasterizer->scissor flag, so we need to re-emit scissor state when rasterizer state changes. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-27 15:25:24 -04:00
Rob Clark	106f18258a	freedreno: update generated headers Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-27 15:25:01 -04:00
Erik Faye-Lund	c3486cd8c9	st/mesa: do not call update_framebuffer_size with NULL pointer In st_renderbuffer_alloc_storage, we avoid allocating storage for zero-sized buffers, leading to this pointer being NULL. We already take care to avoid dereferencing these pointers for color-buffers, but not for depth/stencil-buffers. So let's thread a bit more carefully here. This avoids a crash while running Piglit's glx/glx-visuals-stencil test, both on virgl and r600g. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Guillaume Charifi <guillaume.charifi@sfr.fr> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-09-27 10:33:44 +02:00
Maxime	dd333c66bd	vulkan: Disable randr lease for libxcb < 1.13 Since the Randr lease code was added, compiling against libxcb 1.12 no longer works. CC: mesa-stable@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108024 Fixes: `7ab1fffcd2` Tested-By: Maxime <berillions@gmail.com> Fixes: `7ab1fffcd2` "vulkan: Add EXT_acquire_xlib_display [v5]"	2018-09-27 16:31:42 +10:00
Bas Nieuwenhuizen	40585ddb48	radv: Remove garbage comment. Trivial.	2018-09-27 02:04:06 +02:00
Bas Nieuwenhuizen	0207ebcbf1	radv: Do not use multiple draws for multisample copies. Use sample rate shading instead, should give better locality. Makes Nier with 8x msaa on a Raven go 5 fps -> 7 fps in the menu. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-09-27 02:04:00 +02:00
Jordan Justen	ca1d3fc538	anv: If softpin is supported, use it with the hiz clear value bo Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-09-26 10:21:23 -07:00
Jordan Justen	2a97390552	anv: s/batch/value_bo/ on anv_device_init_hiz_clear_batch Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-09-26 10:21:23 -07:00
Dylan Baker	e9bd071f49	docs: update calendar, add news and link release notes for 18.1.9	2018-09-26 09:44:40 -07:00
Dylan Baker	d4bdcf5d22	docs: Add sha256 sums to 18.1.9	2018-09-26 09:41:53 -07:00
Dylan Baker	4769f49455	docs: Add 18.1.9 release notes	2018-09-26 09:40:56 -07:00
Jason Ekstrand	b3f477ef7a	intel/isl: Add a unit suffixes to some struct fields and variables I was about to make the claim to someone that every field in isl_surf is either an enum or has explicit units. Then I looked at isl_surf and discovered this claim was wrong. We should fix that. This commit does a few refactors: * Add _B suffixes to some struct fields * Add _B to some variables and parameters * Rename row_pitch_tiles -> row_pitch_tl Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-09-26 08:52:26 -05:00
Axel Davy	0d495bec25	radeonsi: NaN should pass kill_if Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=105333 Fixes: https://github.com/iXit/Mesa-3D/issues/314 For this application, NaN is passed to KILL_IF and is expected to pass. v2: Explain in the code why UGE is used. Signed-off-by: Axel Davy <davyaxel0@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> CC: <mesa-stable@lists.freedesktop.org>	2018-09-25 22:05:24 +02:00
Axel Davy	46814e771a	st/nine: Do not mark both ff vs and ps updated Previously if only ff vs or only ff ps was used, the constants for both were marked as updated, while only the constants of the used ff shader were updated. Now that NINE_STATE_FF_VS and NINE_STATE_FF_PS do not intersect anymore, we can correctly mark the correct set of constant as updated. Fixes: https://github.com/iXit/Mesa-3D/issues/319 Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-09-25 22:05:24 +02:00
Axel Davy	8e0526555d	st/nine: Split NINE_STATE_FF_OTHER NINE_STATE_FF_OTHER was mostly ff vs states. Rename it to NINE_STATE_FF_VS_OTHER and move common states with ps to NINE_STATE_FF_PS_CONSTS (renamed from NINE_STATE_FF_PSSTAGES). Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-09-25 22:05:24 +02:00
Axel Davy	5f7a41c33b	st/nine: Add dummy ff shader state Some states only affect the ff shader, not its constants. Currently we don't check anything and always recompute the ff shader key. However we do check for NINE_STATE_FF_OTHER and if set we reupload some constants. Thus for those states which had NINE_STATE_FF_OTHER set but didn't need it, replace by a dummy ff shader state (which is easier to understand for an external reader than just setting 0 and more future proof). Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-09-25 22:05:24 +02:00
Axel Davy	f6bf1d2db0	st/nine: Mark pointsize states as ff states The pointsize states were missing the ff NINE_STATE_FF_OTHER flag, and thus might miss state updates when using ff. Fixes some wine tests. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-09-25 22:05:24 +02:00
Axel Davy	89beea100f	st/nine: Minor refactor of a few NINE_STATE_* flags Rename NINE_STATE_FOG_SHADER, NINE_STATE_POINTSIZE_SHADER and NINE_STATE_PS1X_SHADER into NINE_STATE_VS_PARAMS_MISC and NINE_STATE_PS_PARAMS_MISC. The behaviour is unchanged, except one minor change: D3DRS_FOGTABLEMODE doesn't need to affect VS. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-09-25 22:05:24 +02:00
Axel Davy	7ae2509ce0	st/nine: Increase maximum number of temp registers With some test app I hit the limit. As we allocate on demand (up to the maximum), it is free to increase the limit. Signed-off-by: Axel Davy <davyaxel0@gmail.com> CC: <mesa-stable@lists.freedesktop.org>	2018-09-25 22:05:24 +02:00
Axel Davy	dc4b53e129	st/nine: Lock the entire buffer in some cases. Previously we had already found that for MANAGED buffers the buffer started dirty (which meant all writes out of bound before the first draw call using the buffer have to be taken into account). Possibly it is the same for the other types of buffers. For now always lock the entire buffer (starting from the offset) for these (except for DYNAMIC buffers, which might hurt performance too much). Fixes: https://github.com/iXit/Mesa-3D/issues/301 Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-09-25 22:05:24 +02:00
Axel Davy	0eeb583650	st/nine: Don't call SetCursor until a cursor is set The previous code was ignoring the input until a cursor is set inside d3d (with SetCursorProperties), as expected by wine tests. However it did still make a call to ID3DPresent_SetCursor, which would result into a SetCursor(NULL) call, thus hidding any cursor set outside d3d, which we shouldn't do. Add comment about not avoiding redundant ID3DPresent_SetCursor calls once a cursor has been set in d3d, as it has been tested to cause regressions. Fixes: https://github.com/iXit/Mesa-3D/issues/197 Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-09-25 22:05:24 +02:00
Axel Davy	dcfde02bb0	st/nine: Avoid redundant SetCursorPos calls For some applications SetCursorPosition is called when a cursor event is received. Our SetCursorPosition was always calling wine SetCursorPos which would trigger a cursor event. The infinite loop is avoided by not calling SetCursorPos when the position hasn't changed. Found thanks to wine tests. Fixes irresponsive GUI for some applications. Fixes: https://github.com/iXit/Mesa-3D/issues/173 Signed-off-by: Axel Davy <davyaxel0@gmail.com> CC: <mesa-stable@lists.freedesktop.org>	2018-09-25 22:05:24 +02:00
Axel Davy	112c770597	st/nine: Init cursor position at device creation This is only useful for software cursor, but at least now we won't start it at (0, 0). Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-09-25 22:05:24 +02:00
Axel Davy	62ea55ec8b	st/nine: Initialize manually cursor structure Initialize manually the cursor structure fields for more clarity on its content. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-09-25 22:05:24 +02:00
Axel Davy	110950318c	st/nine: Check if format is DS before retrieving flags d3d9_get_pipe_depth_format_bindings assumes the input format is a depth stencil format. Previously the user could hit this function with an invalid format. Protect the last non protected call with a depth_stencil_format check. Another solution is to have d3d9_get_pipe_depth_format_bindings support non depth stencil format, but we don't want the user to create depth buffers with d3d formats that can't be one, it's better to check if the format can be depth buffer with d3d. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-09-25 22:05:24 +02:00
Axel Davy	af60fbc0a4	st/nine: Remove clamping when mul_zero_wins Tests show the clamping can be removed when mul_zero_wins is supported. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-09-25 22:05:24 +02:00
Axel Davy	a0afa80889	st/nine: Implement predicated instructions Most of the work was already there, just not implemented. Fixes: https://github.com/iXit/Mesa-3D/issues/318 Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-09-25 22:05:24 +02:00
Axel Davy	e7e82bcdc9	st/nine: Fix aliased read in ff Fix aliasing of colorarg_b4 with colorarg_b5. Fixes: https://github.com/iXit/Mesa-3D/issues/302 Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-09-25 22:05:24 +02:00
Axel Davy	9fc6aa1bbe	st/nine: Fix ff assignment with aliasing "tex_stage[s][D3DTSS_COLORARG0] >> 4" could be a two bit number, thus colorarg_b4 was incorrectly set. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-09-25 22:05:24 +02:00
Axel Davy	8c35fb0280	st/nine: Clarify some ff assignments colorarg0, etc are 3 bits wide. Make the code more readable by adding an & 0x7 to further indicate we only remember the first 3 bits only. The 4th bit is always 0, and colorarg_b4, colorarg_b5, etc are used to store the 5th and 6th bits. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-09-25 22:05:24 +02:00
Axel Davy	59aaeeb730	st/nine: Print transform matrices in debug This is useful to see the matrices content in the log to debug. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-09-25 22:05:24 +02:00
Axel Davy	d9da0a1f6d	st/nine: Add ff key hash to help debug This is very useful to find in the log the ff shader shource of a given call. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-09-25 22:05:24 +02:00
Axel Davy	fcbb00a502	st/nine: Avoid RefToBind calls in ff When using csmt, ff shader creation happens on the csmt thread. Creating the shaders, then calling RefToBind causes the device ref to be increased then decreased. However the device dtor assumes than no work pending on the csmt thread could increase the device ref, leading to hang. The issue is avoided by creating the shaders with a bind count directly. Fixes: https://github.com/iXit/Mesa-3D/issues/295 Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-09-25 22:05:24 +02:00
Axel Davy	e83b15cba0	st/nine: Add new helper for object creation with bind Add a new helper to create objects starting with a bind count instead of a ref count. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-09-25 22:05:24 +02:00
Axel Davy	fd86ce7c14	st/nine: Add parameter to start with bind Add a parameter to start new object with a bind instead of a refcount. Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-09-25 22:05:24 +02:00
Axel Davy	a9bf82ecf4	st/nine: Use perspective correction for ps depth fog Emulate perspective interpolation of depth for programmable ps fog ff ps fog uses position z, or 1/w depending on the ff projection matrix set. This is according to public documents found describing the algorithm and tests we made. In the case of programmable ps, we used position's z, which was sufficient to pass wine tests (which test shaders don't set w). Issue https://github.com/iXit/Mesa-3D/issues/315 showed that this calculation was wrong. Using perspective interpolation on z, that is using z * 1/w seems to satisfy both this application and wine tests. Fixes: https://github.com/iXit/Mesa-3D/issues/315 Signed-off-by: Axel Davy <davyaxel0@gmail.com>	2018-09-25 22:05:24 +02:00
Axel Davy	7ee5e5e239	st/nine: Clamp RCP when 0inf!=0 Tests done on several devices of all 3 vendors and of different generations showed that there are several ways of handling infs and NaN for d3d9. Tests showed Intel on windows does always clamp RCP, RSQ and LOG (thus preventing inf/nan generation), for all shader versions (some vendor behaviours vary with shader versions). Doing this in nine avoids 0inf issues for drivers that can't generate 0*inf=0 (which is controled by TGSI's MUL_ZERO_WINS). For now clamp for all drivers. An ulterior optimization would be to avoid clamping for drivers with MUL_ZERO_WINS for the specific shader versions where NV or AMD don't clamp. LOG and RSQ being already clamped, this patch only clamps RCP. Fixes: https://github.com/iXit/Mesa-3D/issues/316 Signed-off-by: Axel Davy <davyaxel0@gmail.com> CC: <mesa-stable@lists.freedesktop.org>	2018-09-25 22:05:23 +02:00
Jan Vesely	1f3fe4aaeb	.travis: Drop note about Clover builds being slow SWR takes 17+ minutes to build. Clover builds take ~6-7 minutes. Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-09-25 14:08:06 -04:00
Jan Vesely	cb1b109733	.travis: Add LLVM-7 Clover build Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-09-25 14:08:06 -04:00
Caio Marcelo de Oliveira Filho	3cf07361ac	intel/compiler: Export TCS passthrough creation Move create_passthrough_tcs() from i965 so can be used in other contexts. Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-25 09:16:31 -07:00
Gert Wollny	47a6f98e15	mesa/st: In the precense of integer buffers enable per buffer blending Since blending will be disabled later for integer formats we have to consider that in the case of a mixed set of integer/non-integer format buffers blending must be handled on a per buffer basis. Fixes on r600: dEQP-GLES31.functional.draw_buffers_indexed.random. max_required_draw_buffers.13 Fixes: `8fb966688b` st/mesa: Disable blending for integer formats. Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-09-25 15:54:38 +02:00
Eric Engestrom	97ae5a858d	meson+autotools: get rid of spammy GCC warning -Wformat-truncation That warning fires every time a string function takes an argument that could possibly be longer than its max output, which triggers all over the place, especially when working with file paths ("what if every file path is MAX_PATH long?" is what GCC is saying, which is really annoying when we know that "/dev/dri/cardN" is not gonna be 4096 char long and it's safe to store it in a 32-char array). Anyway, we either add a ton of dead code all over the place to make GCC happy, or we get rid of its spam. I chose the latter. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>	2018-09-25 11:40:08 +01:00
Eric Engestrom	1a37a80bf6	meson: make it trivial to add other -Wno-foo CFLAGS Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-09-25 11:39:56 +01:00
Eric Engestrom	f5b41f9121	gallivm: ensure string is null-terminated instead of assert()ing Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-09-25 11:39:30 +01:00
Topi Pohjolainen	1cc17fb731	intel/compiler/icl: Use barrier id bits 24:30 instead of 24:27,31 Fixes gpu hangs with Carchase and Manhattan. Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2018-09-25 09:59:59 +03:00
Andres Rodriguez	ec1fcf92ae	radv: only emit ZPASS_DONE for timestamp queries on gfx queues A ZPASS_DONE packet doesn't make sense for the compute queue. It will result in a gpu hang. This change resolves a gpu hang for SteamVR+Vega. Cc: mesa-stable@lists.freedesktop.org Fixes: `1f616a840e` "radv: emit a dummy ..." Signed-off-by: Andres Rodriguez <andresx7@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-09-25 02:30:34 -04:00
Timothy Arceri	72e4287e8f	radv: make use of nir_lower_load_const_to_scalar() This allows NIR to CSE more operations. LLVM does this also so the impact is limited, however doing this in NIR allows other opts to make progress. For example in radeonsi more loops are unrolled in Civilization Beyond Earth. The actual pipeline-db stats are not overwhelming but even in the negatively affected shaders the NIR is clearly better. It just happens that the code shuffling and in some cases calls to max rather than a flt result in the final output from LLVM not giving as good numbers. However this is an incremental opt that further passes build off so the change should be made IMO. Totals from affected shaders: SGPRS: 20192 -> 20184 (-0.04 %) VGPRS: 19516 -> 19524 (0.04 %) Spilled SGPRs: 437 -> 444 (1.60 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 1527444 -> 1522276 (-0.34 %) bytes LDS: 6 -> 6 (0.00 %) blocks Max Waves: 1018 -> 1016 (-0.20 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-25 09:31:22 +10:00
Dylan Baker	f03a160592	meson: de-duplicate LLVM check By adding `_llvm == 'true'` to the required argument we can check the 'auto' and 'true' case in one path. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-09-24 13:02:07 -07:00
Eric Engestrom	f2519e3493	vulkan/wsi/display: wsi_display_select_crtc() doesn' need to modify the connector Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2018-09-24 17:38:11 +01:00
Eric Engestrom	bde3102c0d	vulkan/wsi/display: check if wsi_swapchain_init() succeeded Fixes: `da997ebec9` "vulkan: Add KHR_display extension using DRM [v10]" Cc: Keith Packard <keithp@keithp.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-24 17:37:43 +01:00
Leo Liu	3e7b5e5db2	radeon/uvd: use bitstream coded number for symbols of Huffman tables Signed-off-by: Leo Liu <leo.liu@amd.com> Fixes: 130d1f456(radeon/uvd: reconstruct MJPEG bitstream) Cc: "18.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>	2018-09-24 09:12:49 -04:00
Rhys Perry	6ca1402c11	nv50/ir: fix link-time build failure Seems this fixes linking problems that occur in some situations. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-09-23 18:20:08 +01:00
Rhys Perry	b473fcc9a3	nvc0: fix bindless multisampled images on Maxwell+ NVC0_CB_AUX_BINDLESS_INFO isn't written to on Maxwell+ and it's too small anyway. With these changes, TXQ is used to determine the number of samples and the coordinate adjustment information looked up in a small array in the driver constant buffer. v2: rework to use TXQ and a small array instead of a larger array with an entry for each texture v3: get rid of the small array and calculate the adjustments in the shader Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Fixes: `c2ae9b4052` ('nvc0: implement multisampled images on Maxwell+') Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-09-22 20:13:17 +01:00
Eric Engestrom	ed797f6597	docs: fix couple typos/outdated info `git-branch` doesn't exist, and mesa3d-dev hasn't been used in a great many years :) Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-09-22 17:23:18 +01:00
Eric Engestrom	ae2694efe0	docs: update repo URLs after GitLab move I also updated the developer instructions; presumably someone who's been given commit rights already knows how to clone a repository :) A more useful thing is to show how to update the pushurl, and how to use access tokens to push over HTTPS (especially for us at Intel, where non-http traffic is a pain). Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-09-22 17:23:18 +01:00
Stuart Young	c95dd966c4	docs: Update FAQ with respect to s3tc support It's just over 10 months since 17.3.0 was released with s3tc support enabled. Probably a good idea to update the FAQ page. v2: Incorporate feedback from Adam Jackson <ajax@redhat.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Fixes: `04396a134f` ("mesa: Import libtxc_dxtn sources") Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-09-22 17:23:18 +01:00
Rhys Perry	f580a895b1	nvc0: warn about changing NVC0_CB_AUX_MP_INFO and NVC0_CB_AUX_DRAW_INFO Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-09-22 16:50:39 +01:00
Rhys Perry	01fa76b707	nvc0: Update counter reading shaders to new NVC0_CB_AUX_MP_INFO Fixes: `66ca7e400b` ('nvc0: add support for programmable sample locations') Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-09-22 16:50:22 +01:00
Eric Anholt	cd667edecc	vc4: Remove dead i == 0 code from the cos() implementation. The loop starts at 1.	2018-09-21 17:16:43 -07:00
Eric Anholt	10d5d2d527	vc4: Fix sin(0.0) and cos(0.0) accuracy to fix SDL rendering rotation. SDL has some shaders that compute sin(angle) and cos(angle) for a rotation matrix in the VS, and angle is usually 0.0. Our previous implementation had quite a bit of error around 0.0, causing single-pixel rotations at typical window sizes. SDL2 has changed as of August 28th (commit 12156:e5a666405750) to not need sin/cos in the VS, but we should still fix this for existing implementations or similar patterns that other programs may have. glsl-cos goes from 32 instructions to 36, but 9 uniforms to 7. glsl-sin goes from 32 instructions to 34, but 8 uniforms to 7. This seems like a fine impact to have for the bugfix. Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org> Fixes: https://github.com/anholt/mesa/issues/110	2018-09-21 17:16:43 -07:00
Anuj Phogat	a0baedb638	intel/icl: Fix URB size for different SKUs Different ICL SKUs have different URB sizes. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-09-21 14:40:04 -07:00
Anuj Phogat	fa1ff71a0f	i965/icl: Set Enabled Texel Offset Precision Fix bit h/w specification requires this bit to be always set. V2: Fix bit mask (Chris Wilson) Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-09-21 14:40:04 -07:00
Anuj Phogat	5eb173304b	anv/icl: Set Enabled Texel Offset Precision Fix bit h/w specification requires this bit to be always set. Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-09-21 14:40:04 -07:00
Alex Deucher	afb7c6b301	pci_ids: add new polaris pci id Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: mesa-stable@lists.freedesktop.org	2018-09-21 14:33:13 -05:00
Marek Olšák	f0cd7dbcd7	glsl_to_tgsi: invert gl_SamplePosition.y for the default framebuffer Fixes dEQP-GLES31.functional.shaders.sample_variables.sample_pos.correctness.default_framebuffer with --deqp-gl-config-name=rgba8888d24s8ms4 Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org>	2018-09-21 13:39:00 -04:00
Caio Marcelo de Oliveira Filho	b29ec31854	util: Add macro to get number of elements in dynarray Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2018-09-21 10:12:51 -07:00
Dylan Baker	be56f8a788	docs/meson: Add note about llvm-config$version and llvm-config-$version v2: - fix typo These are how FreeBSD and Debian handle multiple versions of LLVM installed at the same time, respectively. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-09-21 10:03:15 -07:00
Dylan Baker	e0829f9c1a	docs/meson: Update notes on using CFLAGS and -Dc_args v2: - Use ${} to denote variables instead of just $ - fix spelling error bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107313 Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-09-21 10:03:15 -07:00
Dylan Baker	1da60667b5	docs: update meson docs to reflect the current status v2: - minor grammar changes - fix typo Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-09-21 10:03:15 -07:00
Dylan Baker	509ea4649a	meson: Don't force libva to required from auto We already correctly handle va being auto, but we force it to being true, which is bad. Fixes `94cf397092` ("meson: Fix auto option for va") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-09-21 10:03:15 -07:00
Dylan Baker	5dcb77e491	meson: Don't compile pipe loader with dri support when not using dri Corrects building glx as gallium-xlib without any dri targets. v2: - fix ugly formatting Fixes: `66c94b9313` ("meson: build gallium winsys for dri, null, and wrapper") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-09-21 10:03:15 -07:00
Samuel Pitoiset	fe3f13cc5a	radv: use the resolve compute path if dest uses multiple layers The hardware path doesn't support resolving layers, for both source and destination images. This fixes a reflection issue when MSAA is enabled which affects GTA V and probably DIRT3. CC: <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107786 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Gregor Münch <gr.muench_at_gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-21 16:35:59 +02:00
Jason Ekstrand	ab80889e92	anv,radv: Implement vkAcquireNextImage2 This was added as part of 1.1 but it's very hard to track exactly what extension added it. In any case, we should implement it. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Dave Airlie <Airlied@redhat.com>	2018-09-21 07:02:35 -05:00
Juan A. Suarez Romero	24bacaddef	docs: update calendar, add news and link release notes to 18.2.1 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>	2018-09-21 13:09:21 +02:00
Juan A. Suarez Romero	eefc77e691	docs: add sha256 checksums for 18.2.1 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `686eab6642`)	2018-09-21 13:06:14 +02:00
Juan A. Suarez Romero	17fbb1ef74	docs: add release notes for 18.2.1 Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> (cherry picked from commit `3c8c851fe4`)	2018-09-21 13:06:12 +02:00
Samuel Pitoiset	674fcfaecc	radv: only enable shaderInt16 on GFX9+ and LLVM7+ The throughput is similar to 32-bit integers on GFX8 and AMDVLK does not expose 16-bit integers on pre Vega as well. On GFX9+, only LLVM 7+ has support. This fixes a bunch of CTS crashes on GFX9/LLVM 6. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-21 10:56:17 +02:00
Marek Olšák	945e9cdb2b	docs/features: add EXT_direct_state_access features Acked-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-09-21 03:01:58 -04:00
Bas Nieuwenhuizen	0a77e70d10	radv: Fix driver UUID SHA1 init. Was missing the init, found by Emil. Fixes: `d17443a459` "radv: Use build ID if available for cache UUID." CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-09-20 23:38:38 +02:00
Charmaine Lee	64731e7c5e	svga: fix uninitialized fields in DefineDepthStencilView/DefineStreamOutput This patch fixes uninitialized fields in DefineDepthStencilView and DefineStreamOutput commands that are not relevant in SM4 device. Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-20 13:20:10 -06:00
Brian Paul	7f4e6f4c97	r300g: add PIPE_SHADER_CAP_SCALAR_ISA switch case to silence warning Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-09-20 13:20:10 -06:00
Brian Paul	198c50f487	st/mesa: silenced unhanded enum warning in st_glsl_to_tgsi.cpp Add ir_intrinsic_begin_fragment_shader_ordering switch case to silence warning Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-09-20 13:20:10 -06:00
Brian Paul	35ea66a68e	mesa: use GLsizeiptrARB, GLintptrARB in bufferobj.c The function pointer declarations in dd.h for the BufferData() and BufferSubData() use the ARB-suffixed datatypes. This patch changes the buffer_data_fallback() and buffer_sub_data_fallback() functions to use those datatypes too. This fixes a build warning when building 32-bit libraries. Evidently, GLsizeiptrARB and GLsizeiptr are defined differently in that situation. All all implementations of these driver hooks use the ARB-suffixed types. Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-09-20 13:20:10 -06:00
Neha Bhende	708d34d41a	svga: Enable Opengl 3.3 compatibility profile With this patch, svga driver will start advertising OpenGL 3.3 compatibility profile. Tested with some mesa demos, piglit and glretrace. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-20 13:20:10 -06:00
Neha Bhende	ede805dd19	svga: Apply texcoord scale factors only if there is sampler view We need to convert unnormalized texcoords to normalized texcoords when we are sampling from texture. We don't need this conversion if there is no sampler view. Tested with piglit, glretrace Fixes vmware bug 2101970 Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-20 13:20:10 -06:00
Charmaine Lee	1dcf377a76	svga: fix texture array layer index in transfer map In gallium, the layer index of a texture array to be mapped is specified in the z component, whereas in svga device, the index is specified in a separate argument. Currently in svga_texture_transfer_map(), we explicitly modify the z value in the base transfer map to 0 so the layer offset will not be applied twice, but this causes problem when state tracker later refers to the base transfer map and expects the slice index to be specified in z (commit `463b0ea1f6`). To fix the problem, this patch makes a local copy of the box in svga_transfer and modifies the z value in this copy instead. Fixes spec@khr_texture_compression-astc piglit test crashes. Fixes regression in the dma path with commit 1fdd3dd94a. Tested with mtt glretrace, piglit on Windows VM and Linux VM. Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-20 13:20:10 -06:00
Dylan Baker	18a6e426f3	Revert "utils/u_math: break dependency on gallium/utils" This reverts commit `0abce6d770`. Which broke the windows build.	2018-09-20 10:36:33 -07:00
Caio Marcelo de Oliveira Filho	2567ad28bb	i965: remove outdated comment about TCS passthrough Since commit `75881bed9e` "i965: Rework the TCS passthrough shader to use NIR." the created nir_shader is not dummy, and it is compiled by the backend like the others. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-20 09:58:55 -07:00
Christoph Haag	b01834b56c	meson: add option to statically link llvm Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-09-20 06:08:50 -07:00
Dylan Baker	0abce6d770	utils/u_math: break dependency on gallium/utils Currently u_math needs gallium utils for cpu detection. Most of what u_math uses out of u_cpu_detection is duplicated in src/mesa/x86 (surprise!), so I've just reworked it as much as possible to use the x86/common_x86_features.h macros instead of the gallium ones. The mesa implementation is a header only approach, with no external dependencies. There is one small function that was copied over, as promoting u_cpu_detection is itself a fairly hefty undertaking, as it depends on u_debug, and this fixes the bug for now. bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107870 Tested-by: Vinson Lee <vlee@freedesktop.org>	2018-09-20 05:52:23 -07:00
Emil Velikov	b8b3517a49	egl/android: rework device probing Unlike the other platforms, here we aim do guess if the device that we somewhat arbitrarily picked, is supported or not. In particular: when a vendor is _not_ requested we loop through all devices, picking the first one which can create a DRI screen. When a vendor is requested - we use that and do _not_ fall-back to any other device. The former seems a bit fiddly, but considering EGL_EXT_explicit_device and EGL_MESA_query_renderer are MIA, this is the best we can do for the moment. With those (proposed) extensions userspace will be able to create a separate EGL display for each device, query device details and make the conscious decision which one to use. v2: - update droid_open_device_drm_gralloc() - set the dri2_dpy->fd before using it - return a EGLBoolean for droid_{probe,open}_device* - do not warn on droid_load_driver failure (Tomasz) - plug mem leak on dri2_create_screen failure (Tomasz) - fixup function name typo (Tomasz, Rob) v3: - add forward declaration for droid_load_driver() Fixes the HAVE_DRM_GRALLOC build (Mauro) - split dup() assignment and check in separate lines (Tomasz, Eric) - make droid_load_driver() static (Tomasz) - drop unused prop_set variable (Tomasz) v4: - rebase - fwd declarationi should be for droid_probe_device() Cc: Robert Foss <robert.foss@collabora.com> Cc: Tomasz Figa <tfiga@chromium.org> Cc: Mauro Rossi <issor.oruam@gmail.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Tomasz Figa <tfiga@chromium.org> Tested-by: Tomasz Figa <tfiga@chromium.org> Tested-by: Tapani Pälli <tapani.palli@intel.com>	2018-09-20 10:15:38 +01:00
Danylo Piliaiev	18be7403a1	glsl: Add an assert when cloning ir_dereference_record with invalid field Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-09-20 08:30:11 +10:00
Danylo Piliaiev	6f3c7374b1	glsl: Avoid propagating incompatible type of initializer do_assignment validated assigment but when rhs type was not compatible it proceeded without issues and returned error_emitted = false. On the other hand process_initializer expected do_assignment to always return compatible type and never fail. As a result when variable was initialized with incompatible type the type of variable changed to the incompatible one. This manifested in unnecessary error messages and in one case in crash. Example GLSL: vec4 tmp = vec2(0.0); tmp.z -= 1.0; Past error messages: initializer of type vec2 cannot be assigned to variable of type vec4 invalid swizzle / mask `z' type mismatch operands to arithmetic operators must be numeric After this patch: initializer of type vec2 cannot be assigned to variable of type vec4 In the other case when we initialize variable with incompatible struct, accessing variable's field leaded to a crash. Example: uniform struct {float field;} data; ... vec4 tmp = data; tmp.x -= 1.0; After the patch there is only error line without a crash: initializer of type #anon_struct cannot be assigned to variable of type vec4 Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107547	2018-09-20 08:30:11 +10:00
Michal Srb	194bf0a2e0	st/dri: don't set queryDmaBufFormats/queryDmaBufModifiers if the driver does not implement it This is equivalent to commit `a65db0ad1c`, but for dri_kms_init_screen. Without this gbm_dri_is_format_supported always returns false. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104926 Fixes: `e14fe41e0b` ("st/dri: implement createImageFromRenderbuffer(2)") Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Tested-by: Adam Williamson <adamwill@fedoraproject.org>	2018-09-19 15:20:04 -04:00
Jason Ekstrand	c811af767e	anv/so_memcpy: Don't consider src/dst_offset when computing block size The only thing that matters is the size since we never specify any offsets in terms of blocks. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-09-19 09:38:04 -05:00
Jakob Bornecrantz	09171705d5	Revert "mesa: only update framebuffer-state for clears" This reverts commit `fb86365148`.	2018-09-19 15:21:26 +01:00
Samuel Pitoiset	121f226471	radv: use a 64-bit unsigned integer when allocating a descriptor pool pool->size is a 64-bit unsigned integer too. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-19 13:36:12 +02:00
Samuel Pitoiset	35656823b9	radv: enable VK_SUBGROUP_FEATURE_ARITHMETIC_BIT All CTS pass on Polaris/Vega with LLVM 6, 7 and master, so I think it's safe to enable the feature. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-19 13:36:10 +02:00
Samuel Pitoiset	febdc13a6c	radv: do not support blitting surfaces with depth and stencil Fixes: dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.depth_stencil.d32_sfloat_s8_uint_d32_sfloat_s8_uint.optimal_optimal_nearest And all friends that try to blit a surface with different depth and stencil formats. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-19 13:36:07 +02:00
Erik Faye-Lund	fb86365148	mesa: only update framebuffer-state for clears If we update the program-state etc, we risk compiling needless shaders, which can cost quite a bit of performance. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-09-19 11:52:53 +02:00
Juan A. Suarez Romero	0c82e3603e	nir: add initializer data to fix MSVC compile error CC: Jason Ekstrand <jason@jlekstrand.net> Fixes: 82799a5d1b8 ("nir: Add a small pass to rematerialize derefs per-block") Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-09-19 11:46:44 +02:00
Jason Ekstrand	976046a8d8	nir: Add some asserts that we don't put derefs in phis The lcssa and phis_to_regs passes are used by various NIR optimizations that modify the CFG. Putting a couple of asserts will help ensure that we don't accidentally put derefs in phis as part of an optimization pass. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-09-19 02:00:49 -05:00
Jason Ekstrand	864c780566	nir/opt_if: Re-materialize derefs in use blocks before peeling loops Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107879 Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-09-19 02:00:49 -05:00
Jason Ekstrand	0796c3934e	nir/loop_unroll: Re-materialize derefs in use blocks before unrolling When we're about to re-arrange a bunch of blocks, it's a good idea to make sure that we don't have deref uses crossing block boundaries. Otherwise we may end up with a deref going through a phi and that would be bad. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-09-19 01:59:40 -05:00
Jason Ekstrand	7d1d1208c2	nir: Add a small pass to rematerialize derefs per-block This pass re-materializes deref instructions on a per-block basis to ensure that every use of a deref occurs in the same block as the instruction which uses it. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-09-19 01:59:40 -05:00
Kenneth Feng	4490fce166	amd: Add Picasso device id No changes here compared to Raven. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Huang Rui <ray.huang@amd.com> Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org>	2018-09-18 18:05:17 -04:00
Bas Nieuwenhuizen	95bb7d82ca	Revert "radv: fix descriptor pool allocation size" This reverts commit `90819abb56`. This logic was wrong, the original code is correct. The direct impact is that we allocate up to approximately a squared amount of memory compared to what we should allocate. Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-09-18 22:51:42 +02:00
Samuel Pitoiset	c9dbe52f84	radv: implement VK_EXT_conservative_rasterization Only supported by GFX9+. The conservativeraster Sascha demo seems to work as expected. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-18 13:28:01 +02:00
Samuel Pitoiset	450a325858	radv: do not re-create the sampler for every blits in CmdBlitImage() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-18 13:27:59 +02:00
Samuel Pitoiset	3871dd7a92	radv: allow to force anisotropy via RADV_TEX_ANISO Ported from RadeonSI. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-18 13:27:58 +02:00
Timothy Arceri	b54a2311a9	mesa: enable EXT_framebuffer_object in core profile Since user defined names are not allowed in core profile we remove the allow_user_names bool and just check if we have a core profile like all other buffer/texture object handling code does. This extension is required by "Wolfenstein: The Old Blood" and is exposed in core in the Nvidia binary driver. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-09-18 19:58:24 +10:00
Timothy Arceri	02843ed768	mesa: move legacy dri config option texture_depth Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-09-18 19:43:05 +10:00
Timothy Arceri	f958ea6eff	mesa: move legacy dri config option fthrottle_mode Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-09-18 19:43:05 +10:00
Timothy Arceri	4b1a81ef9d	mesa: move legacy dri config option def_max_anisotropy Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-09-18 19:43:05 +10:00
Timothy Arceri	6164d59bcc	mesa: move legacy dri config option no_neg_lod_bias Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-09-18 19:43:05 +10:00
Timothy Arceri	6d1890fa07	mesa: move legacy dri config option round_mode Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-09-18 19:43:05 +10:00
Timothy Arceri	3a1d09fd55	mesa: remove unused dri option float_depth This seems to have only been used by DRI1 drivers which were removed with `e4344161bd`. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-09-18 19:43:05 +10:00
Timothy Arceri	91e76ce493	mesa: move legacy dri config option dither_mode Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-09-18 19:43:05 +10:00
Timothy Arceri	2d7dc9591d	mesa: move legacy dri config option color_reduction Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-09-18 19:43:05 +10:00
Timothy Arceri	408d41a413	mesa: move legacy TCL dri config options Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-09-18 19:43:05 +10:00
Timothy Arceri	024abd3534	util: use force_compat_profile for Wolfenstein The Old Blood This game is looking for some odd extension after creating a core context such as ARB_vertex_program and EXT_framebuffer_object. Rather then enabling these in core this forces the game to use compat. This allows the game to run and seems to work without issues. All other id tech games/engines use a compat profile. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-09-18 19:34:54 +10:00
Timothy Arceri	64ec50d52f	mesa/st: add force_compat_profile option to driconfig Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-09-18 19:34:54 +10:00
Timothy Arceri	7a992fcfa0	Revert "radeonsi: avoid syncing the driver thread in si_fence_finish" This reverts commit `bc65dcab3b`. This was manually reverted. Reverting stops the menu hanging in some id tech games such as RAGE and Wolfenstein The New Order. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107891	2018-09-18 19:21:32 +10:00
Eric Anholt	4e1af6808c	v3d: Switch from FLUSH_ALL_STATE to FLUSH for ending our bin CLs. The HW for FLUSH_ALL_STATE isn't validated, since the closed driver only uses FLUSH. Now that we don't have any new state at the end of our bin CLs, follow their lead.	2018-09-17 16:35:45 -07:00
Eric Anholt	0b8007b523	v3d: Stop clearing the OQ state at the end of the job. Ever since we added OQ support, we've been clearing OQ state at the start of the job anyway. We're intentionally breaking old-and-new-driver-mix systems, because we need to stop using the unvalidated FLUSH_ALL_STATE.	2018-09-17 16:35:45 -07:00
Eric Anholt	350cb79045	v3d: Always emit a TF disable at the start of drawing on V3D 4.x. The HW's FLUSH_ALL_STATE is not validated, so we probably shouldn't use it, meaning that we need to reset state at the start. By doing this, we also make ourselves more resilient to another client leaving the TF state enabled at the end of their batch (as we now do, ourselves). However, we still need to emit a single TF disable at the end of the frame, for SWVC5-718.	2018-09-17 16:35:45 -07:00
Dylan Baker	7f08bcb73f	build: Don't overlink gallium xlib target Currently gallium's xlib target will fail to link due to multiple definitions of all the symbols in libmesautil, this only shows up in autotools, and not in meson due to differences in the way that meson and autotools handle linking static archives into static archives. Autotools uses -Wl,--whole-archive implicitly, meson requires this behavior to be opted-into. The solution is just to remove libmesautils from the libgl-xlib target, since it will get all of those symbols form libmesagallium. I've dropped the link from meson as well, it doesn't seem to hurt anything and should make linking just a little faster. Fixes: `8396043f30` ("Replace uses of _mesa_bitcount with util_bitcount") bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107923 Tested-by: Brian Paul <brianp@vmware.com> Tested-by: Vinson Lee <vlee@freedesktop.org> Cc: Sergii Romantsov<sergii.romantsov@globallogic.com>	2018-09-17 13:21:01 -07:00
Dylan Baker	3acc18fcf7	move pthread_setaffinity_np check to the build system Rather than trying to encode all of the rules in a header, lets just put them in the build system where they belong. This fixes the build on FreeBSD, which does have pthraed_setaffinity_np, but it's in a pthread_np.h, not behind _GNU_SOURCE. FreeBSD also implements cpu_set slightly differently, so additional changes would be required to get it working right there anyway. v2: - fix #define in autotools Fixes: `9f1bbbdbbd` ("util: try to fix the Android and MacOS build") Cc: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-09-17 13:16:46 -07:00
Fritz Koenig	60d0c0d062	mesa: FramebufferParameteri parameter checking Missing break; causes parameter checking to never pass GL_FRAMEBUFFER_FLIP_Y_MESA parameters. Fixes: `318c265160` ("mesa: GL_MESA_framebuffer_flip_y extension [v4]") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-17 11:48:00 -07:00
Fritz Koenig	ba6cc32cf9	mesa: Additional FlipY applications Instances where direction was determined based on winsys or user fbo and should be determined based on FlipY. Key STATE_FB_WPOS_Y_TRANSFORM for of FlipY instead of _mesa_is_user_fbo. This corrects gl_FragCoord usage when applying GL_MESA_framebuffer_flip_y. Fixes: `ab05dd183c` ("i965: implement GL_MESA_framebuffer_flip_y [v3]") Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-17 11:48:00 -07:00
Bas Nieuwenhuizen	d17443a459	radv: Use build ID if available for cache UUID. To get an useful UUID for systems that have a non-useful mtime for the binaries. I started using SHA1 to ensure we get reasonable mixing in the various possibilities and the various build id lengths. CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-09-17 20:19:52 +02:00
Samuel Pitoiset	08103c5f65	radv: enable shaderInt16 capability Not sure if this is all wired up. CTS does pass and the Tangrams demo works fine on Vega. There are corruption issues on Polaris but not sure if that related to 16-bit support. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-17 15:18:39 +02:00
Samuel Pitoiset	cd76ce0078	ac: add 16-bit support to ac_build_bitfield_reverse() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-17 15:18:37 +02:00
Samuel Pitoiset	fc398f4d67	ac: add 16-bit support to ac_build_bit_count() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-17 15:18:34 +02:00
Samuel Pitoiset	94dd08eb7c	ac: add 16-bit support to ac_find_lsb() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-17 15:18:32 +02:00
Samuel Pitoiset	5a6c8ca3e8	ac: add 16-bit support to ac_build_umsb() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-17 15:18:30 +02:00
Samuel Pitoiset	3e7f3e2cd1	ac: add 16-bit support to ac_build_isign() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-17 15:18:28 +02:00
Samuel Pitoiset	cfd6314cfe	ac: add 16-bit constant values for zero and one Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-17 15:18:26 +02:00
Samuel Pitoiset	074e29183c	ac: add ac_build_bifield_reverse() helper Are we missing 64-bit support? Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-17 15:18:23 +02:00
Samuel Pitoiset	371c35e5bb	ac: add ac_build_bit_count() helper Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-17 15:18:20 +02:00
Samuel Pitoiset	aec9151464	radv: fix use of unreachable() in the meta blit path Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-09-17 11:29:25 +02:00
Samuel Pitoiset	6521d4a659	Revert "radv: Optimize rebinding the same descriptor set." This introduces random GPU hangs on Vega, at least. This reverts commit `02a43edf18`.	2018-09-17 11:20:57 +02:00
Samuel Pitoiset	90819abb56	radv: fix descriptor pool allocation size The size has to be multiplied by the number of sets. This gets rid of the OUT_OF_POOL_KHR error and fixes a crash with the Tangrams demo. CC: 18.1 18.2 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-17 10:18:01 +02:00
Jason Ekstrand	67094e11e9	anv/query: Add an emit_srm helper Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-09-17 02:57:21 -05:00
Jason Ekstrand	40149441b8	anv: Add a mi_memset and use it for zeroing queries Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-09-17 02:57:21 -05:00
Jason Ekstrand	b11e9b5ffe	anv/query: Use anv_address everywhere Instead of passing around BOs and offsets, use addresses which are anv's GPU equivalent of pointers. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-09-17 02:57:21 -05:00
Jason Ekstrand	07e214f1ce	anv/query: Write both dwords in emit_zero_queries Each query slot is a uint64_t and we were only zeroing half of it. Fixes: `7ec6e4e689` "anv/query: implement multiview interactions" Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-09-17 02:57:21 -05:00
Jason Ekstrand	c0420a62c9	anv/query: Increment an index while writing results Instead of computing an index at the end which we hope maps to the number of things written, just count the number of things as we go. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-09-17 02:57:21 -05:00
Ian Romanick	df9dbc03d3	i965/fs: Don't propagate conditional modifiers from integer compares to adds No shader-db changes on any Intel platform... which probably explains why no bugs have been bisected to this problem since it landed in Mesa 18.1. :( The commit mentioned below is in 18.2, so 18.1 would need a slightly different fix (due to code refactoring). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Fixes: `77f269bb56` "i965/fs: Refactor propagation of conditional modifiers from compares to adds" Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (reviewed the original patch) Cc: Matt Turner <mattst88@gmail.com> (reviewed the original patch)	2018-09-17 00:38:22 -07:00
Bas Nieuwenhuizen	0dd8189f15	radv: Only allow 16 user SGPRs for compute on GFX9+. Apparently for compute there are only 16 instead of the 32 for the graphics path. Fixes dEQP-VK.binding_model.descriptorset_random.sets16.noarray.ubolimitlow.sbolimitlow.imglimitlow.noiub.comp.0 CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-09-16 12:50:58 +02:00
Bas Nieuwenhuizen	d97c892584	radv: Set the user SGPR MSB for Vega. Otherwise using 32 user SGPRs would be broken. CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-09-16 12:50:58 +02:00
Bas Nieuwenhuizen	02a43edf18	radv: Optimize rebinding the same descriptor set. This makes it cheaper to just change the dynamic offsets with the same descriptor sets. Suggested-by: Philip Rebohle <philip.rebohle@tu-dortmund.de> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-09-16 12:50:19 +02:00
Gert Wollny	14976817f4	r600/sb: use safe math optimizations when TGSI contains precise operations Fixes: dEQP-GLES3.functional.shaders.invariance.highp.common_subexpression_3 dEQP-GLES3.functional.shaders.invariance.mediump.common_subexpression_3 dEQP-GLES3.functional.shaders.invariance.lowp.common_subexpression_3 Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-09-15 20:44:53 +02:00
Mauro Rossi	cc3b99bb48	android: broadcom/cle: export the broadcom top level path headers Fixes the following building error in vc4 build: In file included from external/mesa/src/gallium/drivers/vc4/kernel/vc4_render_cl.c:34: In file included from external/mesa/src/gallium/drivers/vc4/kernel/vc4_drv.h:27: In file included from external/mesa/src/gallium/drivers/vc4/vc4_simulator_validate.h:34: In file included from external/mesa/src/gallium/drivers/vc4/vc4_context.h:39: In file included from external/mesa/src/gallium/drivers/vc4/vc4_cl.h:56: gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h:12:10: fatal error: 'cle/v3d_packet_helpers.h' file not found ^~~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. Fixes: `5b102160ae` ("broadcom/genxml: Introduce a V3D packet/struct decoder.") Cc: "18.2" <mesa-stable@lists.freedesktop.org> Acked-by: Eric Anholt <eric@anholt.net> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>	2018-09-15 09:14:46 +02:00
Mauro Rossi	9158e0bd82	android: broadcom/cle: add gallium include path Fixes the following building error: In file included from external/mesa/src/broadcom/cle/v3d_decoder.c:38: In file included from external/mesa/src/broadcom/cle/v3d_packet_helpers.h:29: external/mesa/src/gallium/auxiliary/util/u_math.h:42:10: fatal error: 'pipe/p_compiler.h' file not found ^~~~~~~~~~~~~~~~~~~ 1 error generated. Fixes: `5b102160ae` ("broadcom/genxml: Introduce a V3D packet/struct decoder.") Cc: "18.2" <mesa-stable@lists.freedesktop.org> Acked-by: Eric Anholt <eric@anholt.net> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>	2018-09-15 09:14:42 +02:00
Mauro Rossi	3341429d74	android: broadcom/genxml: fix collision with intel/genxml header-gen macro Fixes the following building error, happening when building both intel and broadcom: Gen Header: libmesa_broadcom_genxml_32 <= v3d_packet_v21_pack.h FAILED: gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h /bin/bash -c "python external/mesa/src/broadcom/cle/gen_pack_header.py \ external/mesa/src/broadcom/cle/v3d_packet_v21.xml \ > gen/STATIC_LIBRARIES/libmesa_broadcom_genxml_intermediates/broadcom/cle/v3d_packet_v21_pack.h" Traceback (most recent call last): File "external/mesa/src/broadcom/cle/gen_pack_header.py", line 626, in <module> p = Parser(sys.argv[2]) IndexError: list index out of range header-gen macro is already defined by Intel genxml building rules and the existing header-gen does not have the $(PRIVATE_VER) argument, infact the bash command line logged in the building error is missing exactly $(PRIVATE_VER) argument Renaming the macro as pack-header-gen in src/broadcom/Android.genxml.mk solves the building error, another possible way is to keep the gen rules commands expanded and not use the macros. Fixes: `7f80a9ff13` ("vc4: Introduce XML-based packet header generation like Intel's.") Cc: "18.2" <mesa-stable@lists.freedesktop.org> Acked-by: Eric Anholt <eric@anholt.net> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>	2018-09-15 09:14:33 +02:00
Caio Marcelo de Oliveira Filho	f9d25f630c	anv/memcpy: fix build after starting to use addresses The offsets now come from the anv_address, these references were not updated and using the old variable. Fixes: `e1ab834557` "anv/memcpy: Use addresses instead of bo+offset" Tested-by: Clayton Craft <clayton.a.craft@intel.com>	2018-09-14 21:45:50 -07:00
Jason Ekstrand	d6a73824bd	anv/cmd_buffer: Take an address in emit_lrm Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-09-14 22:12:11 -05:00
Jason Ekstrand	e1ab834557	anv/memcpy: Use addresses instead of bo+offset Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-09-14 22:12:11 -05:00
Jason Ekstrand	90b46f6c17	anv/so_memcpy: Use the correct SO_BUFFER size on gen8+ This shouldn't matter as we'll never write OOB anyway but we may as well get it right. It's supposed to be in dwords - 1. Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-09-14 22:12:11 -05:00
Timothy Arceri	e29f0ede75	ac: fix get_image_coords() for radeonsi Because this was setting image to true we would end up calling si_load_image_desc() when we sould be calling si_load_sampler_desc(). This fixes an assert() in Deus Ex: MD Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-09-15 12:23:32 +10:00
Marek Olšák	914bd3014f	gallium/util: don't let child processes inherit our thread affinity v2: corrected the comment	2018-09-14 21:15:39 -04:00
Marek Olšák	7d41a7593a	gallium/util: start with a random L3 cache index for AMD Zen	2018-09-14 21:05:37 -04:00
Josh Pieper	936e0dcd61	st/mesa: Validate the result of pipe_transfer_map in make_texture (v2) When using Freecad, I was getting intermittent segfaults inside of mesa. I traced it down to this path in st_cb_drawpixels.c where the result of pipe_transfer_map wasn't being checked. In my case, it was returning NULL because nouveau_bo_new returned ENOENT. I'm by no means a mesa developer, but this patch solves the problem for me and seems reasonable enough. v2: Marek - also unmap the PBO and release the texture, and call the make_texture function sooner for less cleanup Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org>	2018-09-14 21:05:37 -04:00
Samuel Pitoiset	c79aad30ae	radv: emit the initial config only once in the preambles It shouldn't be needed to emit the initial graphics or compute state when beginning a new command buffer. Emitting them in the preamble should be enough and this will reduce IB sizes. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Samuel Pitoiset	9de062ef20	radv: fix setting global locations for indirect descriptors Indirect descriptors only need one entry, we don't have to emit a location for every descriptors. Fixes GPU hangs with new CTS: dEQP-VK.binding_model.descriptorset_random.* CC: 18.2 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Samuel Pitoiset	748f4cce18	radv: fix flushing indirect descriptors Let say, we first bind a graphics pipeline that needs indirect descriptors sets. The userdata pointers will be emitted at draw time. Then if we bind a compute pipeline that doesn't need any indirect descriptors, the driver will re-emit them for all grpahics stages. To avoid this to happen, just check the bind point type. CC: 18.2 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Samuel Pitoiset	063264db5b	radv: fix GPU hangs with 32-bit indirect descriptors LLVM 6 isn't affected. Fixes GPU hangs with new CTS: dEQP-VK.binding_model.descriptorset_random.* CC: 18.2 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Samuel Pitoiset	aa30205929	radv: handle loc->indirect correctly for the first descriptor This was wrong for descriptor #0 when all of them are indirect. This is because indirect_offset was 0 and we emitted a "normal" descriptor pointer for nothing. While we are at it remove radv_userdata_info::indirect_offset which is useless. CC: 18.2 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Samuel Pitoiset	b9f6521157	radv: bump the maximum number of arguments to 64 Bumping to 64 should be safe enough. Fixes some crashes with new CTS: dEQP-VK.binding_model.descriptorset_random.* CC: 18.2 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Samuel Pitoiset	c28ea92947	radv: tidy up ac_setup_rings() for the GSVS rings Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Samuel Pitoiset	40fb8c7fca	radv: fix setting the number of entries for GSVS on VI+ According to RadeonSI, it's unnecessary to multiply by the stride. That field seems to always be 64. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Samuel Pitoiset	a006c24237	radv: always compute the number of components from the output mask That removes two special cases for clip/cull distances. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Samuel Pitoiset	9447e91329	radv: emit data contiguously in the GS->VS ring buffer Instead of having holes. The other ring parameters like offset and stride can be updated later. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Samuel Pitoiset	fbc064a5b4	radv: make use of the output usage mask in GS copy shader This is just for consistency because LLVM can detect and remove unused loads. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Samuel Pitoiset	f398595dca	radv: improve a comment in si_emit_set_predication_state() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Samuel Pitoiset	abdf396cbe	radv: fix VK_EXT_conditional_rendering visibility It's actually just the opposite. This fixes the new Sascha conditionalrender demo. CC: 18.2 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Samuel Pitoiset	18464d298b	radv: make use of ac_unpack_param() instead of ac_build_bfe() Same code is generated because LLVM ends up by using bfe, but that seems cleaner to me. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-14 10:59:52 +02:00
Timothy Arceri	21e34bab09	nir: add loop unroll support for complex wrapper loops In GLSL IR we cheat with switch statements and simply convert them into loops with a single iteration. This allowed us to make use of the existing jump instruction handling provided by the loop handing code, it also allows dead code to be cleaned up once we have wrapped the code in a loop. However using loops in this way created previously unrollable loops which limits further optimisations. Here we provide a way to unroll loops that end in a break and have multiple other exits. All shader-db changes are from the dolphin uber shaders. There is a small amount of HURT shaders but in general the improvements far exceed the HURT. shader-db results IVB: total instructions in shared programs: 10018187 -> 10016468 (-0.02%) instructions in affected programs: 104080 -> 102361 (-1.65%) helped: 36 HURT: 15 total cycles in shared programs: 220065064 -> 154529655 (-29.78%) cycles in affected programs: 126063017 -> 60527608 (-51.99%) helped: 51 HURT: 0 total loops in shared programs: 2515 -> 2308 (-8.23%) loops in affected programs: 903 -> 696 (-22.92%) helped: 51 HURT: 0 total spills in shared programs: 4370 -> 4124 (-5.63%) spills in affected programs: 1397 -> 1151 (-17.61%) helped: 9 HURT: 12 total fills in shared programs: 4581 -> 4419 (-3.54%) fills in affected programs: 2201 -> 2039 (-7.36%) helped: 9 HURT: 15 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-14 16:07:36 +10:00
Timothy Arceri	2975422ceb	nir: propagates if condition evaluation down some alu chains v2: - only allow nir_op_inot or nir_op_b2i when alu input is 1. - use some helpers as suggested by Jason. v3: - evaluate alu op for single input alu ops - add helper function to decide if to propagate through alu - make use of nir_before_src in another spot shader-db IVB results: total instructions in shared programs: 9993483 -> 9993472 (-0.00%) instructions in affected programs: 1300 -> 1289 (-0.85%) helped: 11 HURT: 0 total cycles in shared programs: 219476091 -> 219476059 (-0.00%) cycles in affected programs: 7675 -> 7643 (-0.42%) helped: 10 HURT: 1 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-14 16:07:36 +10:00
Timothy Arceri	ef4ad7baf1	nir: evaluate if condition uses inside the if branches Since we know what side of the branch we ended up on we can just replace the use with a constant. All the spill changes in shader-db are from Dolphin uber shaders, despite some small regressions the change is clearly positive. V2: insert new constant after any phis in the use->parent_instr->type == nir_instr_type_phi path. v3: - use nir_after_block_before_jump() for inserting const - check dominance of phi uses correctly v4: - create some helpers as suggested by Jason. v5 (Jason Ekstrand): - Use LIST_ENTRY to get the phi src shader-db results IVB: total instructions in shared programs: 9999201 -> 9993483 (-0.06%) instructions in affected programs: 163235 -> 157517 (-3.50%) helped: 132 HURT: 2 total cycles in shared programs: 231670754 -> 219476091 (-5.26%) cycles in affected programs: 143424120 -> 131229457 (-8.50%) helped: 115 HURT: 24 total spills in shared programs: 4383 -> 4370 (-0.30%) spills in affected programs: 1656 -> 1643 (-0.79%) helped: 9 HURT: 18 total fills in shared programs: 4610 -> 4581 (-0.63%) fills in affected programs: 374 -> 345 (-7.75%) helped: 6 HURT: 0 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-14 16:07:36 +10:00
Erik Faye-Lund	fa5e9f1f73	virgl: adjust strides when mapping temp-resources When we're mapping temp-resources, we clip the resource to the transfer-box, which means the stride might not be correct any more. So let's update the stride from the temp-resource, and recompute the layer-stride. This fixes crashes when running dEQP with --deqp-gl-config-name=rgba8888d24s8ms4 Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Fixes: `a8987b88ff` "virgl: add driver for virtio-gpu 3D (v2)" Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-09-14 10:59:02 +10:00
Pierre Moreau	21b92b3464	nvir: Always split 64-bit IMAD/IMUL operations Those operations do not map to actual hardware instructions, therefore those should always be lowered to 32-bit instructions. Fixes: `009c54aa7a` "nv50/ir: Split 64-bit integer MAD/MUL operations" Signed-off-by: Pierre Moreau <pierre.morrow@free.fr> Reviewed-by: Karol Herbst <kherbst@redhat.com> Signed-off-by: Karol Herbst <kherbst@redhat.com>	2018-09-13 20:49:38 +02:00
Leo Liu	cb63e5d1eb	st/vdpau: Use output buffer as back buffer with 24-bit color only Using output buffer with 8 bits video RGB as back buffer certainly is not working for 30 bits color depth visual. Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2018-09-13 14:28:32 -04:00
Leo Liu	4d8ec12f03	vl/dri: add color depth to vl winsys For VDPAU use later Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2018-09-13 14:28:32 -04:00
Leo Liu	cd77d49ecf	vl/dri3: add support for 10 bits format Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2018-09-13 14:28:32 -04:00
Leo Liu	902358de4b	vl/dri: add 10 bits format supports v2: Tell B10G10R10X2 and R10G10B10X2 formats for different HW. Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2018-09-13 14:28:32 -04:00
Kristian H. Kristensen	aaafae4f55	egl/android: Declare droid_load_driver() static Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>	2018-09-13 11:12:35 -07:00
Samuel Pitoiset	d4bf954fe6	radv: fix function names for VK_EXT_conditional_rendering Otherwise they are not exported. CC: 18.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Dave Airlie <airlied@redhat.com Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-09-13 16:03:18 +02:00
Jason Ekstrand	1a263b377c	anv: Silence a couple compiler warnings [63/93] Compiling C object 'src/intel/vulkan/...intel@vulkan@@anv_common@sta/anv_device.c.o'. ../src/intel/vulkan/anv_device.c:685:30: warning: passing 'const char ' to parameter of type 'void ' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers] vk_free(&instance->alloc, instance->app_info.app_name); ^~~~~~~~~~~~~~~~~~~~~~~~~~~ ../src/vulkan/util/vk_alloc.h:62:51: note: passing argument to parameter 'data' here vk_free(const VkAllocationCallbacks alloc, void data) ^ ../src/intel/vulkan/anv_device.c:686:30: warning: passing 'const char ' to parameter of type 'void ' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers] vk_free(&instance->alloc, instance->app_info.engine_name); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../src/vulkan/util/vk_alloc.h:62:51: note: passing argument to parameter 'data' here vk_free(const VkAllocationCallbacks alloc, void data) ^ [65/93] Compiling C object 'src/intel/vulkan/...ommon@sta/anv_nir_apply_pipeline_layout.c.o'. ../src/intel/vulkan/anv_nir_apply_pipeline_layout.c:519:13: warning: unused variable 'image_uniform' [-Wunused-variable] unsigned image_uniform; Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-09-12 21:20:27 -05:00
Michel Dänzer	e34dd4f508	loader/dri3: Don't wait for fence of old buffer when re-allocating it We only need to wait for the fence before drawing to a buffer, not before reading from it. This might avoid hangs when re-allocating the fake front buffer, similar to the previous change. But I haven't seen any evidence that this was actually happening in practice. Tested-by: Olivier Fourdan <ofourdan@redhat.com>	2018-09-12 16:55:09 +02:00
Michel Dänzer	aefac10fec	loader/dri3: Only wait for back buffer fences in dri3_get_buffer We don't need to wait before drawing to the fake front buffer, as front buffer rendering by definition is allowed to produce artifacts. Fixes hangs in some cases when re-using the fake front buffer, due to it still being busy (i.e. in use for presentation). Cc: mesa-stable@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/106404 Bugzilla: https://bugs.freedesktop.org/107757 Tested-by: Olivier Fourdan <ofourdan@redhat.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>	2018-09-12 16:53:58 +02:00
Vadym Shovkoplias	9b5c0c520f	glsl/linker: Check the invariance of built-in special variables From Section 4.6.4 (Invariance and Linkage) of the GLSL ES 1.0 specification "The invariance of varyings that are declared in both the vertex and fragment shaders must match. For the built-in special variables, gl_FragCoord can only be declared invariant if and only if gl_Position is declared invariant. Similarly gl_PointCoord can only be declared invariant if and only if gl_PointSize is declared invariant. It is an error to declare gl_FrontFacing as invariant. The invariance of gl_FrontFacing is the same as the invariance of gl_Position." Fixes: * glsl-pcoord-invariant.shader_test * glsl-fcoord-invariant.shader_test * glsl-fface-invariant.shader_test Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107734 Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-09-12 11:43:21 +03:00
Tapani Pälli	30580640f2	intel/tools: fix initial position of window in aubinator viewer Currently position is set before widgets are sized by gtk and calculation can get wrong results where window is positioned offscreen. Patch fixes this by setting aubfile window position as 0,0 only when size_allocate has been called to the widget. Now window is always positioned to 0,0 if imgui.ini is missing. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-09-12 11:43:21 +03:00
Erik Faye-Lund	eaa718588e	winsys/virgl: avoid unintended behavior If we end up never taking the loop that writes ret, we can end up with an uninitialized value, and if we're really unlucky, that value can be -1, causing us to go down an error-path instead of a success path. This was obviously not intended, so let's just initialize this to zero. Noticed by Valgrind: Conditional jump or move depends on uninitialised value(s) at 0xBA640A0: virgl_drm_winsys_resource_cache_create (virgl_drm_winsys.c:348) by 0xBA62FCF: virgl_buffer_create (virgl_buffer.c:170) by 0xBA605AC: virgl_resource_create (virgl_resource.c:60) by 0xBCF816F: bufferobj_data (st_cb_bufferobjects.c:344) by 0xBCF816F: st_bufferobj_data (st_cb_bufferobjects.c:390) by 0xBB7E836: vbo_use_buffer_objects (vbo_exec_api.c:1136) by 0xBCFCC6E: st_create_context_priv (st_context.c:414) by 0xBCFD3CD: st_create_context (st_context.c:590) by 0xBBB30CA: st_api_create_context (st_manager.c:896) by 0xB981E76: dri_create_context (dri_context.c:155) by 0xB97BDCE: driCreateContextAttribs (dri_util.c:473) by 0x5288331: dri3_create_context_attribs (dri3_glx.c:309) by 0x5264D64: glXCreateContextAttribsARB (create_context.c:78) Fixes: `a8987b88ff` ("virgl: add driver for virtio-gpu 3D (v2)") Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2018-09-12 10:14:43 +02:00
Juan A. Suarez Romero	d631916f29	travis: use python3.5 for meson Newer Meson versions require python >=3.5. But in Trusty default python3 version is 3.4.x. Install python3.5 and makes it the default version for Meson using update-alternatives method. CC: Jan Vesely <jano.vesely@gmail.com> CC: Andres Gomez <agomez@igalia.com> CC: Emil Velikov <emil.l.velikov@gmail.com> CC: Jon Turney <jon.turney@dronecode.org.uk> CC: Eric Engestrom <eric.engestrom@intel.com> CC: Dylan Baker <dylan@pnwbakers.com> Fixes: `3824c8e7cd` "meson: disable asserts by default on release builds" Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Andres Gomez <agomez@igalia.com>	2018-09-11 14:27:58 +01:00
Samuel Pitoiset	3d08631fe5	radv: adjust ESGS ring buffer size computation on VI+ Noticed while working in this area. Ported from RadeonSI. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-09-11 11:30:19 +02:00
Gert Wollny	47e01e77d8	mesa/texture: Also check for LA texture when querying intensity component size Gallium may pick L16A16_FLOAT to represent GL_INTENSITY16F if no intensity format is provided by the driver. However, when calling glGetTexLevelParameteriv(..., GL_TEXTURE_INTENSITY_SIZE, ...) mesa will return a zero size because the actually used format has no intensity channel and as a fallback only the sizes of the red/green channels are checked. Also checking for LA sizes in the allocated texture resolves this problem. v2: Only check alpha channel size and return it (Marek) L and A size are always the same in this case. Fixes (on virgl): ext_framebuffer_multisample-fast-clear GL_ARB_texture_float * Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107832 Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-09-11 09:07:05 +02:00
Ilia Mirkin	133e12fb69	nv50,nvc0: warn on not-explicitly-handled caps Not handling caps explicitly means that we're likely getting incorrect values -- these need to be reviewed and set appropriately. While we're at it, add in some missing caps, and set all the subpixel stuff to 8 as that seems to be what the blob reports. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-09-11 01:25:19 -04:00
Timothy Arceri	e66c2158f8	mesa: remove duplicate dispatch sanity tests This removes duplicate tests from gl_core_functions_possible that are already covered by common_desktop_functions_possible. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-09-11 10:13:31 +10:00
Timothy Arceri	355a5ef761	mesa: tidy up init_matrix_stack() Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2018-09-11 09:26:04 +10:00
Christopher Egert	51995f6920	radeon: fix ColorMask Since commit `af3685d149` various OpenGL applications regressed on the classic mesa radeon driver. Signed-off-by: Christopher Egert <cme3000@gmail.com> CC: 18.1 18.2 <mesa-stable@lists.freedesktop.org> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2018-09-10 16:57:20 -04:00
Elie Tournier	9179c745f6	gallium: Correctly handle no config context creation This patch fixes the following Piglit test: spec@egl_mesa_configless_context@basic It also fixes few test in a virgl guest. v2: Evaluate the value of no_config (Ilia) Suggested-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Elie Tournier <elie.tournier@collabora.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2018-09-10 15:30:17 -04:00
Bas Nieuwenhuizen	f6e09db2e6	radv: Support v3 of VK_EXT_vertex_attribute_divisor. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> CC: 18.2 <mesa-stable@lists.freedesktop.org>	2018-09-10 21:26:17 +02:00
Marek Olšák	867f7aaed2	radeonsi/nir: port some bindless and sampler code from TGSI These might be all missing changes for bindless textures. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-09-10 15:23:21 -04:00
Marek Olšák	b00deed66f	radeonsi: adjust and simplify max_alloc_size determination Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-09-10 15:19:56 -04:00
Marek Olšák	203ef19f48	radeonsi: split si_copy_buffer compute and SDMA will be added into it. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-09-10 15:19:56 -04:00
Marek Olšák	986d6f12fb	radeonsi: don't call VBO prefetch with size=0 for the next commit. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-09-10 15:19:56 -04:00
Marek Olšák	1119fe5c25	radeonsi: merge SI and CI dma_clear_buffer and remove the callback also use assertions for the requirements that offset and size are a multiple of 4. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-09-10 15:19:56 -04:00
Marek Olšák	be0bd95abf	radeonsi: fix GPU hangs with bindless textures and LLVM 7.0 Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-09-10 15:19:56 -04:00
Marek Olšák	fa595e3d0c	ac: remove deprecated use of LLVMInt1Type() Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-09-10 15:19:56 -04:00
Marek Olšák	cc36ebbdc3	ac: use iN_0/1 constants Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-09-10 15:19:56 -04:00
Marek Olšák	bc09c3d59e	ac: add radeon_info::num_good_cu_per_sh Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-09-10 15:19:56 -04:00
Marek Olšák	a5f35aa742	ac: revert new LLVM 7.0 behavior for fdiv Cc: 18.2 <mesa-stable@lists.freedesktop.org> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-09-10 15:19:56 -04:00
Marek Olšák	662db03577	radeonsi: fix printing a BO list into ddebug reports important for debugging Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-09-10 15:19:56 -04:00
Marek Olšák	da72b6296c	r600: fix HTILE for NPOT textures with mipmapping Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-09-10 15:19:56 -04:00
Marek Olšák	d4e52281aa	winsys/radeon: fix CMASK fast clear for NPOT textures with mipmapping on SI/CI Cc: 18.2 <mesa-stable@lists.freedesktop.org> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-09-10 15:19:56 -04:00
Marek Olšák	a1b9a00f82	radeonsi: fix HTILE for NPOT textures with mipmapping on SI/CI VI uses addrlib so it's unaffected. Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-09-10 15:19:56 -04:00
Brian Paul	5162735957	docs: document new features/extensions in driver for WS 15 / Fusion 11 Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-10 13:07:30 -06:00
Brian Paul	7baf45dfc7	svga: assorted fixes/changes in svga_pipe_blit.c To align the code with VMware's in-house copy. Signed-off-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Brian Paul	25fceccf72	svga: set buffer bind_flags in svga_buffer_add_host_surface() To match the in-house VMware code. Signed-off-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	337a74aa40	svga: add format conversion for legacy formats This patch extends the format_conversion table to support different view formats on texture buffer. For legacy image formats such as INTENSITY, LUMINANCE, LUMINANCE_ALPHA, special swizzle masks will be used on the red or RG channels. This fixes piglit test arb_texture_buffer_object-formats fs\|vs arb Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	389450a271	svga: remove obsolete code to reemit gs binding The svga_reemit_gs_bindings function is no longer needed. Remove it. Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Brian Paul	c174ee9f9d	svga: move variant->fs_shadow_compare_units assignment Fixes a crash since the variant object isn't allocated until later in the function. Not sure how this got through. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	cb70474b20	svga: fix resource checking in is_blending_enabled() This patch makes sure a valid color buffer is bound before checking its resource. This fixes Unigine Valley running in SM41 device. Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Neha Bhende	c6103328ab	svga: Use texture_copy_region instead of texture_copy_handle for multisampling This fixes some of tests cases in arb_copy_image-formats and also fixes SurfaceCopy related errors in vmware.log when multi sampled surfaces are used. Tested with piglit, glretrace on windows and linux VM. v2: As per Brian's comment Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	fdf5885183	svga: add missing devcap check for texture array support The patch checks DXFMT_ARRAY devcap for texture array support. Tested with MTT-piglit. No regressions. Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	3069581260	svga: no need to check MULTISAMPLE devcap for view format According to the current SVGA contract, any view format can be used on the underlying resource that is multisample. So there is no need to check the MULTISAMPLE devcap for the view format. Fixes black rendering issue with Tropics running with 4xMSAA. Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	6f254ad9b4	svga: sync devcap name changes in svga3d_devcaps.h Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	49428c8d61	svga: explicit set DXFMT_SHADER_SAMPLE for DS format for pre-SM41 device Explicit set the DXFMT_SHADER_SAMPLE bit for depth stencil formats for pre-SM41 device only. This bit is now set by the SM41 device. Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	379a2f265f	svga: remove unused variable Trivial.	2018-09-10 13:07:30 -06:00
Brian Paul	cbcc416a58	svga: draw round points when msaa is enabled See comments for details. This allows the piglit ext_framebuffer_multisample-point-smooth test to pass. Also, test the pipe_rasterizer_state::point_quad_rasterization field to see if sprite point rasterization is needed because it's possible for no sprite_coord_enable bits to be set when drawing sprites. Finally, remove old, stale comments. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-10 13:07:30 -06:00
Brian Paul	6b039c7d7c	svga: check number of samples before emitting MSAA decls/opcodes If real MSAA is not available, we only support 1 sample/pixel. In that case, we must not declare MSAA resources or emit MSAA opcodes. Do that by checking the sample count. Fixes several piglit MSAA tests, such as arb_texture_multisample-sample-depth (when the hard-coded sample count of 4 is fixed in that test). Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-10 13:07:30 -06:00
Brian Paul	cf2fb6813c	svga: remove obsolete comment on format_cap_table[] We removed the special cases referred to in this comment in the commit "svga: add a separate function to get dx format capabilities from vgpu10 device". Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-10 13:07:30 -06:00
Brian Paul	0fc6c17bf2	svga: allow TGSI_TEXTURE_CUBE_ARRAY in emit_tg4() Technically, SM4.1 doesn't support cube map arrays, but our backend renderers actually do. This allows the Piglit textureGather cube map array tests to pass. Tested with GLrenderer, DX11renderer and SWrenderer. Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	3467a274e0	svga: no dma on multisample surface Force direct map on multisample surface. Fixes SVGA Driver Errors running multisample piglit tests on Linux VM v2: use texture for the check. Reviewed-by: Neha Bhende <bhenden@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	5f14444184	svga: src surface for IntraSurfaceCopy cannot be multisample Fixes SVGA Driver Errors with piglit test arb_copy_image-targets Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Neha Bhende <bhenden@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	026e1ad7bb	svga: fix missing format multisample devcap check In commit e4048f6cd1, svga_is_dx_format_supported() is supposed to also check the SVGA3D_DXFMT_MULTISAMPLE bit for multisample support of a format. Somehow that code is not included in that commit. This patch fixes it. Fixes piglit test spec@ext_framebuffer_multisample@formats all_samples. Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	285d8b47b1	svga: fix incorrect multisample support in VGPU9 device Commit e4048f6cd1 unintentionally allows multisample support for VGPU9 device. This patch fixes this regression. Reviewed-by: Neha Bhende <bhenden@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	59a56ca1c8	svga: fix the missing devcap for SVGA3D_BC3_UNORM_SRGB Set the devcap to SVGA3D_DEVCAP_DXFMT_BC3_UNORM_SRGB Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Neha Bhende <bhenden@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	16666eb470	svga: add a separate function to get dx format capabilities from vgpu10 device Currently we have one function to get format capabailities and we convert DX10 devcaps back to DX9. This can be confusing. Going forward we will have a separate function for dealing with dx formats. This patch also fixes the depth stencil devcap. Instead of hardcoding the capabilities for the depth stencil formats, we will inquire the device for the capabilities. Note: we will still need to explicity set the SVGA3D_DXFMT_SHADER_SAMPLE bit for SVGA3D_R32_FLOAT_X8X24 and SVGA3D_R24_UNORM_X8 since this bit is not advertised but supported by the device. v2: reapply the patch after svga_is_format_supported is moved to svga_format.c Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Neha Bhende <bhenden@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	b1aee7ff05	svga: assign a separate function for is_format_supported() for vgpu10 device This patch adds a new function svga_is_dx_format_supported() to check for format support in a VGPU10 device. v2: reapply the patch after svga_is_format_supported is moved to svga_format.c Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Neha Bhende <bhenden@vmware.com>	2018-09-10 13:07:30 -06:00
Brian Paul	1ea9c80d6d	svga: add some devcap debugging code Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	96ef81e39e	svga: fix depth and coverage mask output declaration Set the component mask to zero for both registers. Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	7187a2f7ff	svga: add sample positions for 2 samples Fixes piglit tests spec@arb_sample_shading@builtin-gl-sample-position 2 spec@arb_texture_multisample@fb-completeness@2 Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	73c850fb9a	svga: check sample count devcaps Check sample count devcaps from the svga device to determine the supported sample counts. Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Brian Paul	afacde3553	svga: fix 1-element cube map array issue As with 1D and 2D array textures, if there's only one array element (one cubemap in this case) we have to issue different shader code. This fixes a number of Piglit cubemap array tests. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-10 13:07:30 -06:00
Brian Paul	767c1eb436	svga: simplify array test in svga_init_shader_key_common() And squash commit a patch to silence a compiler warning (add default case to the switch statement). Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	45517f492b	winsys/drm: check for CAPS2/SM41 support if VGPU10 is enabled No need to check for HW_CAPS2 or SM4_1 support if VGPU10 is not enabled or is explicitly disabled via the environment variable SVGA_VGPU10. Reviewed-by: Deepak Rawat <drawat@vmware.com>	2018-09-10 13:07:30 -06:00
Deepak Rawat	159e706c4c	winsys/drm: Add support for quality level in surface ioctl A new argument "quality level" is added in surface define v3 which represets precision settings for surface. This commit add support for quality level in DRM_VMW_GB_SURFACE_CREATE_EXT and DRM_VMW_GB_SURFACE_REF_EXT. Signed-off-by: Deepak Rawat <drawat@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	b343c6915c	svga: sync svga3d_types.h with upstream changes Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	b5827db2ea	winsys/drm: enable intra_surface_copy if HW_CAP2 is supported With drm version 2_15, we can inquire for support of HW_CAP2. If it is supported, we can enable intra_surface_copy support. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com> Reviewed-by: Deepak Rawat <drawat@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	7448bb0089	svga: add git version logging at init time Before we can log the git version in the host log, we'll add the git version in the init debug message. Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	4669ffd29b	svga: fix a typo in svga_texture_copy_region() Trivial.	2018-09-10 13:07:30 -06:00
Charmaine Lee	3233d05390	svga: use helper function to do copy region Use the common helper function svga_texture_copy_region for copy region command. Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	74791b80b9	svga: fix cubemap array rendering with backed surface view This patch fixes the layer index when rendering to a backed surface view of a cubemap array. Fixes piglit test fbo-generatemipmap-cubemap array. Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	2d39e6d0c8	svga: add a helper function to send ResolveCopy command Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	9a24b08a49	svga: sync svga3d header files This is a squash of what was orginally three commits. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	f3eda3e5e1	svga: add SM4_1 enable debug print Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	ccd895db76	svga: fix swizzling for texture gather Texture swizzling for texture gather needs to be done to the selected texels rather than to the returned vector. This patch has specical cases for the different swizzles in emit_tg4(). Fixes a lot of piglit texture gather tests. Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	be1993d6ed	svga: fix starting index for system values Currently, the starting index for system values is assigned to the next index after the highest index of the tgsi declared input registers. But the tgsi index might be different from the actual assigned index, hence this might cause overlap of indices. With this patch, the shader linker keeps track of the highest index of the translated input registers, and the next index will be used for the starting index for system values. Fixes SHIM errors running arb_copy_image-formats on SM4_1 device. Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Deepak Rawat	569f838987	winsys/svga: Add support for new surface ioctl, multisample pattern Kernel driver version 2.15 added new surface ioctl named: DRM_VMW_GB_SURFACE_CREATE_EXT DRM_VMW_GB_SURFACE_REF_EXT The new ioctl has support for 64-bit svga3d_flags if DRM_VMW_PARAM_SM4_1 is available. Multisampling surface mob size calculation is added. Also synced the relevant header update. svga device modified the surface define command V3 with new parameter multisampling pattern. Adding support for that in winsys. Signed-off-by: Deepak Rawat <drawat@vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-10 13:07:30 -06:00
Brian Paul	3f55425ee6	svga: enable MSAA for SM4_1 device The SVGA device is deprecating the DX9 MSAA support. This patch enables MSAA for SM4_1 device by explicitly setting the SVGA3D_SURFACE_MULTISAMPLE bit. For SM4_1 device, only 4 samples is supported. Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	8088cb6f53	svga: add sample count to the surface_can_create interface With this patch, sample count is also taken into account when determining if a resource can be created. Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Brian Paul	4a1976bfcf	svga: implement support for GL_ARB_texture_query_lod Just translate the TGSI LODQ intruction to VGPU10 LOD instruction. All (4) Piglit GL_ARB_texture_query_lod tests pass. Reviewed-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: Neha Bhende <bhenden@vmware.com>	2018-09-10 13:07:30 -06:00
Neha Bhende	252e97ecdf	svga: Add support for arb_texture_gather With sm4_1, we can support single channel 2D or CubeMap textures. This patch exercises this feature. Tested with piglit v2: As per Brian's comment Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-10 13:07:30 -06:00
Brian Paul	36c84bcd77	svga: add support for interpolation at sample position Vs. sampling at the centroid or the fragment center. Note that this does not fix failures with the Piglit arb_sample_shading-interpolate-at-sample-position or arb_sample_shading-ignore-centroid-qualifier.exe tests at this time. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-10 13:07:30 -06:00
Brian Paul	bcf7aaa9f7	svga: clarify sys value -> input register mapping We translate TGSI system value registers to VGPU10 input registers. Add a comment and set file = TGSI_FILE_INPUT. That's not stricly necessary since we map both TGSI_FILE_INPUT and TGSI_FILE_SYSTEM_VALUE to VGPU10_OPERAND_TYPE_INPUT, but this makes the code a bit more understandable. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-10 13:07:30 -06:00
Brian Paul	9de5bdb341	svga: add support for FS sample mask output This, with the previous work for sample position/id query, allows us to enable per-sample shading for VGPU 10.1. Note that quite a few Piglit arb_sample_shading tests still do not pass, but many do. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-10 13:07:30 -06:00
Brian Paul	0a219dd918	svga: add support for sample id, sample position Sample ID is just a system value. Sample position must be implemented with the VGPU10_OPCODE_SAMPLE_POS instruction. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-10 13:07:30 -06:00
Brian Paul	ac4a0c0e82	svga: implement no-op svga_set_min_samples() This is part of the per-sample shading feature (PIPE_CAP_SAMPLE_SHADING). Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	3c3fc7154e	svga: add support for independent blend function per render target This patch adds support for GL_ARB_draw_buffers_blend extension for SM4_1 device. Fixes piglit test fbo-draw-buffers-blend. This patch is squashed with a subsequent patch which fixed a regression. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-10 13:07:30 -06:00
Brian Paul	5512f943b8	svga: emit shader version as 4.0 or 4.1 depending on device support Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-10 13:07:30 -06:00
Brian Paul	1d806b6f13	svga: restructure nested if's in emit_src_register() To make it cleaner for subsequent changes. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-10 13:07:30 -06:00
Brian Paul	16439085f5	svga: sync VGPU10ShaderTokens.h with upstream changes This includes new DX 10.1 opcodes and tokens. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	22e8099711	svga: add support for shadow cubemap array Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	f929247d24	svga: add support for rendering to cubemap array Fixes piglit test arb_texture_cube_map_array-fbo-cubemap-array Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	1df17fc697	svga: add support for TXL2 opcode This patch adds support for cubemap array texture lookup with explicit LOD. Fixes piglit test arb_texture_cube_map_array-cubemap-lod Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Charmaine Lee	62402be407	svga: add support for cubemap array This patch adds support for cubemap array for SM4_1. Fixes piglit test arb_texture_cube_map_array-cubemap Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Brian Paul	018ff0112f	svga: add have_sm4_1 flag, helper function Signed-off-by: Brian Paul <brianp@vmware.com>	2018-09-10 13:07:30 -06:00
Marek Olšák	d211679017	gallium/u_inlines: remove the destroy variable in pipe_reference_described Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-09-10 14:53:01 -04:00
Marek Olšák	ed880fe192	gallium/u_inlines: improve pipe_reference_described perf for debug builds Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2018-09-10 14:53:01 -04:00
Marek Olšák	c042a34b14	gallium/auxiliary: don't dereference counters twice needlessly Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-09-10 14:52:32 -04:00
Marek Olšák	61767c059e	gallium/u_inlines: normalize naming, use dst & src, style fixes (v2) v2: update comments Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-09-10 14:52:32 -04:00
Marek Olšák	9f1bbbdbbd	util: try to fix the Android and MacOS build Bionic does not have pthread_setaffinity_np. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107869 Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-09-10 14:49:07 -04:00
Jason Ekstrand	6f00785765	anv: Support v3 of VK_EXT_vertex_attribute_divisor Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-10 13:45:32 -05:00
Jason Ekstrand	34a17a48d4	vulkan: Update the XML and headers to 1.1.84 Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-10 13:30:21 -05:00
Sergii Romantsov	bbe551f3ea	mesa/meson: 32bit xmlconfig linkage Building of 32bit mesa with meson causes linkage issue: "undefined reference to `util_get_process_name'" Fixed by adding link-with mesa_util for xmlconfig primary. v2: Removed '[]', commit message corrected. v3: Reverted changes in gbm and glx libraries. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107843 Fixes: `2e1e6511f7` "util: extract get_process_name from xmlconfig.c" Cc: Marek Olšák <marek.olsak@amd.com> Cc: Dylan Baker <dylan@pnwbakers.com> Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-09-10 08:57:42 -07:00
Jose Fonseca	52ca32121b	Require Visual Studio 2015. We no longer need or use Visual Studio 2013. https://ci.appveyor.com/project/jrfonseca/mesa/build/52 Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-09-10 10:10:16 +01:00
Jose Fonseca	d5f934522d	util: Make util_context_thread_changed a no-op on Windows. Spite using thrd_t types, these functions are wed to pthreads, and break Windows builds, because thrd_current() is not implemented there, as it's impossible to have an efficient thrd_current() implementation on Windows. Trivial.	2018-09-10 10:10:16 +01:00
Erik Faye-Lund	c4017106bb	virgl: do not map zero-sized resource When creating textures, we avoid creating backing-store for all multisampled textures, not just depth buffers. So we can't try to map them later. That's just going to fail. So let's take the blit-based code-path that seems to avoid this problem. This make this piglit test-case no longer crash (although it still fails): bin/copyteximage 2D -samples=2 -auto Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-09-10 10:35:42 +02:00
Erik Faye-Lund	8083464013	virgl: remove dead code We don't use the size we calculate in this function, so let's just drop the calculation Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-09-10 10:35:32 +02:00
Erik Faye-Lund	b9c40e492d	virgl: drop needless return-code We always return TRUE, and we never check the return-value. Let's just drop the return value instead. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-09-10 10:35:20 +02:00
Erik Faye-Lund	9635869d73	virgl: free trans on map-error When we fail to map memory, we should also free trans to avoid leaking memory. Noticed while reading code. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-09-10 10:35:02 +02:00
Chris Wilson	44e3e6a9b4	i965: Bump aperture tracking to u64 As a prelude to handling large address spaces, first allow ourselves the luxury of handling the full 4G. Reported-by: Andrey Simiklit <asimiklit.work@gmail.com> Cc: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-09-10 09:14:46 +01:00
Mathias Fröhlich	2fece204c0	etnaviv: Reduce max offset to available hardware bits. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-09-10 07:59:31 +02:00
Mathias Fröhlich	4569bc6ad0	gallium: New cap PIPE_CAP_MAX_VERTEX_ELEMENT_SRC_OFFSET. Introduce a new capability for the maximum value of pipe_vertex_element::src_offset. Initially just every driver backend returns the value previously set from _mesa_init_constants. So this shall end up in no functional change. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-09-10 07:59:31 +02:00
Dave Airlie	240af61494	virgl: don't send a shader create with no data. (v2) This fixes the situation where we'd send a shader with just the header and no data. piglit/glsl-max-varyings test was causing this to happen, and the renderer fix was breaking it. v2: drop fprintf Fixes: `a8987b88ff` "virgl: add driver for virtio-gpu 3D (v2)" Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2018-09-10 12:23:30 +10:00
Timothy Arceri	14fe9fa11b	mesa: enable ARB_vertex_buffer_object in core profile This extension is required by "Wolfenstein: The Old Blood" and is exposed in core in the Nvidia binary driver. All the functions are just alias of the core functions so there should be nothing more to do. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-09-08 14:35:09 +10:00
Marek Olšák	21ca322e63	st/mesa: throttle texture uploads if their memory usage goes beyond a limit This prevents radeonsi from running out of memory. It also increases texture upload performance by being nice to the kernel memory manager.	2018-09-07 17:59:02 -04:00
Marek Olšák	9ce2cef68f	gallium: add PIPE_CAP_MAX_TEXTURE_UPLOAD_MEMORY_BUDGET	2018-09-07 17:59:02 -04:00
Andres Gomez	ecfe41e690	docs: update calendar, add news item and link release notes for 18.2.0 Signed-off-by: Andres Gomez <agomez@igalia.com>	2018-09-08 00:40:43 +03:00
Andres Gomez	5382a90cb2	docs: add sha256 checksums for 18.2.0 Signed-off-by: Andres Gomez <agomez@igalia.com> (cherry picked from commit `cb1ddf48e2`)	2018-09-08 00:28:23 +03:00
Andres Gomez	65f3327db6	docs: update 18.2.0 release notes Signed-off-by: Andres Gomez <agomez@igalia.com> (cherry picked from commit `7378180e7a`)	2018-09-08 00:28:21 +03:00
Marek Olšák	7ac52c2e38	Revert "gallium/os_thread: simplify helper pipe_current_thread_get_time_nano" This reverts commit `6d477bc546`. It fixes the Windows build hopefully.	2018-09-07 16:52:36 -04:00
Jason Ekstrand	465e5a868c	anv: Clamp scissors to the framebuffer boundary The Vulkan 1.1.81 spec says: "It is legal for offset.x + extent.width or offset.y + extent.height to exceed the dimensions of the framebuffer - the scissor test still applies as defined above. Rasterization does not produce fragments outside of the framebuffer, so such fragments never have the scissor test performed on them." Elsewhere, the Vulkan 1.1.81 spec says: "The application must ensure (using scissor if necessary) that all rendering is contained within the render area, otherwise the pixels outside of the render area become undefined and shader side effects may occur for fragments outside the render area. The render area must be contained within the framebuffer dimensions." Unfortunately, there's some room for interpretation here as to what the consequences are of having the render area set to exactly the framebuffer dimensions and having a scissor that is larger than the framebuffer. Given that GL and other APIs provide automatic clipping to the framebuffer, it makes sense that applications would assume that Vulkan does this as well. It costs us very little to play it safe and just clamp client-provided scissors to the framebuffer dimensions. Fortunately, the user is required to provide us with at least one scissor so we don't need to handle the case where they don't. Fixes: `fb2a5ceb32` "anv: Emit DRAWING_RECTANGLE once at driver..." Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-09-07 15:19:02 -05:00
Jason Ekstrand	b08b4b2b25	anv: Disable the vertex cache when tessellating on SKL GT4 I have no idea if I'm correct about what's going wrong or if this is the correct fix. However, in my multiple weeks of banging my head on this hang, a VUE reference counting bug seems to match all the symptoms and it definitely fixes the hang. Cc: mesa-stable@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107280 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-09-07 15:19:02 -05:00
Jason Ekstrand	5dee89438a	anv: Implement a VF cache invalidate workaround Known to fix nothing whatsoever but it's in the docs. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-09-07 15:19:02 -05:00
Jason Ekstrand	c643c5e18d	anv: Re-emit vertex buffers when the pipeline changes Some of the bits of VERTEX_BUFFER_STATE such as access type, instance data step rate, and pitch come from the pipeline. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-09-07 15:19:02 -05:00
Marek Olšák	25ffb84016	radeonsi: pin the winsys thread to the requested L3 cache (v2) v2: rebase Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-07 16:03:36 -04:00
Marek Olšák	8016639f63	gallium/u_threaded: implement set_context_param for thread pinning (v2) v2: - use set_context_param - set set_context_param even if the driver doesn't implement it Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-07 16:03:36 -04:00
Marek Olšák	8d473f555a	st/mesa: pin driver threads to a specific L3 cache on AMD Zen (v2) v2: use set_context_param Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-07 16:03:30 -04:00
Marek Olšák	e5e3b5cdcc	gallium: add pipe_context::set_context_param for tuning perf on AMD Zen (v2) State trackers will not use the new param directly, but will instead use a helper in MakeCurrent that does the right thing. v2: rework the interface Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-07 15:48:31 -04:00
Marek Olšák	6d477bc546	gallium/os_thread: simplify helper pipe_current_thread_get_time_nano Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-07 15:48:31 -04:00
Marek Olšák	15fa2c5e35	gallium/u_cpu_detect: get the number of cores per L3 cache for AMD Zen Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-07 15:48:31 -04:00
Marek Olšák	ce432e259d	gallium/u_cpu_detect: fix parsing the CPU family According to: https://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf Also Intel: https://www.microbe.cz/docs/CPUID.pdf Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-07 15:48:31 -04:00
Marek Olšák	a84fd58f48	gallium/u_cpu_detect: fix a race condition on initialization Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-07 15:48:31 -04:00
Dylan Baker	8396043f30	Replace uses of _mesa_bitcount with util_bitcount and _mesa_bitcount_64 with util_bitcount_64. This fixes a build problem in nir for platforms that don't have popcount or popcountll, such as 32bit msvc. v2: - Fix additional uses of _mesa_bitcount added after this was originally written Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1) Acked-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-09-07 10:21:26 -07:00
Dylan Baker	80825abb5d	move u_math to src/util Currently we have two sets of functions for bit counts, one in gallium and one in core mesa. The ones in core mesa are header only in many cases, since they reduce to "#define _mesa_bitcount popcount", but they provide a fallback implementation. This is important because 32bit msvc doesn't have popcountll, just popcount; so when nir (for example) includes the core mesa header it doesn't (and shouldn't) link with core mesa. To fix this we'll promote the version out of gallium util, then replace the core mesa uses with the util version, since nir (and other non-core mesa users) can and do link with mesautils. Acked-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-09-07 10:21:26 -07:00
Dylan Baker	aa4386ebfe	docs: update calendar, add news item and link release notes for X.Y.Z Signed-off-by: Dylan Baker <dylan@pnwbakers.com>	2018-09-07 10:19:33 -07:00
Dylan Baker	d514f55611	docs/relnotes: Add sha256 sums for mesa 18.1.8	2018-09-07 10:17:38 -07:00
Dylan Baker	f6a9f44529	docs: Add release notes for 18.1.8	2018-09-07 10:17:36 -07:00
Jason Ekstrand	f9e630e23d	i965: Workaround the gen9 hw astc5x5 sampler bug gen9 hardware has a bug in the sampler cache that can cause GPU hangs whenever an texture with aux compression enabled is in the sampler cache together with an ASTC5x5 texture. Because we can't control what the client binds at any given time, we have two options: resolve the CCS or decompresss the ASTC. Doing a CCS or HiZ resolve is far less drastic and will likely have a smaller performance impact. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Tested-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>	2018-09-07 10:42:40 -05:00
Eric Anholt	a91b158bd9	v3d: Fix setup of the VCM cache size. There were two bugs working together to make things mostly work: I wasn't dividing the VPM output size available by the size of a batch (vertex), but I also had the size of the VPM reduced by a factor of 8. Fixes dEQP-GLES3.functional.vertex_array_objects.all_attributes and it seems also my intermittent varying failures. Fixes: `1561e4984e` ("v3d: Emit the VCM_CACHE_SIZE packet.")	2018-09-07 08:11:38 -07:00
Eric Anholt	f73f748323	v3d: Fix SRC_ALPHA_SATURATE blending for RTs without alpha. Fixes dEQP-GLES3.functional.fragment_ops.blend.default_framebuffer.rgb_func_alpha_func.dst.src_alpha_saturate_src_alpha_saturate and friends with --deqp-egl-config-name=rgb565d0s0 Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-09-07 08:11:05 -07:00
Lionel Landwerlin	69874e9a6a	intel/genxml: turn SLM Enable bit into boolean Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-07 14:46:20 +01:00
Sergii Romantsov	97fcccb25e	i965/tools: 32bit compilation with meson Building of 32bit mesa with meson causes issue: "implicit declaration of function ‘__builtin_ia32_clflush’". Fixed by adding msse2 compilation flag. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107843 Fixes: `314879f7fe` (i965: Fix asynchronous mappings on !LLC platforms.) Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-09-07 13:46:48 +01:00
Sergii Romantsov	d709f12792	intel: compiler option msse2 and mstackrealign Seems in case of 32-bit library, usage of msse2 makes some stack corruption or incorrect instructions. Usage with mstackrealign fixes that case. v2: Fixed meson. v3: Definition of c_sse2_args moved on the top (L.Landwerlin). Added mstackrealign for Android's mks where msee4.1 is used. v4: Added for Vulkan also. v5: Commit message correction. CC: <mesa-stable@lists.freedesktop.org> Fixes: `6b05c080f2` (i965: Compile with -msse3) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107779 Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-09-07 13:45:46 +01:00
Rob Clark	5404e0637f	freedreno: fix rast->depth_cleap_near/far Fixes: `daa19363de` gallium: split depth_clip into depth_clip_near & depth_clip_far Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-07 07:41:43 -04:00
Marek Olšák	fda7683726	gallium: enable GL_AMD_depth_clamp_separate on r600, radeonsi	2018-09-06 21:53:00 -04:00
Marek Olšák	daa19363de	gallium: split depth_clip into depth_clip_near & depth_clip_far for AMD_depth_clamp_separate.	2018-09-06 21:53:00 -04:00
Jason Ekstrand	7b26741806	anv/pipeline: Only consider double elements which actually exist The brw_vs_prog_data::double_inputs_read field comes directly from shader_info::double_inputs which may contain inputs which are not actually read. Instead of using it directly, AND it with inputs_read which is only things which are read. Otherwise, we may end up subtracting too many elements when computing elem_count. Cc: mesa-stable@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103241 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-09-06 16:07:50 -05:00
Jason Ekstrand	44ec31cd75	nir: Drop the vs_inputs_dual_locations option It was very inconsistently handled; the only things that made use of it were glsl_to_nir, glspirv, and nir_gather_info. In particular, nir_lower_io completely ignored it so anyone using nir_lower_io on 64-bit vertex attributes was going to be in for a shock. Also, as of the previous commit, it's set by every driver that supports 64-bit vertex attributes. There's no longer any reason to have it be an option so let's just delete it. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-09-06 16:07:50 -05:00
Jason Ekstrand	0909a57b63	radeonsi/nir: Set vs_inputs_dual_locations and let NIR do the remap We were going out of our way to disable dual-location re-mapping in NIR only to then do the remapping in st_glsl_to_nir.cpp. Presumably, this was so that double_inputs would be correct for the core state tracker. However, now that we've it to gl_program::DualSlotInputs which is unaffected by NIR lowering, we can let NIR lower things for us. The one tricky bit here is that we have to remap the inputs_read bitfield back to the single-slot convention for the gallium state tracker to use. Since radeonsi is the only NIR-capable gallium driver that also supports GL_ARB_vertex_attrib_64bit, we only have to worry about radeonsi when making core gallium state tracker changes. Acked-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-09-06 16:07:50 -05:00
Jason Ekstrand	25efd787cf	compiler: Move double_inputs to gl_program::DualSlotInputs Previously, we had two field in shader_info: double_inputs_read and double_inputs. Presumably, the one was for all double inputs that are read and the other is all that exist. However, because nir_gather_info regenerates these two values, there is a possibility, if a variable gets deleted, that the value of double_inputs could change over time. This is a problem because double_inputs is used to remap the input locations to a two-slot-per-dvec3/4 scheme for i965. If that mapping were to change between glsl_to_nir and back-end state setup, we would fall over when trying to map the NIR outputs back onto the GL location space. This commit changes the way slot re-mapping works. Instead of the double_inputs field in shader_info, it adds a DualSlotInputs bitfield to gl_program. By having it in gl_program, we more easily guarantee that NIR passes won't touch it after it's been set. It also makes more sense to put it in a GL data structure since it's really a mapping from GL slots to back-end and/or NIR slots and not really a NIR shader thing. Tested-by: Alejandro Piñeiro <apinheiro@igalia.com> (ARB_gl_spirv tests) Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-09-06 16:07:50 -05:00
Marek Olšák	1285f71d3e	gallium: add PIPE_CAP_RASTERIZER_SUBPIXEL_BITS Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-09-06 16:07:40 -04:00
Eric Engestrom	3824c8e7cd	meson: disable asserts by default on release builds By the time Mesa 18.3 comes out (probably December '18), Meson 0.45 will be 9 months old (March '18), so I think this is reasonable. (btw, the currently-required Meson 0.44.1 was released less than 12 days before 0.45, so we're really not bumping by much.) Currently, the Meson versions in the major distributions are: Arch: ships 0.47.2 CentOS: 7 ships 0.47.1 Debian: stable ships 0.37.1, so it hasn't been usable in a long time. everything more recent ships 0.47.2 Fedora: 28 ships 0.45.1 FreeBSD: ships 0.46.1 (ports) Gentoo: ships 0.46.1 OpenSUSE: 15 ships 0.46 Ubuntu: 18.04 ships 0.45.1 Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-09-06 18:16:31 +01:00
Andrii Simiklit	2930b76cfe	mesa/util: add missing va_end() after va_copy() MSDN: "va_end must be called on each argument list that's initialized with va_start or va_copy before the function returns." Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107810 Fixes: `c6267ebd6c` "gallium/util: Stop bundling our snprintf implementation." Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>	2018-09-06 17:33:27 +01:00
Andrii Simiklit	65cfe698b0	mesa/util: don't ignore NULL returned from 'malloc' We should exit from the function 'util_vasprintf' with error code -1 for case where 'malloc' returns NULL Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Fixes: `864148d69e` "util: add util_vasprintf() for Windows (v2)" Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>	2018-09-06 17:33:27 +01:00
Andrii Simiklit	570cacba7a	mesa/util: don't use the same 'va_list' instance twice The first usage of the 'va_list' instance could change it. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Fixes: `864148d69e` "util: add util_vasprintf() for Windows (v2)" Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>	2018-09-06 17:33:27 +01:00
Andrii Simiklit	267ed29288	apple/glx/log: added missing va_end() after va_copy() Each invocation of va_copy() must be matched by a corresponding invocation of va_end() Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Fixes: `51691f0767` "darwin: Use ASL for logging" Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>	2018-09-06 17:33:27 +01:00
Eric Engestrom	6daba55aa1	meson: drop unnecessary llvm version hacks The current minimum meson version supported is 0.44.1, so we have met both the 0.43 and 0.44 requirement to not need these hacks anymore :) Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-09-06 17:16:58 +01:00
Danylo Piliaiev	2b98a023d9	mesa: add missing return statement for GL_RG_SNORM case Fixes: `0d356cf478` "mesa: enable EXT_render_snorm extension" Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-09-06 17:24:53 +03:00
Eric Engestrom	e67dadd3a9	meson: consolidate langs lists Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-09-06 15:22:24 +01:00
Eric Engestrom	07ff56791d	intel/compiler: remove unused get_image_base_type() Unused since `09f1de97a7` "anv,i965: Lower away image derefs in the driver". Cc: Jason Ekstrand <jason.ekstrand@intel.com> Signed-off-by: Eric Engestrom <eric@engestrom.ch> Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-06 15:22:24 +01:00
Mathias Fröhlich	a6232b6932	tnl: Fix green gun regression in xonotic. Fix an other regression of mesa: Make gl_vertex_array contain pointers to first order VAO members. The regression showed up with drivers using the tnl module and was reproducible using xonotic-glx -benchmark demos/the-big-keybench.dem. Fixes: `64d2a20480` mesa: Make gl_vertex_array contain pointers to first order VAO members. Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>	2018-09-06 14:35:12 +02:00
Lionel Landwerlin	2dce1175c1	Revert "i965/tools: 32bit compilation with meson" This reverts commit `4aec44c0d9`. Unfortunately this patch needed a another one to be committed first.	2018-09-06 12:25:07 +01:00
Sergii Romantsov	4aec44c0d9	i965/tools: 32bit compilation with meson Building of 32bit mesa with meson causes issue: "implicit declaration of function ‘__builtin_ia32_clflush’". Fixed by adding msse2 compilation flag. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107843 Fixes: `314879f7fe` (i965: Fix asynchronous mappings on !LLC platforms.) Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-09-06 11:55:57 +01:00
Timothy Arceri	b9fe8ff23d	glsl: fixer lexer for unreachable defines If we have something like: #ifdef NOT_DEFINED #define A_MACRO(x) \ if (x) #endif The # on the #define is not skipped but the define itself is so this then gets recognised as #if. Until `28a3731e3f` this didn't happen because we ended up in <HASH>{NONSPACE} where BEGIN INITIAL was called stopping the problem from happening. This change makes sure we never call RETURN_TOKEN_NEVER_SKIP for if/else/endif when processing a define. Cc: Ian Romanick <idr@freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107772 Tested-By: Eero Tamminen <eero.t.tamminen@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-09-06 10:13:21 +10:00
Hyunjun Ko	2454742a84	freedreno/ir3: insert mov if same instruction in the outputs. For example, result0 = texture(sampler[indexBase + 5], coords); result1 = texture(sampler[indexBase + 0], coords); result2 = texture(sampler[indexBase + 0], coords); out_result0 = result0; out_result1 = result1; out_result2 = result2; In this kind of case we need to insert an extra mov to the outputs so that the result could be assigned to each register respectively. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-05 13:38:43 -04:00
Hyunjun Ko	b4da2f6667	freedreno/ir3: make immediates array dynamic Since most shaders wouldn't need that large array of immediates, making the array dynamic could save unnecessary spaces. In addition, sometimes we can potentially have a much larger array of immediates to be lowered, which might be more than 64. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-05 13:38:43 -04:00
Rob Clark	c3d9f29b78	freedreno: allocate ctx's batch on demand Don't fall over when app wants more than 32 contexts. Instead allocate contexts on demand. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-05 13:38:43 -04:00
Rob Clark	a122118c14	freedreno: add fd_context_batch() accessor For cases in which (after the following commit) ctx->batch may be null. Prep work for following commit. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-05 13:38:43 -04:00
Rob Clark	a45e1802db	freedreno/a6xx: fix mem2gmem for zsbuf Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-05 13:38:43 -04:00
Rob Clark	c77e0948c7	freedreno/batch: fix crash in !reorder case We aren't using the batch-cache if reorder==false. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-05 13:38:43 -04:00
Rob Clark	2c623e7071	freedreno/ir3: better compile_error() printing Try to show the error at the appropriate line of nir Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-05 13:38:43 -04:00
Rob Clark	ca758251ba	freedreno/a6xx: bordercolor fixes Port fixes from a5xx (`f0715442`) TODO maybe this should move to shared code, since it seems to be the same. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-05 13:38:43 -04:00
Rob Clark	73378013d7	freedreno: fix context teardown harder The border_color_uploaders need to be torn down before the transfer_pool is destroyed. Fixes: `e11e9d6394` freedreno: fix context teardown race Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-05 13:38:43 -04:00
Rob Clark	1a24f51966	freedreno/ir3: ignore unused inputs We could end up w/ inputs larger than vec4, simply because unused inputs are not split. Fixes things like dEQP-GLES31.functional.separate_shader.random.77 (and probably a handful of others) Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-05 13:38:43 -04:00
Rob Clark	6b4397feab	freedreno/a6xx: fix debug build crash Porting `0c8d9e923a` to a6xx. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-09-05 13:38:43 -04:00
Dylan Baker	d25a27ec56	meson: Print a message about why a libdrm version was selected We require a single version of libdrm for all of our libdrm dependencies (core and driver), but the way this is structured can make the error message less than helpful, as one driver might be the one setting the libdrm requirement, while another might be the one that generates the version failure. This adds a simple message to the output announcing which libdrm module set the version, which might be more helpful. v2: - Use message suggested by Eric Engstrom Fixes: `c445b1d56f` ("meson: Use the same version for all libdrm checks") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-09-05 10:32:51 -07:00
Charmaine Lee	af104ad799	svga: rename face to layer_face Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-05 11:22:42 -06:00
Brian Paul	e334e104d0	svga: encode sample count in resource declarations No regressions before the corresponding host-side change. Reviewed-by: Neha Bhende <bhenden@vmware.com>	2018-09-05 11:22:42 -06:00
Charmaine Lee	49678e9e49	svga: sync with upstream changes to surface flags SVGA device now supports 64 bits surface flags. This patch updates the winsys interface to allow 64 bits surface flags. The linux winsys layer will for now only honor the lower 32 bits of the surface flags. Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-05 11:22:42 -06:00
Neha Bhende	4310649ccb	svga: avoid try_blit() for some depth formats on non vgpu10. On non vgpu10, driver doesn't support util_blitter_blit for SVGA3D_Z_D16, SVGA3D_Z_D24x8, SVGA3D_Z_D24S8. Patch fixes following piglit tests regression on hwv8 caused by commit 27bf35caea5e: spec@arb_depth_texture@fbo-depth-gl-depth-component16-blit spec@arb_depth_texture@fbo-depth-gl-depth-component24-blit spec@arb_depth_texture@fbo-depth-gl-depth-component32-blit Tested with mtt-piglit on hw 8,9,10,11,13 and mtt-glretrace on windows and linux. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-05 11:22:42 -06:00
Neha Bhende	53091a0312	svga: convert dst format to linear when blending is enabled. When blending is enabled, framebuffer colorspace has to be linear. Previously, we never hit this case because we were not supporting sRGB drawable. Previous patch added that support. Tested with mtt glretrace, viewperf, piglit, conform. Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-05 11:22:42 -06:00
Neha Bhende	dfab1289e8	winsys/svga: Avoid cap2 code path for now CAP2 functionality is not yet part of vmwgfx. This is causing unnecessary dmesg error messages. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-09-05 11:22:42 -06:00
Neha Bhende	8449c33a27	svga: start using SVGA3dCmdIntraSurfaceCopy command for svga_blit. Basically, SVGA3dCmdIntraSurfaceCopy command allow copying when source and destination are same. Tested with MTT piglit, glretrace, viewperf, conform v2: changes as per Charmaine's comment v3: changes as per Charmaine's comment Reviewed-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-05 11:22:42 -06:00
Neha Bhende	4639ef3763	svga/winsys: Add cap2 support in winsys Reviewed-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-05 11:22:42 -06:00
Neha Bhende	6b3627da08	svga: Add SVGA3dCmdIntraSurfaceCopy command support in OpenGL driver v2: changes as per Charmaine's comment Reviewed-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2018-09-05 11:22:42 -06:00
Brian Paul	bac94dfefa	svga: update device header files from upstream This is a squash commit of several earlier patches. Signed-off-by: Brian Paul <brianp@vmware.com>	2018-09-05 11:22:42 -06:00
Charmaine Lee	f4f39fa5d9	winsys/drm: Fix assert when try to accumulate an invalid fd This patch makes sure there is a valid fd before merging it to the context's fd in vmw_svga_winsys_fence_server_sync(). This fixes the assert running webot. No regression running kmscube. Reviewed-by: Sinclair Yeh <syeh@vmware.com>	2018-09-05 11:22:42 -06:00
Eric Anholt	16f17e3a3c	loader: Drop unused argument from dri3_update_drawable(). The argument has never been used since the function was added. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>	2018-09-05 10:11:27 -07:00
Alejandro Piñeiro	4e1f8d82c2	i965/fs: include multisamplers on image_intrinsic_coord_components This is the second patch needed to fix the following piglit tests: tests/spec/arb_gl_spirv/linker/uniform/multisampler.shader_test tests/spec/arb_gl_spirv/linker/uniform/multisampler-array.shader_test Although in this case it doesn't affect so many borrowed tests, as there aren't too many tests using multisamplers on Intel. It is worth to note that this patch is also needed when those tests are run on GLSL mode (using the --glsl option). Although most Intel drivers would not be able to run/execute tests using multisamplers, as GL_MAX_IMAGE_SAMPLES is zero, technically those tests are expected to link correctly, so linking tests should pass. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-05 17:02:28 +02:00
Alejandro Piñeiro	8969777686	i965: move brw_nir_lower_gl_images call At this moment that lowering is using info coming from the UniformStorage, so for the ARB_gl_spirv codepath, it needs to be done after calling gl_nir_link_uniforms. As for the GLSL codepath it can also be called later, we just move the call on both cases, to avoid adding several shader->spirv_data checks, and keep the patch as small as possible. This is the first patch needed to fix the following piglit tests: tests/spec/arb_gl_spirv/linker/uniform/multisampler.shader_test tests/spec/arb_gl_spirv/linker/uniform/multisampler-array.shader_test but fixes thousands of tests when borrowing the tests from other specs (that needs to be done manually right now). Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-05 17:02:28 +02:00
Alejandro Piñeiro	2a6182fe06	intel/compiler: rename brw_nir_lower_glsl_images To brw_nir_lower_gl_images, as it will be also used on the ARB_gl_spirv codepath, that doesn't involves GLSL at all. So the lowering is about images following the OpenGL semantics. In any case "brw_nir_lower_opengl_images" seemed too long to me, so I just used gl. That shortening is already used on other parts of the code. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-05 17:02:28 +02:00
Alejandro Piñeiro	960f6459be	intel/compiler: remove unused variable num_images Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-05 17:02:28 +02:00
Gert Wollny	218ff0d510	winsys/virgl/vtest: Correct off-by-one error in resource allocation The resource bo array must already extended when the target index is equal to the current size of the array. Signed-off-by: Gert Wollny <gert.wollny@collabora.com>	2018-09-05 13:54:01 +02:00
Gert Wollny	5341260f62	winsys/virgl: Initialize value to silence valgrind Silences: Conditional jump or move depends on uninitialised value(s) at 0xB72F2C0: virgl_drm_winsys_create (virgl_drm_winsys.c:854) by 0xB72F2C0: virgl_drm_screen_create (virgl_drm_winsys.c:926) by 0xB21C885: pipe_virgl_create_screen (drm_helper.h:275) by 0xB7201F0: pipe_loader_create_screen (pipe_loader.c:137) by 0xB639C91: dri2_init_screen (dri2.c:2112) by 0xB634F68: driCreateNewScreen2 (dri_util.c:153) by 0x63023E6: dri3_create_screen (dri3_glx.c:893) by 0x62D35BD: AllocAndFetchScreenConfigs (glxext.c:820) by 0x62D35BD: __glXInitialize (glxext.c:946) by 0x62CECB3: GetGLXPrivScreenConfig (glxcmds.c:174) by 0x62CF69C: glXQueryExtensionsString (glxcmds.c:1304) by 0x60AA7D9: ??? (in /usr/lib/x86_64-linux-gnu/libwaffle-1.so.0.5.2) by 0x4F81450: wfl_checked_display_connect (piglit-util-waffle.h:74) by 0x4F829E0: piglit_wfl_framework_init (piglit_wfl_framework.c:627) Signed-off-by: Gert Wollny <gert.wollny@collabora.com>	2018-09-05 13:54:01 +02:00
Gert Wollny	9b0e8d8723	winsys/virgl: correct resource and handle allocation (v2) Fixes crash with piglit/bin/map_buffer_range-invalidate CopyBufferSubData \ increment-offset -auto -fbo * Resize the resource storage already when the count is equal to the allocated size, fixes: Invalid write of size 8 at 0xB72E4CF: virgl_drm_add_res (virgl_drm_winsys.c:629) by 0xB72E4CF: virgl_drm_emit_res (virgl_drm_winsys.c:663) by 0xB72A44A: virgl_encode_resource_copy_region (virgl_encode.c:776) by 0xB40CD12: st_copy_buffer_subdata (st_cb_bufferobjects.c:585) by 0xB244A3B: _mesa_CopyBufferSubData (bufferobj.c:2940) by 0x109A1E: upload (invalidate.c:169) by 0x109C2F: piglit_display (invalidate.c:215) by 0x4F80FBE: run_test (piglit_fbo_framework.c:52) by 0x4F66E5F: piglit_gl_test_run (piglit-framework-gl.c:229) by 0x10949D: main (invalidate.c:47) Address 0xbe07d30 is 0 bytes after a block of size 4,096 alloc'd at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) by 0xB72DAAF: virgl_drm_cmd_buf_create (virgl_drm_winsys.c:567) * Also resize the space allocated for the handles, fixes: Invalid write of size 4 at 0xB72E4F0: virgl_drm_add_res (virgl_drm_winsys.c:631) by 0xB72E4F0: virgl_drm_emit_res (virgl_drm_winsys.c:663) by 0xB72A44A: virgl_encode_resource_copy_region (virgl_encode.c:776) by 0xB40CD12: st_copy_buffer_subdata (st_cb_bufferobjects.c:585) by 0xB244A3B: _mesa_CopyBufferSubData (bufferobj.c:2940) by 0x109A1E: upload (invalidate.c:169) by 0x109C2F: piglit_display (invalidate.c:215) by 0x4F80FBE: run_test (piglit_fbo_framework.c:52) by 0x4F66E5F: piglit_gl_test_run (piglit-framework-gl.c:229) by 0x10949D: main (invalidate.c:47) Address 0xbe08570 is 0 bytes after a block of size 2,048 alloc'd at 0x4C2FB0F: malloc ( in /usr/lib/valgrind/vgpreload_memcheck-amd64- linux.so) by 0xB72DAC8: virgl_drm_cmd_buf_create (virgl_drm_winsys.c:572) Fixes: `4b15b5e803` ("virgl: resize resource bo allocation if we need to.") v2: - Use REALLOC macro and avoid memory leak when re-allocation fails - add Fixes tag (both Emil Velikov) - reorder commit message Signed-off-by: Gert Wollny <gert.wollny@collabora.com>	2018-09-05 13:54:01 +02:00
Tomeu Vizoso	f13de57edb	virgl: use hw-atomics instead of in-ssbo ones Emulating atomics on top of ssbos can lead to too small max SSBO count, so let's use the hw-atomics mechanism to expose atomic buffers instead. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2018-09-05 05:46:58 +01:00
Erik Faye-Lund	1bd927d997	virgl: update minor differences to upstream header virgl_protocol.h is considered to have it's upstream in the virglrenderer repository, and somehow these minor differences has crept in. Let's sync with the upstream to avoid this. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2018-09-05 05:46:52 +01:00
Erik Faye-Lund	5a587d18d5	gallium: add PIPE_CAP_MAX_COMBINED_HW_ATOMIC_COUNTER{S,_BUFFERS} This moves the evergreen-specific max-sizes out as a driver-cap, so other drivers with less strict requirements also can use hw-atomics. Remove ssbo_atomic as it's no longer needed. We should now be able to use hw-atomics for some stages and not for other, if needed. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2018-09-05 05:46:46 +01:00
Erik Faye-Lund	d641d3f48b	gallium: add PIPE_CAP_MAX_COMBINED_SHADER_BUFFERS This gets rid of a r600 specific hack in the state-tracker, and prepares for other drivers to be able to use hw-atomics. While we're at it, clean up some indentation in the various drivers. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2018-09-05 05:46:37 +01:00
Erik Faye-Lund	84795f8c64	st/mesa: simplify MaxAtomicBufferSize-logic MaxAtomicCounters has already been assigned in the loop above in the ssbo_atomic = true case, so this will calculate the same value as the default. While we're at it, fixup indentation on the MaxAtomicBufferBindings assign. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2018-09-05 05:46:33 +01:00
Erik Faye-Lund	38f0c078de	st/mesa: clean up atomic vs ssbo code This makes the code a bit easier to follow; we first set up MaxShaderStorageBlocks, then we either set up a dedicated MaxAtomicBuffers, or we split MaxShaderStorageBlocks in two. While we're at it, also make the SSBO-splitting code tolerate the hypothetical case of having an odd number of SSBOs without incorrectly dropping the last SSBO. This has the nice result that the SSBOs and atomic buffers are dealt with almost completely orthogonally, easing some upcoming patches. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2018-09-05 05:46:27 +01:00
Erik Faye-Lund	a805e4e9de	st/mesa: use real bool for can_ubo We're doing full c99 now, so there's no point in using the old boolean type. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>	2018-09-05 05:46:09 +01:00
Marek Olšák	28e542dcdb	gallium/u_threaded: increase batch size to increase performance This reduces mutex overhead. radeonsi: +4.4% performance with piglit/drawoverhead, DrawElements, Ryzen X1700 iris_dri.so: +14% with piglit/drawoverhead, DrawArrays, i7 7700HQ. Acked-by: Kenneth Graunke <kenneth@whitecape.org>	2018-09-04 14:31:56 -04:00
Marek Olšák	ebd5806e0f	st/vdpau: silence an unitialized-variable warning	2018-09-04 14:01:43 -04:00
Marek Olšák	725e8ad559	st/mesa: help fix stencil border color for GL_DEPTH_STENCIL textures GL_STENCIL_INDEX uses GL_INTENSITY for the border color, which is nicer to hardware that doesn't read the stencil border value from the X channel. This fixes a bunch of dEQP tests on Vega & Raven. Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org>	2018-09-04 14:01:43 -04:00
Ernestas Kulik	d49904085a	glsl_to_tgsi: Fix potential leak Reported by Coverity: arr_live_ranges is freed in a different branch than the one in which it was allocated. Signed-off-by: Ernestas Kulik <ernestas.kulik@gmail.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2018-09-04 14:01:43 -04:00
Ernestas Kulik	ea1e50cc16	u_vbuf: Fix leak Reported by Coverity: data is heap-allocated, but only freed in the info->index_size != 0 branch. Signed-off-by: Ernestas Kulik <ernestas.kulik@gmail.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com> Cc: 18.2 <mesa-stable@lists.freedesktop.org>	2018-09-04 14:01:43 -04:00
Eric Anholt	2e59b88903	freedreno: Drop a bunch of duplicated gallium PIPE_CAP default code. Now that we have the util function for the default values, we can get rid of the boilerplate. v2: Rebase on new gallium caps Reviewed-by: Rob Clark <robdclark@gmail.com> (v1)	2018-09-04 08:08:22 -07:00
Eric Anholt	492b74b445	v3d: Drop a bunch of duplicated gallium PIPE_CAP default code. Now that we have the util function for the default values, we can get rid of the boilerplate. v2: Rebase on new gallium caps	2018-09-04 08:08:18 -07:00
Eric Anholt	c311e00000	vc4: Drop a bunch of duplicated gallium PIPE_CAP default code. Now that we have the util function for the default values, we can get rid of the boilerplate. v2: drop GLSL level in favor of defaults. v3: Rebase on new gallium caps	2018-09-04 08:08:10 -07:00
Eric Anholt	ad782a7020	gallium: Add a helper for implementing PIPE_CAP_* default values. One of the pains of implementing a gallium driver is filling in a million pipe caps you don't know about yet when you're just starting out. One of the pains of working on gallium is copy-and-pasting your new PIPE_CAP into each driver. We can fix both of these by having each driver call into the default helper from their default case, so that both sides can ignore each other until they need to. v2: fix i915g build, revert swr change to avoid breaking scons build (https://travis-ci.org/anholt/mesa/jobs/419739857) v3: Rebase on 3 new gallium caps. Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1) Cc: Bruce Cherniak <bruce.cherniak@intel.com> Cc: George Kyriazis <george.kyriazis@intel.com> Cc: Kenneth Graunke <kenneth@whitecape.org>	2018-09-04 08:07:52 -07:00
Jason Ekstrand	67571ae796	intel/compiler: Remove redundant nir_remove_dead_variables call As of `07a2098a70`, brw_nir_optimize calls nir_remove_dead_variables as the last optimization. Doing it again is just pointless. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-09-04 09:03:16 -05:00
Lionel Landwerlin	07a2098a70	intel: compiler: remove dead local variables at optimization pass We're hitting an assert in gfxbench because one of the local variable is a sampler (according to Jason this isn't valid) : testfw_app: ../src/compiler/nir_types.cpp:551: void glsl_get_natural_size_align_bytes(const glsl_type, unsigned int, unsigned int*): Assertion `!"type does not have a natural size"' failed. Since this particular variable isn't used, it can be eliminated by removing unused local variables at the end of the optimization loop. This makes sense also for valid local variables. v2: Move additional local variable removal out of optimization loop, but before large constant removal (Jason/Lionel) v3: Move the removal at the end of brw_nir_optimize() Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107806 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-09-03 17:24:19 +01:00
Andrii Simiklit	095600dad6	intel/decoder: fix the possible out of bounds group_iter The "gen_group_get_length" function can return a negative value and it can lead to the out of bounds group_iter. v2: printing of "unknown command type" was added v3: just the asserts are added Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-09-03 11:14:30 +01:00
Bas Nieuwenhuizen	233718a199	radv: Fix CMASK dimensions. Mirrors `1e40f69483` "ac/surface: fix CMASK fast clear for NPOT textures with mipmapping on SI/CI/VI" CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-09-03 09:24:30 +02:00
Bas Nieuwenhuizen	ab64891f4c	radv: Use a lower max offchip buffer count. No clue what gets fixed by this but both radeonsi and amdvlk do it. CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-09-03 09:24:30 +02:00
Bas Nieuwenhuizen	4dc244eb44	radv: Add VEGA20 support. Just mirror the radeonsi bits. Since this is just adding the extra switch entries for new HW I think this should be fine for stable. CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-09-03 09:24:30 +02:00
Dave Airlie	c1ba33c34b	radv: don't expose linear depth surfaces on SI/CIK/VI either. ac_surface.c: gfx6_compute_surface says /* DB doesn't support linear layouts. */ Now if we expose linear depth and create a linear depth image and use CmdCopyImage to copy into it, we can't map the underlying memory and read it linearly which I think should work. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-09-03 11:38:00 +10:00
Mauro Rossi	ac0856ae41	egl/android: do not indent HAVE_DRM_GRALLOC preprocessor directive Fixes: `3f7bca44d9` ("egl/android: #ifdef out flink name support") Fixes: `c7bb82136b` ("egl/android: Add DRM node probing and filtering") Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>	2018-09-02 11:27:08 +02:00
Jason Ekstrand	2ad9917e18	anv/blorp: Fix a comment as per Nanley's review feedback This accidentally didn't make it into `62378c5e9e`	2018-09-01 09:12:08 -05:00
Jason Ekstrand	62378c5e9e	anv/blorp: Do more flushing around HiZ clears We make the flush after a HiZ clear unconditional and add a flush/stall before the clear as well. Cc: mesa-stable@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107760 Reviewed-by: Chad Versace <chadversary@chromium.org> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-09-01 09:08:36 -05:00
Ian Romanick	82530ce1b5	i965/vec4: Clamp indirect tes input array reads with 0x0fffffff Page 190 of "Volume 7: 3D Media GPGPU Engine (Haswell)" says the valid range of the offset is [0, 0FFFFFFFh]. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: mesa-stable@lists.freedesktop.org	2018-09-01 00:23:45 -07:00
Ian Romanick	75666605c9	i965/vec4: Correctly handle uniform sources in generate_tes_add_indirect_urb_offset Fixes failure in the new piglit test tes-patch-input-array-vec2-index-invalid-rd.shader_test. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Cc: mesa-stable@lists.freedesktop.org	2018-09-01 00:23:43 -07:00
Andres Gomez	adad7e3aa8	docs: update calendar to extended the 18.1 cycle by one more release Due to having 2 additional RCs for 18.2. Cc: Dylan Baker <dylan.c.baker@intel.com> Cc: Juan A. Suarez <jasuarez@igalia.com> Cc: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Acked-by: Dylan Baker <dylan@pnwbakers.com> Acked-by: Juan A. Suarez <jasuarez@igalia.com> Acked-by: Emil Velikov <emil.velikov@collabora.com>	2018-09-01 02:23:14 +03:00
Rodrigo Vivi	e8c42ed4ab	intel: Introducing Amber Lake platform Amber Lake uses the same gen graphics as Kaby Lake, including a id that were previously marked as reserved on Kaby Lake, but that now is moved to AML page. This follows the ids and approach used on kernel's commit e364672477a1 ("drm/i915/aml: Introducing Amber Lake platform") Reported-by: Timo Aaltonen <timo.aaltonen@canonical.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Anuj Phogat <anuj.phogat@gmail.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-31 13:57:52 -07:00
Rodrigo Vivi	886a048feb	intel: aubinator: Adding missed platforms to the error message. Many new platforms got added to gen_device_name_to_pci_device_id() but the error message inside aubinator didn't reflected those changes. So syncing on the same order to be sure that we are not missing any now. Cc: Anuj Phogat <anuj.phogat@gmail.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Jordan Justen <jordan.l.justen@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-31 13:57:41 -07:00
Nanley Chery	904c2a617d	i965/gen7_urb: Re-emit PUSH_CONSTANT_ALLOC on some gen9 According to internal docs, some gen9 platforms have a pixel shader push constant synchronization issue. Although not listed among said platforms, this issue seems to be present on the GeminiLake 2x6's we've tested. We consider the available workarounds to be too detrimental on performance. Instead, we mitigate the issue by applying part of one of the workarounds. Re-emit PUSH_CONSTANT_ALLOC at the top of every batch (as suggested by Ken). Fixes ext_framebuffer_multisample-accuracy piglit test failures with the following options: * 6 depth_draw small depthstencil * 8 stencil_draw small depthstencil * 6 stencil_draw small depthstencil * 8 depth_resolve small * 6 stencil_resolve small depthstencil * 4 stencil_draw small depthstencil * 16 stencil_draw small depthstencil * 16 depth_draw small depthstencil * 2 stencil_resolve small depthstencil * 6 stencil_draw small * all_samples stencil_draw small * 2 depth_draw small depthstencil * all_samples depth_draw small depthstencil * all_samples stencil_resolve small * 4 depth_draw small depthstencil * all_samples depth_draw small * all_samples stencil_draw small depthstencil * 4 stencil_resolve small depthstencil * 4 depth_resolve small depthstencil * all_samples stencil_resolve small depthstencil v2: Include more platforms in WA (Ken). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106865 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93355 Cc: <mesa-stable@lists.freedesktop.org> Tested-by: Mark Janes <mark.a.janes@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-31 13:19:17 -07:00
Christian Gmeiner	773d6ea6e7	imx: make use of loader_open_render_node(..) helper Gets rid of hard-coded gpu device path. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-31 21:47:13 +02:00
Christian Gmeiner	b05a8f4f41	tegra: make use loader_open_render_node(..) helper Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-31 21:46:32 +02:00
Christian Gmeiner	ab348885eb	loader: add loader_open_render_node(..) This helper is almost a 1:1 copy of tegra_open_render_node(). Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-31 21:46:03 +02:00
Christian Gmeiner	d0b09e2dfe	tegra: fix memory leak Fixes: `1755f608f5` ("tegra: Initial support") Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-31 21:45:16 +02:00
Daniel Stone	01c0aa9f05	st/dri: Don't expose sRGB formats to clients Though the SARGB8888 format is used internally through its FourCC value, it is not a real format as defined by drm_fourcc.h; it cannot be used with KMS or other interfaces expecting drm_fourcc.h format codes. Ensure we don't advertise it through the dmabuf format/modifier query interfaces, preventing us from tripping over an assert. Signed-off-by: Daniel Stone <daniels@collabora.com> Reported-by: Michel Dänzer <michel.daenzer@amd.com> Fixes: `8c1b9882b2` ("egl/dri2: Guard against invalid fourcc formats") Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>	2018-08-31 18:02:42 +01:00
Samuel Pitoiset	686ec97cfb	radv: add missing support for protected memory properties Fixes Vulkan CTS CL#2849. Similar to the ANV driver. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-08-31 17:35:13 +02:00
Samuel Pitoiset	7355e9326b	radv: remove dead code in scan_shader_output_decl() Never used. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-08-31 17:34:41 +02:00
Samuel Pitoiset	e9acf069b2	radv: remove radv_shader_context::num_output_{clips,culls} Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-08-31 17:34:41 +02:00
Samuel Pitoiset	a6a6441c75	radv: adjust the cull dist mask in scan_shader_output_decl() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-08-31 17:34:41 +02:00
Samuel Pitoiset	ea778e760c	radv: get length of the clip/cull distances array from usage mask Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-08-31 17:34:41 +02:00
Samuel Pitoiset	732679c25e	radv: do not recompute the output usage mask for clipdist twice The shader info pass takes care of this now. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-08-31 17:34:41 +02:00
Samuel Pitoiset	730c704f86	radv: gather the output usage mask for clip/cull distances correctly It's a special case because both are combined into a single array. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-08-31 17:34:41 +02:00
Samuel Pitoiset	ffe3a2a298	radv: add set_output_usage_mask() helper Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-08-31 17:34:41 +02:00
Samuel Pitoiset	6f47df3129	radv: fix passing clip/cull distances from VS to PS CTS doesn't test input clip/cull distances for the fragment shader stage, which explains why this was totally broken. I wrote a simple test locally that works now. This fixes a crash with GTA V and DXVK. Note that we are exporting unused parameters from the vertex shader now, but this can't be optimized easily because we don't keep the fragment shader info... Cc: mesa-stable@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107477 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-08-31 17:34:36 +02:00
Juan A. Suarez Romero	54a9622dd5	egl/wayland: do not leak wl_buffer when it is locked If color buffer is locked, do not set its wayland buffer to NULL; otherwise it can not be freed later. Rather, flag it in order to destroy it later on the release event. v2: instruct release event to unlock only or free wl_buffer too (Daniel) This also fixes dEQP-EGL.functional.swap_buffers_with_damage.* tests. CC: Daniel Stone <daniel@fooishbar.org> Reviewed-by: Daniel Stone <daniels@collabora.com>	2018-08-31 16:29:36 +02:00
Dave Airlie	2c1f249f2b	ac/radeonsi: fix CIK copy max size While adding transfer queues to radv, I started writing some tests, the first test I wrote fell over copying a buffer larger than this limit. Checked AMDVLK and found the correct limit. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-31 15:11:49 +10:00
Dave Airlie	c9f5448695	radeonsi: fix regression in indirect input swizzles. This fixes: tests/spec/arb_enhanced_layouts/execution/component-layout/vs-fs-array-dvec3.shader_test since I reworked the 64-bit swizzles. Fixes: `bb17ae49ee` (gallivm: allow to pass two swizzles into fetches.) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-31 06:08:24 +01:00
Dave Airlie	750b829daf	radeonsi: fix tess/gs fetchs for new swizzle. I have piglit results from my machine, but I must have messed up, and not built mesa in between properly. Fixes: `bb17ae49ee` (gallivm: allow to pass two swizzles into fetches.) Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-31 06:08:21 +01:00
Marek Olšák	355ed029b0	mesa: ignore VAO IDs equal to 0 in glDeleteVertexArrays This fixes a firefox crash. Fixes: `781a78914c` Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-08-30 22:30:28 -04:00
Kenneth Graunke	b147254d36	Revert "intel/tools/aubwrite: Always use physical addresses for traces." This reverts commit `f8cfc77660`. This appears to break intel_dump_gpu for Gen9 systems - I can load them in the simulator, but nothing happens. Reverting the patch makes the simulator properly execute our commands and shaders again.	2018-08-30 14:36:28 -07:00
Jason Ekstrand	a0f18f2142	intel/nir: Lowering image loads and stores trashes all metadata This fixes the GL_ARB_fragment_shader_interlock piglit test on gen8 platforms where the lack of metadata dirtying was causing another pass to accidentally delete a much needed loop. https://bugs.freedesktop.org/show_bug.cgi?id=107745 Fixes: `37f7983bcc` "intel/compiler: Do image load/store lowering..." Jason Ekstrand <jason@jlekstrand.net> writes: Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-30 14:06:31 -05:00
Jason Ekstrand	d9cf4308ce	i965/screen: Allow modifiers on sRGB formats This effectively reverts `a266934935` which was a misguided attempt at protecting intel_query_dma_buf_modifiers from invalid formats. Unfortunately, in some internal EGL cases, we can get an SRGB format validly in this function. Rejecting such formats caused us to not allow CCS in some cases where we should have been allowing it. This regressed the performance of some SynMark tests as well as GfxBench ALU2, Tessellation and Manhattan 3.0 tests There's some question of whether or not we really should be using SRGB "fourcc" formats that aren't actually in drm_foucc.h but there's not much harm in allowing them through here. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107223 Fixes: `a266934935` "i965/screen: Return false for unsupported..." Tested-By: Eero Tamminen <eero.t.tamminen@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-30 11:41:50 -05:00
Jason Ekstrand	8c1b9882b2	egl/dri2: Guard against invalid fourcc formats We already reject attempts to import images with invalid fourcc formats but don't really guard the queries all that well. This makes us error out in any calls to eglQueryDmaBufModifiersEXT if the given format is not a valid fourcc format. We also add an assert to ensure that drivers don't advertise any non-fourcc formats. Cc: mesa-stable@lists.freedesktop.org Tested-By: Eero Tamminen <eero.t.tamminen@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-30 11:41:50 -05:00
Jason Ekstrand	b95896f492	egl/dri2: Add a helper for the number of planes for a FOURCC format This also serves as a convenient "is this a fourcc format" check as well which we'll take advantage of in the next commit. Cc: mesa-stable@lists.freedesktop.org Tested-By: Eero Tamminen <eero.t.tamminen@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-30 11:41:50 -05:00
Jason Ekstrand	19bdc7dd0f	radv/meta: Set num_components on image_store intrinsics Now that image load/store intrinsics are variable-width, we need to set num_components accordingly. In `15d39f474b`, both glsl_to_nir and spirv_to_nir were updated to properly set num_components but radv meta was left behind. Fixes: `15d39f474b` "nir: Make image load/store intrinsics..." Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-08-30 08:26:14 -05:00
Vicki Pfau	8c0e3f3822	gallivm: Detect VSX separately from Altivec Previously gallivm would attempt to use VSX instructions on all systems where it detected that Altivec is supported; however, VSX was added to POWER long after Altivec, causing lots of crashes on older POWER/PPC hardware, e.g. PPC Macs. By detecting VSX separately from Altivec we can automatically disable it on hardware that supports Altivec but not VSX Signed-off-by: Vicki Pfau <vi@endrift.com>	2018-08-30 06:09:49 +02:00
Ilia Mirkin	3e04c67950	nv50: bump compat glsl level to same as core Passes the compat piglits. I'm sure that there will be odd issues that aren't caught by them, but at least it should basically work. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-08-29 20:51:40 -04:00
Ilia Mirkin	a608e5cc9f	nvc0: bump compat GLSL version to match core This passes the handful of tests in piglit. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-08-29 20:51:40 -04:00
Ilia Mirkin	52a7297dc6	glsl: avoid lowering texcoord array except in simple cases With compat creeping up to geometry and tess shaders, lowering texcoord accesses/writes becomes more complicated. Since it's an optimization anyways, just avoid the complication for now. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-29 20:51:23 -04:00
Andres Gomez	3731233cba	docs: update calendar 18.2.0-rc5 is out, extend to 18.2.0-rc6 Signed-off-by: Andres Gomez <agomez@igalia.com>	2018-08-30 03:33:08 +03:00
Timothy Arceri	9c47c39687	st/mesa, gallium: add a workaround for No Mans Sky The spec seems clear this is not allowed but the Nvidia binary forces apps to add layout qualifiers so this works around the issue for No Mans Sky until the CTS can be sorted out. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-30 09:54:40 +10:00
Timothy Arceri	9ce7d79cdc	glsl: add a mechanism to allow layout qualifiers on function params The spec is quite clear this is not allowed: From Section 4.4. (Layout Qualifiers) of the GLSL 4.60 spec: "Layout qualifiers can appear in several forms of declaration. They can appear as part of an interface block definition or block member, as shown in the grammar in the previous section. They can also appear with just an interface-qualifier to establish layouts of other declarations made with that qualifier: layout-qualifier interface-qualifier ; Or, they can appear with an individual variable declared with an interface qualifier: layout-qualifier interface-qualifier declaration ;" From Section 4.10 (Memory Qualifiers) of the GLSL 4.60 spec: "Layout qualifiers cannot be used on formal function parameters, and layout qualification is not included in parameter matching." However on the Nvidia binary driver they actually fail to compile if image function params don't have a layout qualifier. This results in applications such as No Mans Sky using layout qualifiers on params. I've submitted a CTS test to expose this problem in the Nvidia driver but until that is resolved this patch will help Mesa drivers work around the issue. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-30 09:54:40 +10:00
Timothy Arceri	28a3731e3f	glsl: skip stringification in preprocessor if in unreachable branch This fixes compilation of some "No Mans Sky" shaders where the stringification happens in branches intended for DX12. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-08-30 09:51:57 +10:00
Bas Nieuwenhuizen	4738b6ac81	radv: Add missing checks in radv_get_image_format_properties. CC: <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-30 01:21:20 +02:00
Dave Airlie	bb17ae49ee	gallivm: allow to pass two swizzles into fetches. This hijacks the top 16-bits of swizzle, to pass in the swizzle for the second channel. This fixes handling .yx swizzles of 64-bit values. This should fixup radeonsi and llvmpipe. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107524 Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-30 00:15:40 +01:00
Timothy Arceri	3bcec6cf1c	radeonsi: enable radeonsi_zerovram for No Mans Sky Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-30 07:57:38 +10:00
Timothy Arceri	5566dd8a61	radeonsi: add radeonsi_zerovram driconfig option More and more games seem to require this so lets make it a config option. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-30 07:57:38 +10:00
Timothy Arceri	406c3d748d	radeonsi: enable GL 4.5 in compat profile Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-30 07:57:38 +10:00
Timothy Arceri	781a78914c	mesa: enable ARB_direct_state_access in compat for GL3.1+ We could enable it for lower versions of GL but this allows us to just use the existing version/extension checks that are already used by the core profile. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-30 07:57:38 +10:00
Marek Olšák	93b8b987d0	radeonsi: add a thorough clear/copy_buffer benchmark	2018-08-29 15:31:42 -04:00
Marek Olšák	5914f5bd4a	radeonsi: let internal compute dispatches tune WAVES_PER_SH	2018-08-29 15:31:42 -04:00
Marek Olšák	c5442c1165	radeonsi: add TGSI_SEMANTIC_CS_USER_DATA for reading up to 4 SGPRs with TGSI	2018-08-29 15:31:42 -04:00
Marek Olšák	d7250e4304	radeonsi: add SI_QUERY_TIME_ELAPSED_SDMA_SI for measuring DMA on SI DMA on SI doesn't support the timestamp packet, so it's emulated.	2018-08-29 15:31:42 -04:00
Marek Olšák	c359880d8b	radeonsi: add SI_QUERY_TIME_ELAPSED_SDMA for measuring SDMA performance	2018-08-29 15:31:42 -04:00
Marek Olšák	0c5429cc73	radeonsi: add flag L2_STREAM for minimal cache usage	2018-08-29 15:31:41 -04:00
Marek Olšák	8f6e06d160	gallium: add TGSI_MEMORY_STREAM_CACHE_POLICY For internal radeonsi shaders.	2018-08-29 15:31:41 -04:00
Jason Ekstrand	d8033d4083	intel/compiler: Remove surface_idx from brw_image_param Now that the drivers are lowering to surface indices themselves, we no longer need to push the surface index into the shader. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:03 -05:00
Jason Ekstrand	3cbc02e469	intel: Use TXS for image_size when we have a typed surface Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:03 -05:00
Jason Ekstrand	09f1de97a7	anv,i965: Lower away image derefs in the driver Previously, the back-end compiler turn image access into magic uniform reads and there was a complex contract between back-end compiler and driver about setting up and filling out those params. As of this commit, both drivers now lower image_deref_load_param_intel intrinsics to load_uniform intrinsics controlled by the driver and lower the other image_deref_* intrinsics to image_* intrinsics which take an actual binding table index. There are still "magic" uniforms but they are now added and controlled entirely by the driver and that contract no longer spans components. This also has the side-effect of making most image use compile-time binding table indices. Previously, all image access pulled the binding table index from a uniform. Part of the reason for this was that the magic uniforms made it difficult to decouple binding table indices from the uniforms and, since they are indexed completely differently (especially in Vulkan), it was hard to pull them apart. Now that the driver is handling both, it's trivial to decouple the two and provide actual binding table indices. Shader-db results on Kaby Lake: total instructions in shared programs: 15166872 -> 15164293 (-0.02%) instructions in affected programs: 115834 -> 113255 (-2.23%) helped: 191 HURT: 0 total cycles in shared programs: 571311495 -> 571196465 (-0.02%) cycles in affected programs: 4757115 -> 4642085 (-2.42%) helped: 73 HURT: 67 total spills in shared programs: 10951 -> 10926 (-0.23%) spills in affected programs: 742 -> 717 (-3.37%) helped: 7 HURT: 0 total fills in shared programs: 22226 -> 22201 (-0.11%) fills in affected programs: 1146 -> 1121 (-2.18%) helped: 7 HURT: 0 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:03 -05:00
Jason Ekstrand	0de003be03	nir: Add handle/index-based image intrinsics Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	3942943819	nir: Use a bitfield for image access qualifiers This commit expands the current memory access enum to contain the extra two bits provided for images. We choose to follow the SPIR-V convention of NonReadable and NonWriteable because readonly implies that you can read so readonly + writeonly doesn't make as much sense as NonReadable + NonWriteable. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	48e4fa7dd8	glsl/link,i965: Make ImageAccess four-state The GLSL spec allows you to set both the "readonly" and "writeonly" qualifiers on images to indicate that it can only be used with imageSize. However, we had no way of representing this int he linked shader and flagged it as GL_READ_ONLY. This is good from a "does it use this buffer?" perspective but not from a format and access lowering perspective. By using GL_NONE for if "readonly" and "writeonly" are both set, we can detect this case in the driver and handle it correctly. Nothing currently relies on the type of surface in the "readonly" + "writeonly" case but that's about to change. i965 is the only drier which uses the ImageAccess field and gl_bindless_image::access is currently unused. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	4289143899	intel/compiler: Use two components for 1D array image sizes Having the array length component stored in .z was a small convenience for the ISL image param filling code and an annoyance in the NIR lowering code. The only convenience of treating 1D arrays like 2D arrays in the lowering code is in the address calculation code so let's put all the complexity there as well. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	b1c414ef28	isl: Use the view array length for the image size Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	37f7983bcc	intel/compiler: Do image load/store lowering to NIR This commit moves our storage image format conversion codegen into NIR instead of doing it in the back-end. This has the advantage of letting us run it through NIR's optimizer which is pretty effective at shrinking things down. In the common case of rgba8, the number of instructions emitted after NIR is done with it is half of what it was with the lowering happening in the back-end. On the downside, the back-end's lowering is able to directly use predicates and the NIR lowering has to use IFs. Shader-db results on Kaby Lake: total instructions in shared programs: 15166910 -> 15166872 (<.01%) instructions in affected programs: 5895 -> 5857 (-0.64%) helped: 15 HURT: 0 Clearly, we don't have that much image_load_store happening in the shaders in shader-db.... Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	b217705dec	nir/types: Add a wrapper for coordinate_components Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	f2d0a2b110	anv/pipeline: Remove dead image loads in lower_input_attacnments Dead code will get rid of them eventually but it's better if they're just gone so we guarantee they won't trip up later passes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	15d39f474b	nir: Make image load/store intrinsics variable-width Instead of requiring 4 components, this allows them to potentially use fewer. Both the SPIR-V and GLSL paths still generate vec4 intrinsics so drivers which assume 4 components should be safe. However, we want to be able to shrink them for i965. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	7cdf8f9339	nir/format_convert: Fix a bitmask in unpack_11f11f10f Fixes: `4e337b42f9` "nir/format_convert: Add pack/unpack for R11F_G11F_B10F" Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	1f7be4968f	nir/format_convert: Rename pack_r11g11b10f to pack_11f11f10f This matches the unpack function. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	7bd0363d6f	nir/format_convert: Add [us]norm conversion helpers Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	152fdeddbb	nir/format_convert: Rename nir_format_bitcast_uint_vec We have a name for that, it's called a uvec. This just makes the function name a bit shorter. While we're here, we also add an assert for one of the assumptions this function makes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	7c5df52bdc	nir/format_convert: Add vec mask and sign-extend helpers Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	ea4f200864	nir/format_convert: Add support for unpacking signed integers Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	80c424148b	nir/opcodes: Make unpack_half_2x16_split_* variable-width There is nothing inherent about these opcodes that requires them to only take scalars. It's very convenient if we let them take vectors as well. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	d448fa3ae3	nir/algebraic: Add some max/min optimizations Found by inspection. This doesn't help much now but we'll see this pattern with images if you load UNORM and then store UNORM. Shader-db results on Kaby Lake: total instructions in shared programs: 15166916 -> 15166910 (<.01%) instructions in affected programs: 761 -> 755 (-0.79%) helped: 6 HURT: 0 Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	4dd5263663	nir/algebraic: Add more extract_[iu](8\|16) optimizations This adds the "(a << N) >> M" family of mask or sign-extensions. Not a huge win right now but this pattern will soon be generated by NIR format lowering code. Shader-db results on Kaby Lake: total instructions in shared programs: 15166918 -> 15166916 (<.01%) instructions in affected programs: 36 -> 34 (-5.56%) helped: 2 HURT: 0 Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Jason Ekstrand	116b47fe3c	nir/algebraic: Be more careful converting ushr to extract_u8/16 If it's not the right bit-size, it may not actually be the correct extraction. For now, we'll only worry about 32-bit versions. Fixes: `905ff86198` "nir: Recognize open-coded extract_u16" Fixes: `76289fbfa8` "nir: Recognize open-coded extract_u8" Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-29 14:04:02 -05:00
Sagar Ghuge	40fc4b5acd	intel/tools: new i965_disasm tool Adds a new i965 instruction disassemble tool v2: 1) fix a few nits (Matt Turner) 2) Remove i965_disasm header (Matt Turner) v3: 1) Redirect output to correct file descriptors (Matt Turner) 2) Refactor code (Matt Turner) 3) Use better formatting style (Matt Turner) Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>	2018-08-29 11:19:55 -07:00
Kenneth Graunke	8fb966688b	st/mesa: Disable blending for integer formats. Blending isn't valid for integer formats. Rather than having drivers worry about this, just disable blending in this case. This hopefully will increase hits in the CSO cache as well, by eliminating most of the meaningless fields in this case. Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-29 10:51:11 -07:00
Brian Paul	18e9b4791b	svga: add missing switch cases for shadow textures This doesn't seem to make any difference in testing, but it fixes a failed assertion when dumping sm3 shaders. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-08-29 11:29:07 -06:00
Brian Paul	fb7e462c97	svga: fix vgpu9 sprite coordinate bug Setting GL_POINT_SPRITE_COORD_ORIGIN to GL_LOWER_LEFT did not work for vgpu9. We can use the rasterizer sprite_coord_enable bitfield as-is. We need to index into it using the TGSI semantic index, not the register index. This fixes the Piglit fbo-gl_pointcoord and glsl-fs-pointcoord tests. Testing done: Piglit, Mesa sprite demos Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-08-29 11:29:07 -06:00
Brian Paul	8331d69a87	svga: fix PIPE_TEXTURE_RECT/BUFFER const buffer issue The flag_rect and flag_buffer fields didn't sufficiently capture the state changes needed for those resource types. For example, if a texture binding was changed from a 500x500 rect texture to a 400x400 rect texture we didn't set SVGA_NEW_TEXTURE_CONSTS. But we need to do that to emit the new texcoord scale factors to the constant buffers. Rather than track the sizes of all bound resources, just set the flag if the resource is a rect. Same story with texture buffers. Also, since rect/buffer textures are usable with VS/GS shaders, add SVGA_NEW_TEXTURE_CONSTS to the flags we check for emitting VS/GS constants. This seems to help with XFCE / xfwm4 desktop scaling. VMware issue 2156696. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-08-29 11:29:07 -06:00
Brian Paul	46c7433da8	svga: minor improvements in svga_state_constants.c Add const qualifiers. Add 'f' suffix on floats to avoid double promotion. Remove unneeded shader type assertion since the switch statement handled it already. Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-08-29 11:29:07 -06:00
Jason Ekstrand	cdea5d996e	anv: Free the app and engine name Fixes: `8c048af589` "anv: Copy the appliation info into the instance" Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-29 11:24:57 -05:00
Rhys Kidd	f7d0c112cb	nv50/ir: silence partitionLoadStore() unused function warning Move this now-unused function into the existing comment block, which was its only prior use. ../../../../../src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp:2645:1: warning: unused function 'partitionLoadStore' [-Wunused-function] partitionLoadStore(uint8_t comp[2], uint8_t size[2], uint8_t mask) Fixes: ("86e4440361 nouveau: codegen: Disable more old resource handling code") Signed-off-by: Rhys Kidd <rhyskidd@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>	2018-08-29 08:59:27 -04:00
vadym.shovkoplias	966a797e43	glsl/linker: Link all out vars from a shader objects on a single stage During intra stage linking some out variables can be dropped because it is not used in a shader with the main function. But these out vars can be referenced on later stages which can lead to further linking errors. Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105731	2018-08-29 20:03:56 +10:00
Lionel Landwerlin	5a1c23d150	anv: blorp: support multiple aspect blits Newer blit tests are enabling depth&stencils blits. We currently don't support it but can do by iterating over the aspects masks (copy some logic from the CopyImage function). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `9f44745eca` ("anv: Use blorp to implement VkBlitImage") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-29 10:31:06 +01:00
Tapani Pälli	a72dbc461b	mesa: allow GL_UNSIGNED_BYTE type for SNORM reads OpenGL ES spec states: "For normalized fixed-point rendering surfaces, the combination format RGBA and type UNSIGNED_BYTE is accepted." This fixes following failing VK-GL-CTS tests: KHR-GLES3.packed_pixels.pbo_rectangle.rgba8_snorm KHR-GLES3.packed_pixels.rectangle.rgba8_snorm KHR-GLES3.packed_pixels.varied_rectangle.rgba8_snorm Signed-off-by: Tapani Pälli <tapani.palli@intel.com> https://bugs.freedesktop.org/show_bug.cgi?id=107658 Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Marek Olšák <marek.olsak@amd.com> Tested-by: Andres Gomez <agomez@igalia.com>	2018-08-29 09:26:23 +03:00
Timothy Arceri	5db981952a	nir: add loop unroll support for wrapper loops This adds support for unrolling the classic do { // ... } while (false) that is used to wrap multi-line macros. GLSL IR also wraps switch statements in a loop like this. shader-db results IVB: total loops in shared programs: 2515 -> 2512 (-0.12%) loops in affected programs: 33 -> 30 (-9.09%) helped: 3 HURT: 0 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-29 16:02:05 +10:00
Timothy Arceri	0f450b57a1	nir/opt_loop_unroll: Remove unneeded phis if we make progress Now that SSA values can be derefs and they have special rules, we have to be a bit more careful about our LCSSA phis. In particular, we need to clean up in case LCSSA ended up creating a phi node for a deref. This avoids validation issues with some CTS tests with the following patch, but its possible this we could also see the same problem with the existing unrolling passes. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-29 16:02:05 +10:00
Timothy Arceri	5a6b04d94b	nir: add complex_loop bool to loop info In order to be sure loop_terminator_list is an accurate representation of all the jumps in the loop we need to be sure we didn't encounter any other complex behaviour such as continues, nested breaks, etc during analysis. This will be used in the following patch. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-29 16:02:05 +10:00
Timothy Arceri	fef6325e58	nir: always attempt to find loop terminators This will help later patches with unrolling loops that end with a break i.e. loops the always exit on their first interation. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-29 16:02:05 +10:00
Marek Olšák	1e40f69483	ac/surface: fix CMASK fast clear for NPOT textures with mipmapping on SI/CI/VI This fixes VM faults and corruption. Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-08-28 19:51:51 -04:00
Ian Romanick	c836326a29	i965/vec4: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1 No shader-db changes on any Intel platform. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-28 15:35:50 -07:00
Ian Romanick	c856403868	i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for imageAtomicAdd of +1 or -1 v2: Refactor selection of atomic opcode to a separate function. Suggested by Jason. No changes on any other Intel platforms. Skylake total instructions in shared programs: 14304261 -> 14304241 (<.01%) instructions in affected programs: 1625 -> 1605 (-1.23%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5 helped stats (rel) min: 1.01% max: 14.29% x̄: 5.86% x̃: 4.07% 95% mean confidence interval for instructions value: -10.66 0.66 95% mean confidence interval for instructions %-change: -15.91% 4.19% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 527531226 -> 527531194 (<.01%) cycles in affected programs: 92204 -> 92172 (-0.03%) helped: 2 HURT: 0 Haswell and Broadwell had similar results. (Broadwell shown) total instructions in shared programs: 14615730 -> 14615710 (<.01%) instructions in affected programs: 1838 -> 1818 (-1.09%) helped: 4 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5 helped stats (rel) min: 0.89% max: 13.04% x̄: 5.37% x̃: 3.78% 95% mean confidence interval for instructions value: -10.66 0.66 95% mean confidence interval for instructions %-change: -14.59% 3.85% Inconclusive result (value mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-28 15:35:46 -07:00
Ian Romanick	b6e247cf0e	i965/fs: Refactor image atomics to be a bit more like other atomics This greatly simplifies the next patch. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-28 15:35:46 -07:00
Ian Romanick	fabe3ead57	i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1 Funny story... a single shader was hurt for instructions, spills, fills. That same shader was also the most helped for cycles. #GPUsAreWeird No changes on any other Intel platform. v2: Refactor selection of atomic opcode to a separate function. Suggested by Jason. Haswell, Broadwell, and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14304116 -> 14304261 (<.01%) instructions in affected programs: 12776 -> 12921 (1.13%) helped: 19 HURT: 1 helped stats (abs) min: 1 max: 16 x̄: 2.32 x̃: 1 helped stats (rel) min: 0.05% max: 7.27% x̄: 0.92% x̃: 0.55% HURT stats (abs) min: 189 max: 189 x̄: 189.00 x̃: 189 HURT stats (rel) min: 4.87% max: 4.87% x̄: 4.87% x̃: 4.87% 95% mean confidence interval for instructions value: -12.83 27.33 95% mean confidence interval for instructions %-change: -1.57% 0.31% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 527552861 -> 527531226 (<.01%) cycles in affected programs: 1459195 -> 1437560 (-1.48%) helped: 16 HURT: 2 helped stats (abs) min: 2 max: 21328 x̄: 1353.69 x̃: 6 helped stats (rel) min: 0.01% max: 5.29% x̄: 0.36% x̃: 0.03% HURT stats (abs) min: 12 max: 12 x̄: 12.00 x̃: 12 HURT stats (rel) min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03% 95% mean confidence interval for cycles value: -3699.81 1295.92 95% mean confidence interval for cycles %-change: -0.94% 0.30% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 8025 -> 8033 (0.10%) spills in affected programs: 208 -> 216 (3.85%) helped: 1 HURT: 1 total fills in shared programs: 10989 -> 11040 (0.46%) fills in affected programs: 444 -> 495 (11.49%) helped: 1 HURT: 1 Ivy Bridge total instructions in shared programs: 11709181 -> 11709153 (<.01%) instructions in affected programs: 3505 -> 3477 (-0.80%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 23 x̄: 9.33 x̃: 4 helped stats (rel) min: 0.11% max: 1.16% x̄: 0.63% x̃: 0.61% total cycles in shared programs: 254741126 -> 254738801 (<.01%) cycles in affected programs: 919067 -> 916742 (-0.25%) helped: 3 HURT: 0 helped stats (abs) min: 21 max: 2144 x̄: 775.00 x̃: 160 helped stats (rel) min: 0.03% max: 0.90% x̄: 0.32% x̃: 0.03% total spills in shared programs: 4536 -> 4533 (-0.07%) spills in affected programs: 40 -> 37 (-7.50%) helped: 1 HURT: 0 total fills in shared programs: 4819 -> 4813 (-0.12%) fills in affected programs: 94 -> 88 (-6.38%) helped: 1 HURT: 0 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v1] Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-28 15:35:38 -07:00
Ian Romanick	41399f4bc7	intel/compiler: Silence unused parameter warnings in brw_eu.h All of the other brw__desc functions take a devinfo parameter, and all of the others at least have an assert that uses it. Keep the parameter, but mark it as unused. Silences 37 warnings like: In file included from src/intel/common/gen_disasm.c:27:0: src/intel/compiler/brw_eu.h: In function ‘brw_pixel_interp_desc’: src/intel/compiler/brw_eu.h:377:53: warning: unused parameter ‘devinfo’ [-Wunused-parameter] brw_pixel_interp_desc(const struct gen_device_info devinfo, ^~~~~~~ Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-28 15:35:38 -07:00
Sagar Ghuge	56574f4df3	i965: enable AMD_depth_clamp_separate Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-08-28 12:57:27 -07:00
Sagar Ghuge	e6adea0dc0	i965: add functional changes for AMD_depth_clamp_separate Gen >= 9 have ability to control clamping of depth values separately at near and far plane. z_w is clamped to the range [min(n,f), 0] if clamping at near plane is enabled, [0, max(n,f)] if clamping at far plane is enabled and [min(n,f) max(n,f)] if clamping at both plane is enabled. v2: 1) Use better coding style (Ian Romanick) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-08-28 12:57:27 -07:00
Sagar Ghuge	2765749e0f	mesa: add EXTRA_EXT for AMD_depth_clamp_separate Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-28 12:57:27 -07:00
Sagar Ghuge	2770446740	mesa: add support for GL_AMD_depth_clamp_separate tokens _mesa_set_enable() and _mesa_IsEnabled() extended to accept new two tokens GL_DEPTH_CLAMP_NEAR_AMD and GL_DEPTH_CLAMP_FAR_AMD. v2: Remove unnecessary parentheses (Marek Olsak) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-28 12:57:27 -07:00
Sagar Ghuge	5650d39978	mesa: Add support for AMD_depth_clamp_separate Enable _mesa_PushAttrib() and _mesa_PopAttrib() to handle GL_DEPTH_CLAMP_NEAR_AMD and GL_DEPTH_CLAMP_FAR_AMD tokens. Remove DepthClamp, because DepthClampNear + DepthClampFar replaces it, as suggested by Marek Olsak. Driver that enables AMD_depth_clamp_separate will only ever look at DepthClampNear and DepthClampFar, as suggested by Ian Romanick. v2: 1) Remove unnecessary parentheses (Marek Olsak) 2) if AMD_depth_clamp_separate is unsupported, TEST_AND_UPDATE GL_DEPTH_CLAMP only (Marek Olsak) 3) Clamp against near and far plane separately (Marek Olsak) 4) Clip point separately for near and far Z clipping plane (Marek Olsak) v3: Clamp raster position zw to the range [min(n,f), 0] for near plane and [0, max(n,f)] for far plane (Marek Olsak) v4: Use MIN2 and MAX2 instead of CLAMP (Marek Olsak) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-28 12:57:27 -07:00
Sagar Ghuge	379949b967	mesa: Add types for AMD_depth_clamp_separate. Add some basic types and storage for the AMD_depth_clamp_separate extension. v2: 1) Drop unnecessary definition (Marek Olsak) 2) Expose extension in compatibility profile (Marek Olsak) Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-28 12:57:27 -07:00
Sagar Ghuge	f663fb5487	glapi: define AMD_depth_clamp_separate Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-28 12:57:27 -07:00
Jason Ekstrand	c92a463d23	anv: Claim to support depthBounds for ID games Cc: "18.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-28 13:05:54 -05:00
Jason Ekstrand	8c048af589	anv: Copy the appliation info into the instance Cc: "18.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-28 13:05:54 -05:00
Jason Ekstrand	4ffb575da5	vulkan/alloc: Add a vk_strdup helper Cc: "18.2" <mesa-stable@lists.freedesktop.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-28 13:05:54 -05:00
Dylan Baker	7c00db9527	meson: Actually load translation files Currently we run the script but don't actually load any files, even in a tarball where they exist. Fixes: `3218056e0e` ("meson: Build i965 and dri stack") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-28 08:51:05 -07:00
Caio Marcelo de Oliveira Filho	f172a77dd8	nir: Remove outdated comment Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-28 08:11:03 -07:00
Kevin Rogovin	03ecec9ed2	i965: Add INTEL_fragment_shader_ordering support. Adds suppport for INTEL_fragment_shader_ordering. We achieve the fragment ordering by using the same instruction as for beginInvocationInterlockARB() which is by issuing a memory fence via sendc. Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com> Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>	2018-08-28 17:15:10 +03:00
Kevin Rogovin	119435c877	mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering This extension provides new GLSL built-in function beginFragmentShaderOrderingIntel() that guarantees (taking wording of GL_INTEL_fragment_shader_ordering extension) that any memory transactions issued by shader invocations from previous primitives mapped to same xy window coordinates (and same sample when per-sample shading is active), complete and are visible to the shader invocation that called beginFragmentShaderOrderingINTEL(). One advantage of INTEL_fragment_shader_ordering over ARB_fragment_shader_interlock is that it provides a function that operates as a memory barrie (instead of a defining a critcial section) that can be called under arbitary control flow from any function (in contrast the begin/end of ARB_fragment_shader_interlock may only be called once, from main(), under no control flow. Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com> Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>	2018-08-28 17:15:10 +03:00
Andrii Simiklit	1b0df8a460	i965/gen6/xfb: handle case where transform feedback is not active When the SVBI Payload Enable is false I guess the register R1.4 which contains the Maximum Streamed Vertex Buffer Index is filled by zero and GS stops to write transform feedback when the transform feedback is not active. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107579 Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com> Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>	2018-08-28 15:32:45 +02:00
Rhys Perry	743e11c10b	docs: add forgotten features to 18.2.0 release notes Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewied-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: 18.2: <mesa-stable@lists.freedesktop.org>	2018-08-28 13:50:51 +01:00
Erik Faye-Lund	a4e60ccb56	virgl: add debug-switch to output TGSI This is quite useful for debugging shader-transpiling issues in virglrenderer. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-By: Gert Wollny <gert.wollny@collabora.com>	2018-08-28 14:13:43 +02:00
Erik Faye-Lund	4ab06cc56e	virgl: introduce $VIRGL_DEBUG=verbose This adds an environment-varaible that can be used for driver-specific flags, as well as a flag for it to enable verbose output. While we're at it, quiet some overly chatty debug-output by default. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-By: Gert Wollny <gert.wollny@collabora.com>	2018-08-28 14:13:43 +02:00
Erik Faye-Lund	1b2444dffc	virgl: replace fprintf-call with debug_printf This is the only direct call-site for fprintf in virgl; all other call-sites call debug_printf instead. So let's follow in style here. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-By: Gert Wollny <gert.wollny@collabora.com>	2018-08-28 14:13:43 +02:00
Erik Faye-Lund	2ebfa90abe	virgl: delete commented out fprintf-call This is just debug-cruft left over. Let's just get rid of it. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-By: Gert Wollny <gert.wollny@collabora.com>	2018-08-28 14:13:43 +02:00
Guido Günther	9de34b4dde	meson: Don't enable any vulkan drivers on arm, aarch64 There's no Vulkan support for arm atm. Signed-off-by: Guido Günther <guido.gunther@puri.sm> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-27 11:32:04 -07:00
Guido Günther	05e2fc6860	meson: Be a bit more helpful when arch or OS is unknown V2: Add one missing @0@ Signed-off-by: Guido Günther <guido.gunther@puri.sm> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-27 11:31:52 -07:00
Sagar Ghuge	a1e3305f75	intel/eu: print bytes instead of 32 bit hex value INTEL_DEBUG=hex prints 32 bit hex value and due to endianness of CPU byte order is reversed. In order to disassemble binary files, print each byte instead of 32 bit hex value. v2: Print blank spaces in order to vertically align output of compacted instructions hex value with uncompacted instructions hex value. (Matt Turner) v3: Fix line wrap at correct length Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-08-27 11:07:39 -07:00
Lionel Landwerlin	440a988bd1	intel: decoder: handle 0 sized structs Gen7.5 has a BLEND_STATE of size 0 which includes a variable length group. We did not deal with that very well, leading to an endless loop. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107544 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-27 18:33:18 +01:00
Rhys Perry	e56e600bd3	nv50/ir,nvc0: use constant buffers for compute when possible on Kepler+ Gives a +7.79% increase in FPS with Hitman on lowest quality settings on my GTX 1060. total instructions in shared programs : 5787979 -> 5748677 (-0.68%) total gprs used in shared programs : 669901 -> 669373 (-0.08%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21068 -> 21064 (-0.02%) local shared gpr inst bytes helped 1 0 152 274 274 hurt 0 0 0 0 0 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2018-08-27 14:23:42 +01:00
Rhys Perry	d27c791891	nv50/ir: optimize multiplication by 16-bit immediates into two xmads Rather than the usual three that would be created. total instructions in shared programs : 5796385 -> 5786560 (-0.17%) total gprs used in shared programs : 670103 -> 669968 (-0.02%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21164 -> 21068 (-0.45%) local shared gpr inst bytes helped 1 0 64 1040 1040 hurt 0 0 27 0 0 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2018-08-27 13:57:11 +01:00
Rhys Perry	400a4eb964	nv50/ir: optimize near power-of-twos into shladd total instructions in shared programs : 5819319 -> 5796385 (-0.39%) total gprs used in shared programs : 670571 -> 670103 (-0.07%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21164 -> 21164 (0.00%) local shared gpr inst bytes helped 0 0 318 1758 1758 hurt 0 0 63 0 0 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2018-08-27 13:57:01 +01:00
Rhys Perry	2f52925f5c	nv50/ir: move a * b -> a << log2(b) code into createMul() With this commit, OP_MAD is handled on nv50 too. This commit is also useful for later commits. Also, instead of creating a shladd, it relies on LateAlgebraicOpt to create one. This simplifies the code and helps shader-db slightly overall. total instructions in shared programs : 5820882 -> 5819319 (-0.03%) total gprs used in shared programs : 670595 -> 670571 (-0.00%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21164 -> 21164 (0.00%) local shared gpr inst bytes helped 0 0 18 230 230 hurt 0 0 8 263 263 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2018-08-27 13:56:47 +01:00
Rhys Perry	b60bc7a4ab	nv50/ir: optimize imul/imad to xmads This hits the shader-db numbers a good bit, though a few xmads is way faster than an imul or imad and the cost is mitigated by the next commit, which optimizes many multiplications by immediates into shorter and less register heavy instructions than the xmads. total instructions in shared programs : 5768871 -> 5820882 (0.90%) total gprs used in shared programs : 669919 -> 670595 (0.10%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21068 -> 21164 (0.46%) local shared gpr inst bytes helped 0 0 38 0 0 hurt 1 0 365 3076 3076 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2018-08-27 13:56:44 +01:00
Rhys Perry	bcbcdf8448	gm107/ir: add support for OP_XMAD on GM107+ v4: make the immediate field 16 bits v5: don't ever emit h1 flags for immediates Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2018-08-27 13:56:41 +01:00
Rhys Perry	5d6952d2de	nv50/ir: add preliminary support for OP_XMAD v4: remove uint16_t(...) v4: don't allow immediates outside [0,65535] in insnCanLoad() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Karol Herbst <kherbst@redhat.com>	2018-08-27 13:56:36 +01:00
vadym.shovkoplias	4a8444d5bc	glsl/linker: Allow unused in blocks which are not declated on previous stage >From Section 4.3.4 (Inputs) of the GLSL 1.50 spec: "Only the input variables that are actually read need to be written by the previous stage; it is allowed to have superfluous declarations of input variables." Fixes: * interstage-multiple-shader-objects.shader_test v2: Update comment in ir.h since the usage of "used" field has been extended. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101247 Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-27 12:13:53 +02:00
Jason Ekstrand	07a227f543	nir: Pull block_ends_in_jump into nir.h We had two different implementations in different files. May as well have one and put it in nir.h. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-27 02:15:38 -05:00
Samuel Iglesias Gonsálvez	59a8e0dbf8	anv: Add support for protected memory properties on anv_GetPhysicalDeviceProperties2() VkPhysicalDeviceProtectedMemoryProperties structure is new on Vulkan 1.1. Fixes Vulkan CTS CL#2849. Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-27 09:07:52 +02:00
Jason Ekstrand	aad501f15e	intel/tools: Add 0x in front of a couple of hex values Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-25 18:47:08 -05:00
Jason Ekstrand	76b0e4d8c9	anv: Fill holes in the VF VUE to zero This fixes a GPU hang in DOOM 2016 running under wine. Cc: mesa-stable@lists.freedesktop.org Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104809 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-25 18:47:08 -05:00
Kai Wasserbäch	b2313ef4a8	intel: tools: Fix aubinator_error's fprintf call (format-security) The recent commit `4616639b49` introduced the new function aubinator_error() which is a trivial wrapper around fprintf() to STDERR. The call to fprintf() however is passed the message msg directly: fprintf(stderr, msg); This is a format-security violation and leads to an FTBFS with -Werror=format-security (GCC 8): ../../../src/intel/tools/aubinator.c: In function 'aubinator_error': ../../../src/intel/tools/aubinator.c:74:4: error: format not a string literal and no format arguments [-Werror=format-security] fprintf(stderr, msg); ^~~~~~~ This patch fixes this trivially by introducing a catch-all "%s" format argument. Fixes: `4616639b49` ("intel: tools: split aub parsing from aubinator") Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-25 16:52:12 +01:00
Jason Ekstrand	70de31d0c1	intel/batch_decoder: Print blend states properly Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-25 07:50:45 -05:00
Jason Ekstrand	cbd4bc1346	intel/batch_decoder: Fix dynamic state printing Instead of printing addresses like everyone else, we were accidentally printing the offset from state base address. Also, state_map is a void pointer so we were incrementing in bytes instead of dwords and every state other than the first was wrong. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-25 07:50:43 -05:00
Jason Ekstrand	d1971be6ea	intel/decoder: Print ISL formats for vertex elements Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-25 07:50:40 -05:00
Jason Ekstrand	2abd7ae189	intel/decoder: Clean up field iteration and fix sub-dword fields First of all, setting iter->name in advance_field is unnecessary because it gets set by gen_decode_field which gets called immediately after gen_decode_field in the one call-site. Second, we weren't properly initializing start_bit and end_bit in the initial condition of gen_field_iterator_next so the first field of a struct would get printed wrong if it doesn't start on the first bit. This is fixed by adding a iter_start_field helper which sets the field and also sets up the other bits we need. This fixes decoding of 3DSTATE_SBE_SWIZ. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-25 07:50:36 -05:00
Kenneth Graunke	1281608849	gallium: Split out PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE. Some hardware can do PIPE_TEX_WRAP_MIRROR_REPEAT but not PIPE_TEX_WRAP_MIRROR_CLAMP and PIPE_TEX_WRAP_MIRROR_CLAMP_TO_BORDER. Drivers for such hardware would like to advertise support for ARB_texture_mirror_clamp_to_edge but not EXT_texture_mirror_clamp. This commit adds a new PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE bit, changes the extension enable to be based on that, and enables it in all upstream drivers which supported PIPE_CAP_TEXTURE_MIRROR_CLAMP (so they continue supporting this mode).	2018-08-24 17:25:36 -07:00
Lionel Landwerlin	f430a37fa7	intel: decoder: unify MI_BB_START field naming The batch decoder looks for a field with a particular name to decide whether an MI_BB_START leads into a second batch buffer level. Because the names are different between Gen7.5/8 and the newer generation we fail that test and keep on reading (invalid) instructions. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107544 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-24 23:10:08 +01:00
Dylan Baker	7f745c19c1	docs: Update calendar, news, relnotes for 18.1.7	2018-08-24 09:35:24 -07:00
Dylan Baker	82c2e7bf9e	docs: Add mesa 18.1.7 notes	2018-08-24 09:34:03 -07:00
Dylan Baker	2d8569073e	docs: Add mesa 18.1.7 docs	2018-08-24 09:33:59 -07:00
Andres Gomez	0d3bb146a8	docs: update calendar 18.2.0-rc4 is out, extend to 18.2.0-rc5 Signed-off-by: Andres Gomez <agomez@igalia.com>	2018-08-24 18:58:00 +03:00
Kevin Rogovin	e345247092	docs/relnotes: Mark NV_fragment_shader_interlock support in i965 Acked-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-24 08:59:54 -05:00
Emil Velikov	081395e99d	egl/drm: use gbm_dri_bo() wrapper Remove the explicit cast, using the appropriate wrapper instead. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Daniel Stone <daniels@collabora.com>	2018-08-24 11:53:24 +01:00
Emil Velikov	7b4269a5e0	egl/drm: use gbm_dri_surface() wrapper Remove the explicit cast, using the appropriate wrapper instead. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Daniel Stone <daniels@collabora.com>	2018-08-24 11:53:20 +01:00
Emil Velikov	7eb4a28d41	egl/drm: use gbm_dri_device() wrapper Remove the explicit cast, using the appropriate wrapper instead. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Daniel Stone <daniels@collabora.com>	2018-08-24 11:52:48 +01:00
Emil Velikov	2c049384b1	egl/android: simplify device open/probe Currently droid_probe_device, does not do any 'probing' but filtering out a device if it doesn't match the vendor string given. Rename the function, straighten the return type and call it only as needed - an actual vendor string is provided. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Tomasz Figa <tfiga@chromium.org>	2018-08-24 11:52:44 +01:00
Emil Velikov	2f8403a4ca	egl/android: remove drmVersion::name NULL check The name string is guaranteed to be non-NULL. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Tomasz Figa <tfiga@chromium.org>	2018-08-24 11:52:41 +01:00
Emil Velikov	d1211f3112	egl/android: remove droid_probe_driver() The function name is misleading - it effectively checks if loader_get_driver_for_fd fails. Which can happen only only on strdup error - a close to impossible scenario. Drop the function - we call the loader API at at later stage. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Tomasz Figa <tfiga@chromium.org>	2018-08-24 11:52:39 +01:00
Emil Velikov	9b5bf7afce	egl/android: use strcmp with drmVersion::name The name string is guaranteed to be NULL terminated. Drop the explicit length check that comes with strncmp(). Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Tomasz Figa <tfiga@chromium.org>	2018-08-24 11:52:37 +01:00
Emil Velikov	3827966643	egl/android: use drmDevice instead of the manual /dev/dri iteration Replace the manual handling of /dev/dri in favor of the drmDevice API. The latter provides a consistent way of enumerating the devices, providing device details as needed. v2: - Use ARRAY_SIZE (Frank) - s/famour/favor/ typo (Frank) - Make MAX_DRM_DEVICES a macro - fix vla errors (RobF) - Remove left-over dev_path instance (RobF) Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Robert Foss <robert.foss@collabora.com> (v1) Reviewed-by: Tomasz Figa <tfiga@chromium.org>	2018-08-24 11:50:36 +01:00
Emil Velikov	cff80b6c15	Revert "configure: allow building with python3" This reverts commit `ae7898dfdb`. Turns out the python scripts are _not_ fully python 3 compatible. As Ilia reported using get_xmlpool.py with LANG=C produces some weird output - see the link for details. Even though the issue was spotted with the autoconf build, it exposes a genuine problem with the script (and lack of lang handling of the meson build.) https://lists.freedesktop.org/archives/mesa-dev/2018-August/203508.html	2018-08-24 11:14:15 +01:00
Emil Velikov	7a4d2d1fdf	Revert "travis: use python3 for the autoconf builds" This reverts commit `855af9a5a2`. Turns out the python scripts are _not_ fully python 3 compatible. As Ilia reported using get_xmlpool.py with LANG=C produces some weird output - see the link for details. Even though the issue was spotted with the autoconf build, it exposes a genuine problem with the script (and lack of lang handling of the meson build.) https://lists.freedesktop.org/archives/mesa-dev/2018-August/203508.html	2018-08-24 11:10:24 +01:00
Kenneth Graunke	93e8e17fa4	Revert "mesa: bump GL_MAX_ELEMENTS_INDICES and GL_MAX_ELEMENTS_VERTICES" This reverts commit `095515e16c`. This breaks KHR-GL46.map_buffer_alignment.functional on i965. This code was apparently not reviewed and I don't know why we would move from a driver configurable constant to a hardcoded value for all drivers. This really looks like an accidental hack push.	2018-08-24 00:36:01 -07:00
Kenneth Graunke	9d670fd86c	Revert recent changes about not including compute in combined limits. As far as I can tell, no one reviewed these changes, they made i965 assert fail on driver load, and I am not certain they are correct. (Hopefully reverting these does not break radeonsi too badly...) The uniform related changes seem fine and reasonable, but the texture image units change is possibly incorrect. According to the OES_tessellation_shader spec issue 5: (5) How are aggregate shader limits computed? RESOLVED: Following the GL 4.4 model, but we restrict uniform buffer bindings to 12/stage instead of 14, this results in MAX_UNIFORM_BUFFER_BINDINGS = 72 This is 12 bindings/stage * 6 shader stages, allowing a static partitioning of the bindings even though at most 5 stages can appear in a program object). MAX_COMBINED_UNIFORM_BLOCKS = 60 This is 12 blocks/stage * 5 stages, since compute shaders can't be mixed with other stages. MAX_COMBINED_TEXTURE_IMAGE_UNITS = 96 This is 16 textures/stage * 6 stages. which definitely is including compute shaders in that last limit. Not including compute shaders breaks the following test: dEQP-GLES31.functional.state_query.integer.max_combined_texture_image_units_getinteger There was enough breakage that I figured we should just send this back to the drawing board. Revert "i965: don't include compute resources in "Combined" limits" Revert "st/mesa: don't include compute resources in "Combined" limits" Revert "mesa: don't include compute resources in MAX_COMBINED_* limits" This reverts commit `b03dcb1e5f`. This reverts commit `cff290df4c`. This reverts commit `45f87a48f9`.	2018-08-24 00:36:01 -07:00
Roland Scheidegger	8e1be9a34a	gallivm: don't use saturated unsigned add/sub intrinsics for llvm 8.0 These have been removed. Unfortunately auto-upgrade doesn't work for jit. (Worse, it seems we don't get a compilation error anymore when compiling the shader, rather llvm will just do a call to a null function in the jitted shaders making it difficult to detect when intrinsics vanish.) Luckily the signed ones are still there, I helped convincing llvm removing them is a bad idea for now, since while the unsigned ones have sort of agreed-upon simplest patterns to replace them with, this is not the case for the signed ones, and they require _significantly_ more complex patterns - to the point that the recognition is IMHO probably unlikely to ever work reliably in practice (due to other optimizations interfering). (Even for the relatively trivial unsigned patterns, llvm already added test cases where recognition doesn't work, unsaturated add followed by saturated add may produce atrocious code.) Nevertheless, it seems there's a serious quest to squash all cpu-specific intrinsics going on, so I'd expect patches to nuke them as well to resurface. Adapt the existing fallback code to match the simple patterns llvm uses and hope for the best. I've verified with lp_test_blend that it does produce the expected saturated assembly instructions. Though our cmp/select build helpers don't use boolean masks, but it doesn't seem to interfere with llvm's ability to recognize the pattern. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106231 Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2018-08-24 07:50:13 +02:00
Marek Olšák	45b5f5fa25	st/mesa: expose KHR_texture_compression_astc_sliced_3d This is ASTC 2D LDR allowing texture arrays and 3D, compressing each slice as a separate 2D image. Tested by piglit. Trivial.	2018-08-24 00:36:18 -04:00
Marek Olšák	dae4cf397d	st/mesa: expose EXT_disjoint_timer_query same cap as ARB_timer_query, no changes needed, tested by piglit	2018-08-24 00:36:18 -04:00
Marek Olšák	263c962cfd	mesa: expose EXT_vertex_attrib_64bit because the closed driver exposes it. It's the same as the ARB extension. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-08-24 00:36:18 -04:00
Marek Olšák	5c90091036	mesa: expose AMD_query_buffer_object it's a subset of the ARB extension. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-08-24 00:36:18 -04:00
Marek Olšák	056b9a5a36	mesa: expose AMD_multi_draw_indirect because the closed driver exposes it. This is equivalent to the ARB extension. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-08-24 00:36:18 -04:00
Marek Olšák	b3c17330e6	mesa: expose AMD_gpu_shader_int64 because the closed driver exposes it. It's equivalent to ARB_gpu_shader_int64. In this patch, I did everything the same as we do for ARB_gpu_shader_int64. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-08-24 00:36:18 -04:00
Marek Olšák	1cf3631b9c	mesa: expose ARB_post_depth_coverage in the Compatibility profile It only contains GLSL changes. v2: allow the layout qualifier on GLSL <= 1.30	2018-08-24 00:36:18 -04:00
Jason Ekstrand	8d8222461f	intel/nir: Enable nir_opt_find_array_copies We have to be a bit careful with this one because we want it to run in the optimization loop but only in the first brw_nir_optimize call. Later calls assume that we've lowered away copy_deref instructions and we don't want to introduce any more. Shader-db results on Kaby Lake: total instructions in shared programs: 15176942 -> 15176942 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 In spite of the lack of any shader-db improvement, this patch completely eliminates spilling in the Batman: Arkham City tessellation shaders. This is because we are now able to detect that the temporary array created by DXVK for storing TCS inputs is a copy of the input arrays and use indirect URB reads instead of making a copy of 4.5 KiB of input data and then indirecting on it with if-ladders. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-23 21:47:51 -05:00
Jason Ekstrand	53072582dc	nir: Add an array copy optimization This peephole optimization looks for a series of load/store_deref or copy_deref instructions that copy an array from one variable to another and turns it into a copy_deref that copies the entire array. The pattern it looks for is extremely specific but it's good enough to pick up on the input array copies in DXVK and should also be able to pick up the sequence generated by spirv_to_nir for a OpLoad of a large composite followed by OpStore. It can always be improved later if needed. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-23 21:47:47 -05:00
Jason Ekstrand	a4a9c07549	intel/nir: Use nir_shrink_vec_array_vars Shader-db results on Kaby Lake: total instructions in shared programs: 15177605 -> 15176765 (<.01%) instructions in affected programs: 4259 -> 3419 (-19.72%) helped: 1 HURT: 0 total spills in shared programs: 10954 -> 10855 (-0.90%) spills in affected programs: 295 -> 196 (-33.56%) helped: 1 HURT: 0 total fills in shared programs: 22222 -> 22117 (-0.47%) fills in affected programs: 417 -> 312 (-25.18%) helped: 1 HURT: 0 The helped shader is from the OglCSDof synmark test. On my Kaby Lake laptop, the actual framerate of the benchmark didn't appear to improve beyond the noise. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-23 21:46:56 -05:00
Jason Ekstrand	be8d009908	nir: Add a array-of-vector variable shrinking pass This pass looks for variables with vector or array-of-vector types and narrows the type to only the components used. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-23 21:46:56 -05:00
Jason Ekstrand	02a5442dd7	intel/nir: Use the new structure and array splitting passes We call structure splitting once because it is guaranteed to split all the structures in the entire shader in one go. We call array splitting in the loop in case future optimizations turn indirects into direct dereferences and we can split more arrays. Shader-db results on Kaby Lake: total instructions in shared programs: 15177605 -> 15177605 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 This is unsurprising because nir_lower_vars_to_ssa already effectively does structure and array splitting internally. It doesn't actually split the variables but it's ability to reason about aliasing in the presence of arrays and structures and pick out scalars or vectors to be lowered to SSA values is fairly advanced. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-23 21:44:14 -05:00
Jason Ekstrand	fa6417495c	nir: Add an array splitting pass This pass looks for array variables where at least one level of the array is never indirected and splits it into multiple smaller variables. This pass doesn't really do much now because nir_lower_vars_to_ssa can already see through arrays of arrays and can detect indirects on just one level or even see that arr[i][0][5] does not alias arr[i][1][j]. This pass exists to help other passes more easily see through arrays of arrays. If a back-end does implement arrays using scratch or indirects on registers, having more smaller arrays is likely to have better memory efficiency. v2 (Jason Ekstrand): - Better comments and naming (some from Caio) - Rework to use one hash map instead of two v2.1 (Jason Ekstrand): - Fix a couple of bugs that were added in the rework including one which basically prevented it from running Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-23 21:44:14 -05:00
Jason Ekstrand	26eb077ec4	nir: Add a structure splitting pass This pass doesn't really do much now because nir_lower_vars_to_ssa can already see through structures and considers them to be "split". This pass exists to help other passes more easily see through structure variables. If a back-end does implement arrays using scratch or indirects on registers, having more smaller arrays is likely to have better memory efficiency. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-23 21:44:14 -05:00
Jason Ekstrand	b489998e63	nir/types: Add array_or_matrix helpers Reviewed-by: Thomas Helland<thomashelland90@gmail.com>	2018-08-23 21:44:14 -05:00
Kenneth Graunke	b03dcb1e5f	i965: don't include compute resources in "Combined" limits The combined limits should only include shader stages that can be active at the same time. We don't need to include compute. See also `cff290df4c` for st/mesa. Unbreaks i965 from assert failing on driver load since Marek's `45f87a48f9`, which dropped the core Mesa capabilities before adjusting driver limits down to match.	2018-08-23 17:27:27 -07:00
Marek Olšák	9176703788	radeonsi: increase the maximum UBO size to 2 GB Same as the closed driver. This causes a failure in GL45-CTS.compute_shader.max, which has a trivial bug. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-08-23 16:56:17 -04:00
Marek Olšák	5693ca865d	radeonsi: bump MAX_GS_INVOCATIONS same as the closed driver Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-08-23 16:56:17 -04:00
Marek Olšák	d3c1b212bc	gallium: add PIPE_CAP_MAX_SHADER_BUFFER_SIZE Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-08-23 16:56:17 -04:00
Marek Olšák	f6ccd594e7	gallium: add PIPE_CAP_MAX_GS_INVOCATIONS Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-08-23 16:56:17 -04:00
Marek Olšák	8c71b70f07	tgsi/ureg: don't call tgsi_sanity when it's too slow Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-08-23 16:56:17 -04:00
Marek Olšák	80aecad0ca	st/mesa: fix up uniform limits to be able to expose large UBOs Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-08-23 16:56:17 -04:00
Marek Olšák	cff290df4c	st/mesa: don't include compute resources in "Combined" limits The combined limits should only include shader stages that can be active at the same time. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-08-23 16:56:17 -04:00
Marek Olšák	d36af3a9d9	st/mesa: set ctx->Const.SubPixelBits Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-08-23 16:56:17 -04:00
Marek Olšák	3867af39f9	glsl: fix error checking against MAX_UNIFORM_LOCATIONS Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-08-23 16:56:17 -04:00
Marek Olšák	f01338118c	mesa: make MaxCombinedUniformComponents 64-bit to allow large UBOs Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-08-23 16:56:17 -04:00
Marek Olšák	a8b71f2db8	mesa: add ctx->Const.MaxGeometryShaderInvocations radeonsi wants to report a different value Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-08-23 16:56:17 -04:00
Marek Olšák	45f87a48f9	mesa: don't include compute resources in MAX_COMBINED_* limits 5 is the maximum number of shader stages that can be used by 1 execution call at the same time (e.g. a draw call). The limit ensures that each stage can use all of its binding points. Compute is separate and doesn't need the 5x multiplier. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-08-23 16:56:17 -04:00
Marek Olšák	095515e16c	mesa: bump GL_MAX_ELEMENTS_INDICES and GL_MAX_ELEMENTS_VERTICES same number as our closed GL driver v2: don't use MaxArrayLockSize Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-08-23 16:56:17 -04:00
Marek Olšák	356ff963ec	mesa: remove incorrect change for EXT_disjoint_timer_query Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>	2018-08-23 16:56:17 -04:00
Marek Olšák	37eee90df7	glapi: actually implement GL_EXT_robustness for GLES The extension was exposed but not the functions. This fixes: dEQP-GLES31.functional.debug.negative_coverage.get_error.buffer.readn_pixels dEQP-GLES31.functional.debug.negative_coverage.get_error.state.get_nuniformfv dEQP-GLES31.functional.debug.negative_coverage.get_error.state.get_nuniformiv Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-08-23 16:54:30 -04:00
Kenneth Graunke	578e45ab7b	intel/decoder: Decode SFIXED values. This lets us example SAMPLER_STATE's LOD Bias field, among other things. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-23 13:04:53 -07:00
Emil Velikov	855af9a5a2	travis: use python3 for the autoconf builds Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-23 17:00:28 +01:00
Emil Velikov	ae7898dfdb	configure: allow building with python3 Pretty much all of the scripts are python2+3 compatible. Check and allow using python3, while adjusting the PYTHON2 refs. Note: - python3.4 is used as it's the earliest supported version - python3 chosen prior to python2 Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-23 17:00:13 +01:00
Emil Velikov	c51e7486d9	bin/git_sha1_gen.py: remove execute bit/shebang The script is executed explicitly via the build system, that uses PYTHON/prog_python and equivalent. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-23 17:00:04 +01:00
Eric Engestrom	993a456360	vk/wsi: avoid reading uninitialised memory It will be ignored by x11_swapchain_result() anyway (because reaching the `fail` label without setting `result` means the swapchain status was already a hard error), but the compiler still complains about reading uninitialised memory. While at it, drop the unused assignment right before returning. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-23 14:47:59 +01:00
Eric Engestrom	a0f6a11944	egl: drop unused _EGL_BUILT_IN_DRIVER_DRI2 Unused since `b174a1ae72` "egl: Simplify the "driver" interface". Signed-off-by: Eric Engestrom <eric@engestrom.ch> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-08-23 14:47:59 +01:00
Samuel Pitoiset	87fbc16e34	radv/gfx9: implement coherent shaders for VK_ACCESS_SHADER_READ_BIT Single-sample color and single-sample depth (not stencil) are coherent with shaders. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl	2018-08-23 15:42:56 +02:00
Mathieu Bridon	6027d354d1	bin/install_megadrivers.py: Remove shebang and executable bit Since the script is never executed directly, but launched by Meson as an argument to the Python interpreter, those are not needed any more. In addition, they are the reason this script was missed when I moved the Meson buildsystem to Python 3, so removing them helps avoiding future confusion. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-23 12:12:06 +01:00
Mathieu Bridon	8c8fd0bb8e	meson: Run the install script with Python 3 The script was being run directly as an executable, and it has a Python 2 shebang. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-23 12:12:06 +01:00
Emil Velikov	48820ed8da	glsl: remove execute bit and shebang from python tests Just like the rest of the tree - these should be run either as part of the build system check target, or at the very least with an explicitly versioned python executable. Fixes: `db8cd8e367` ("glcpp/tests: Convert shell scripts to a python script") Fixes: `97c28cb082` ("glsl/tests: Convert optimization-test.sh to pure python") Fixes: `3b52d29227` ("glsl/tests: reimplement warnings-test in python") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-23 12:02:45 +01:00
Emil Velikov	e39b916d0c	docs: update required mako version The requirement was bumped a while back, but we forgot to update the docs. Fixes: `ed871af91c` ("configure.ac: raise Mako required version to 0.8.0") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-23 12:02:45 +01:00
Emil Velikov	e7149369bd	configure: use distutils in ax_check_python_mako_module Handling the version comparison by hand is a bad idea. Python has a handy module distutils for that - use it. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-23 11:59:48 +01:00
Emil Velikov	df2042d99a	configure: enforce python 2.7 with AM_PATH_PYTHON Currently we use AC_CHECK_PROGS looking for python2.7, python2 and finally python. That is due to the varying names used across the different OS. Use the handy AM_PATH_PYTHON which finds the correct name and checks for the version. Note: python2.7 has been an unofficial requirement for quite some time. Update the docs to reflect that. Cc: Dylan Baker <dylan@pnwbakers.com> Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-23 11:55:55 +01:00
Ian Romanick	c7c0b391ef	i965: Enable INTEL_shader_atomic_float_minmax on Gen9+ Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Ian Romanick	59c17dbc6c	i965: Sort Gen9+ extension enables This is a strictly alphabetic sort, as is done in extensions_table.h There are other options. We should pick one and document it. Right now, this file is chaos. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Ian Romanick	d515c75463	intel/compiler: Implement untyped atomic float min, max, and compare-swap dataport messages v2: Split changes to the message type field to another patch. Suggested by Caio. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Ian Romanick	f347348f8a	intel/compiler: Expand untyped atomic message type field by a bit This is necessary for a new Gen9 message type that will be added in the next patch. There are also Gen8 message types that need the extra bit (mostly for bindless). v2: Split off from the next patch. Suggested by Caio. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Ian Romanick	d628642a34	intel/compiler: Silence unused parameter warnings src/intel/compiler/brw_disasm_info.c: In function ‘nir_print_instr’: src/intel/compiler/brw_disasm_info.c:30:61: warning: unused parameter ‘instr’ [-Wunused-parameter] __attribute__((weak)) void nir_print_instr(const nir_instr instr, FILE fp) {} ^~~~~ src/intel/compiler/brw_disasm_info.c:30:74: warning: unused parameter ‘fp’ [-Wunused-parameter] __attribute__((weak)) void nir_print_instr(const nir_instr instr, FILE fp) {} ^~ src/intel/compiler/brw_disasm.c: In function ‘src_ia1’: src/intel/compiler/brw_disasm.c:850:18: warning: unused parameter ‘_reg_file’ [-Wunused-parameter] unsigned _reg_file, ^~~~~~~~~ src/intel/compiler/brw_fs_surface_builder.cpp: In function ‘void brw::surface_access::emit_byte_scattered_write(const brw::fs_builder&, const fs_reg&, const fs_reg&, const fs_reg&, unsigned int, unsigned int, unsigned int, brw_predicate)’: src/intel/compiler/brw_fs_surface_builder.cpp:193:57: warning: unused parameter ‘size’ [-Wunused-parameter] unsigned dims, unsigned size, ^~~~ v2: Update commit message. brw_fs_generator.cpp warnings were already fixed by another patch. Noticed by Caio. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Ian Romanick	0842655ac6	nir: Add floating point atomic min, max, and compare-swap instrinsics Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Ian Romanick	69ce7baa9e	nir: Add floating point atomic add instrinsics Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Ian Romanick	a390158d10	glsl: Add support for lowering shared-variable float atomics Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Ian Romanick	39bf3100ac	glsl: Add support for lowering SSBO float atomics Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Ian Romanick	280ab4afa8	glsl: Add built-in functions for INTEL_shader_atomic_float_minmax Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Ian Romanick	c9d52c83a4	mesa: Extension boilerplate for INTEL_shader_atomic_float_minmax Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Ian Romanick	346321a836	docs: Initial version of INTEL_shader_atomic_float_minmax spec v2: Describe interactions with the capabilities added by SPV_INTEL_shader_atomic_float_minmax v3: Remove 64-bit float support. v4: Explain NaN issues. Explain issues with atomicMin(-0, +0) and atomicMax(-0, +0). v5: Fix whitespace issues noticed by Caio. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Ian Romanick	88b6c7bc14	glsl: Add built-in functions for NV_shader_atomic_float Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Ian Romanick	9527bb4e70	mesa: Extension boilerplate for NV_shader_atomic_float Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 20:31:32 -07:00
Gurchetan Singh	c731508b98	meson: fix egl build for android Haven't tested this, but we do include loader.h in platform_android.c Fixes: `c5ec155685` ("meson: wire up egl/android") Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-22 16:47:19 -07:00
Gurchetan Singh	ec6cb01e21	meson: fix egl build for surfaceless Without this, I get: > platform_surfaceless.c:38:10: fatal error: 'loader.h' file not found > #include "loader.h" > ^~~~~~~~~~ > 1 error generated. Fixes: `108d257a16` ("meson: build libEGL") Reviewed-by: Dylan Baker <dylan@pnwbakers.com> v2: Split up patches, modify commit message (Dylan)	2018-08-22 16:47:09 -07:00
Caio Marcelo de Oliveira Filho	410de0e3f1	nir: Give end_block its own index Since there's no particular reason for the index to be 0, choose an index that is not used by other block. This is convenient when we store "per-block" data in an array AND look for the successors data (e.g. any kind of backwards data-flow analysis). v2: Add a note about end_block's index. (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-22 14:41:26 -07:00
Caio Marcelo de Oliveira Filho	8364ec3fce	nir: Skip common instructions when comparing deref paths Deref paths may share the same deref instructions in their chains, e.g. ssa_100 = deref_var A ssa_101 = deref_struct "array_field" of ssa_100 ssa_102 = deref_array "[1]" of ssa_101 ssa_103 = deref_struct "field_a" of ssa_102 ssa_104 = deref_struct "field_a" of ssa_103 when comparing the two last deref instructions, their paths will share a common sequence ssa_100, ssa_101, ssa_102. This patch skips to next iteration if the deref instructions are the same. Path[0] (the var) is still handled specially, so in the case above, only ssa_101 and ssa_102 will be skipped. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-22 14:41:26 -07:00
Caio Marcelo de Oliveira Filho	5196041e93	nir: Export deref comparison functions Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-22 14:41:26 -07:00
Caio Marcelo de Oliveira Filho	7f8ecedced	util/dynarray: add a clone function v2: Fix mem_ctx parameter type. (Thomas) Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-22 14:41:26 -07:00
Mariusz Ceier	61b84b8c14	amd/addrlib: Fix include path for c99_compat.h Without this patch mesa doesn't compile: In file included from ../mesa-9999/src/amd/addrlib/addrinterface.cpp:39: ../mesa-9999/src/util/macros.h:29:10: fatal error: c99_compat.h: No such file or directory #include "c99_compat.h" ^~~~~~~~~~~~~~ compilation terminated. Fixes: `15ca5ce99a` ("amd/addrlib: mark returnCode as MAYBE_UNUSED in") Signed-off-by: Mariusz Ceier <mceier+mesa-dev@gmail.com> Acked-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-22 14:39:02 -07:00
Grazvydas Ignotas	0076ea92a9	vulkan/wsi: fix pointer-integer conversion warnings For 32bit build. Trivial. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-08-23 00:34:32 +03:00
Grazvydas Ignotas	9177074524	radv: use different builtin shader cache for 32bit Currently if 64bit and 32bit programs are used interchangeably, radv will keep overwriting the cache. Use separate cache files to avoid that. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-08-23 00:34:32 +03:00
Grazvydas Ignotas	356f6673d6	radv: place pointer length into cache uuid Thanks to reproducible builds, binary file timestamps may be identical for both 32bit and 64bit packages when built from the same source. This means radv will use the same cache for both 32 and 64 bit processes, which leads to crashes. Conveniently there is a spare byte in cache_uuid, let's place the pointer size there. Fixes: `f4e499ec79` "radv: add initial non-conformant radv vulkan driver" CC: 18.1 18.2 <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107601 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105904 Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-08-23 00:34:32 +03:00
Grazvydas Ignotas	2edf47edf0	llvmpipe: add cc clobber to inline asm The bsr instruction modifies flags, so that needs to be indicated to the compiler. No effect on generated code, but still needed for correctness. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-08-23 00:34:32 +03:00
Nanley Chery	6d80b0b4ba	intel/isl: Avoid tiling some 16K-wide render targets Fix rendering issues on BDW and SKL. Fixes: `0288fe8d04` ("i965/miptree: Use the correct BLT pitch") Fixes the following regressions seen exclusively on SKL: * KHR-GL46.texture_barrier_ARB.disjoint-texels * KHR-GL46.texture_barrier_ARB.overlapping-texels * KHR-GL46.texture_barrier.disjoint-texels * KHR-GL46.texture_barrier.overlapping-texels and both on BDW and SKL: * GTF-GL46.gtf21.GL2FixedTests.buffer_corners.buffer_corners * GTF-GL46.gtf21.GL2FixedTests.stencil_plane_corners.stencil_plane_corners v2: Note the fixed tests (Andres). Don't cause failures with multisampled buffers (Andres). Don't hamper SKL GT4 (Ken). v3: Fix the Fixes tag (Dylan). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107359 Cc: <mesa-stable@lists.freedesktop.org> Tested-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-22 13:53:19 -07:00
Nanley Chery	b041fc0649	i965/miptree: Fix can_blit_slice() Check the destination's row pitch against the BLT engine's row pitch limitation as well. Fixes: `0288fe8d04` ("i965/miptree: Use the correct BLT pitch") v2: Fix the Fixes tag (Dylan). Check the destination row pitch (Chris). Reported-by: Dylan Baker <dylan@pnwbakers.com> Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-22 13:53:02 -07:00
Nanley Chery	030b6efcfd	i965/miptree: Use miptree_map in map_blit functions This struct contains all the data of interest. can_blit_slice() will use it in the next patch to calculate the correct pitch. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-22 13:23:17 -07:00
Rafael Antognolli	f8cfc77660	intel/tools/aubwrite: Always use physical addresses for traces. It looks like we can't rely on the simulator to always translate virtual addresses to physical ones correctly. So let's use physical everywhere. Since our current GGTT maps virtual to physical addresses in a 1:1 way, no further changes are required. Additionally, we have other address spaces not in use right now. So let's make it easier to switch which one we are using but putting the default one into the aub_file struct. Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-22 12:52:41 -07:00
Rafael Antognolli	e82d8fa964	intel/tools/aubwrite: Rename "legacy" to "Trace Block". Hopefully it's a little more descriptive, and more accurate. Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-22 12:52:41 -07:00
Jason Ekstrand	68ae66542a	nir/vars_to_ssa: Don't build deref nodes for non-local variables Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2018-08-22 14:17:38 -05:00
Marek Olšák	e80e8d7adc	ac: fix WAITCNT flags for GFX9 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-22 14:34:43 -04:00
Kai Wasserbäch	c836a751bc	amd/addrlib: mark physicalSliceSize as MAYBE_UNUSED in Addr::V1::EgBasedLib::HwlGetSizeAdjustmentMicroTiled Only used, when asserts are enabled. Fixes an unused-but-set-variable warning with GCC 8: ../../../src/amd/addrlib/r800/egbaddrlib.cpp: In member function 'virtual long long unsigned int Addr::V1::EgBasedLib::HwlGetSizeAdjustmentMicroTiled(unsigned int, unsigned int, ADDR_SURFACE_FLAGS, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int) const': ../../../src/amd/addrlib/r800/egbaddrlib.cpp:4111:13: warning: variable 'physicalSliceSize' set but not used [-Wunused-but-set-variable] UINT_64 physicalSliceSize; ^~~~~~~~~~~~~~~~~ Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-22 14:33:21 -04:00
Kai Wasserbäch	2e0586e379	amd/addrlib: mark numPipes as MAYBE_UNUSED in Addr::V1::EgBasedLib::SanityCheckMacroTiled (v2) Only used, when asserts are enabled. Fixes an unused-variable warning with GCC 8: ../../../src/amd/addrlib/r800/egbaddrlib.cpp: In member function 'int Addr::V1::EgBasedLib::SanityCheckMacroTiled(ADDR_TILEINFO*) const': ../../../src/amd/addrlib/r800/egbaddrlib.cpp:982:13: warning: unused variable 'numPipes' [-Wunused-variable] UINT_32 numPipes = HwlGetPipes(pTileInfo); ^~~~~~~~ v2: Don't realign other variable definitions, to keep in line with file style (Marek) Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-22 14:33:21 -04:00
Kai Wasserbäch	6a7ef7c7dc	amd/addrlib: mark pEqToCheck as MAYBE_UNUSED in Addr::V2::Gfx9Lib::ComputeStereoInfo (v2) Only used, when asserts are enabled. Fixes an unused-variable warning with GCC 8: ../../../src/amd/addrlib/gfx9/gfx9addrlib.cpp: In member function 'ADDR_E_RETURNCODE Addr::V2::Gfx9Lib::ComputeStereoInfo(const ADDR2_COMPUTE_SURFACE_INFO_INPUT, ADDR2_COMPUTE_SURFACE_INFO_OUTPUT, unsigned int) const': ../../../src/amd/addrlib/gfx9/gfx9addrlib.cpp:3879:34: warning: unused variable 'pEqToCheck' [-Wunused-variable] const ADDR_EQUATION *pEqToCheck = &m_equationTable[eqIndex]; ^~~~~~~~~~ v2: Don't realign other variable definitions, to keep in line with file style (Marek) Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-22 14:33:21 -04:00
Kai Wasserbäch	556f89a715	amd/addrlib: mark microBlockDim as MAYBE_UNUSED in Addr::V2::Gfx9Lib::HwlComputeBlock256Equation Only used, when asserts are enabled. Fixes an unused-but-set-variable warning with GCC 8: ../../../src/amd/addrlib/gfx9/gfx9addrlib.cpp: In member function 'virtual ADDR_E_RETURNCODE Addr::V2::Gfx9Lib::HwlComputeBlock256Equation(AddrResourceType, AddrSwizzleMode, unsigned int, ADDR_EQUATION*) const': ../../../src/amd/addrlib/gfx9/gfx9addrlib.cpp:2473:15: warning: variable 'microBlockDim' set but not used [-Wunused-but-set-variable] Dim2d microBlockDim = Block256_2d[elementBytesLog2]; ^~~~~~~~~~~~~ Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-22 14:33:21 -04:00
Kai Wasserbäch	15ca5ce99a	amd/addrlib: mark returnCode as MAYBE_UNUSED in ElemGetExportNorm Only used, when asserts are enabled. Fixes an unused-but-set-variable warning with GCC 8: ../../../src/amd/addrlib/addrinterface.cpp: In function 'int ElemGetExportNorm(ADDR_HANDLE, const ELEM_GETEXPORTNORM_INPUT*)': ../../../src/amd/addrlib/addrinterface.cpp:835:23: warning: variable 'returnCode' set but not used [-Wunused-but-set-variable] ADDR_E_RETURNCODE returnCode = ADDR_OK; ^~~~~~~~~~ Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-22 14:33:21 -04:00
Lionel Landwerlin	8b0e48887f	intel: aubinator_viewer: add urb view This is available through a "Show URB" button on the 3DPRIMITIVE instructions. v2: Fix urb allocation end value in tooltip (Rafael) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-08-22 18:02:11 +01:00
Lionel Landwerlin	d1c4a62bf8	intel: aubinator_viewer: store urb state during decoding Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-08-22 18:02:11 +01:00
Lionel Landwerlin	38f10d5a03	intel: tools: add aubinator viewer A graphical user interface version of aubinator. Allows you to : - simultaneously look at multiple points in the aub file (using all the goodness of the existing decoding in aubinator) - edit an aub file v2: Switch from GLFW to GTK+3 v3: Fix warning when exiting Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Rafael Antognolli <rafael.antognolli@intel.com> (v1)	2018-08-22 18:02:11 +01:00
Lionel Landwerlin	ea83a1d304	intel: tools: import ImGui We want to add a new UI tool to decode aub files. This will use the Dear ImGui library to render its interface. The build of this UI toolkit is conditional to -Dwith_tools=intel-ui which superseeds -Dwith_tools=intel. The main way to use ImGui is to embed its source code at a particular revision. Most embedding projects have to do a bit of integration which is really specific to one's project. In our case the only modification is to include libepoxy. We also choose to use Gtk+3 for the window system integration. As oppose to the previous previous version of this patch using GLFW, Gtk+ is able to handle X11/Wayland session as well as property DPI scaling on retina monitors. The import was done at this commit (https://github.com/ocornut/imgui) : commit 6211f40f3d903dd9df961256e044029c49793aa3 Author: omar <omarcornut@gmail.com> Date: Fri Jul 27 12:29:33 2018 +0200 Internals: Drag and Drop: default drop preview use a narrower clipping rectangle (no effect here, but other branches uses a narrow clipping rectangle that was too small so this is a fix for it) + Comments v2: Switch from GLFW to GTK+ (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-08-22 18:02:11 +01:00
Lionel Landwerlin	4ba12e8c54	intel: tools: aub_mem: reuse already mapped ppgtt buffers When we map a PPGTT buffer into a continous address space of aubinator to be able to inspect it, we currently add it to the list of BOs to unmap once we're finished. An optimization we can apply it to look up that list before trying to remap PPGTT buffers again (we already do this for GGTT buffers). We need to take some care before doing this because the list also contains GGTT BOs. As GGTT & PPGTT are 2 different address spaces, we can have matching addresses in both that point to different physical locations. This changes adds a flag on the elements of the list of mapped BOs to differenciate between GGTT & PPGTT, which allows use to reuse that list when looking up both address spaces. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-08-22 18:02:11 +01:00
Lionel Landwerlin	8fd78b4eea	intel: tools: aubmem: map gtt data to aub file This will allow the aubinator viewer tool to modify the aub data that was loaded at a particular gtt address. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-08-22 18:02:11 +01:00
Lionel Landwerlin	ebb145ee12	intel: tools: create libaub Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-22 18:02:11 +01:00
Lionel Landwerlin	475d670ef7	intel: tools: aubwrite: wrap function declarations for c++ Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-08-22 18:02:11 +01:00
Lionel Landwerlin	ed21007a6a	intel: tools: split memory management out of aubinator Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-08-22 18:02:11 +01:00
Lionel Landwerlin	14a1cb37eb	util: rb_tree: add safe iterators v2: Add helper to make iterators more readable (Rafael) Fix rev iterator bug (Rafael) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-08-22 17:49:36 +01:00
Lionel Landwerlin	4616639b49	intel: tools: split aub parsing from aubinator v2: add parsing error callback (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> (v1)	2018-08-22 17:49:36 +01:00
Mathieu Bridon	e15686567c	meson: Run the test with Python 3 This is a patch from me and a patch from Mathieu Bridon squashed together. Signed-off-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Mathieu Bridon <bochecha@daitauha.fr>	2018-08-22 08:41:01 -07:00
Mathieu Bridon	ff0ce31e2a	python: Disable universal newlines We are testing the behaviour of a tool, for different input files, each one using a different newline sequence. ('\n' on UNIX, '\r\n' on Windows, …) Unfortunately, when opening a file in text mode, Python 3 will by default enable the "universal newlines" mode, which means it replaces all the known newline sequences by '\n'. This (usually useful) behaviour breaks the tests, which are specifically trying to handle files with newline sequences different from '\n'. Disabling the universal newlines mode fixes the tests. However, to keep the script compatible with both Python 2 and 3, we must use the io.open() function instead of the open() builtin, as the latter only knows about the `newline` argument on Python 3. Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-22 08:41:01 -07:00
Mathieu Bridon	fc708069f7	python: difflib prefers unicode strings Python 3 does not automatically convert from bytes to unicode strings like Python 2 used to do. This commit makes sure we pass unicode strings to difflib.unified_diff, so that the script works on both Python 2 and 3. Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-22 08:41:01 -07:00
Dylan Baker	477d4b9960	compiler/glsl/tests: Make tests python3 safe v2: - explicitly decode the output of subprocesses - handle bytes and string types consistently rather than relying on python 2's coercion for bytes and ignoring them in python 3 v3: - explicitly set encode as well as decode - python 2.7 and 3.x `bytes` instead of defining an alias Reviewed-by: Mathieu Bridon <bochecha@daitauha.fr>	2018-08-22 08:41:01 -07:00
Juan A. Suarez Romero	6ea5718318	travis: SWR requires LLVM 6.0 v2: update clarification why ubuntu-toolchain-r-test is required (Emil) Fixes: `0cef0cccf5` ("swr: bump minimum supported LLVM version to 6.0") Cc: Dylan Baker <dylan@pnwbakers.com> Cc: Eric Engestrom <eric.engestrom@intel.com> Acked-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-22 17:29:20 +02:00
Samuel Pitoiset	4c43ec461d	ac/nir: fix getting GLSL type of array of samplers for TG4 This fixes a crash in build_tex_intrinsic() when trying to launch the Basemark GPU benchmark on GFX8. It looks like there is still something wrong because some frames are black. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106980 CC: 18.2 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-08-22 15:23:11 +02:00
Samuel Pitoiset	24ee53231d	radv: remove dead variables after splitting per member structs Otherwise, nir_lower_clip_cull_distance_arrays might report wrong number of output clips/culls because it relies on shader output variables and some of them might be dead. This fixes a rendering issue with Dolphin and Super Mario Sunshine. Fixes: `b0c643d8f5` ("spirv: Use NIR per-member splitting") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107610 CC: 18.2 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-08-22 13:57:18 +02:00
Yunchao He	bea4d4c78c	anv: add VK_EXT_sampler_filter_minmax support This extension can be supported on SKL+. With this patch, all corresponding tests (6K+) in CTS can pass. No test fails. I verified CTS with the command below: deqp-vk --deqp-case=dEQP-VK.pipeline.sampler.view_type.reduce v2: 1) support all depth formats, not depth-only formats, 2) fix a wrong indention (Jason). v3: fix a few nits (Lionel). v4: fix failures in CI: disable sampler reduction when sampler reduction mode is not specified via this extension (Lionel). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-22 11:56:19 +01:00
Samuel Pitoiset	0608349232	radv: use ac_build_imad() Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-08-22 09:17:40 +02:00
Marek Olšák	d87fe1f0fd	ac,radeonsi: use ac_build_gather_values more Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-21 20:50:37 -04:00
Marek Olšák	60beac9efc	ac,radeonsi: use ac_build_fmad Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-21 20:50:37 -04:00
Marek Olšák	c401ead68a	radeonsi: use ac_build_imad Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-21 20:50:37 -04:00
Marek Olšák	659f2e0fcb	ac: add imad & fmad helpers Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-21 20:50:37 -04:00
Marek Olšák	2276f8f064	ac: add ac_build_s_barrier Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-21 20:50:37 -04:00
Marek Olšák	6224144b6d	radeonsi: print the shader stage name when printing LLVM IR Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-21 20:50:37 -04:00
Marek Olšák	5d20b9be90	radeonsi: use is_merged shader in si_prolog_get_rw_buffers needed to change the input type to si_shader_context Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-21 20:50:37 -04:00
Marek Olšák	a4a104fc81	ac: completely remove +auto-waitcnt-before-barrier it causes corruption on several different GPU generations. Cc: 18.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-21 20:50:37 -04:00
Anuj Phogat	2383ddace1	anv/icl: Allow headerless sampler messages for pre-emptable contexts It fixes simulator warnings in vulkancts tests complaining about missing support for headerless sampler messages for pre-emptable contexts. Bit 5 in SAMPLER MODE register is newly introduced for ICLLP. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-21 12:50:05 -07:00
Anuj Phogat	81b74b5d96	anv/icl: Disable binding table prefetching Gen 11 workarounds table #2056 WABTPPrefetchDisable suggests to disable prefetching of binding tables for ICLLP A0 and B0 steppings. We have a similar patch for i965 driver in Mesa commit `a5889d70`. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-21 12:50:05 -07:00
Anuj Phogat	482f328f3b	i965/icl: Allow headerless sampler messages for pre-emptable contexts It fixes simulator warnings in piglit tests complaining about missing support for headerless sampler messages for pre-emptable contexts. Bit 5 in SAMPLER MODE register is newly introduced for ICLLP. Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-21 12:50:05 -07:00
Dave Airlie	32529e6084	r600/eg: rework atomic counter emission with flushes With the current code, we didn't do the space checks prior to atomic counter setup emission, but we also didn't add atomic counters to the space check so we could get a flush later as well. These flushes would be bad, and lead to problems with parallel tests. We have to ensure the atomic counter copy in, draw emits and counter copy out are kept in the same command submission unit. This reworks the code to drop some useless masks, make the counting separate to the emits, and make the space checker handle atomic counter space. [airlied: want this in 18.2] Fixes: `06993e4ee` (r600: add support for hw atomic counters. (v3))	2018-08-21 20:45:38 +01:00
Dave Airlie	41d58e2098	virgl: ARB_enhanced_layouts support We need to handle the gaps in the streamout bindings on the guest side and enable if it the host has the rest enabled. Reviewed-by: Jakob Bornecrantz <jakob@collabora.com>	2018-08-22 05:05:21 +10:00
Chad Versace	aa79cc2bc8	i965: Implement EGL_KHR_mutable_render_buffer Testing: - Manually tested a low-latency handwriting demo that toggles EGL_RENDER_BUFFER. Toggling changed the display latency as expected. Used Android on Chrome OS, Kabylake GT2. - No change in dEQP-EGL.functional.* on Fedora 27, Wayland, Skylake GT2. Used deqp at tag android-p-preview-5. - No regressions in dEQP-EGL.functional., ran on Android on Chrome OS, Kabylake GT2. Some dEQP-EGL.functional.mutable_render_buffer. test change from NotSupported to Pass. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-08-21 09:56:20 -07:00
Chad Versace	ed7c694688	egl/android: Implement EGL_KHR_mutable_render_buffer Specifically, implement the extension DRI_MutableRenderBufferLoader. However, the loader enables EGL_KHR_mutable_render_buffer only if the DRI driver implements its half of the extension, DRI_MutableRenderBufferDriver. Testing: - No change in dEQP-EGL.functional.* on Fedora 27, Wayland, Skylake GT2. Used deqp at tag android-p-preview-5. - No change in dEQP-EGL.functional.*, ran on Android on Chrome OS, Kabylake GT2. - Manually inspected Android apps on same Chrome OS device. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-08-21 09:56:20 -07:00
Eric Engestrom	317c460a4d	util/xmlpool: make indentation coherent Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-21 17:36:13 +01:00
Eric Engestrom	2de9e841e7	egl: add helper to combine two u32 into one u64 Use a helper to avoid the common issues of upcasting after the right shift (losing the upper bits) and shifting signed values (sign gets shifted too). Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Daniel Stone <daniels@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-21 15:50:02 +01:00
Eric Engestrom	1ca23420c1	docs: trivial s/>/>/ html fix Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-21 15:41:41 +01:00
Eric Engestrom	6ff1c47996	autotools: don't ship the git_sha1.h generated in git in the tarballs This file is regenerated at build time anyway, so this would just get overwritten anyway. No reason to ship it in the tarball. Fixes: `44df06211c` "autotools: include git_sha1.h in dist tarball" Fixes: `471f708ed6` "git_sha1: simplify logic" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-21 15:30:56 +01:00
Eric Engestrom	81fe9bdf6d	intel/genxml: minor python style fix Suggested-by: Dylan Baker <dylan@pnwbakers.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-21 15:30:55 +01:00
Jose Fonseca	9e5e3a8ead	appveyor: Set git core.autocrlf setting to true. The git core.autocrlf setting defaults to true (ie, all text files get checked out as CRLF on Windows), except on Appveyor where's set to "input" (ie, all text files get checked out with the upstream repository's line endings, which for us typically means LF.) And this was masking on Appveyor a regression in gen_xmlpool.py processing t_options.h with CRLF line endings. This change makes core.autocrlf to be true, which would have enabled to immediately catch the issue, as seen in https://ci.appveyor.com/project/jrfonseca/mesa/build/51 Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-08-21 09:46:19 +01:00
Timothy Arceri	797cd198ae	mesa: move legacy hyperz option from dri config Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-21 09:19:02 +10:00
Timothy Arceri	02062ab1e1	mesa: remove unused dri config option disable_shader_bit_encoding This was added as a workaround for Heaven 3.0 but was later removed by `5ead448719` to allow Heaven 4.0 to work correctly. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-21 09:19:02 +10:00
Timothy Arceri	c5f863f2fd	mesa: drop legacy no_rast dri option Add enviroment var overrides to legacy drivers instead. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-21 09:19:01 +10:00
Timothy Arceri	02e32c92a2	i965: remove unused no_rast bool Forcing software fallbacks for i965 hasn't been an option since `5e3c093ff8`. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-21 09:19:01 +10:00
Timothy Arceri	7867c1078a	i915: remove early_z dri option This driver is in maintenance mode so lets remove this hidden unsafe option. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-21 09:19:01 +10:00
Kevin Rogovin	7ec308d978	Add NV_fragment_shader_interlock support. The main purpose for having NV_fragment_shader_interlock extension is because that extension is also for GLES31 while the ARB extension is for GL only. Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>	2018-08-20 13:32:43 -07:00
Juan A. Suarez Romero	44df06211c	autotools: include git_sha1.h in dist tarball This fixes `make distcheck`. Fixes: `471f708ed6` ("git_sha1: simplify logic") CC: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-20 18:43:50 +02:00
Juan A. Suarez Romero	0cef0cccf5	swr: bump minimum supported LLVM version to 6.0 RADV now requires LLVM 6.0 or greater, and thus we can't build dist tarball because swr requires LLVM 5.0. Let's bump required LLVM to 6.0 in swr too. v2: bump also in meson.build (Eric) Fixes: `fd1121e839` ("amd: remove support for LLVM 5.0") Cc: Tim Rowley <timothy.o.rowley@intel.com> Cc: Emil Velikov <emil.velikov@collabora.com> Cc: Dylan Baker <dylan@pnwbakers.com> Cc: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2018-08-20 16:13:37 +02:00
Danylo Piliaiev	25ec806eb2	i965: Advertise 8 bits subpixel precision for viewport bounds on gen6+ We use floating-points for viewport bounds so VIEWPORT_SUBPIXEL_BITS should reflect this. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105975 Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-20 15:11:57 +01:00
Rob Clark	e11e9d6394	freedreno: fix context teardown race We could still have batches queued up to flush, so fd_context_destroy() (which will kill and sync on the flush_queue) before deleting buffers that might be referenced from fdN_gmem() from context of flush_queue. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-08-20 10:03:05 -04:00
Kai Wasserbäch	5fab32ddad	intel/decoder: mark total_length as MAYBE_UNUSED in gen_spec_load Only used, when asserts are enabled. Fixes an unused-variable warning with GCC 8: ../../../src/intel/common/gen_decoder.c: In function 'gen_spec_load': ../../../src/intel/common/gen_decoder.c:535:47: warning: variable 'total_length' set but not used [-Wunused-but-set-variable] uint32_t text_offset = 0, text_length = 0, total_length; ^~~~~~~~~~~~ Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-20 11:08:52 +01:00
Kai Wasserbäch	4228e052b3	intel/tools: initialise bo_addr to 0 in main Supresses a maybe-uninitialized warning with GCC 8. Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-20 11:08:52 +01:00
Kai Wasserbäch	ccdefbb559	intel: aubinator: mark ftruncate_res as MAYBE_UNUSED in ensure_phys_mem Only used, when asserts are enabled. Fixes an unused-variable warning with GCC 8: ../../../src/intel/tools/aubinator.c: In function 'ensure_phys_mem': ../../../src/intel/tools/aubinator.c:209:11: warning: unused variable 'ftruncate_res' [-Wunused-variable] int ftruncate_res = ftruncate(mem_fd, mem_fd_len += 4096); ^~~~~~~~~~~~~ Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-20 11:08:52 +01:00
Kai Wasserbäch	64c2bca59f	intel/aubinator_error_decode: mark ret as MAYBE_UNUSED in main Only used, when asserts are enabled. Fixes an unused-but-set-variable warning with GCC 8: ../../../src/intel/tools/aubinator_error_decode.c: In function 'main': ../../../src/intel/tools/aubinator_error_decode.c:759:11: warning: variable 'ret' set but not used [-Wunused-but-set-variable] int ret; ^~~ Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-20 11:08:52 +01:00
Samuel Pitoiset	0aacb5eab6	radv: do not use CP predication for DCC decompressions This fixes a regression with some Unity demos. Not sure what the root cause of the problem is, especially because the driver doesn't perform any fast color clears. So, it shouldn't be needed to decompress DCC. RadeonSI says that the decompression is relatively cheap if the surface has been decompressed already. One possible improvement is to two use predicates, one for DCC and one for FCE that could be cleared when DCC, FMASK or CMASK are performed by the driver. That might skip some unnecessary decompression passes (not DCC though). Fixes: `ff7daadca1` ("radv: enable/disable predication for the DCC decompression pass") CC: 18.2 <mesa-stable@lists.freedesktop.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107563 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-08-20 11:54:37 +02:00
Tapani Pälli	799b3d16d4	egl: implement EXT_surface_SMPTE2086_metadata and EXT_surface_CTA861_3_metadata Patch implements common bits for EXT_surface_SMPTE2086_metadata and EXT_surface_CTA861_3_metadata extensions by adding new required attributes and eglQuerySurface + eglSurfaceAttrib changes. Currently none of the drivers are utilizing this data but this patch is enabler in getting there. v2: don't enable extension globally, should be only enabled by EGL drivers that can transfer metadata to the window system (Jason) use EGLint instead of uint16_t (Eric) Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-20 09:44:53 +03:00
Timothy Arceri	5a0684d665	mesa: move legacy dri config option texture_units Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-08-20 13:53:59 +10:00
Timothy Arceri	8b4157d578	mesa: remove unused dri config option texture_heaps This seems to have only been used by DRI1 drivers which were removed with `e4344161bd`. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-08-20 13:53:59 +10:00
Timothy Arceri	fb277f504e	mesa: move legacy dri config option texture_blend_quality Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-08-20 13:53:59 +10:00
Timothy Arceri	c470db706a	util: remove unused S3TC translation for dri config Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-08-20 13:53:59 +10:00
Timothy Arceri	7d2474afb5	mesa: remove dri configs unused software-fallback options These seems to have only been used by DRI1 drivers which were removed with `e4344161bd`. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-08-20 13:53:58 +10:00
Timothy Arceri	24da2d162d	mesa: remove unused dri config option excess_mipmap This seems to have only been used by DRI1 drivers which were removed with `e4344161bd`. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-08-20 13:53:58 +10:00
Timothy Arceri	498831c7e6	mesa: remove unused dri config option performance_boxes This seems to have only been used by DRI1 drivers which were removed with `e4344161bd`. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-08-20 13:53:58 +10:00
Timothy Arceri	4a91d4ef0f	docs: update the default mesa shader cache dir We renamed the dir in commit `28b326238b`, this just updates the website to reflect the change.	2018-08-20 08:08:58 +10:00
Kai Wasserbäch	2c020dbf06	vulkan/wsi: initialise image_index to 0 in x11_manage_fifo_queues Supresses a maybe-uninitialized warning with GCC 8. Note: image_index should always be initialised due to the result check, but the compiler doesn't see that. Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-08-18 10:34:19 +10:00
Kai Wasserbäch	6f0647c0b2	nir: mark prev_block as MAYBE_UNUSED in opt_peel_loop_initial_if Only used, when asserts are enabled. Fixes an unused-variable warning with gcc-8: ../../../src/compiler/nir/nir_opt_if.c: In function 'opt_peel_loop_initial_if': ../../../src/compiler/nir/nir_opt_if.c:109:15: warning: unused variable 'prev_block' [-Wunused-variable] nir_block prev_block = ^~~~~~~~~~ Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-18 10:34:15 +10:00
Kai Wasserbäch	9387ca29ae	util: mark s as MAYBE_UNUSED in _mesa_half_to_unorm8 Only used, when asserts are enabled. Fixes an unused-variable warning with gcc-8: ../../../src/util/half_float.c: In function '_mesa_half_to_unorm8': ../../../src/util/half_float.c:189:14: warning: unused variable 's' [-Wunused-variable] const int s = (val >> 15) & 0x1; ^ Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-18 10:34:12 +10:00
Timothy Arceri	0da93de9c8	util: add drirc workarounds for RAGE This allows the game to run on wine (tested on radeonsi where we have compat profile support).	2018-08-18 09:26:51 +10:00
Timothy Arceri	3f9d8e9c88	util: better handle program names from wine For some reason wine will sometimes give us a windows style path for an application. For example when running the 64bit version of Rage wine gives a Unix style path, but when running the 32bit version is gives a windows style path. If we detect no '/' in the path at all it should be safe to assume we have a wine application and instead look for a '\'. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-18 09:20:39 +10:00
Timothy Arceri	d0803dea11	nir: allow more nested loops to be unrolled The innermost check was added to stop us from unrolling multiple loops in a single pass, and to stop outer loops from unrolling. When we successfully unroll a loop we need to run the analysis pass again before deciding if we want to go ahead an unroll a second loop. However the logic was flawed because it never tried to unroll any nested loops other than the first innermost loop it found. If this innermost loop is not unrolled we end up skipping all other nested loops. This unrolls a loop in a Deus Ex: MD shader on ultra settings and also unrolls a loop in a shader from the game Prey when running on DXVK. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-18 09:03:13 +10:00
Ray Strode	9baff597ce	gallium/winsys/kms: don't unmap what wasn't mapped At the moment, depending on pipe transfer flags, the dumb buffer map address can end up at either kms_sw_dt->ro_mapped or kms_sw_dt->mapped. When it's time to unmap the dumb buffer, both locations get unmapped, even though one is probably initialized to 0. That leads to the code segment getting unmapped at runtime and crashes when trying to call into unrelated code. This commit addresses the problem by using MAP_FAILED instead of NULL for ro_mapped and mapped when the dumb buffer is unmapped, and only unmapping mapped addresses at unmap time. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107098 Signed-off-by: Ray Strode <rstrode@redhat.com> Fixes: `d891f28df9` ("gallium/winsys/kms: Fix possible leak in map/unmap.") Cc: Lepton Wu <lepton@chromium.org> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-17 17:16:32 +01:00
Qiang Yu	0aa80abf25	loader: add dri_driver option to override dri driver to load drirc implementation of MESA_LOADER_DRIVER_OVERRIDE which can be used to override dri driver to load. Usage: override dri driver for device with spec kernel driver name: <device kernel_driver="kernel_driver_name"> <option name="dri_driver" value="new_dri_driver" /> </device> or <device driver="loader" kernel_driver="kernel_driver_name"> <option name="dri_driver" value="new_dri_driver" /> </device> v2: add kernel_driver device attribute to specify kernel driver name instead of reuse driver attribute v3: seperate loader_get_kernel_driver_name into another patch seperate add kernel_driver attribute into another patch Suggested-by: Michel Dänzer <michel@daenzer.net> Signed-off-by: Qiang Yu <Qiang.Yu@amd.com> Acked-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> [v4 Emil: add HAVE_LIBDRM guard around __driConfigOptionsLoader and loader_get_dri_config_driver] Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-17 17:16:32 +01:00
Qiang Yu	3bbe180b98	xmlconfig: add kernel_driver device attribute This attribute can be used by loader to apply different option to device use specific kernel driver. Signed-off-by: Qiang Yu <Qiang.Yu@amd.com> Acked-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-17 17:16:32 +01:00
Qiang Yu	e8b91e99e9	loader: abstract loader_get_kernel_driver_name for reuse This function can be shared by the following kernel_driver drirc patch. Signed-off-by: Qiang Yu <Qiang.Yu@amd.com> Acked-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-17 17:16:32 +01:00
Qiang Yu	30b10dbb7c	driconf: move ${sysconfdir}/drirc to ${datadir}/drirc.d/00-mesa-defaults.conf ${sysconfdir} is for store admin config files, so move this mesa default config file to ${datadir}/drirc.d. Signed-off-by: Qiang Yu <Qiang.Yu@amd.com> Acked-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-17 17:16:32 +01:00
Qiang Yu	04bdbbcab3	xmlconfig: read more config files from drirc.d/ Driver and application can put their drirc files in ${datadir}/drirc.d/ with name xxx.conf. Config files will be read and applied in file name alphabetic order. So there are three places for drirc listed in order: 1. /usr/share/drirc.d/ 2. /etc/drirc 3. ~/.drirc v4: fix meson build v3: 1. seperate driParseConfigFiles refine into another patch 2. fix entries[i] mem leak v2: drop /etc/drirc.d Signed-off-by: Qiang Yu <Qiang.Yu@amd.com> Acked-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-17 17:16:32 +01:00
Emil Velikov	0da417129e	xmlconfig: refine driParseConfigFiles to use parseOneConfigFile Also prepare for the usage of following parseConfigDir patch. Signed-off-by: Qiang Yu <Qiang.Yu@amd.com> Acked-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> [Emil: add #include <limits.h>] Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-17 17:16:32 +01:00
Jason Ekstrand	d9ea015ced	anv/pipeline: Lower pipeline layouts etc. after linking This allows us to use the link-optimized shader for determining binding table layouts and, more importantly, URB layouts. For apps running on DXVK, this is extremely important as DXVK likes to declare max-size inputs and outputs and this lets is massively shrink our URB space requirements. VkPipeline-db results (Batman pipelines only) on KBL: total instructions in shared programs: 820403 -> 790008 (-3.70%) instructions in affected programs: 273759 -> 243364 (-11.10%) helped: 622 HURT: 42 total spills in shared programs: 8449 -> 5212 (-38.31%) spills in affected programs: 3427 -> 190 (-94.46%) helped: 607 HURT: 2 total fills in shared programs: 11638 -> 6067 (-47.87%) fills in affected programs: 5879 -> 308 (-94.76%) helped: 606 HURT: 3 Looking at shaders by hand, it makes the URB between TCS and TES go from containing 32 per-vertex varyings per tessellation shader pair to a more reasonable 8-12. For a 3-vertex patch, that's at least half the URB space no matter how big the patch section is. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-17 10:50:28 -05:00
Jason Ekstrand	f210a5f4bb	anv/pipeline: Set tess IO read/written key fields in compile_* We want these to be set as close to the final compile as possible so that they are guaranteed to happen after nir_shader_gather_info is called. The next commit is going to move nir_shader_gather_info to after the linking step which makes this necessary. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-17 10:50:28 -05:00
Jason Ekstrand	2e4094cd8f	anv/pipeline: Use more fields from stage in compile_cs Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-17 10:50:28 -05:00
Jason Ekstrand	4af1a8c9e4	anv/apply_pipeline_layout: Add to the bind map instead of replacing it This commit makes three changes. One is to only walk the descriptors once and set bind map sizes at the same time as filling out the entries. The second is to make the pass additive so that we can put stuff in the bind map before applying the pipeline layout. Third, we switch to using designated initializers. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-17 10:50:28 -05:00
Jason Ekstrand	320dacb0a0	anv/lower_ycbcr: Use the binding array size for bounds checks Because lower_ycbcr gets called before apply_pipeline_layout, the indices are all logical and the binding layout HW size is actually too big for the bounds check. We should just use the regular logical array size instead. Fixes: `f3e91e78a3` "anv: add nir lowering pass for ycbcr textures" Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-17 10:50:28 -05:00
Mathieu Bridon	459ec5265c	python: Open the template as text, with an explicit encoding In commit `bd27203f4d` we changed this to open in binary mode, to then explicitly decode the lines with the right encoding. Unfortunately, that broke the build on Windows, where the template file can have '\r\n' as line terminators: opening in binary mode would keep those terminators and break the regexp. We need to go back to text mode, where the "universal newlines" mode takes care of this. However, to fix the initial issue, let's specify the encoding explicitly when opening the file, and make sure it is open in text mode, so we only get unicode strings. Reviewed-by: Jose Fonseca <jfonseca@vmware>	2018-08-17 09:34:49 -06:00
Mathieu Bridon	f9415d760a	python: Help Python 2 print the line Reviewed-by: Jose Fonseca <jfonseca@vmware>	2018-08-17 09:33:16 -06:00
Rob Clark	a8ef7f5e02	freedreno/a6xx: streamout Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-08-17 11:04:21 -04:00
Rob Clark	7fa2a8c3c4	freedreno/a6xx: fragz fixes Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-08-17 11:04:21 -04:00
Rob Clark	7c73d41160	freedreno/a6xx: scissor fixes Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-08-17 11:04:21 -04:00
Rob Clark	b7f18e49b7	freedreno: update generated headers Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-08-17 11:04:21 -04:00
Rob Clark	a4754c245b	freedreno/a6xx: fix srgb Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-08-17 11:04:21 -04:00
Rob Clark	2658f63701	freedreno: fix dEQP-GLES3.functional.fence_sync.* Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-08-17 11:04:21 -04:00
Samuel Pitoiset	d27e1584ce	radv/winsys: fix creating the BO list for virtual buffers When the number of unique BO is 0, we optimize the list creation by copying all buffers of the current CS directly into it. But this is only valid if the CS doesn't have virtual buffers, otherwise they are not added and hw might report VM faults. This fixes VM faults with: dEQP-VK.sparse_resources.image_sparse_binding.2d.rgba8ui.1024_128_1 CC: <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-08-17 15:00:21 +02:00
Kristian H. Kristensen	de3b34df97	freedreno: Add a6xx backend This adds a freedreno backend for the a6xx generation GPUs, which at the time of this commit is about 98% GLES2 conformant. Much remains to be done - both performance work and feature work towards more recent GLES versions, but this is a good start. Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-08-16 19:13:36 -04:00
Rob Clark	6ee58e8257	freedreno: update generated headers pull in a6xx registers Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-08-16 19:11:08 -04:00
Kristian H. Kristensen	e89683d5a2	freedreno: Fix warnings Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org> Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-08-16 19:11:08 -04:00
Dylan Baker	c782168751	scons: Check for mako 0.8.0 v2: - Use distutils to do the version checking Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107565 Acked-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2018-08-16 13:53:10 -07:00
Dylan Baker	64e4638130	scons: Require python 2.7 less than 2.7 is not supported. v2: - Remove check for python >= 2.0, since we've already enforced 2.7 Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2018-08-16 13:52:56 -07:00
Dylan Baker	5a8f824d8c	meson: use python3 module to find python3 This handy helper is nice for OSes that are not linux or BSD like (mac and windows) as it knows how to find python3 in odd places. Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2018-08-16 13:51:44 -07:00
Dylan Baker	52194ae4df	meson: Ensure that mako is >= 0.8.0 It's what autotools has required for a long time. v3: - Use distutils.version.StrictVersion instead of comparing strings Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2018-08-16 13:50:51 -07:00
Eric Engestrom	03ec672213	svga: simplify Mesa version string Suggested-by: Emil Velikov <emil.l.velikov@gmail.com> Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-16 17:38:31 +01:00
Eric Engestrom	bc8abc1adf	bin: always define MESA_GIT_SHA1 to make it directly usable in code Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-16 17:38:31 +01:00
Eric Engestrom	471f708ed6	git_sha1: simplify logic Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-16 17:38:31 +01:00
Eric Engestrom	9a6a631762	i965: drop unused assignment Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-16 17:38:31 +01:00
Eric Engestrom	7a1f4340b6	anv: drop cast-to-void of used variable `device` is used 2 lines below, even visible in the diff context printed. Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-16 17:38:31 +01:00
Eric Engestrom	6cf0d4f91f	anv: use safer snprintf() to ensure NULL string-terminator Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-16 17:38:31 +01:00
Eric Engestrom	d6aea40326	intel/batch-decoder: replace local ARRAY_LENGTH() macro with global ARRAY_SIZE() Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-16 17:38:31 +01:00
Eric Engestrom	81c1989e4f	intel: various python cleanups Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-16 17:38:25 +01:00
Eric Engestrom	aa78b29eba	egl: check for buffer overflow before corrupting our memory Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-16 17:38:22 +01:00
Eric Engestrom	eb6b41749b	egl/wayland: remove sign from bitfield `formats` Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-16 17:38:18 +01:00
Eric Engestrom	c5d9b48a71	mailmap: add various typos of Emil's address from the log Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-16 17:38:04 +01:00
Eric Engestrom	882ed53946	egl: some spelling fixes Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-16 14:15:18 +01:00
Samuel Pitoiset	f9e8456c39	radv: initialize the DCC predicate correctly when it's compressed We have to do a fast-clear eliminate when clearing DCC metadata with 0x20202020. I don't know if that fixes anything but that seems correct to me. CC: 18.2 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-08-16 14:11:51 +02:00
Samuel Pitoiset	f3a78a9da0	radv: fix missing initialization of the conditional rendering state This was missing when VK_EXT_conditional_rendering has been implemented. The predication type should be -1 to avoid restoring previous state when performing a decompression pass with DCC enabled. Note that we don't have to handle secondary command buffers because we don't support this feature currently. CC: 18.2 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-08-16 14:11:48 +02:00
Eric Engestrom	c5dd02287f	bin: split `write_if_different()` out Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-16 12:33:35 +01:00
Eric Engestrom	c2e00f9eee	bin: whitespace cleanup Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-16 12:30:30 +01:00
Bas Nieuwenhuizen	011a811652	radv: Revert divisor = 0 case for vertex attribute extension. Seems like DXVK depends on that and it might get reverted upstream. Since apps are not supposed to use 0 in v2 anyway, we should be safe implementing the old behavior there. Fixes: `66e12451ac` "radv: Update to new VK_EXT_vertex_attribute_divisor to version 2." CC: 18.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-16 11:13:19 +02:00
Bas Nieuwenhuizen	3308db2dd7	radv: Possible on-demand compilation fix. Seems that in a single case we use the renderpass before checking the pipeline, so check the renderpass before we use it. Fixes: `fbcd167314` "radv: Add on-demand compilation of built-in shaders." Tested-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-16 11:13:19 +02:00
Gert Wollny	1560c58b12	mesa/st: fix array indices off-by-one error in remapping When moving the array sizes from the old list to the new one it was not taken into account that the array indices start with one, but the array_size array started at index zero, which resulted in incorrect array sizes when arrays were merged. Correct this by copying the array_size values of the retained arrays with an offset of -1. Also fix whitespaces for the replaced lines. Fixes: `d8c2119f9b` mesa/st/glsl_to_tgsi: Expose array live range tracking and merging Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-08-16 08:52:26 +02:00
Alexander Tsoy	9a96bf0ecd	meson: fix build for egl platform_x11 without dri3 and gbm Compiling EGL's platform_x11 without dri3 and gbm yields this compile failure: platform_x11 needs inc_loader: ../mesa-18.2.0-rc2/src/egl/drivers/dri2/platform_x11.c:48:10: fatal error: loader.h: No such file or directory #include "loader.h" ^~~~~~~~~~ Fixes: `108d257a16` ("meson: build libEGL") Bugzilla: https://bugs.gentoo.org/663534 Reviewed-by: Matt Turner <mattst88@gmail.com>	2018-08-15 16:37:16 -07:00
Jason Ekstrand	10f44da775	Revert "intel/nir: Call nir_lower_io_to_scalar_early" Commit `4434591bf5` caused substantially more URB messages in geometry and tessellation shaders. Before we can really enable this sort of optimization, We either need some way of combining them back together into vectors or we need to do cross-stage vector element elimination without splitting everything into scalars. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510 Fixes: `4434591bf5` "intel/nir: Call nir_lower_io_to_scalar_early" Acked-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: Mark Janes <mark.a.janes@intel.com>	2018-08-15 17:56:50 -05:00
Erik Faye-Lund	da1f7c56da	i965: do not emit empty surface state If called with an empty size, brw_emit_buffer_surface_state asserts. We already have a dedicated helper for uploading nothing, so let's use that instead. Avoids an assert in dEQP-GLES31.functional.shaders.opaque_type_indexing.ssbo.const_literal_vertex when running a debug build of i965. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-15 23:23:16 +01:00
Sergii Romantsov	743dff1cca	intel/ppgtt: 4096 replaced by PAGE_SIZE Usage of number 4096 replaced by PAGE_SIZE. Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-15 23:23:16 +01:00
Sergii Romantsov	24839663a4	intel/ppgtt: memory address alignment Kernel (for ppgtt) requires memory address to be aligned to page size (4096). -v2: added marking that also fixes initial commit `01058a5522`. -v3: numbers replaced by PAGE_SIZE; buffer-object size is aligned instead of alignment of offsets (Chris Wilson). -v4: changes related to PAGE_SIZE moved to separate commit -v5: restored alignment to page-size for 0-size. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106997 Fixes: `a363bb2cd0` (i965: Allocate VMA in userspace for full-PPGTT systems.) Fixes: `01058a5522` (i965: Add virtual memory allocator infrastructure to brw_bufmgr.) Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-15 23:23:16 +01:00
Timothy Arceri	f0a8accb0d	radv: add Doom workaround Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-16 07:53:38 +10:00
Sergii Romantsov	efb28aa970	i965: Emitting 3DSTATE_SO_BUFFER of 0-size. Avoided filling of whole structure and bo-allocation if size of surface is 0. Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>	2018-08-15 13:15:28 -07:00
Erik Faye-Lund	98b3b6367a	virgl: report actual max-texture sizes Instead of doing conservative guesses, we should report the max levels based on the max sizes we get from GL on the host. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Jakob Bornecrantz <jakob@collabora.com>	2018-08-15 18:48:16 +02:00
Erik Faye-Lund	825aaeae39	virgl: do not use SP_MAX_TEXTURE_*_LEVELS defines These macro-names are also used for softpipe, so let's avoid confusion by avoiding them. Besides, they are just used in one place in virgl, so let's just inline them into the place they are used instead. While we're at it, fixup an error in the comment for the 3D version. Mesa subtracts computes max-size by doing by 2^(n-1), which means this should be 256 cubed, not 512 cubed. The other comments are correct. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Jakob Bornecrantz <jakob@collabora.com>	2018-08-15 18:48:08 +02:00
Dylan Baker	ef7ae84daf	docs: Add news item for 18.1.6	2018-08-15 09:09:59 -07:00
Samuel Pitoiset	71d5b2fbf8	radv: disable the auto-waitcnt-before-barrier LLVM option This option allows us to remove additional s_waitcnt instructions because s_barrier internally does s_waitcnt 0. Though, apparently there is a problem with LDS accesses that causes rendering issues with FFXV and DXVK. Disable this optimization for now (RadeonSI still uses it). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107460 CC: 18.2 <mesa-stable@lists.freedesktop.org> Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-08-15 16:21:50 +02:00
Samuel Pitoiset	85113c4d05	radv: fix memory leaks in radv_load_meta_pipeline() Reported by Coverity. Fixes: `fbcd167314` ("radv: Add on-demand compilation of built-in shaders.") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-08-15 16:20:58 +02:00
Samuel Pitoiset	17e79865cf	radv: drop wrong initialization of COMPUTE_RESOURCE_LIMITS The last parameter of radeon_set_sh_reg_seq() is the number of dwords to emit. We were lucky because WAVES_PER_SH(0x3) is 3 but it was initialized to 0. COMPUTE_RESOURCE_LIMITS is correctly set when generating compute pipelines, so we don't need to initialize it. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-08-15 16:20:38 +02:00
Andres Gomez	53b4701cb0	docs: update calendar 18.2.0-rc3 is out Signed-off-by: Andres Gomez <agomez@igalia.com>	2018-08-15 15:48:18 +03:00
Mauro Rossi	43318d5857	radv/meta_decompress: fix pointer to integer conversion VK_NULL_HANDLE replaces NULL to avoid following building error: external/mesa/src/amd/vulkan/radv_meta_decompress.c:365:54: error: incompatible pointer to integer conversion passing 'void ' to parameter of type 'VkShaderModule' (aka 'unsigned long long') [-Werror,-Wint-conversion] VkResult ret = create_pipeline(cmd_buffer->device, NULL, samples, ^~~~ prebuilts/clang/host/linux-x86/clang-4053586/lib64/clang/5.0.300080/include/stddef.h:105:16: note: expanded from macro 'NULL' # define NULL ((void)0) ^~~~~~~~~~ external/mesa/src/amd/vulkan/radv_meta_decompress.c:97:32: note: passing argument to parameter 'vs_module_h' here VkShaderModule vs_module_h, ^ 1 error generated. Fixes: `fbcd167314` ("radv: Add on-demand compilation of built-in shaders.") Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>	2018-08-15 14:34:50 +02:00
Mauro Rossi	73b342c7a5	egl/android: fix regression in drm_gralloc path (v2) This patch fixes a regression in mesa 18.2 and mesa-dev branches for HAVE_DRM_GRALLOC code path which is causing black screen on Android and prevents boot due to SIGSEGV MAPERR crash related to unproper handling of drm_gralloc drm FD in new droid_open_device() path. Problem is due to `c7bb82136b` ("egl/android: Add DRM node probing and filtering") To avoid the crash the former existing working droid_open_device() is restored, renamed droid_open_device_drm_gralloc() and kept within HAVE_DRM_GRALLOC braces. Tested with mesa-dev and mesa 18.2 branch and oreo-x86 bootanimation and Androdi GUI booting is fixed with i965, nouveau, radeon. The changes are compatible with gbm_gralloc, I've tested build with hwc too. (v2) remove indentation from HAVE_DRM_GRALLOC pre-processor directive NOTE: Definition of enum{} for GRALLOC_MODULE_PERFORM_GET_DRM_FD is not necessary and it's actually causing a redefinition building error, because in HAVE_DRM_GRALLOC path gralloc_drm.h is already exported by libgralloc_drm which is currently still a dependency. Fixes: `c7bb82136b` ("egl/android: Add DRM node probing and filtering") Cc: "18.2" <mesa-stable@lists.freedesktop.org> Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>	2018-08-15 14:07:49 +02:00
Tapani Pälli	656ccf4ef8	mesa: shader dump/read support for ARB programs Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106283 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2018-08-15 11:03:35 +03:00
Danylo Piliaiev	479a849ad6	glsl: Avoid calling get_array_element for scalar constants Accessing scalar constant as an array in function call or initializer list triggered assert in get_array_element. Examples: func(0[0]); vec2 t = { 0[0], 0 }; Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107550 Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-08-15 10:01:43 +03:00
Marek Olšák	bffa025ada	radeonsi: enable 1 missing PS_SU perf counter on Polaris	2018-08-14 21:20:31 -04:00
Marek Olšák	df50099834	radeonsi: use radeon_info::name Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-14 21:20:31 -04:00
Marek Olšák	84652721b9	ac: add radeon_info::name Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-14 21:20:31 -04:00
Marek Olšák	de8d5edbc4	radeonsi: split si_clear_buffer to remove enum si_method Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 21:21:12 -04:00
Marek Olšák	4de92f2abb	radeonsi: replace CP_DMA_USE_L2 with enum si_cache_policy Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 21:21:10 -04:00
Marek Olšák	bc132d62f9	radeonsi: declare coher in si_copy_buffer Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 21:21:09 -04:00
Marek Olšák	cddd7ce325	radeonsi: make PFP_SYNC_ME an explicit CP DMA flag Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 21:21:07 -04:00
Marek Olšák	277295962c	radeonsi: don't use emit_data->args in load_emit Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 21:21:06 -04:00
Marek Olšák	8fb34050b5	radeonsi: don't use emit_data->args in store_emit Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 21:21:04 -04:00
Marek Olšák	a2c18bfbe3	radeonsi: don't use emit_data->args in atomic_emit Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 21:21:03 -04:00
Marek Olšák	297fb213b3	radeonsi: don't use emit_data->args in build_interp_intrinsic Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 21:21:01 -04:00
Marek Olšák	99ae440d4e	radeonsi: inline atomic_fetch_args Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 21:20:59 -04:00
Marek Olšák	267e92893c	radeonsi: inline store_fetch_args Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 21:20:58 -04:00
Marek Olšák	f15e55aa8a	radeonsi: inline load_fetch_args Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 21:20:56 -04:00
Marek Olšák	2c94f321eb	radeonsi: merge txq_emit and resq_emit Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 21:20:55 -04:00
Marek Olšák	a14c803166	radeonsi: inline resq_fetch_args Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 21:20:54 -04:00
Marek Olšák	347e52adcd	radeonsi: inline txq_fetch_args Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 21:20:52 -04:00
Marek Olšák	c9b2ce2672	radeonsi: use get_resinfo directly in lower_gather4_integer Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 21:20:36 -04:00
Marek Olšák	7804ddaf87	radeonsi: inline tex_fetch_args into build_tex_intrinsic The diff looks like it moves code that I didn't touch. Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 21:20:34 -04:00
Marek Olšák	da1d8adc29	radeonsi: remove fetch_args callbacks for ALU instructions Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 21:20:33 -04:00
Marek Olšák	ac72a6bd0b	radeonsi: move internal TGSI shaders into si_shaderlib_tgsi.c Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 21:20:31 -04:00
Marek Olšák	0ca8294ece	radeonsi: implement EXT_window_rectangles Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 21:19:02 -04:00
Marek Olšák	465e929d6a	gallium/u_blitter: save/restore window rectangles Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 21:19:01 -04:00
Marek Olšák	15fc0f8d4a	noop: implement set_window_rectangles Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 21:18:59 -04:00
Marek Olšák	7c8716e4fb	ddebug: implement set_window_rectangles Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 21:18:51 -04:00
Rodrigo Vivi	44f1dcf9b3	i965: Add a new CFL PCI ID. One more CFL ID added to spec. Align with kernel commit d0e062ebb3a4 ("drm/i915/cfl: Add a new CFL PCI ID.") Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Anuj Phogat <anuj.phogat@gmail.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-14 15:46:56 -07:00
Rob Clark	70bf639328	freedreno/ir3: add support for a6xx 'merged' register set Starting with a6xx, half and full precision registers conflict. Which makes things a bit more efficient, ie. if some parts of the shader are heavy on half-precision and others on full precision, you don't have to allocate the worst case for both. But it means we need to setup some additional conflicts. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-08-14 17:59:02 -04:00
Rob Clark	4813060ed4	freedreno/ir3: small RA cleanup Collapse is_temp() into it's only callsite, and pass compiler object as struct rather than void. Just cleanups to reduce noise in next patch. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-08-14 17:59:02 -04:00
Rob Clark	fdd35f497b	freedreno/ir3: stop hard-coding FS input regs We originally did this because at the time we didn't know all the bitfields to configure where various frag shader sysval's went. But we do. So switch to using sysvals for all the frag shader inputs. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-08-14 17:59:02 -04:00
Rob Clark	e97b56172c	freedreno/ir3: use r63.x for unused inputs This way, unused sysval inputs, like frag_vcoord, get the correct regid value to disable the input. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-08-14 17:59:02 -04:00
Rob Clark	066930e54d	freedreno/ir3: create all inputs in first block create_input()/create_input_compmask() should take the ctx as arg, rather than block, to enforce that all inputs are created in the first block, so that RA sees them as live at the start of the shader. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-08-14 17:59:02 -04:00
Rob Clark	62da068fd3	freedreno/ir3: rename s/frag_pos/frag_vcoord/g Make it more clear that this is varying fetch related. Also fixup some comments. Just cleanup for next patches. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-08-14 17:59:02 -04:00
Rob Clark	4a7f9feada	compiler: add SYSTEM_VALUE_VARYING_COORD Used internally in freedreno/ir3 for the vec2 value that hw passes to shader to use as coordinate for bary.f (varying fetch) instruction. This is not the same as SYSTEM_VALUE_FRAG_COORD. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-08-14 17:59:02 -04:00
Rob Clark	b5a098b202	freedreno/ir3: move per-generation compiler config Move it from the compile ctx to the compiler object, before adding new things for a6xx. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-08-14 17:59:02 -04:00
Bas Nieuwenhuizen	66e12451ac	radv: Update to new VK_EXT_vertex_attribute_divisor to version 2. Behavior wrt firstInstance got changed, and a divisor of 0 has been disallowed. The new version of the ext got published in specification 1.1.81. Sending to stable since the only known user is DXVK, which needs this for correctness. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> CC: 18.2 <mesa-stable@lists.freedesktop.org>	2018-08-14 22:13:09 +02:00
Bas Nieuwenhuizen	4bb6c49375	radv: Allow ETC2 on RAVEN and VEGA10 instead of all GFX9. Follow radeonsi. Fixes: `3665f66ef2` "radv: Add support for ETC2 textures." Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 22:11:04 +02:00
Bas Nieuwenhuizen	bf33ca7512	radv: Fix missing Android platform define. CC: <mesa-stable@lists.freedesktop.org> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2018-08-14 22:11:04 +02:00
Rob Clark	13b9d32fb1	freedreno: move free() into fdN_context_destroy() Following patches will be doing further cleanup after calling fd_context_destroy() so it is easier if we move the free() into the per-gen backend code. Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-08-14 15:46:34 -04:00
Jonathan Marek	dc9705f30d	freedreno: a2xx: ir2 update this patch brings a number of changes to ir2: -ir2 now generates CF clauses as necessary during assembly. this simplifies fd2_program/fd2_compiler and is necessary to implement optimization passes -ir2 now has separate vector/scalar instructions. this will make it easier to implementing scheduling of scalar+vector instructions together. dst_reg is also now seperate from src registers instead of a single list -ir2 now implements register allocation. this makes it possible to compile shaders which have more than 64 TGSI registers -ir2 now implements the following optimizations: removal of IN/OUT MOV instructions generated by TGSI and removal of unused instructions when some exports are disabled -ir2 now allows full 8-bit index for constants -ir2_alloc no longer allocates 4 times too many bytes Signed-off-by: Jonathan Marek <jonathan@marek.ca> Signed-off-by: Rob Clark <robdclark@gmail.com>	2018-08-14 12:46:25 -04:00
Andres Gomez	5406eb5513	docs: update calendar 18.2.0-rc1 and 18.2.0-rc2 are out Signed-off-by: Andres Gomez <agomez@igalia.com>	2018-08-14 17:07:09 +03:00
Bas Nieuwenhuizen	fbcd167314	radv: Add on-demand compilation of built-in shaders. In environments where we cannot cache, e.g. Android (no homedir), ChromeOS (readonly rootfs) or sandboxes (cannot open cache), the startup cost of creating a device in radv is rather high, due to compiling all possible built-in pipelines up front. This meant depending on the CPU a 1-4 sec cost of creating a Device. For CTS this cost is unacceptable, and likely for starting random apps too. So if there is no cache, with this patch radv will compile shaders on demand. Once there is a cache from the first run, even if incomplete, the driver knows that it can likely write the cache and precompiles everything. Note that I did not switch the buffer and itob/btoi compute pipelines to on-demand, since you cannot really do anything in Vulkan without them and there are only a few. This reduces the CTS runtime for the no caches scenario on my threadripper from 32 minutes to 8 minutes. Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-08-14 10:26:24 +02:00
Bas Nieuwenhuizen	24a9033d6f	radv: Refactor blit pipeline creation. Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-08-14 10:26:11 +02:00
Bas Nieuwenhuizen	806a792b43	radv: Make fs key exemplars ordered to be a reverse fs_key lookup. While at it, share the exemplars and account for a non-occurring fs key. Reviewed-by: Dave Airlie <airlied@redhat.com>	2018-08-14 10:26:06 +02:00
Dave Airlie	0be5e9f5a1	virgl: ARB_texture_barrier support Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>	2018-08-14 16:55:56 +10:00
Dylan Baker	6d61aed231	docs: update calendar, add news item and link release notes for 18.1.6 Signed-off-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-13 10:06:45 -07:00
Dylan Baker	973ae7a06b	docs: Add sha256 sums for 18.1.6	2018-08-13 10:05:44 -07:00
Dylan Baker	66c8a64e67	docs: Add release notes for 18.1.6	2018-08-13 10:05:42 -07:00
Alejandro Piñeiro	668ab8aeb1	mesa/glspirv: fix compilation with MSVC From AppVeyor #8582, it seems that MSVC doesn't like uint, so this patch replaces it with unsigned. Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-08-13 18:57:18 +02:00
Eric Engestrom	f976d22759	travis: install correct version of mako for each build system Meson now uses python3, so let's add a block for Autotools, move that line into the buildsys-specific blocks, and set the correct version for Meson. Fixes: `2ee1c86d71` "meson: Build with Python 3" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-13 17:29:42 +01:00
Erik Faye-Lund	ae5770171c	mesa/st/glsl_to_tgsi: fixup copy-paste mistake This is clearly a copy-paste error; if we validate the reladdr2-pointer, we don't want to traverse to the reladdr-pointer. Especially since the check above shows that reladdr could be NULL here. Noticed by Coverity. CID: 1438389, 1438390 Fixes: `568bda2f2d` ("mesa/st/glsl_to_tgsi: Split arrays whose elements are only accessed directly") Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Gert Wollny <gw.fossdev@gmail.com>	2018-08-13 18:15:36 +02:00
Neil Roberts	c91a5f70fb	i965/nir: Use the nir copy of shader_info to handle gl_PatchVerticesIn Instead of using the copy of shader_info stored in gl_program, it now uses the one in nir_shader. This is needed for SPIR-V because the info.tess.tcs_vertices_out is filled in via _mesa_spirv_to_nir which happens much later than with a GLSL shader. The copy of shader_data in gl_program is only updated later via brw_shader_gather_info but that is too late. For GLSL this shouldn't create any problems because the nir copy of the shader_info is immediately copied from the gl_program in glsl_to_nir. v2: updated after commit "i965: Combine both gl_PatchVerticesIn lowering passes." (488972) (Alejandro Piñeiro) Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-13 16:28:27 +02:00
Neil Roberts	a105c1e6e5	mesa/glspirv: Set separate_shader on shader_info The value is copied from the gl_program. If we don’t do this then it will get reset back to zero in brw_shader_gather_info. This isn’t a problem for GLSL because in that case the nir_shader is initialised with a copy of the shader_info from the gl_program. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-13 16:28:27 +02:00
Iago Toral Quiroga	40947d4744	mesa/glspirv: pick off the only entry point we need This is the same we do for vulkan drivers This is needed to pass the following CTS test: KHR-GL45.gl_spirv.spirv_modules_shader_binary_multiple_shader_objects_test Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-13 16:28:27 +02:00
Alejandro Piñeiro	32e1d4c34b	mesa/glspirv: compute double inputs and remap attributes input locations used by input attributes are not handled in the same way in OpenGL vs Vulkan. There is a detailed explanation of such differences on the following commit: `c2acf97fcc` So with this commit, the same adjustment that is done after glsl_to_nir, is being done after spirv_to_nir, when it is used on OpenGL (ARB_gl_spirv). Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-13 16:28:27 +02:00
Alejandro Piñeiro	d6c8066663	nir/glsl: make nir_remap_attributes public As we plan to reuse it for ARB_gl_spirv implementation. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-13 16:28:27 +02:00
Alejandro Piñeiro	af194bd38e	nir/lower_samplers: don't assume a deref for both texture and sampler srcs After commit "nir: Use derefs in nir_lower_samplers" (`75286c2d08`) assumes one deref for both the texture and the sampler. However there are cases (on OpenGL, using ARB_gl_spirv) where SPIR-V is not providing a sampler, like for texture query levels ops. Although we could make spirv_to_nir to provide a sampler deref for those cases, it is not really needed, and wrong from the Vulkan point of view. This patch fixes the following (borrowed) tests run on SPIR-V mode: arb_compute_shader/execution/basic-texelFetch.shader_test arb_gpu_shader5/execution/sampler_array_indexing/fs-simple-texture-size.shader_test arb_texture_query_levels/execution/fs-baselevel.shader_test arb_texture_query_levels/execution/fs-maxlevel.shader_test arb_texture_query_levels/execution/fs-miptree.shader_test arb_texture_query_levels/execution/fs-nomips.shader_test arb_texture_query_levels/execution/vs-baselevel.shader_test arb_texture_query_levels/execution/vs-maxlevel.shader_test arb_texture_query_levels/execution/vs-miptree.shader_test arb_texture_query_levels/execution/vs-nomips.shader_test glsl-1.30/execution/fs-textureSize-compare.shader_test v2: merge lower_tex_src_to_offset and calc_sampler_offsets together, update texture/sampler index and texture_array_size directly on lower_tex_src_to_offset (Jason) v3: clarify one comment (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-13 16:28:27 +02:00
Alejandro Piñeiro	fe2de39fb2	nir/linker: take into account hidden uniforms So they are not exposed through the introspection API. It is worth to note that the number of hidden uniforms of GLSL linking vs SPIR-V linking would be somewhat different due the differen order of the nir lowerings/optimizations. For example: gl_FbWposYTransform. This is introduced as part of nir_lower_wpos_ytransform. On GLSL that is executed after the IR-based linking. So that means that on GLSL the UniformStorage will not include this uniform. With the SPIR-V linking, that uniform is already present, but marked as hidden. So it will be included on the UniformStorage, but as hidden. One alternative would create a special how_declared for that case, but seemed an overkill. Using hidden should be ok as far as it is used properly. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-13 16:28:27 +02:00
Alejandro Piñeiro	5332d7582d	nir: add how_declared to nir_variable.data Equivalent to the already existing how_declared at GLSL IR. The only difference is that we are not adding all the declaration_type available on GLSL, only the one that we will use on the short term. We would add more mode if needed on the future. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-13 16:28:26 +02:00
Neil Roberts	be6f472b23	spirv: Make VertexIndex and VertexId both non-zero-based GLSL has gl_VertexID which is supposed to be non-zero-based. SPIR-V has both VertexIndex and VertexId builtins whose meanings are defined by the APIs. Vulkan defines VertexIndex as being non-zero-based. In Vulkan VertexId and InstanceId have no meaning and are pretty much just reserved for OpenGL at this point. GL_ARB_spirv removes VertexIndex and defines VertexId to be the same as gl_VertexId (which is also non-zero-based). Previously in Mesa it was treating VertexIndex as non-zero-based and VertexId as zero-based, so it was breaking for GL. This behaviour was apparently based on Khronos bug 14255. However that bug doesn’t seem to have made a final decision for VertexId. Assuming there really is no other definition for VertexId for Vulkan it seems better to just make them both have the same value. v2: update comment and commit descriptions, based on Jason Ekstrand explanation of the meaning/rationale behind all those builtins (Jason) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-13 16:23:36 +02:00
Alejandro Piñeiro	624c00f1a6	spirv: fill info.gs.input_primitive too info.gs.output_primitive was already being filled. Not sure why this is not needed on Vulkan, but we found to be needed for ARB_gl_spirv. Specifically, this is needed to get the following test passing: KHR-GL45.gl_spirv.spirv_validation_builtin_variable_decorations_test Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-13 12:56:51 +02:00
Tapani Pälli	ed94a5799d	docs/features: mark GL_EXT_render_snorm as done for i965 Signed-off-by: Tapani Pälli <tapani.palli@intel.com>	2018-08-13 13:08:22 +03:00
Tapani Pälli	fa9e6c235d	i965: enable EXT_render_snorm Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-08-13 12:03:17 +03:00
Tapani Pälli	0d356cf478	mesa: enable EXT_render_snorm extension Patch sets additional formats renderable and enables the extension when OpenGL ES 3.1 is supported. v2: instead of dummy_true, have a separate toggle for extension (Eric Anholt) v3: add missing checks, simplify some existing checks and fix glCopyTexImage2D check (Nanley Chery) add SHORT and BYTE support in read_pixels_es3_error_check Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>	2018-08-13 12:03:17 +03:00
Kenneth Graunke	de57926dc9	blorp: Properly handle Z24X8 blits. One of the reasons we didn't notice that R24_UNORM_X8_TYPELESS destinations were broken was that an earlier layer was swapping it out for B8G8R8A8_UNORM. That made Z24X8 -> Z24X8 blits work. However, R32_FLOAT -> R24_UNORM_X8_TYPELESS was still totally broken. The old code only considered one format at a time, without thinking that format conversion may need to occur. This patch moves the translation out to a place where it can consider both formats. If both are Z24X8, we continue using B8G8R8A8_UNORM to avoid having to do shader math workarounds. If we have a Z24X8 destination, but a non-matching source, we use our shader hacks to actually render to it properly. Fixes: `804856fa57` (intel/blorp: Handle more exotic destination formats) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-11 12:34:01 -07:00
Kenneth Graunke	8a29086285	blorp: Don't try to use R32_UNORM for R24_UNORM_X8_TYPELESS rendering. The hardware doesn't support rendering to R24_UNORM_X8_TYPELESS, so Jason decided to fake it with a bit of shader math and R32_UNORM RTs. The only problem is that R32_UNORM isn't renderable either...so we've just traded one bad format for another. This patch makes us use R32_UINT instead. Fixes: `804856fa57` (intel/blorp: Handle more exotic destination formats) Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-11 12:33:27 -07:00
Jason Ekstrand	a9f7bcfdf9	intel: Switch the order of the 2x MSAA sample positions The Vulkan 1.1.82 spec flipped the order to better match D3D. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>	2018-08-11 10:58:12 -05:00
Gert Wollny	8a87138885	mesa/st/tests: Add array life range estimation and renumbering tests Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Acked-by: Dave Airlie <airlied@redhat.com>	2018-08-11 12:32:42 +02:00
Gert Wollny	0981fc84df	mesa/st/tests: Add array life range tests infrastructure to common test class Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Acked-by: Dave Airlie <airlied@redhat.com>	2018-08-11 12:32:42 +02:00
Gert Wollny	d8c2119f9b	mesa/st/glsl_to_tgsi: Expose array live range tracking and merging This patch ties in the array split, merge, and interleave code. shader-db changes in the TGSI code are: original code \| array-merge \| change mean max \| mean max \| best mean % worst ----------------------------------------------------------- arrays 0.05 2 \| 0.00 0 \| -2 -100 0 total temps 5.05 21 \| 4.92 20 \| -15 -2.59 1 instr 55.33 988 \| 55.20 988 \| -15 -0.24 0 Evaluation: Run shader-db in single thread mode (otherwise the output is not ordered and the best and worst column don't make sense) to get results pre-stats.txt and post-stats.txt. Then using python pandas: import pandas as pd old_stats = pd.read_csv('pre-stats.txt') new_stats = pd.read_csv('post-stats.txt') omean = old_stats.mean() omax = old_stats.max() nmean = new_stats.mean() nmax = new_stats.max() delta = new_stats - old_stats pd.concat([omean, omax, nmean, nmax, delta.min(), delta.mean()/old_stats.mean()*100, delta.max()], axis=1, keys=['mean', 'max', 'mean', 'max', 'best', 'avg change %', 'worst']) v4: - Correct typo and add bugs that are fixed by this series. - Update stats and describe stats evaluation Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105371 https://bugs.freedesktop.org/show_bug.cgi?id=100200 Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Acked-by: Dave Airlie <airlied@redhat.com>	2018-08-11 12:32:42 +02:00
Gert Wollny	c317d0ab54	mesa/st/glsl_to_tgsi: add array life range evaluation into tracking code v4: Also track the register given in inst->resource. (thanks: Benedikt Schemmer for testing the patches on radeonsi, which revealed that I was missing tracking this) Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Acked-by: Dave Airlie <airlied@redhat.com>	2018-08-11 12:32:42 +02:00
Gert Wollny	5e58eb37f1	mesa/st/glsl_to_tgsi: add class for array access tracking Because of the indirect access it is impossible to obtain an accurate per component and array element tracking. Therefore, the tracking is simplified to only track whether any element was accessed, whether this happend conditionally in a loop. In addition, while tracking of temporaries requires a per-componet tracking that is later fused, for arrays only the components access mask is neede. The resulting tracking code and evaluation of the array live range is sufficiently different from the evaluation of the live range of temporaries to justify implementing this in a different class instead of adding more complexity to the already existing code for temporary life range evaluation. v4: Update commit message to make it clearer why this class is seperate from the tracking of temporaries. Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Acked-by: Dave Airlie <airlied@redhat.com>	2018-08-11 12:32:42 +02:00
Gert Wollny	7d55d01b53	mesa/st/glsl_to_tgsi: move evaluation of read mask up in the call hierarchy In preparation of the array live range tracking the evaluation of the read mask is moved out the register live range tracking to the enclosing call of the generalized read access tracking. Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Acked-by: Dave Airlie <airlied@redhat.com>	2018-08-11 12:32:42 +02:00
Gert Wollny	f2a4636339	mesa/st/glsl_to_tgsi: rename access_record to register_merge_record and some more renames In preparartion of adding the tracking of the live range the classes that refer to temporary registers are renamed. Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Acked-by: Dave Airlie <airlied@redhat.com>	2018-08-11 12:32:42 +02:00
Gert Wollny	8c89728889	mesa/st/tests: Add tests for array merge helper classes. v2: - Define tests also in the meson.build file. v4: - Check no-op mapping of all bits. - Convert tests to the new class layout used in the merge evaulation. - remove dependency on llvm in meson build (Thanks Dylan Baker for pointing out that this might not needed) Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Acked-by: Dave Airlie <airlied@redhat.com>	2018-08-11 12:32:42 +02:00
Gert Wollny	12316aa217	mesa/st/glsl_to_tgsi: Add array merge logic v4: - Update the code to use the new merge logic. - Use a cleaner, class-based approach for the evaluation of merges. Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Acked-by: Dave Airlie <airlied@redhat.com>	2018-08-11 12:32:42 +02:00
Gert Wollny	d097ef4204	mesa/st/glsl_to_tgsi: Add helper classes to apply array merging and interleaving v4: - Remove logic for evaluation of swizzles and merges since this was moved to array_live_range. This class now only handles the actual remapping. Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Acked-by: Dave Airlie <airlied@redhat.com>	2018-08-11 12:32:42 +02:00
Gert Wollny	d54c2f92f9	mesa/st/glsl_to_tgsi: Add helper class for array live range merging and interleaving This class holds the array length, live range, and accessed components, and it implements the logic for evaluating how arrays are merged and interleaved. v4: - Add logic to evaluate merge and interleave of a pair of arrays to the class array_live_range. - document class - update commit message Thanks Nicolai Hähnle for the pointers given. Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Acked-by: Dave Airlie <airlied@redhat.com>	2018-08-11 12:32:42 +02:00
Gert Wollny	331ae3cde5	mesa/st/glsl_to_tgsi:rename lifetime to register_live_range On one hand "live range" is the term used in the literature, and on the other hand a distinction is needed from the array live ranges. v4: Fix indentions and white spaces Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v3) Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Acked-by: Dave Airlie <airlied@redhat.com>	2018-08-11 12:32:42 +02:00
Gert Wollny	f40c9d0225	mesa/st/glsl_to_tgsi: Properly resolve life times simple if/else + use constructs in constructs like below, currently the live range estimation extends the live range of t unecessarily to the whole loop because it was not detected that t is unconditional written and later read only in the "if (a)" scope. while (foo) { ... if (a) { ... if (b) t = ... else t = ... x = t; ... } ... } This patch adds a unit test for this case and corrects the minimal live range estimation accordingly. v4: update comments Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Acked-by: Dave Airlie <airlied@redhat.com>	2018-08-11 12:32:42 +02:00
Gert Wollny	568bda2f2d	mesa/st/glsl_to_tgsi: Split arrays whose elements are only accessed directly Array whose elements are only accessed directly are replaced by the according number of temporary registers. By doing so the otherwise reserved register range becomes subject to further optimizations like copy propagation and register merging. Thanks to the resulting reduced register pressure this patch makes the piglits spec/glsl-1.50/execution - variable-indexing/vs-output-array-vec3-index-wr-before-gs geometry/max-input-components pass on r600 (barts) where they would fail before with a "GPR limit exceeded" error (even with the spilling that was recently added). v2: * rename method dissolve_arrays to split_arrays * unify the tracking and remapping methods for src and dst registers * also track access to arrays via reladdr* v3: * enable this optimization only if the driver requests register merge v4: * Correct comments * Also update inst->resource if it is an array element (thanks: Benedikt Schemmer for testing the patches on radeonsi, which revealed that I was missing tracking this) Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Acked-by: Dave Airlie <airlied@redhat.com>	2018-08-11 12:32:42 +02:00
Gert Wollny	b1cead3add	mesa/st/glsl_to_tgsi: Add method to collect some TGSI statistics When mesa is compiled in debug mode then this adds the possibility to print out some statistics about the translated and optimized TGSI shaders to a file. The functionality is enabled by setting the environment variable GLSL_TO_TGSI_PRINT_STATS to the file name where the statistics should be collected. The file is opened in append mode so that statistics from various runs will be accumulated. v4: Make accress to log file thread save (thanks for pointing this out Nicolai Hähnle) Signed-off-by: Gert Wollny <gw.fossdev@gmail.com> Acked-by: Dave Airlie <airlied@redhat.com>	2018-08-11 12:32:42 +02:00
Gert Wollny	be95ca9be7	Gallium/tgsi: Correct signdness of return value of bit operations The GLSL operations findLSB, findMSB, and countBits always return a signed integer type. Let TGSI reflect this. v2: Properly set values in infer_(src\|dst)_type (Thanks Roland Schneidegger for pointing out problems with my 1st approach) v2: Set values in the common infer_type code path, and only add the correct source type for UMSB (Roland Schneidegger) Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Roland Scheidegger <sroland@vmware.com>	2018-08-11 11:14:29 +02:00
Mathieu Bridon	2ee1c86d71	meson: Build with Python 3 Now that all the build scripts are compatible with both Python 2 and 3, we can flip the switch and tell Meson to use the latter. Since Meson already depends on Python 3 anyway, this means we don't need two different Python stacks to build Mesa. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-10 15:15:09 -07:00
Mathieu Bridon	bd27203f4d	python: Rework bytes/unicode string handling In both Python 2 and 3, opening a file without specifying the mode will open it for reading in text mode ('r'). On Python 2, the read() method of a file object opened in mode 'r' will return byte strings, while on Python 3 it will return unicode strings. Explicitly specifying the binary mode ('rb') then decoding the byte string means we always handle unicode strings on both Python 2 and 3. Which in turns means all re.match(line) will return unicode strings as well. If we also make expandCString return unicode strings, we don't need the call to the unicode() constructor any more. We were using the ugettext() method because it always returns unicode strings in Python 2, contrarily to the gettext() one which returns byte strings. The ugettext() method doesn't exist on Python 3, so we must use the right method on each version of Python. The last hurdles are that Python 3 doesn't let us concatenate unicode and byte strings directly, and that Python 2's stdout wants encoded byte strings while Python 3's want unicode strings. With these changes, the script gives the same output on both Python 2 and 3. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-10 15:14:48 -07:00
Mathieu Bridon	15ac05fd45	python: Fix inequality comparisons On Python 3, executing `foo != bar` will first try to call foo.__ne__(bar), and fallback on the opposite result of foo.__eq__(bar). Python 2 does not do that. As a result, those __eq__ methods were never called, when we were testing for inequality. Expliclty adding the __ne__ methods fixes this issue, in a way that is compatible with both Python 2 and 3. However, this means the __eq__ methods are now called when testing for `foo != None`, so they need to be guarded correctly. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-10 08:45:59 -07:00
Gert Wollny	e94095ec30	mesa/st: ETC2 now uses R8G8B8A8_SRGB as fallback The check for ETC2 compatibility was not updated when the fallback format was changed. Fixes: `71867a0a61` st/mesa: Fall back to R8G8B8A8_SRGB for ETC2 Signed-off-by: Gert Wollny <gert.wollny@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-10 10:09:22 +02:00
Mathieu Bridon	08fe9b3e3a	python: Simplify list sorting Instead of copying the list, then sorting the copy in-place, we can just get a new sorted copy directly. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-09 16:49:19 -07:00
Mathieu Bridon	8d3ff6244c	python: Use key-functions when sorting containers In Python 2, the traditional way to sort containers was to use a comparison function (which returned either -1, 0 or 1 when passed two objects) and pass that as the "cmp" argument to the container's sort() method. Python 2.4 introduced key-functions, which instead only operate on a given item, and return a sorting key for this item. In general, this runs faster, because the cmp-function has to get run multiple times for each item of the container. Python 3 removed the cmp-function, enforcing usage of key-functions instead. This change makes the script compatible with Python 2 and Python 3. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-09 16:49:19 -07:00
Mathieu Bridon	1e668ca111	python: Better check for integer types Python 3 lost the long type: now everything is an int, with the right size. This commit makes the script compatible with Python 2 (where we check for both int and long) and Python 3 (where we only check for int). Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-09 16:49:19 -07:00
Mathieu Bridon	14f1ab998f	python: Do not mix bytes and unicode strings Mixing the two is a long-standing recipe for errors in Python 2, so much so that Python 3 now completely separates them. This commit stops treating both as if they were the same, and in the process makes the script compatible with both Python 2 and 3. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-09 16:49:19 -07:00
Mathieu Bridon	c644b2d7a7	python: Explicitly use a list On Python 2, the builtin functions filter() returns a list. On Python 3, it returns an iterator. Since we want to use those objects in contexts where we need lists, we need to explicitly turn them into lists. This makes the code compatible with both Python 2 and Python 3. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-09 16:49:18 -07:00
Mathieu Bridon	d9ca4a172e	python: Use the right function for the job The code was just reimplementing itertools.combinations_with_replacement in a less efficient way. This does change the order of the results slightly, but it should be ok. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-09 16:49:18 -07:00
Eric Anholt	b618d7ea59	egl: Fix leak of X11 pixmaps backing pbuffers in DRI3. This is basically copied from the DRI2 destroy path. Without this, Raspberry Pi would quickly run out of CMA during the EGL tests in the CTS due to all the pixmaps laying around. Fixes: `f35198bade` ("egl/x11: Implement dri3 support with loader's dri3 helper") Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-09 13:12:13 -07:00
Kenneth Graunke	08a5c395ab	intel: Fix SIMD16 unaligned payload GRF reads on Gen4-5. When the SIMD16 Gen4-5 fragment shader payload contains source depth (g2-3), destination stencil (g4), and destination depth (g5-6), the single register of stencil makes the destination depth unaligned. We were generating this instruction in the RT write payload setup: mov(16) m14<1>F g5<8,8,1>F { align1 compr }; which is illegal, instructions with a source region spanning more than one register need to be aligned to even registers. This is because the hardware implicitly does (nr \| 1) instead of (nr + 1) when splitting the compressed instruction into two mov(8)'s. I believe this would cause the hardware to load g5 twice, replicating subspan 0-1's destination depth to subspan 2-3. This showed up as 2x2 artifact blocks in both TIS-100 and Reicast. Normally, we rely on the register allocator to even-align our virtual GRFs. But we don't control the payload, so we need to lower SIMD widths to make it work. To fix this, we teach lower_simd_width about the restriction, and then call it again after lower_load_payload (which is what generates the offending MOV). Fixes: `8aee87fe4c` (i965: Use SIMD16 instead of SIMD8 on Gen4 when possible.) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107212 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=13728 Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Diego Viola <diego.viola@gmail.com>	2018-08-09 12:33:41 -07:00
Kenneth Graunke	11b9f63a74	i965: Only enable depth IZ signals if there's an actual depthbuffer. According to the G45 PRM Volume 2 Page 265 we're supposed to only set these signals when there is an actual depth buffer. Note that we already do this for the stencil buffer by virtue of brw->stencil_enabled invoking _mesa_is_stencil_enabled(ctx) which checks whether the current drawbuffer's visual has stencil bits (which is updated based on what buffers are bound). We just need to do it for depth as well. Not observed to fix anything. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-09 12:33:38 -07:00
Adam Jackson	63a6b719d9	glx: GLX_MESA_multithread_makecurrent is direct-only This extension is not defined for indirect contexts. Marking it as "client only", as the old code did here, would make the extension available in indirect contexts, even though the server would certainly not have it in its extension list. Cc: <mesa-stable@lists.freedesktop.org> Signed-off-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-09 12:33:14 -04:00
Eric Engestrom	fcf259ef97	anv: set error in all failure paths Cc: Jason Ekstrand <jason.ekstrand@intel.com> Fixes: `5b196f39bd` "anv/pipeline: Compile to NIR in compile_graphics" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-09 11:20:27 +01:00
Eric Engestrom	aac80f7597	intel/tools: add missing variable initialisation Fixes: `6a60beba40` "intel/tools: Add an error state to aub translator" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2018-08-09 11:20:18 +01:00
vadym.shovkoplias	e0de26eacc	drirc: Allow extension midshader for Metro Redux This fixes both Metro 2033 Redux and Metro Last Light Redux Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99730 Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com> Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-08-09 13:13:20 +03:00
Tapani Pälli	03a5acec68	glsl: handle error case with ast_post_inc, ast_post_dec Return ir_rvalue::error_value with ast_post_inc, ast_post_dec if parser error was emitted previously. This way process_array_size won't see bogus IR generated like with commit `9c676a6427`. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98699 Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>	2018-08-09 13:07:16 +03:00
Eric Anholt	fdfb689a48	vc4: Implement texture_subdata() to directly upload tiled data. This avoids a memcpy into a temporary in the upload path. Improves x11perf -putimage100 performance by 12.1586% +/- 1.38155% (n=145)	2018-08-08 18:14:31 -07:00
Eric Anholt	25bee5ef9e	vc4: Handle partial loads/stores of tiled textures. Previously, we would load out the tile-aligned area, update the raster copy, and store it back. This was a huge cost for XPutImage calls to the screen under glamor. Instead, implement a general load/store path that walks over the source x/y writing into the corresponding pixel of the destination (using clever math from https://fgiesen.wordpress.com/2011/01/17/texture-tiling-and-swizzling/). If things are aligned, we go through the previous utile-at-a-time loop. Improves x11perf -putimage10 performance by 139.777% +/- 2.83464% (n=5) Improves x11perf -putimage100 performance by 383.908% +/- 22.6297% (n=11) Improves x11perf -getimage10 performance by 2.75731% +/- 0.585054% (n=145)	2018-08-08 16:45:44 -07:00
Eric Anholt	3e06b918aa	vc4: Compile the LT image helper per cpp we might load/store. For the partial load/store support I'm about to add, we want the memcpy to be compiled out to a single load/store. This should also eliminate the calls to vc4_utile_width/height(). Improves x11perf -putimage100 performance by 3.76344% +/- 1.16978% (n=15)	2018-08-08 15:53:25 -07:00
Eric Anholt	d6a174669f	vc4: Refactor to reuse the LT tile walking code.	2018-08-08 12:34:48 -07:00
Juan A. Suarez Romero	a9fb331ea7	wayland/egl: update surface size on window resize According to EGL 1.5 spec, section 3.10.1.1 ("Native Window Resizing"): "If the native window corresponding to _surface_ has been resized prior to the swap, _surface_ must be resized to match. _surface_ will normally be resized by the EGL implementation at the time the native window is resized. If the implementation cannot do this transparently to the client, then eglSwapBuffers must detect the change and resize surface prior to copying its pixels to the native window." So far, resizing a native window in Wayland/EGL was interpreted in Mesa as a request to resize, which is not executed until the first draw call. And hence, surface size is not updated until executing it. Thus, querying the surface size with eglQuerySurface() after a window resize still returns the old values. This commit updates the surface size values as soon as the resize is done, even when the real resize is done in the draw call. This makes the semantics that any native window resize request take effect inmediately, and if user calls eglQuerySurface() it will return the new resized values. v2: update surface size if there isn't a back surface (Daniel) CC: Daniel Stone <daniel@fooishbar.org> CC: mesa-stable@lists.freedesktop.org Reviewed-by: Daniel Stone <daniels@collabora.com>	2018-08-08 18:29:58 +02:00
Juan A. Suarez Romero	1fe7cbdf05	wayland/egl: initialize window surface size to window size When creating a windows surface with eglCreateWindowSurface(), the width and height returned by eglQuerySurface(EGL_{WIDTH,HEIGHT}) is invalid until buffers are updated (like calling glClear()). But according to EGL 1.5 spec, section 3.5.6 ("Surface Attributes"): "Querying EGL_WIDTH and EGL_HEIGHT returns respectively the width and height, in pixels, of the surface. For a window or pixmap surface, these values are initially equal to the width and height of the native window or pixmap with respect to which the surface was created" This fixes dEQP-EGL.functional.color_clears.* CTS tests v2: - Do not modify attached_{width,height} (Daniel) - Do not update size on resizing window (Brendan) CC: Daniel Stone <daniel@fooishbar.org> CC: Brendan King <brendan.king@imgtec.com> CC: mesa-stable@lists.freedesktop.org Tested-by: Eric Engestrom <eric@engestrom.ch> Tested-by: Chad Versace <chadversary@chromium.org> Reviewed-by: Chad Versace <chadversary@chromium.org> Reviewed-by: Daniel Stone <daniels@collabora.com>	2018-08-08 18:28:52 +02:00
Juan A. Suarez Romero	f9d0e7d3bc	travis: make drivers explicit in Meson targets Like in the autotools target, make the list of drivers to be built in each of the Meson targets explicit. This will help to identify missing dependencies and other issues more easily. CC: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-08 17:56:32 +02:00
Brian Paul	51e878cdb3	svga: use pipe_sampler_view::target in svga_set_sampler_views() instead of the underlying texture's target. This fixes an issue where the TGSI sampler type was not agreeing with the sampler view target/type. In particular, this fixes a Mint 19 XFCE desktop scaling issue because the TGSI code was using a RECT sampler but the sampler view's underlying texture was PIPE_TEXTURE_2D. We want to use the sampler view's type rather than the underlying resource, as we do for the view's surface format. No piglit regressions. VMware issue 2156696. Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-08-08 08:20:10 -06:00
Brian Paul	92e5dc94ac	svga: use SVGA3D_RS_FILLMODE for vgpu9 I'm not sure why we didn't support this in the past, but fillmode is supported by all renderers nowadays. Also fix the logic in svga_create_rasterizer_state() to avoid a few swtnl case. No piglit regressions Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-08-08 08:20:10 -06:00
Brian Paul	a45b495700	svga: add TGSI_SEMANTIC_FACE switch case in svga_swtnl_update_vdecl() Fixes failed assertion running Piglit polygon-mode-face test. Though, the test still does not pass. Reviewed-by: Neha Bhende <bhenden@vmware.com> Reviewed-by: Charmaine Lee <charmainel@vmware.com>	2018-08-08 08:20:10 -06:00
Brian Paul	92e7342a6f	xlib: remove unused Fake_glXGetAGPOffsetMESA() function To silence compiler warning. Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-08 08:20:09 -06:00
Brian Paul	6ff4795c62	gl.h: define GLeglImageOES depending on GL_EXT_EGL_image_storage To avoid duplicate typedef with the definition in glext.h V2: test for both GL_OES_EGL_image and GL_EXT_EGL_image_storage in case both the GL and GLES headers are included. Per Emil. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107488 Tested-by: Vinson Lee <vlee@freedesktop.org> Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>	2018-08-08 08:20:01 -06:00
Emil Velikov	32aa7ff647	Android: copy -fnomath options from the autotools build Add -fno-math-errno and -fno-trapping-math to the build. Mesa does not depend on the functionality provided, thus this should result in slightly faster code and smaller binaries. Cc: Tapani Pälli <tapani.palli@intel.com> Cc: Rob Herring <robh@kernel.org> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Tapani Pälli <tapani.palli@intel.com>	2018-08-08 13:45:55 +01:00
Emil Velikov	315c46cfdc	autotools: use correct gl.pc LIBS when using glvnd This is more of a hack, since glvnd itself should be providing the file. Until that happens, ensure the libs is correctly set to -lGL CC: <mesa-stable@lists.freedesktop.org> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Adam Jackson <ajax@redhat.com>	2018-08-08 13:37:09 +01:00
Emil Velikov	8dc96416c9	glx: automake: add egl.pc/headers TODO when using glvnd Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Adam Jackson <ajax@redhat.com>	2018-08-08 13:37:09 +01:00
Emil Velikov	94ed4c4a16	egl: automake: add egl.pc/headers TODO when using glvnd Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Adam Jackson <ajax@redhat.com>	2018-08-08 13:37:09 +01:00
Emil Velikov	25a9450a44	autotools: error out when building with mangling and glvnd It's not a thing that can work, nor is a wise idea to attempt. v2: Tweak error message (Dylan) CC: <mesa-stable@lists.freedesktop.org> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Adam Jackson <ajax@redhat.com> (v1)	2018-08-08 13:37:09 +01:00
Emil Velikov	d5ac236471	autotools: error out when using the broken --with-{gl, osmesa}-lib-name The toggles were broken with the introduction of --enable-mangling. Fixing that up might be possible, but it's not worth the complexity since one can rename the libraries at any point. CC: <mesa-stable@lists.freedesktop.org> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Adam Jackson <ajax@redhat.com>	2018-08-08 13:37:09 +01:00
Emil Velikov	4f2b73d9fd	meson: recommend building the surfaceless platform It has no special requirements, size and build-time is effectively zero. v2: Rebase Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Acked-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Adam Jackson <ajax@redhat.com>	2018-08-08 13:37:09 +01:00
Emil Velikov	a7ea7511ba	automake: require shared glapi when using DRI based libGL This has been a requirement for ages, yet it seems like we never explicitly errored out during configure. CC: <mesa-stable@lists.freedesktop.org> Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Adam Jackson <ajax@redhat.com>	2018-08-08 13:37:09 +01:00
Emil Velikov	834036500c	ttn: remove {varying_slot, frag_result}_to_tgsi_semantic helpers The respective drivers have been updated and the helpers are no longer needed. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-08-08 13:33:07 +01:00
Juan A. Suarez Romero	db432194a1	travis: remove libedit-dev dependency in LLVM 6.0 targets In LLVM <6.0 we added explicitly libedit-dev, as it was required to satisfy apt dependencies. In LLVM 6.0, this is not required anymore, so let's remove it. CC: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-08 13:00:33 +02:00
Erik Faye-Lund	0f450e0cbe	glsl_to_tgsi: plumb image writable through to driver The virgl driver cares about the writable-flag on image definitions, because it re-emits GLSL from the TGSI. However, so far it was hardcoded to true in glsl_to_tgsi, which cause problems when virglrenderer is running on top of GLES 3.1, where not all formats are supported for writable images. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com>	2018-08-08 09:35:09 +02:00
Eric Anholt	cfe69d0aaa	vc4: Fix vc4_fence_server_sync() on pre-syncobj kernels. We won't have an FD if we're just having the server wait on a fence created by eglCreateSyncKHR(). Our seqno fences will happen in order, so server-side waits are no-ops in that case. Fixes dEQP-EGL.functional.sharing.gles2.multithread.simple_egl_server_sync.buffers.gen_delete Fixes: `b0acc3a562` ("broadcom/vc4: Native fence fd support")	2018-08-07 17:00:49 -07:00
Eric Anholt	69158c452b	vc4: Ignore samplers for finding uniform offsets. Fixes: dEQP-GLES2.shaders.struct.uniform.sampler_array_fragment dEQP-GLES2.shaders.struct.uniform.sampler_array_vertex dEQP-GLES2.shaders.struct.uniform.sampler_nested_fragment dEQP-GLES2.shaders.struct.uniform.sampler_nested_vertex Cc: mesa-stable@lists.freedesktop.org	2018-08-07 17:00:22 -07:00
Eric Anholt	e24a8e5232	vc4: Extend dumping of uniforms in QIR and in the command stream. Similar to what I did for V3D, provide some description of the uniforms.	2018-08-07 17:00:22 -07:00
Eric Anholt	3954331aff	vc4: Pull uinfo->data[i] dereference out to the top of the loop. Reduces the size of vc4_uniforms.o by about 10%. We would basically always end up loading the cachline of uinfo->data[i] anyway, so it should be good for performance as well as making the code a bit cleaner.	2018-08-07 17:00:22 -07:00
Eric Anholt	550e9c917c	vc4: Make sure to emit a tile coordinates between two MSAA loads. The HW only executes a load once the tile coordinates packet happens, and only tracks one at a time, so by emitting our two MSAA loads back to back we would end up with an undefined color or Z buffer. The simulator doesn't seem to care, but sync up the RCL generation with the kernel anyway. Fixes dEQP-EGL.functional.render.multi_context.gles2.rgb888_window	2018-08-07 17:00:22 -07:00
Eric Anholt	9ab6912a00	vc4: Respect a sampler view's first_layer field. Fixes texturing from EGL images created from cubemap faces, as in dEQP-EGL.functional.image.create.gles2_cubemap_negative_x_rgba_texture Cc: mesa-stable@lists.freedesktop.org	2018-08-07 17:00:22 -07:00
Dave Airlie	fe0a3a45bb	virgl: add ARB_shader_clock support Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>	2018-08-08 08:36:40 +10:00
Mathieu Bridon	ba1ebf2ee1	python: Specify the template output encoding We're trying to write a unicode string (i.e decoded) to a file opened in binary (i.e encoded) mode. In Python 2 this works, because of the automatic conversion between byte and unicode strings. In Python 3 this fails though, as no automatic conversion is attempted. This change makes the scripts compatible with both versions of Python. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-07 13:28:35 -07:00
Mathieu Bridon	e1b88aee68	python: Fix rich comparisons Python 3 doesn't call objects __cmp__() methods any more to compare them. Instead, it requires implementing the rich comparison methods explicitly: __eq__(), __ne(), __lt__(), __le__(), __gt__() and __ge__(). Fortunately Python 2 also supports those. This commit only implements the comparison methods which are actually used by the build scripts. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-07 13:10:34 -07:00
Mathieu Bridon	9b6746b7c0	python: Use explicit integer divisions In Python 2, divisions of integers return an integer: >>> 32 / 4 8 In Python 3 though, they return floats: >>> 32 / 4 8.0 However, Python 3 has an explicit integer division operator: >>> 32 // 4 8 That operator exists on Python >= 2.2, so let's use it everywhere to make the scripts compatible with both Python 2 and 3. In addition, using __future__.division tells Python 2 to behave the same way as Python 3, which helps ensure the scripts produce the same output in both versions of Python. Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v2) Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-07 13:07:44 -07:00
Chad Versace	3dc22381fa	egl/main: Add bits for EGL_KHR_mutable_render_buffer A follow-up patch enables EGL_KHR_mutable_render_buffer for Android. This patch is separate from the Android patch because I think it's easier to review the platform-independent bits separately. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-08-07 11:11:05 -07:00
Chad Versace	5c6d6eedb3	dri: Add param driCreateConfigs(mutable_render_buffer) If set, then the config will have __DRI_ATTRIB_MUTABLE_RENDER_BUFFER, which translates to EGL_MUTABLE_RENDER_BUFFER_BIT_KHR. Not used yet. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-08-07 11:11:05 -07:00
Chad Versace	bbe2d50b58	dri: Define DRI_MutableRenderBuffer extensions Define extensions DRI_MutableRenderBufferDriver and DRI_MutableRenderBufferLoader. These are the two halves for EGL_KHR_mutable_render_buffer. Outside the DRI code there is one additional change. Add gl_config::mutableRenderBuffer to match __DRI_ATTRIB_MUTABLE_RENDER_BUFFER. Neither are used yet. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-08-07 11:11:05 -07:00
Chad Versace	eabf59791e	egl/dri2: In dri2_make_current, return early on failure This pulls an 'else' block into the function's main body, making the code easier to follow. Without this change, the upcoming EGL_KHR_mutable_render_buffer patch transforms dri2_make_current() into spaghetti. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-08-07 11:11:05 -07:00
Chad Versace	f48f9a78da	egl: Simplify queries for EGL_RENDER_BUFFER There exist two queryable EGL_RENDER_BUFFER states in EGL: eglQuerySurface(EGL_RENDER_BUFFER) and eglQueryContext(EGL_RENDER_BUFFER). These changes eliminate potentially very fragile code in the upcoming EGL_KHR_mutable_render_buffer implementation. * eglQuerySurface(EGL_RENDER_BUFFER) The implementation of eglQuerySurface(EGL_RENDER_BUFFER) contained abstruse logic which required comprehending the specification complexities of how the two EGL_RENDER_BUFFER states interact. The function sometimes returned _EGLContext::WindowRenderBuffer, sometimes _EGLSurface::RenderBuffer. Why? The function tried to encode the actual logic from the EGL spec. When did the function return which variable? Go study the EGL spec, hope you understand it, then hope Mesa mutated the EGL_RENDER_BUFFER state in all the correct places. Have fun. To simplify eglQuerySurface(EGL_RENDER_BUFFER), and to improve confidence in its correctness, flatten its indirect logic. For pixmap and pbuffer surfaces, simply return a hard-coded literal value, as the spec suggests. For window surfaces, simply return _EGLSurface::RequestedRenderBuffer. Nothing difficult here. * eglQueryContext(EGL_RENDER_BUFFER) The implementation of this suffered from the same issues as eglQuerySurface, and the solution is the same. confidence in its correctness, flatten its indirect logic. For pixmap and pbuffer surfaces, simply return a hard-coded literal value, as the spec suggests. For window surfaces, simply return _EGLSurface::ActiveRenderBuffer. Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-08-07 11:11:05 -07:00
Marek Olšák	d145e33e7c	radeonsi: set GLC=1 for all write-only shader resources	2018-08-07 13:52:34 -04:00
Marek Olšák	2ab8cf6de5	radeonsi: don't load block dimensions into SGPRs if they are not variable	2018-08-07 13:52:34 -04:00
Juan A. Suarez Romero	03cff7ecd8	travis: meson/Vulkan requires LLVM 6.0 RADV now requires LLVM 6.0. Fixes: `fd1121e839` ("amd: remove support for LLVM 5.0") CC: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Andres Gomez <agomez@igalia.com>	2018-08-07 19:29:29 +02:00
Juan A. Suarez Romero	80f937ea4d	travis: add ubuntu-toolchain-r-test LLVM 6.0 requires libstc++4.9, which is not available in main Travis repository. v2: LLVM 6.0 requires libstdc+4.9, rather than GCC 4.9 (Jan Vesely) Fixes: `fd1121e839` ("amd: remove support for LLVM 5.0") CC: Marek Olšák <marek.olsak@amd.com> CC: Emil Velikov <emil.velikov@collabora.com> CC: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-07 19:27:07 +02:00
Emil Velikov	85cad15298	egl: set EGL_BAD_NATIVE_PIXMAP in the copy_buffers fallback As the spec says: EGL_BAD_NATIVE_PIXMAP is generated if the implementation does not support native pixmaps. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-07 17:59:24 +01:00
Emil Velikov	5463064f7a	egl/x11: use the no-op dri2_fallback_copy_buffers for swrast Currently dri2_copy_buffers is used for swrast, which depends on the DRI2_FLUSH extension. Since that's not a thing on software based drivers we crash out. Do the slightly more graceful, thing of returning EGL_FALSE. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-07 17:59:09 +01:00
Emil Velikov	670cd4080b	egl: remove unneeded _eglGetNativePlatform check There's little point in calling _eglGetNativePlatform() in eglCopyBuffers. The platform returned should be identical to the one already stored in our _EGLDisplay. In the following corner case, the check is incorrect. The function _eglGetNativePlatform effectively invokes the old-style eglGetDisplay platform selection. Thus if the EGL_PLATFORM platform does not match with the EGL_EXT_platform_* used to create the display we'll error out. Addresses the egl-copy-buffers piglit test. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-08-07 17:58:52 +01:00
Emil Velikov	b4b277f770	travis: use https for all the links Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-07 17:27:06 +01:00
Emil Velikov	6b8657aff0	autoconf: stop exporting internal wayland details With version v1.15 the "code" option was deprecated in favour of "private-code" or "public-code". Before the interface symbol generated was exported (which is a bad idea since it's internal implementation detail) and others may misuse it. That was the case with libva approx. 1 year ago. Since then libva was fixed, so we can finally hide it by using "private-code" Inspired by similar xserver patch by Adam Jackson. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-07 17:23:17 +01:00
Emil Velikov	2f1d9e6cb8	meson: stop exporting internal wayland details With version v1.15 the "code" option was deprecated in favour of "private-code" or "public-code". Before the interface symbol generated was exported (which is a bad idea since it's internal implementation detail) and others may misuse it. That was the case with libva approx. 1 year ago. Since then libva was fixed, so we can finally hide it by using "private-code" Inspired by similar xserver patch by Adam Jackson. Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-07 17:23:17 +01:00
Emil Velikov	c077b74ee8	meson: use dependency()+find_program() for wayland-scanner Helps when the native wayland-scanner is located outside of PATH. Inspired by the xserver code ;-) Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-07 17:23:17 +01:00
Emil Velikov	54d844897f	swr: don't export swr_create_screen_internal With earlier rework the user and provider of the symbol are within the same binary. Thus there's no point in exporting the function. Spotted while reviewing patch from Chuck, that nearly added another unneeded PUBLIC function. Cc: Chuck Atkins <chuck.atkins@kitware.com> Cc: Tim Rowley <timothy.o.rowley@intel.com> Fixes: `f50aa21456` "(swr: build driver proper separate from rasterizer") Signed-off-by: Emil Velikov <emil.velikov@collabora.com> Tested-by: Chuck Atkins <chuck.atkins@kitware.com> Reviewed-By: George Kyriazis <george.kyriazis@intel.com<mailto:george.kyriazis@intel.com>> Tested-by: Chuck Atkins <chuck.atkins@kitware.com<mailto:chuck.atkins@kitware.com>>	2018-08-07 17:23:17 +01:00
Eric Engestrom	e02f061b69	meson: install KHR/khrplatform.h when needed Fixes: `f7d42ee7d3` "include: update GL & GLES headers (v2)" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-07 15:57:32 +01:00
Eric Engestrom	ed07e831a8	i965: gen_shader_sha1() doesn't use the brw_context Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com>	2018-08-07 14:20:50 +01:00
Eric Engestrom	87c156183c	configure: install KHR/khrplatform.h when needed Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107511 Fixes: `f7d42ee7d3` "include: update GL & GLES headers (v2)" Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Tested-by: Brad King <brad.king@kitware.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com>	2018-08-07 14:20:50 +01:00
Lionel Landwerlin	303e7b39b5	intel: don't build tools without -Dtools=intel Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107487 Fixes: 4334196ab325c6w ("intel: tools: simplify meson build") Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-07 11:58:47 +01:00
Erik Faye-Lund	c4f183492d	virgl: update virgl_hw.h from virglrenderer This just makes sure we're currently up-to-date with what virglrenderer has. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Acked-by: Dave Airlie <airlied@redhat.com>	2018-08-07 09:38:41 +02:00
Erik Faye-Lund	0914e1464e	virgl: rename msaa_sample_positions -> sample_locations This matches what this field is called in virglrenderer's copy of this. This reduces the diff between the two different versions of virgl_hw.h, and should make it easier to upgrade the file in the future. Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Acked-by: Dave Airlie <airlied@redhat.com>	2018-08-07 09:38:27 +02:00
Eric Anholt	9507e03699	vc4: Fix a leak of the no-vertex-elements workaround BO. Fixes: `bd1925562a` ("vc4: Convert the driver to emitting the shader record using pack macros.")	2018-08-06 19:10:06 -07:00
Eric Anholt	86095e9bb1	vc4: Fix context creation when syncobjs aren't supported. Noticed when trying to run current Mesa on rpi's downstream kernel. Fixes: `b0acc3a562` ("broadcom/vc4: Native fence fd support")	2018-08-06 19:10:06 -07:00
Eric Anholt	1561e4984e	v3d: Emit the VCM_CACHE_SIZE packet. This is needed to ensure that we don't get blocked waiting for VPM space with bin/render overlapping. Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-08-06 13:03:23 -07:00
Eric Anholt	5d49076990	v3d: Drop "VC5" from the renderer string. VC5 isn't a useful name any more, just stick to v3d.	2018-08-06 13:03:23 -07:00
Eric Anholt	50a8713d4f	v3d: Avoid spilling that breaks the r5 usage after a ldvary. Fixes bad rendering when forcing 2 spills in glxgears. Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-08-06 13:03:23 -07:00
Eric Anholt	f2c0d310d6	v3d: Make sure that QPU instruction-has-a-dest matches VIR. Found when debugging register spilling -- we would try to spill the dest of a STVPMV, inserting spill code after entering the last segment. In fact, we were likely to to choose to do this, given that the STVPMV "dest" temp was never read from, making it cheap to spill. Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-08-06 13:03:23 -07:00
Eric Anholt	3f9cb2eb05	v3d: Wait for TMU writes to complete before continuing after a spill. The simulator complained that we had write responses outstanding at shader end. It seems that a TMU read does not guarantee that previous TMU writes by the thread have completed, which surprised me. Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-08-06 13:03:23 -07:00
Eric Anholt	ccbe33af5b	v3d: Make sure we don't emit a thrsw before the last one finished. Found while forcing some spilling, which creates a lot of short tmua->thrsw->ldtmu sequences. Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-08-06 13:03:23 -07:00
Eric Anholt	f9d54dc3cf	v3d: Add some debug code for forcing register spilling. This is useful for periodically testing out register spilling to see how it goes on simple shaders, rather than only failing on insanely complicated ones.	2018-08-06 13:03:23 -07:00
Chad Versace	aaa41cd297	drisw: Fix build on Android Nougat, which lacks shm (v2) In commit `cf54bd5e8`, dri_sw_winsys.c began using <sys/shm.h> to support the new functions putImageShm, getImageShm in DRI_SWRastLoader. But Android began supporting System V shared memory only in Oreo. Nougat has no shm headers. Fix the build by ifdef'ing out the shm code on Nougat. Fixes: `cf54bd5e8` "drisw: use shared memory when possible" Reviewed-by: Dave Airlie <airlied@redhat.com> Cc: Marc-André Lureau <marcandre.lureau@gmail.com>	2018-08-06 11:09:38 -07:00
Ian Romanick	6229ee87c7	mesa: fix make check for AMD_framebuffer_multisample_advanced Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107483 Fixes: `3d6900d76e` ("glapi: define AMD_framebuffer_multisample_advanced and add its functions") Reviewed-by: Marek Olšák <marek.olsak@amd.com> Cc: Vinson Lee <vlee@freedesktop.org>	2018-08-06 10:31:56 -07:00
Ian Romanick	b7946f6778	glapi: Fix GLES versioning for AMD_framebuffer_multisample_advanced functions The GL_AMD_framebuffer_multisample_advanced spec says: OpenGL ES dependencies: Requires OpenGL ES 3.0. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107483 Fixes: `3d6900d76e` ("glapi: define AMD_framebuffer_multisample_advanced and add its functions") Reviewed-by: Marek Olšák <marek.olsak@amd.com> Cc: Vinson Lee <vlee@freedesktop.org>	2018-08-06 10:30:06 -07:00
Gert Wollny	7a46b2d641	meson, install_megadrivers: Also remove stale symlinks os.path.exists doesn't return True for stale symlinks, but they are in the way later, when a link/file with the same name is to be created. For instance it is conceivable that the pointed to file is replaced by a file with a new name, and then the symlink is dead. To handle this check specifically for all existing symlinks to be removed. (This bugged me for some time with a link libXvMCr600.so always being in the way of installing this file) v2: use only os.lexist and replace all instances of os.exist (Dylan Baker) v3: handle directory check correctly (Eric Engestrom) Fixes: `f7f1b30f81` ("meson: extend install_megadrivers script to handle symmlinking") Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>(v2 minus dir check) Reviewed-by: Dylan Baker <dylan@pnwbakers.com> Signed-off-by: Gert Wollny <gert.wollny@collabora.com>	2018-08-06 18:42:01 +02:00
Tapani Pälli	5eb4b384d9	anv: add more swapchain formats This change helps with some of the dEQP-VK.wsi.android.* tests that try to create swapchain with using such formats. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Chad Versace <chadversary@chromium.org>	2018-08-06 09:25:11 +03:00
Karol Herbst	c3325097be	nvc0/ir: return 0 in imageLoad on incomplete textures We already guarded all OP_SULDP against out of bound accesses, but we ended up just reusing whatever value was stored in the dest registers. Fixes CTS test shader_image_load_store.incomplete_textures v2: fix for loads not ending up with predicates (bindless_texture) v3: fix replacing the def Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Karol Herbst <kherbst@redhat.com>	2018-08-04 18:25:20 +02:00
Karol Herbst	0ca046d7e9	gm200/ir: optimize rcp(sqrt) to rsq mitigates hurt shaders after adding sqrt: total instructions in shared programs : 5456166 -> 5454825 (-0.02%) total gprs used in shared programs : 647522 -> 647551 (0.00%) total shared used in shared programs : 389120 -> 389120 (0.00%) total local used in shared programs : 21064 -> 21064 (0.00%) total bytes used in shared programs : 58288696 -> 58274448 (-0.02%) local shared gpr inst bytes helped 0 0 0 516 516 hurt 0 0 27 2 2 Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Karol Herbst <kherbst@redhat.com>	2018-08-04 15:24:08 +02:00
Karol Herbst	6f98a3065b	gm200/ir: add native OP_SQRT support ./GpuTest /test=pixmark_piano 1024x640 30sec: 301 -> 327 points shader-db: total instructions in shared programs : 5472103 -> 5456166 (-0.29%) total gprs used in shared programs : 647530 -> 647522 (-0.00%) total shared used in shared programs : 389120 -> 389120 (0.00%) total local used in shared programs : 21064 -> 21064 (0.00%) total bytes used in shared programs : 58459304 -> 58288696 (-0.29%) local shared gpr inst bytes helped 0 0 27 8281 8281 hurt 0 0 21 431 431 v2: use NVISA_GM200_CHIPSET Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu> Signed-off-by: Karol Herbst <kherbst@redhat.com>	2018-08-04 15:24:08 +02:00
Lionel Landwerlin	4334196ab3	intel: tools: simplify meson build Remove the if tools condition and just put it through the install: parameter. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-04 09:45:34 +01:00
Lionel Landwerlin	87a3c97781	intel: aubinator: simplify decoding Since we don't support streaming an aub file, we can drop the decoding status enum. v2: include stdbool (Eric) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-04 09:40:14 +01:00
Lionel Landwerlin	02ebc064ea	intel: common: add missing stdint include Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-04 09:39:01 +01:00
Lionel Landwerlin	db4770ee57	intel: decoder: remove unused variable Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-04 09:38:58 +01:00
Lionel Landwerlin	7471286bb0	intel: tools: aubwrite: reuse canonical address helper Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-04 09:38:44 +01:00
Lionel Landwerlin	35955afa7a	intel: aubinator: fix read the context/ring Up to now we've been lucky that the buffer returned was always exactly at the address we requested. Fixes: `144b40db54` ("intel: aubinator: drop the 1Tb GTT mapping") Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>	2018-08-04 09:38:34 +01:00
Ian Romanick	3b07d28f81	nir: Transform expressions of b2f(a) and b2f(b) to a == b All Gen7+ platforms had similar results. (Skylake shown) total instructions in shared programs: 14276886 -> 14276838 (<.01%) instructions in affected programs: 312 -> 264 (-15.38%) helped: 2 HURT: 0 total cycles in shared programs: 532578395 -> 532570985 (<.01%) cycles in affected programs: 682562 -> 675152 (-1.09%) helped: 374 HURT: 4 helped stats (abs) min: 2 max: 200 x̄: 20.39 x̃: 18 helped stats (rel) min: 0.07% max: 11.64% x̄: 1.25% x̃: 1.28% HURT stats (abs) min: 2 max: 114 x̄: 53.50 x̃: 49 HURT stats (rel) min: 0.06% max: 11.70% x̄: 5.02% x̃: 4.15% 95% mean confidence interval for cycles value: -21.30 -17.91 95% mean confidence interval for cycles %-change: -1.30% -1.06% Cycles are helped. Sandy Bridge total instructions in shared programs: 10488123 -> 10488075 (<.01%) instructions in affected programs: 336 -> 288 (-14.29%) helped: 2 HURT: 0 total cycles in shared programs: 150260379 -> 150260439 (<.01%) cycles in affected programs: 4726 -> 4786 (1.27%) helped: 0 HURT: 2 No changes on Iron Lake or GM45. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	c658b6c4c8	nir: Transform expressions of b2f(a) and b2f(b) to a ^^ b All Gen platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14276892 -> 14276886 (<.01%) instructions in affected programs: 484 -> 478 (-1.24%) helped: 2 HURT: 0 total cycles in shared programs: 532578397 -> 532578395 (<.01%) cycles in affected programs: 3522 -> 3520 (-0.06%) helped: 1 HURT: 0 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	3aca80aabc	nir: Transform expressions of b2f(a) and b2f(b) to !(a && b) All Gen platforms had pretty similar results. (Skylake shown) total cycles in shared programs: 532578400 -> 532578397 (<.01%) cycles in affected programs: 2784 -> 2781 (-0.11%) helped: 1 HURT: 1 helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4 helped stats (rel) min: 0.26% max: 0.26% x̄: 0.26% x̃: 0.26% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08% v2: s/fmax/fmin/. Noticed by Thomas Helland. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	1713c97181	nir: Transform expressions of b2f(a) and b2f(b) to a && b No changes on any Gen platform. v2: s/fmax/fmin/. Noticed by Thomas Helland. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	4425f4786a	nir: Transform expressions of b2f(a) and b2f(b) to !(a \|\| b) All Gen6+ platforms had similar results. (Skylake shown) total instructions in shared programs: 14276961 -> 14276892 (<.01%) instructions in affected programs: 3215 -> 3146 (-2.15%) helped: 28 HURT: 0 helped stats (abs) min: 1 max: 6 x̄: 2.46 x̃: 2 helped stats (rel) min: 0.47% max: 9.52% x̄: 4.34% x̃: 1.92% 95% mean confidence interval for instructions value: -2.87 -2.06 95% mean confidence interval for instructions %-change: -5.73% -2.95% Instructions are helped. total cycles in shared programs: 532577068 -> 532578400 (<.01%) cycles in affected programs: 121864 -> 123196 (1.09%) helped: 35 HURT: 30 helped stats (abs) min: 2 max: 268 x̄: 42.34 x̃: 22 helped stats (rel) min: 0.12% max: 12.14% x̄: 3.22% x̃: 1.86% HURT stats (abs) min: 2 max: 246 x̄: 93.80 x̃: 36 HURT stats (rel) min: 0.09% max: 13.63% x̄: 4.47% x̃: 2.58% 95% mean confidence interval for cycles value: -5.02 46.01 95% mean confidence interval for cycles %-change: -0.99% 1.65% Inconclusive result (value mean confidence interval includes 0). Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 7781299 -> 7781342 (<.01%) instructions in affected programs: 22300 -> 22343 (0.19%) helped: 13 HURT: 40 helped stats (abs) min: 2 max: 3 x̄: 2.85 x̃: 3 helped stats (rel) min: 1.15% max: 7.69% x̄: 3.72% x̃: 3.33% HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2 HURT stats (rel) min: 0.26% max: 1.30% x̄: 0.47% x̃: 0.43% 95% mean confidence interval for instructions value: 0.23 1.39 95% mean confidence interval for instructions %-change: -1.18% 0.07% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 177878928 -> 177879332 (<.01%) cycles in affected programs: 383298 -> 383702 (0.11%) helped: 7 HURT: 43 helped stats (abs) min: 2 max: 18 x̄: 10.00 x̃: 10 helped stats (rel) min: 0.17% max: 4.81% x̄: 2.62% x̃: 3.40% HURT stats (abs) min: 2 max: 38 x̄: 11.02 x̃: 12 HURT stats (rel) min: 0.08% max: 1.54% x̄: 0.25% x̃: 0.09% 95% mean confidence interval for cycles value: 5.21 10.95 95% mean confidence interval for cycles %-change: -0.51% 0.21% Inconclusive result (%-change mean confidence interval includes 0). v2: s/fmin/fmax/. Noticed by Thomas Helland. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	6b3670ae80	nir: Transform -fabs(a) >= 0 to a == 0 All Gen platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14276964 -> 14276961 (<.01%) instructions in affected programs: 411 -> 408 (-0.73%) helped: 3 HURT: 0 helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 helped stats (rel) min: 0.47% max: 1.96% x̄: 1.04% x̃: 0.68% total cycles in shared programs: 532577062 -> 532577068 (<.01%) cycles in affected programs: 1093 -> 1099 (0.55%) helped: 1 HURT: 1 helped stats (abs) min: 16 max: 16 x̄: 16.00 x̃: 16 helped stats (rel) min: 7.77% max: 7.77% x̄: 7.77% x̃: 7.77% HURT stats (abs) min: 22 max: 22 x̄: 22.00 x̃: 22 HURT stats (rel) min: 2.48% max: 2.48% x̄: 2.48% x̃: 2.48% Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	46e7c340d4	nir: Transform expressions of b2f(a) and b2f(b) to a \|\| b All Gen6+ platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14277184 -> 14276964 (<.01%) instructions in affected programs: 10082 -> 9862 (-2.18%) helped: 37 HURT: 1 helped stats (abs) min: 1 max: 30 x̄: 5.97 x̃: 4 helped stats (rel) min: 0.14% max: 16.00% x̄: 5.23% x̃: 2.04% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.70% max: 0.70% x̄: 0.70% x̃: 0.70% 95% mean confidence interval for instructions value: -7.87 -3.71 95% mean confidence interval for instructions %-change: -6.98% -3.16% Instructions are helped. total cycles in shared programs: 532577990 -> 532577062 (<.01%) cycles in affected programs: 170959 -> 170031 (-0.54%) helped: 33 HURT: 9 helped stats (abs) min: 2 max: 120 x̄: 30.91 x̃: 30 helped stats (rel) min: 0.02% max: 7.65% x̄: 2.66% x̃: 1.13% HURT stats (abs) min: 2 max: 24 x̄: 10.22 x̃: 8 HURT stats (rel) min: 0.09% max: 1.79% x̄: 0.61% x̃: 0.22% 95% mean confidence interval for cycles value: -31.23 -12.96 95% mean confidence interval for cycles %-change: -2.90% -1.02% Cycles are helped. Iron Lake and GM45 had similar results. (Iron Lake shown) total instructions in shared programs: 7781539 -> 7781301 (<.01%) instructions in affected programs: 10169 -> 9931 (-2.34%) helped: 32 HURT: 0 helped stats (abs) min: 2 max: 20 x̄: 7.44 x̃: 6 helped stats (rel) min: 0.47% max: 17.02% x̄: 4.03% x̃: 1.88% 95% mean confidence interval for instructions value: -9.53 -5.34 95% mean confidence interval for instructions %-change: -5.94% -2.12% Instructions are helped. total cycles in shared programs: 177878590 -> 177878932 (<.01%) cycles in affected programs: 78706 -> 79048 (0.43%) helped: 7 HURT: 21 helped stats (abs) min: 6 max: 34 x̄: 24.57 x̃: 28 helped stats (rel) min: 0.15% max: 8.33% x̄: 4.66% x̃: 6.37% HURT stats (abs) min: 2 max: 86 x̄: 24.48 x̃: 22 HURT stats (rel) min: 0.01% max: 4.28% x̄: 1.21% x̃: 0.70% 95% mean confidence interval for cycles value: 0.30 24.13 95% mean confidence interval for cycles %-change: -1.52% 1.01% Inconclusive result (%-change mean confidence interval includes 0). v2: s/fmin/fmax/. Noticed by Thomas Helland. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	be7d3ba34a	nir: Transform -fabs(a) < 0 to a != 0 Unlike the much older -abs(a) >= 0.0 transformation, this is not precise. The behavior changes if a is NaN. All Gen platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14277216 -> 14277184 (<.01%) instructions in affected programs: 2300 -> 2268 (-1.39%) helped: 8 HURT: 0 helped stats (abs) min: 1 max: 8 x̄: 4.00 x̃: 3 helped stats (rel) min: 0.48% max: 15.15% x̄: 4.41% x̃: 1.01% 95% mean confidence interval for instructions value: -6.45 -1.55 95% mean confidence interval for instructions %-change: -9.96% 1.13% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 532577848 -> 532577990 (<.01%) cycles in affected programs: 17486 -> 17628 (0.81%) helped: 2 HURT: 5 helped stats (abs) min: 2 max: 6 x̄: 4.00 x̃: 4 helped stats (rel) min: 0.06% max: 1.81% x̄: 0.93% x̃: 0.93% HURT stats (abs) min: 6 max: 50 x̄: 30.00 x̃: 26 HURT stats (rel) min: 0.55% max: 2.17% x̄: 1.19% x̃: 1.02% 95% mean confidence interval for cycles value: -1.06 41.63 95% mean confidence interval for cycles %-change: -0.58% 1.74% Inconclusive result (value mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	d49eab2757	nir: Rearrange bcsel with two bcsel sources All Gen platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14277220 -> 14277216 (<.01%) instructions in affected programs: 422 -> 418 (-0.95%) helped: 2 HURT: 0 total cycles in shared programs: 532577908 -> 532577848 (<.01%) cycles in affected programs: 2800 -> 2740 (-2.14%) helped: 2 HURT: 0 Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	b92fded6eb	nir: Collapse more repeated bcsels on the same argument All Gen platforms had pretty similar results. (Skylake shown) total instructions in shared programs: 14277230 -> 14277220 (<.01%) instructions in affected programs: 751 -> 741 (-1.33%) helped: 4 HURT: 0 helped stats (abs) min: 2 max: 3 x̄: 2.50 x̃: 2 helped stats (rel) min: 1.23% max: 1.40% x̄: 1.32% x̃: 1.32% 95% mean confidence interval for instructions value: -3.42 -1.58 95% mean confidence interval for instructions %-change: -1.47% -1.17% Instructions are helped. total cycles in shared programs: 532577947 -> 532577908 (<.01%) cycles in affected programs: 10641 -> 10602 (-0.37%) helped: 4 HURT: 3 helped stats (abs) min: 1 max: 40 x̄: 13.75 x̃: 7 helped stats (rel) min: 0.11% max: 3.08% x̄: 1.10% x̃: 0.60% HURT stats (abs) min: 2 max: 8 x̄: 5.33 x̃: 6 HURT stats (rel) min: 0.13% max: 0.55% x̄: 0.30% x̃: 0.23% 95% mean confidence interval for cycles value: -20.69 9.55 95% mean confidence interval for cycles %-change: -1.63% 0.63% Inconclusive result (value mean confidence interval includes 0). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-04 01:12:03 -07:00
Ian Romanick	408330ed48	nir: Don't compare i2f or u2i with zero Broadwell and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14277620 -> 14277230 (<.01%) instructions in affected programs: 36905 -> 36515 (-1.06%) helped: 101 HURT: 6 helped stats (abs) min: 1 max: 6 x̄: 4.46 x̃: 6 helped stats (rel) min: 0.32% max: 7.69% x̄: 1.80% x̃: 1.51% HURT stats (abs) min: 1 max: 28 x̄: 10.00 x̃: 1 HURT stats (rel) min: 0.33% max: 1.74% x̄: 0.68% x̃: 0.47% 95% mean confidence interval for instructions value: -4.59 -2.70 95% mean confidence interval for instructions %-change: -1.90% -1.41% Instructions are helped. total cycles in shared programs: 532580716 -> 532577947 (<.01%) cycles in affected programs: 940575 -> 937806 (-0.29%) helped: 92 HURT: 12 helped stats (abs) min: 2 max: 158 x̄: 51.04 x̃: 62 helped stats (rel) min: 0.24% max: 3.99% x̄: 2.14% x̃: 2.41% HURT stats (abs) min: 10 max: 1112 x̄: 160.58 x̃: 63 HURT stats (rel) min: 0.06% max: 21.90% x̄: 4.22% x̃: 0.20% 95% mean confidence interval for cycles value: -50.66 -2.59 95% mean confidence interval for cycles %-change: -2.09% -0.73% Cycles are helped. total spills in shared programs: 8116 -> 8124 (0.10%) spills in affected programs: 200 -> 208 (4.00%) helped: 0 HURT: 2 total fills in shared programs: 11086 -> 11094 (0.07%) fills in affected programs: 436 -> 444 (1.83%) helped: 0 HURT: 2 Ivy Bridge and Haswell had similar results. (Haswell shown) total instructions in shared programs: 12979054 -> 12978067 (<.01%) instructions in affected programs: 33633 -> 32646 (-2.93%) helped: 120 HURT: 2 helped stats (abs) min: 1 max: 13 x̄: 8.53 x̃: 13 helped stats (rel) min: 0.30% max: 16.67% x̄: 4.55% x̃: 3.17% HURT stats (abs) min: 18 max: 18 x̄: 18.00 x̃: 18 HURT stats (rel) min: 1.15% max: 2.84% x̄: 2.00% x̃: 2.00% 95% mean confidence interval for instructions value: -9.19 -6.99 95% mean confidence interval for instructions %-change: -5.27% -3.62% Instructions are helped. total cycles in shared programs: 411212880 -> 411199636 (<.01%) cycles in affected programs: 696441 -> 683197 (-1.90%) helped: 107 HURT: 5 helped stats (abs) min: 2 max: 864 x̄: 124.90 x̃: 146 helped stats (rel) min: 0.03% max: 29.20% x̄: 8.58% x̃: 5.88% HURT stats (abs) min: 2 max: 50 x̄: 24.00 x̃: 22 HURT stats (rel) min: 0.01% max: 5.35% x̄: 1.29% x̃: 0.25% 95% mean confidence interval for cycles value: -136.96 -99.54 95% mean confidence interval for cycles %-change: -9.75% -6.53% Cycles are helped. total spills in shared programs: 78623 -> 78631 (0.01%) spills in affected programs: 66 -> 74 (12.12%) helped: 0 HURT: 2 total fills in shared programs: 80104 -> 80108 (<.01%) fills in affected programs: 133 -> 137 (3.01%) helped: 0 HURT: 2 No changes on Sandy Bridge, Iron Lake, or GM45. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	a3845616a2	nir: Remove f2i(i2f(x)) conversions Broadwell and Skylake had similar results. (Skylake shown) total instructions in shared programs: 14277978 -> 14277620 (<.01%) instructions in affected programs: 36957 -> 36599 (-0.97%) helped: 76 HURT: 1 helped stats (abs) min: 2 max: 90 x̄: 4.89 x̃: 4 helped stats (rel) min: 0.44% max: 5.88% x̄: 1.04% x̃: 0.87% HURT stats (abs) min: 14 max: 14 x̄: 14.00 x̃: 14 HURT stats (rel) min: 0.36% max: 0.36% x̄: 0.36% x̃: 0.36% 95% mean confidence interval for instructions value: -7.06 -2.24 95% mean confidence interval for instructions %-change: -1.28% -0.77% Instructions are helped. total cycles in shared programs: 532584581 -> 532580716 (<.01%) cycles in affected programs: 973591 -> 969726 (-0.40%) helped: 76 HURT: 1 helped stats (abs) min: 2 max: 9940 x̄: 159.80 x̃: 32 helped stats (rel) min: <.01% max: 8.70% x̄: 1.15% x̃: 1.19% HURT stats (abs) min: 8280 max: 8280 x̄: 8280.00 x̃: 8280 HURT stats (rel) min: 2.10% max: 2.10% x̄: 2.10% x̃: 2.10% 95% mean confidence interval for cycles value: -386.98 286.59 95% mean confidence interval for cycles %-change: -1.41% -0.81% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 8127 -> 8116 (-0.14%) spills in affected programs: 108 -> 97 (-10.19%) helped: 1 HURT: 0 total fills in shared programs: 11090 -> 11086 (-0.04%) fills in affected programs: 440 -> 436 (-0.91%) helped: 1 HURT: 1 Haswell total instructions in shared programs: 12979174 -> 12979054 (<.01%) instructions in affected programs: 9040 -> 8920 (-1.33%) helped: 14 HURT: 1 helped stats (abs) min: 2 max: 34 x̄: 8.79 x̃: 6 helped stats (rel) min: 0.41% max: 7.04% x̄: 2.66% x̃: 1.14% HURT stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3 HURT stats (rel) min: 0.19% max: 0.19% x̄: 0.19% x̃: 0.19% 95% mean confidence interval for instructions value: -13.58 -2.42 95% mean confidence interval for instructions %-change: -3.94% -1.01% Instructions are helped. total cycles in shared programs: 411227148 -> 411212880 (<.01%) cycles in affected programs: 630506 -> 616238 (-2.26%) helped: 15 HURT: 0 helped stats (abs) min: 2 max: 11192 x̄: 951.20 x̃: 38 helped stats (rel) min: <.01% max: 16.01% x̄: 3.92% x̃: 0.17% 95% mean confidence interval for cycles value: -2544.28 641.88 95% mean confidence interval for cycles %-change: -6.89% -0.94% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 78626 -> 78623 (<.01%) spills in affected programs: 42 -> 39 (-7.14%) helped: 1 HURT: 0 total fills in shared programs: 80111 -> 80104 (<.01%) fills in affected programs: 140 -> 133 (-5.00%) helped: 1 HURT: 1 Ivy Bridge total instructions in shared programs: 11684101 -> 11684030 (<.01%) instructions in affected programs: 3080 -> 3009 (-2.31%) helped: 4 HURT: 1 helped stats (abs) min: 5 max: 59 x̄: 18.50 x̃: 5 helped stats (rel) min: 6.47% max: 7.04% x̄: 6.87% x̃: 6.99% HURT stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3 HURT stats (rel) min: 0.15% max: 0.15% x̄: 0.15% x̃: 0.15% 95% mean confidence interval for instructions value: -45.59 17.19 95% mean confidence interval for instructions %-change: -9.38% -1.56% Inconclusive result (value mean confidence interval includes 0). total cycles in shared programs: 258407697 -> 258389653 (<.01%) cycles in affected programs: 328323 -> 310279 (-5.50%) helped: 5 HURT: 0 helped stats (abs) min: 32 max: 14908 x̄: 3608.80 x̃: 32 helped stats (rel) min: 1.26% max: 17.22% x̄: 9.30% x̃: 10.60% 95% mean confidence interval for cycles value: -11616.71 4399.11 95% mean confidence interval for cycles %-change: -16.56% -2.03% Inconclusive result (value mean confidence interval includes 0). total spills in shared programs: 4537 -> 4528 (-0.20%) spills in affected programs: 64 -> 55 (-14.06%) helped: 1 HURT: 0 total fills in shared programs: 4823 -> 4815 (-0.17%) fills in affected programs: 189 -> 181 (-4.23%) helped: 1 HURT: 1 Sandy Bridge total instructions in shared programs: 10488464 -> 10488449 (<.01%) instructions in affected programs: 272 -> 257 (-5.51%) helped: 3 HURT: 0 helped stats (abs) min: 5 max: 5 x̄: 5.00 x̃: 5 helped stats (rel) min: 5.49% max: 5.56% x̄: 5.51% x̃: 5.49% total cycles in shared programs: 150263359 -> 150263263 (<.01%) cycles in affected programs: 7978 -> 7882 (-1.20%) helped: 3 HURT: 0 helped stats (abs) min: 32 max: 32 x̄: 32.00 x̃: 32 helped stats (rel) min: 1.15% max: 1.23% x̄: 1.20% x̃: 1.23% No changes on Iron Lake or GM45. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Ian Romanick	ea6c276436	nir: Mark the 0.0 < abs(a) transformation as imprecise Unlike the much older -abs(a) >= 0.0 transformation, this is not precise. The behavior changes if the source is NaN. No shader-db changes on any platform. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Thomas Helland <thomashelland90@gmail.com>	2018-08-04 01:12:03 -07:00
Marek Olšák	4bad50ded9	radeonsi: cosmetic changes	2018-08-04 03:10:30 -04:00
Marek Olšák	6508b93d78	st/mesa: expose & set limits for AMD_framebuffer_multisample_advanced Reviewed-by: Brian Paul <brianp@vmware.com>	2018-08-04 02:47:58 -04:00
Marek Olšák	7f587b57f7	st/mesa: add renderbuffer support for AMD_framebuffer_multisample_advanced Reviewed-by: Brian Paul <brianp@vmware.com>	2018-08-04 02:46:55 -04:00
Marek Olšák	8e3d0019e1	st/mesa: pass storage_sample_count parameter into st_choose_format Reviewed-by: Brian Paul <brianp@vmware.com>	2018-08-04 02:46:55 -04:00
Marek Olšák	459f05c7ec	mesa: add functional FBO changes for AMD_framebuffer_multisample_advanced - relax FBO completeness rules - validate sample counts Reviewed-by: Brian Paul <brianp@vmware.com>	2018-08-04 02:46:55 -04:00
Marek Olšák	328c1c8d99	mesa: add gl_renderbuffer::NumStorageSamples Reviewed-by: Brian Paul <brianp@vmware.com>	2018-08-04 02:46:55 -04:00
Marek Olšák	a96e946d25	mesa: implement glGet for AMD_framebuffer_multisample_advanced Reviewed-by: Brian Paul <brianp@vmware.com>	2018-08-04 02:46:55 -04:00
Marek Olšák	3d6900d76e	glapi: define AMD_framebuffer_multisample_advanced and add its functions Reviewed-by: Brian Paul <brianp@vmware.com>	2018-08-04 02:46:55 -04:00
Marek Olšák	2d115056d3	mesa: add storageSamples parameter to renderbuffer functions It's just passed to other functions but otherwise unused. It will be used in following commits. Reviewed-by: Brian Paul <brianp@vmware.com>	2018-08-04 02:46:55 -04:00
Marek Olšák	f7d42ee7d3	include: update GL & GLES headers (v2) v2: use correct files Acked-by: Ian Romanick <ian.d.romanick@intel.com>	2018-08-04 02:43:05 -04:00
Marek Olšák	fd1121e839	amd: remove support for LLVM 5.0 Users are encouraged to switch to LLVM 6.0 released in March 2018. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-03 18:36:11 -04:00
Marek Olšák	461a864316	winsys/amdgpu: pass the BO list via the CS ioctl on DRM >= 3.27.0	2018-08-03 18:35:19 -04:00
Marek Olšák	0f79b2015b	gallium/u_vbuf: handle indirect multidraws correctly and efficiently (v3) v2: need to do MAX{start+count} instead of MAX{count} added piglit tests v3: use malloc Cc: 18.2 <mesa-stable@lists.freedesktop.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2018-08-03 18:30:46 -04:00
Mauro Rossi	1c7a2433b2	android: radv: build vulkan.radv conditionally to radeonsi A problem was reported with arm,arm64 targets build due to missing libLLVM shared library dependency with AOSP; to avoid this issue vulkan.radv is built conditionally only when radeonsi is in BOARD_GPU_DRIVERS Fixes: `0ca153f869` ("android: radv: enable build of vulkan.radv HAL module") Reported-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-08-03 20:09:16 +02:00
Roland Scheidegger	c72f91deba	util: return 0 for NaNs in float_to_ubyte d3d10 requires NaNs to get converted to 0 for float->unorm conversions (and float->int etc.). GL spec probably doesn't care in general, but it would make sense to have reasonable behavior in any case imho - the old code was converting negative NaNs to 0, and positive NaNs to 255. (Note that using float comparison isn't actually all that much more effort in any case, at least with sse2 it's just float comparison (ucommiss) instead of int one - I converted the second comparison to float too simply because it saves the probably somewhat expensive transfer of the float from simd to int domain (with sse2 via stack), so the generated code actually has 2 less instructions, although float comparisons are more expensive than int ones.) Reviewed-by: Brian Paul <brianp@vmware.com>	2018-08-03 17:07:38 +02:00
Jason Ekstrand	1d900e55fd	anv/pipeline: Disable FS dispatch for pointless fragment shaders Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2018-08-03 05:52:23 -07:00
Timothy Arceri	d5175d21c7	nir: add fall through comment to nir_gather_info This stops Coverity reporting a defect and helps make the code less error-prone. Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2018-08-03 09:30:57 +10:00
Dan Willemsen	12e3334f1e	CleanSpec.mk: Remove HOST_OUT_release This is a forward port of a patch from the AOSP/master tree: `bd633f11de`%5E%21/ Which replaces HOST_OUT_release with HOST_OUT As per Dan's explanation, the current code was incorrect to use $(HOST_OUT_release) as $(HOST_OUT) will be set properly for whether the current build that's being cleaned during incrementals is using host debug or release builds. Additionally Dan noted it was incredibly uncommon to use a debug host build, as there was never a shortcut and one had to set an environment variable manually. Thus it was rarely if ever tested. Change-Id: I7972c0a50fa3520dcfa962d6dd7e602bfe22368d Cc: Rob Herring <rob.herring@linaro.org> Cc: Alistair Strachan <astrachan@google.com> Cc: Marissa Wall <marissaw@google.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Emil Velikov <emil.l.velikov@gmail.com> Cc: Rob Clark <robdclark@gmail.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Rob Herring <robh@kernel.org>	2018-08-02 15:42:40 -06:00
Sumit Semwal	d0b63b6583	Android.common.mk: define HAVE_TIMESPEC_GET This is a forward port of a patch from the AOSP/master tree: `bd30b663f5`%5E%21/ Since https://android-review.googlesource.com/c/718518 added timespec_get() to bionic, mesa3d doesn't build due to redefinition of timespec_get(). Avoid redefinition by defining HAVE_TIMESPEC_GET flag. Test: build and boot tested db820c to UI. Change-Id: I3dcc8034b48785e45cd3fa50e4d9cf2c684694a0 Cc: Rob Herring <rob.herring@linaro.org> Cc: Alistair Strachan <astrachan@google.com> Cc: Marissa Wall <marissaw@google.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Emil Velikov <emil.l.velikov@gmail.com> Cc: Rob Clark <robdclark@gmail.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Rob Herring <robh@kernel.org>	2018-08-02 15:42:27 -06:00
Dan Willemsen	dc030d1ec9	util: Android.mk: Convert implicit rules to static pattern rules This is a partial cherry-pick from AOSP's mesa3d tree: `a88dcf769e`%5E%21/ "We're deprecating make implicit rules, preferring static pattern rules, or just regular rules." Without this patch, the freedesktop/master branch won't build in the AOSP environment, and this patch corrects that, as tested on the Dragonboard 820c. The i965 portion of the patch this is based on collided badly, and I'm not sure how to best forward port it. However, so far we don't see build issues without that portion. Comments or feedback would be appreciated! Change-Id: Id6dfd0d018cbd665fa19d80c14abd5f75fa10b8a Cc: Rob Herring <rob.herring@linaro.org> Cc: Alistair Strachan <astrachan@google.com> Cc: Marissa Wall <marissaw@google.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Emil Velikov <emil.l.velikov@gmail.com> Cc: Rob Clark <robdclark@gmail.com> Reviewed-by: Emil Velikov <emil.velikov@collabora.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Rob Herring <robh@kernel.org>	2018-08-02 15:42:23 -06:00
Darren Powell	726a48c94f	radeonsi: add new R600_DEBUG test "testclearbufperf" Signed-off-by: Darren Powell <darren.powell@amd.com> Signed-off-by: Marek Olšák <marek.olsak@amd.com>	2018-08-02 16:09:22 -04:00
Brian Paul	977638006b	mesa: add switch case for GL 2.0 in _mesa_compute_version() Previously, I added a switch case for GL 2.1 (ed7a0770b881791dd697f3). I don't know of any driver which only supports GL 2.0, but adding this switch case avoids a failure if the app queries GL_SHADING_LANGUAGE_VERSION. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2018-08-02 13:20:00 -06:00
Andres Gomez	2d4d139877	intel/tools: add error2aub creation into autotools Tarball distribution is done through "make distcheck". We include the meson targets also into autotools so they won't fail when building from the tarball. Fixes: `6a60beba40` ("intel/tools: Add an error state to aub translator") Cc: Jason Ekstrand <jason.ekstrand@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Dylan Baker <dylan.c.baker@intel.com> Signed-off-by: Andres Gomez <agomez@igalia.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-02 21:15:57 +03:00
Jason Ekstrand	7ef6cd0ee8	anv/pipeline: Do cross-stage linking optimizations This appears to help the Aztec Ruins benchmark by about 2% on my Kaby Lake gt2 laptop. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-02 10:29:20 -07:00
Jason Ekstrand	a5bffa061d	anv/pipeline: Pull most of the anv_pipeline_compile_* into common code This leaves us with a series of little anv_pipeline_compile_* functions which each take a compiler object, a mem_ctx, the stage to compile, and the previous stage for VUE linking purposes. Some of them do interesting things but most are little more than wrappers around brw_compile_*. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-02 10:29:20 -07:00
Jason Ekstrand	5351339554	anv/pipeline: Add a separate "link" stage This breaks compilation up a bit into "link" and "compile". In the "link" stage, new anv_pipeline_link_* helpers are called which are responsible for setting up the binding table and doing anything needed to properly link with the next stage in the pipeline if one exists. They are called in reverse order starting with the fragment shader so you can assume linking in later stages is already done. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-02 10:29:20 -07:00
Jason Ekstrand	5b196f39bd	anv/pipeline: Compile to NIR in compile_graphics This pulls the SPIR-V to NIR step out into common code. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-02 10:29:20 -07:00
Jason Ekstrand	946fcd02a9	anv/pipeline: Recompile all shaders if any are missing from the cache Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-02 10:29:20 -07:00
Jason Ekstrand	f76d6d8a63	anv/pipeline: Drop anv_pipeline_add_compiled_stage We can set active_stages much more directly and then it's just candy around setting pipeline->stages[stage]. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-02 10:29:20 -07:00
Jason Ekstrand	703a24932a	anv/pipeline: Pull shader compilation out into a helper. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-02 10:29:20 -07:00
Jason Ekstrand	f3c59ca947	anv/pipeline: Call anv_pipeline_compile_* in a loop Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-02 10:29:20 -07:00
Jason Ekstrand	bdc3565c8c	anv/pipeline: Hash the entire pipeline in one go Instead of hashing each stage separately (and TES and TCS together), we hash the entire pipeline. This means we'll get fewer cache hits if they, for instance, re-use the same VS over and over again but it also means we can now safely do cross-stage optimizations. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-02 10:29:20 -07:00
Jason Ekstrand	4a8236ae17	anv/pipeline: Populate keys up-front Instead of having each anv_pipeline_compile_* function populate the shader key, make it part of the anv_pipeline_stage struct and fill it out up-front. Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-02 10:29:20 -07:00
Jason Ekstrand	76503b319a	anv/pipline: Add a helper struct for per-stage info Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>	2018-08-02 10:29:20 -07:00
Jon Turney	a48c0659e1	meson: use correct keyword to fix a meson warning With a sufficently recent meson, the following warning is produced: WARNING: Passed invalid keyword argument "extra_args". WARNING: This will become a hard error in the future. It seems that compiler.links(args:) is meant here. Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk> Reviewed-and-Tested-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Dylan Baker <dylan@pnwbakers.com>	2018-08-02 18:12:49 +01:00
Andres Gomez	3013e22717	docs: add 18.3.0-devel release notes template Signed-off-by: Andres Gomez <agomez@igalia.com>	2018-08-02 18:15:33 +03:00
Andres Gomez	873767cf42	mesa: bump version to 18.3.0-devel Signed-off-by: Andres Gomez <agomez@igalia.com>	2018-08-02 18:00:15 +03:00
Eric Engestrom	44265cc65e	egl/main: fix indentation Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Frank Binns <frank.binns@imgtec.com>	2018-08-02 12:54:05 +01:00
Eric Engestrom	dd007d1c2a	loader: fix indentation Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Frank Binns <frank.binns@imgtec.com>	2018-08-02 12:53:58 +01:00
Vlad Golovkin	9d3a2394e4	swr: Remove unnecessary memset call Zeroing memory after calloc is not necessary. This also allows to avoid possible crash when allocation fails, because memset is called before checking screen for NULL. Fixes: `a29d63ecf7` "swr: refactor swr_create_screen to allow for proper cleanup on error" Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>	2018-08-02 11:13:40 +01:00

4221 changed files with 584560 additions and 177391 deletions

									
										3

.editorconfig
									
												View File
												
				@@ -11,6 +11,7 @@ tab_width = 8

				[*.{c,h,cpp,hpp,cc,hh}]

				indent_style = space

				indent_size = 3

				max_line_length = 78

				[{Makefile*,*.mk}]

				indent_style = tab

				@@ -34,6 +35,6 @@ indent_size = 2

				[*.patch]

				trim_trailing_whitespace = false

				[meson.build,meson_options.txt]

				[{meson.build,meson_options.txt}]

				indent_style = space

				indent_size = 2

52

.gitignore vendored

View File

@@ -1,54 +1,4 @@
 *.a
 *.dll
 *.exe
 *.ilk
 *.la
 *.lo
 *.log
 *.o
 *.obj
 *.orig
 *.os
 *.pc
 *.pdb
 *.pyc
 *.pyo
 *.rej
 *.so
 *.so.*
 *.sw[a-z]
 *.tar
 *.tar.bz2
 *.tar.gz
 *.tar.xz
 *.trs
 *.zip
 *~
 depend
 depend.bak
 bin/ltmain.sh
 lib
 lib64
 configure
 configure.lineno
 autom4te.cache
 aclocal.m4
 config.log
 config.status
 cscope*
 tags
 .scon*
 config.py
 *.out
 build
 libtool
 manifest.txt
 .dir-locals.el
 .deps/
 .dirstamp
 .libs/
 Makefile
 Makefile.in
 .install-mesa-links
 .install-gallium-links
 /src/git_sha1.h
 TAGS

									
										382

.gitlab-ci.yml
									
										Normal file
									
												View File
												
				@@ -0,0 +1,382 @@

				# This is the tag of the docker image used for the build jobs. If the

				# image doesn't exist yet, the containers-build stage generates it.

				#

				# In order to generate a new image, one should generally change the tag.

				# While removing the image from the registry would also work, that's not

				# recommended except for ephemeral images during development: Replacing

				# an image after a significant amount of time might pull in newer

				# versions of gcc/clang or other packages, which might break the build

				# with older commits using the same tag.

				#

				# After merging a change resulting in generating a new image to the

				# main repository, it's recommended to remove the image from the source

				# repository's container registry, so that the image from the main

				# repository's registry will be used there as well.

				variables:

				  UPSTREAM_REPO: mesa/mesa

				  DEBIAN_TAG: "2019-08-09"

				  DEBIAN_VERSION: stretch-slim

				  DEBIAN_IMAGE: "$CI_REGISTRY_IMAGE/debian/$DEBIAN_VERSION:$DEBIAN_TAG"

				include:

				  - project: 'wayland/ci-templates'

				    ref: c73dae8b84697ef18e2dbbf4fed7386d9652b0cd

				    file: '/templates/debian.yml'

				stages:

				  - containers-build

				  - build+test

				  - test

				# When to automatically run the CI

				.ci-run-policy: &ci-run-policy

				  only:

				    - branches@mesa/mesa

				    - merge_requests

				    - /^ci([-/].*)?$/

				  retry:

				    max: 2

				    when:

				      - runner_system_failure

				.ci-deqp-artifacts: &ci-deqp-artifacts

				  artifacts:

				    when: always

				    untracked: false

				    paths:

				      # Watch out!  Artifacts are relative to the build dir.

				      # https://gitlab.com/gitlab-org/gitlab-ce/commit/8788fb925706cad594adf6917a6c5f6587dd1521

				      - artifacts

				# CONTAINERS

				debian:

				  extends: .debian@container-ifnot-exists

				  stage: containers-build

				  <<: *ci-run-policy

				  variables:

				    GIT_STRATEGY: none # no need to pull the whole tree for rebuilding the image

				    DEBIAN_EXEC: 'bash .gitlab-ci/debian-install.sh'

				# BUILD

				.build:

				  <<: *ci-run-policy

				  image: $DEBIAN_IMAGE

				  stage: build+test

				  cache:

				    paths:

				      - ccache

				  artifacts:

				    when: always

				    paths:

				      - _build/meson-logs/*.txt

				      # scons:

				      - build/*/config.log

				      - shader-db

				  variables:

				    CCACHE_COMPILERCHECK: "content"

				  # Use ccache transparently, and print stats before/after

				  before_script:

				    - export PATH="/usr/lib/ccache:$PATH"

				    - export CCACHE_BASEDIR="$PWD"

				    - export CCACHE_DIR="$PWD/ccache"

				    - ccache --zero-stats || true

				    - ccache --show-stats || true

				  after_script:

				    # In case the install dir is being saved as artifacts, tar it up

				    # so that symlinks and hardlinks aren't each packed separately in

				    # the zip file.

				    - if [ -d install ]; then

				        tar -cf artifacts/install.tar install;

				      fi

				    - export CCACHE_DIR="$PWD/ccache"

				    - ccache --show-stats

				.meson-build:

				  extends: .build

				  script:

				    - .gitlab-ci/meson-build.sh

				.scons-build:

				  extends: .build

				  variables:

				    SCONSFLAGS: "-j4"

				  script:

				    - if test -n "$LLVM_VERSION"; then

				        export LLVM_CONFIG="llvm-config-${LLVM_VERSION}";

				      fi

				    - scons $SCONS_TARGET

				    - eval $SCONS_CHECK_COMMAND

				# NOTE: Building SWR is 2x (yes two) times slower than all the other

				# gallium drivers combined.

				# Start this early so that it doesn't limit the total run time.

				#

				# We also stick the glvnd build here, since we want non-glvnd in

				# meson-main for actual driver CI.

				meson-swr-glvnd:

				  extends: .meson-build

				  variables:

				    UNWIND: "true"

				    DRI_LOADERS: >

				      -D glvnd=true

				      -D egl=true

				    GALLIUM_ST: >

				      -D dri3=true

				      -D gallium-vdpau=false

				      -D gallium-xvmc=false

				      -D gallium-omx=disabled

				      -D gallium-va=false

				      -D gallium-xa=false

				      -D gallium-nine=false

				      -D gallium-opencl=disabled

				    GALLIUM_DRIVERS: "swr,iris"

				    LLVM_VERSION: "6.0"

				meson-clang:

				  extends: .meson-build

				  variables:

				    UNWIND: "true"

				    DRI_DRIVERS: "auto"

				    GALLIUM_DRIVERS: "auto"

				    VULKAN_DRIVERS: intel,amd,freedreno

				    CC: "ccache clang-8"

				    CXX: "ccache clang++-8"

				  before_script:

				    - export CCACHE_BASEDIR="$PWD" CCACHE_DIR="$PWD/ccache"

				    - ccache --zero-stats --show-stats || true

				     # clang++ breaks if it picks up the GCC 8 directory without libstdc++.so

				    - apt-get remove -y libgcc-8-dev

				scons-swr:

				  extends: .scons-build

				  variables:

				    SCONS_TARGET: "swr=1"

				    SCONS_CHECK_COMMAND: "true"

				    LLVM_VERSION: "6.0"

				scons-win64:

				  extends: .scons-build

				  variables:

				    SCONS_TARGET: platform=windows machine=x86_64

				    SCONS_CHECK_COMMAND: "true"

				meson-main:

				  extends: .meson-build

				  variables:

				    UNWIND: "true"

				    DRI_LOADERS: >

				      -D glx=dri

				      -D gbm=true

				      -D egl=true

				      -D platforms=x11,wayland,drm,surfaceless

				    DRI_DRIVERS: "i915,i965,r100,r200,nouveau"

				    GALLIUM_ST: >

				      -D dri3=true

				      -D gallium-extra-hud=true

				      -D gallium-vdpau=true

				      -D gallium-xvmc=true

				      -D gallium-omx=bellagio

				      -D gallium-va=true

				      -D gallium-xa=true

				      -D gallium-nine=true

				      -D gallium-opencl=disabled

				    GALLIUM_DRIVERS: "iris,nouveau,kmsro,r300,r600,freedreno,swrast,svga,v3d,vc4,virgl,etnaviv,panfrost,lima"

				    LLVM_VERSION: "7"

				    EXTRA_OPTION: >

				      -D osmesa=gallium

				      -D tools=all

				    MESON_SHADERDB: "true"

				    BUILDTYPE: "debugoptimized"

				  <<: *ci-deqp-artifacts

				meson-clover:

				  extends: .meson-build

				  variables:

				    UNWIND: "true"

				    DRI_LOADERS: >

				      -D glx=disabled

				      -D egl=false

				      -D gbm=false

				    GALLIUM_ST: >

				      -D dri3=false

				      -D gallium-vdpau=false

				      -D gallium-xvmc=false

				      -D gallium-omx=disabled

				      -D gallium-va=false

				      -D gallium-xa=false

				      -D gallium-nine=false

				      -D gallium-opencl=icd

				  script:

				    - export GALLIUM_DRIVERS="r600,radeonsi"

				    - .gitlab-ci/meson-build.sh

				    - LLVM_VERSION=7 .gitlab-ci/meson-build.sh

				    - export GALLIUM_DRIVERS="i915,r600"

				    - LLVM_VERSION=3.9 .gitlab-ci/meson-build.sh

				    - LLVM_VERSION=4.0 .gitlab-ci/meson-build.sh

				    - LLVM_VERSION=5.0 .gitlab-ci/meson-build.sh

				    - LLVM_VERSION=6.0 .gitlab-ci/meson-build.sh

				meson-vulkan:

				  extends: .meson-build

				  variables:

				    UNWIND: "false"

				    DRI_LOADERS: >

				      -D glx=disabled

				      -D gbm=false

				      -D egl=false

				      -D platforms=x11,wayland,drm

				      -D osmesa=none

				    GALLIUM_ST: >

				      -D dri3=true

				      -D gallium-vdpau=false

				      -D gallium-xvmc=false

				      -D gallium-omx=disabled

				      -D gallium-va=false

				      -D gallium-xa=false

				      -D gallium-nine=false

				      -D gallium-opencl=disabled

				    VULKAN_DRIVERS: intel,amd,freedreno

				    LLVM_VERSION: "7"

				    EXTRA_OPTION: >

				      -D vulkan-overlay-layer=true

				.meson-cross:

				  extends: .meson-build

				  variables:

				    UNWIND: "false"

				    DRI_LOADERS: >

				      -D glx=disabled

				      -D gbm=false

				      -D egl=false

				      -D platforms=surfaceless

				      -D osmesa=none

				    GALLIUM_ST: >

				      -D dri3=false

				      -D gallium-vdpau=false

				      -D gallium-xvmc=false

				      -D gallium-omx=disabled

				      -D gallium-va=false

				      -D gallium-xa=false

				      -D gallium-nine=false

				      -D llvm=false

				  <<: *ci-deqp-artifacts

				  script:

				    - .gitlab-ci/meson-build.sh

				meson-armhf:

				  extends: .meson-cross

				  variables:

				    CROSS: armhf

				    VULKAN_DRIVERS: freedreno

				    GALLIUM_DRIVERS: "etnaviv,freedreno,kmsro,lima,nouveau,panfrost,tegra,v3d,vc4"

				    # Disable the tests since we're cross compiling.

				    EXTRA_OPTION: >

				      -D build-tests=false

				      -D I-love-half-baked-turnips=true

				      -D vulkan-overlay-layer=true

				meson-arm64:

				  extends: .meson-cross

				  variables:

				    CROSS: arm64

				    VULKAN_DRIVERS: freedreno

				    GALLIUM_DRIVERS: "etnaviv,freedreno,kmsro,lima,nouveau,panfrost,tegra,v3d,vc4"

				    # Disable the tests since we're cross compiling.

				    EXTRA_OPTION: >

				      -D build-tests=false

				      -D I-love-half-baked-turnips=true

				      -D vulkan-overlay-layer=true

				# While the main point of this build is testing the i386 cross build,

				# we also use this one to test some other options that are exclusive

				# with meson-main's choices (classic swrast and osmesa)

				meson-i386:

				  extends: .meson-cross

				  variables:

				    CROSS: i386

				    VULKAN_DRIVERS: intel

				    DRI_DRIVERS: "swrast"

				    GALLIUM_DRIVERS: "iris"

				    # Disable i386 tests, because u_format_tests gets precision

				    # failures in dxtn unpacking

				    EXTRA_OPTION: >

				      -D build-tests=false

				      -D vulkan-overlay-layer=true

				      -D llvm=false

				      -D osmesa=classic

				scons-nollvm:

				  extends: .scons-build

				  variables:

				    SCONS_TARGET: "llvm=0"

				    SCONS_CHECK_COMMAND: "scons llvm=0 check"

				scons-llvm:

				  extends: .scons-build

				  variables:

				    SCONS_TARGET: "llvm=1"

				    SCONS_CHECK_COMMAND: "scons llvm=1 check"

				    LLVM_VERSION: "3.4"

				    # LLVM 3.4 packages were built with an old libstdc++ ABI

				    CXX: "g++ -D_GLIBCXX_USE_CXX11_ABI=0"

				.deqp-test:

				  <<: *ci-run-policy

				  stage: test

				  image: $DEBIAN_IMAGE

				  variables:

				    GIT_STRATEGY: none # testing doesn't build anything from source

				    DEQP_SKIPS: deqp-default-skips.txt

				  script:

				    # Note: Build dir (and thus install) may be dirty due to GIT_STRATEGY

				    - rm -rf install

				    - tar -xf artifacts/install.tar

				    - ./artifacts/deqp-runner.sh

				  artifacts:

				    when: on_failure

				    name: "$CI_JOB_NAME-$CI_COMMIT_REF_NAME"

				    paths:

				      - results/

				test-llvmpipe-gles2:

				  parallel: 4

				  variables:

				    DEQP_VER: gles2

				    DEQP_EXPECTED_FAILS: deqp-llvmpipe-fails.txt

				    LIBGL_ALWAYS_SOFTWARE: "true"

				    DEQP_RENDERER_MATCH: "llvmpipe"

				  extends: .deqp-test

				  dependencies:

				    - meson-main

				test-softpipe-gles2:

				  parallel: 4

				  variables:

				    DEQP_VER: gles2

				    DEQP_EXPECTED_FAILS: deqp-softpipe-fails.txt

				    LIBGL_ALWAYS_SOFTWARE: "true"

				    DEQP_RENDERER_MATCH: "softpipe"

				    GALLIUM_DRIVER: "softpipe"

				  extends: .deqp-test

				  dependencies:

				    - meson-main

				# The GLES2 CTS run takes about 8 minutes of CPU time, while GLES3 is

				# 25 minutes.  Until we can get its runtime down, just do a partial

				# (every 10 tests) run.

				test-softpipe-gles3-limited:

				  variables:

				    DEQP_VER: gles3

				    DEQP_EXPECTED_FAILS: deqp-softpipe-fails.txt

				    LIBGL_ALWAYS_SOFTWARE: "true"

				    DEQP_RENDERER_MATCH: "softpipe"

				    GALLIUM_DRIVER: "softpipe"

				    CI_NODE_INDEX: 1

				    CI_NODE_TOTAL: 10

				  extends: .deqp-test

				  dependencies:

				    - meson-main

									
										285

.gitlab-ci/debian-install.sh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,285 @@

				#!/bin/bash

				set -e

				set -o xtrace

				export DEBIAN_FRONTEND=noninteractive

				CROSS_ARCHITECTURES="armhf arm64 i386"

				for arch in $CROSS_ARCHITECTURES; do

				    dpkg --add-architecture $arch

				done

				apt-get install -y \

				      apt-transport-https \

				      ca-certificates \

				      curl \

				      wget \

				      unzip \

				      gnupg

				curl -fsSL https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add -

				echo "deb [trusted=yes] https://apt.llvm.org/stretch/ llvm-toolchain-stretch-7 main" >/etc/apt/sources.list.d/llvm7.list

				echo "deb [trusted=yes] https://apt.llvm.org/stretch/ llvm-toolchain-stretch-8 main" >/etc/apt/sources.list.d/llvm8.list

				sed -i -e 's/http:\/\/deb/https:\/\/deb/g' /etc/apt/sources.list

				echo 'deb https://deb.debian.org/debian stretch-backports main' >/etc/apt/sources.list.d/backports.list

				echo 'deb https://deb.debian.org/debian jessie main' >/etc/apt/sources.list.d/jessie.list

				apt-get update

				apt-get install -y -t stretch-backports \

				      llvm-3.4-dev \

				      llvm-3.9-dev \

				      libclang-3.9-dev \

				      llvm-4.0-dev \

				      libclang-4.0-dev \

				      llvm-5.0-dev \

				      libclang-5.0-dev \

				      llvm-6.0-dev \

				      libclang-6.0-dev \

				      llvm-7-dev \

				      libclang-7-dev \

				      llvm-8-dev \

				      libclang-8-dev \

				      g++ \

				      clang-8

				# Install remaining packages from Debian buster to get newer versions

				echo "deb https://deb.debian.org/debian/ buster main" >/etc/apt/sources.list.d/buster.list

				echo "deb https://deb.debian.org/debian/ buster-updates main" >/etc/apt/sources.list.d/buster-updates.list

				apt-get update

				apt-get install -y \

				      git \

				      bzip2 \

				      zlib1g-dev \

				      pkg-config \

				      libxrender-dev \

				      libxdamage-dev \

				      libxxf86vm-dev \

				      gcc \

				      git \

				      libepoxy-dev \

				      libegl1-mesa-dev \

				      libgbm-dev \

				      libclc-dev \

				      libxvmc-dev \

				      libomxil-bellagio-dev \

				      xz-utils \

				      libexpat1-dev \

				      libx11-xcb-dev \

				      libelf-dev \

				      libunwind-dev \

				      libglvnd-dev \

				      libgtk-3-dev \

				      libpng-dev \

				      libgbm-dev \

				      libgles2-mesa-dev \

				      python-mako \

				      python3-mako \

				      bison \

				      flex \

				      gettext \

				      cmake \

				      meson \

				      scons

				# Cross-build Mesa deps

				for arch in $CROSS_ARCHITECTURES; do

				    apt-get install -y \

				            libdrm-dev:${arch} \

				            libexpat1-dev:${arch} \

				            libelf-dev:${arch}

				done

				apt-get install -y \

				        dpkg-dev \

				        gcc-aarch64-linux-gnu \

				        g++-aarch64-linux-gnu \

				        gcc-arm-linux-gnueabihf \

				        g++-arm-linux-gnueabihf \

				        gcc-i686-linux-gnu \

				        g++-i686-linux-gnu

				# for 64bit windows cross-builds

				apt-get install -y mingw-w64

				# for the vulkan overlay layer

				wget https://github.com/KhronosGroup/glslang/releases/download/master-tot/glslang-master-linux-Release.zip

				unzip glslang-master-linux-Release.zip bin/glslangValidator

				install -m755 bin/glslangValidator /usr/local/bin/

				rm bin/glslangValidator glslang-master-linux-Release.zip

				# dependencies where we want a specific version

				export              XORG_RELEASES=https://xorg.freedesktop.org/releases/individual

				export               XCB_RELEASES=https://xcb.freedesktop.org/dist

				export           WAYLAND_RELEASES=https://wayland.freedesktop.org/releases

				export         XORGMACROS_VERSION=util-macros-1.19.0

				export            GLPROTO_VERSION=glproto-1.4.17

				export          DRI2PROTO_VERSION=dri2proto-2.8

				export       LIBPCIACCESS_VERSION=libpciaccess-0.13.4

				export             LIBDRM_VERSION=libdrm-2.4.99

				export           XCBPROTO_VERSION=xcb-proto-1.13

				export         RANDRPROTO_VERSION=randrproto-1.5.0

				export          LIBXRANDR_VERSION=libXrandr-1.5.0

				export             LIBXCB_VERSION=libxcb-1.13

				export       LIBXSHMFENCE_VERSION=libxshmfence-1.3

				export           LIBVDPAU_VERSION=libvdpau-1.1

				export              LIBVA_VERSION=libva-1.7.0

				export         LIBWAYLAND_VERSION=wayland-1.15.0

				export  WAYLAND_PROTOCOLS_VERSION=wayland-protocols-1.12

				wget $XORG_RELEASES/util/$XORGMACROS_VERSION.tar.bz2

				tar -xvf $XORGMACROS_VERSION.tar.bz2 && rm $XORGMACROS_VERSION.tar.bz2

				cd $XORGMACROS_VERSION; ./configure; make install; cd ..

				rm -rf $XORGMACROS_VERSION

				wget $XORG_RELEASES/proto/$GLPROTO_VERSION.tar.bz2

				tar -xvf $GLPROTO_VERSION.tar.bz2 && rm $GLPROTO_VERSION.tar.bz2

				cd $GLPROTO_VERSION; ./configure; make install; cd ..

				rm -rf $GLPROTO_VERSION

				wget $XORG_RELEASES/proto/$DRI2PROTO_VERSION.tar.bz2

				tar -xvf $DRI2PROTO_VERSION.tar.bz2 && rm $DRI2PROTO_VERSION.tar.bz2

				cd $DRI2PROTO_VERSION; ./configure; make install; cd ..

				rm -rf $DRI2PROTO_VERSION

				wget $XCB_RELEASES/$XCBPROTO_VERSION.tar.bz2

				tar -xvf $XCBPROTO_VERSION.tar.bz2 && rm $XCBPROTO_VERSION.tar.bz2

				cd $XCBPROTO_VERSION; ./configure; make install; cd ..

				rm -rf $XCBPROTO_VERSION

				wget $XCB_RELEASES/$LIBXCB_VERSION.tar.bz2

				tar -xvf $LIBXCB_VERSION.tar.bz2 && rm $LIBXCB_VERSION.tar.bz2

				cd $LIBXCB_VERSION; ./configure; make install; cd ..

				rm -rf $LIBXCB_VERSION

				wget $XORG_RELEASES/lib/$LIBPCIACCESS_VERSION.tar.bz2

				tar -xvf $LIBPCIACCESS_VERSION.tar.bz2 && rm $LIBPCIACCESS_VERSION.tar.bz2

				cd $LIBPCIACCESS_VERSION; ./configure; make install; cd ..

				rm -rf $LIBPCIACCESS_VERSION

				wget https://dri.freedesktop.org/libdrm/$LIBDRM_VERSION.tar.bz2

				tar -xvf $LIBDRM_VERSION.tar.bz2 && rm $LIBDRM_VERSION.tar.bz2

				cd $LIBDRM_VERSION; ./configure --enable-vc4 --enable-freedreno --enable-etnaviv-experimental-api; make install; cd ..

				rm -rf $LIBDRM_VERSION

				wget $XORG_RELEASES/proto/$RANDRPROTO_VERSION.tar.bz2

				tar -xvf $RANDRPROTO_VERSION.tar.bz2 && rm $RANDRPROTO_VERSION.tar.bz2

				cd $RANDRPROTO_VERSION; ./configure; make install; cd ..

				rm -rf $RANDRPROTO_VERSION

				wget $XORG_RELEASES/lib/$LIBXRANDR_VERSION.tar.bz2

				tar -xvf $LIBXRANDR_VERSION.tar.bz2 && rm $LIBXRANDR_VERSION.tar.bz2

				cd $LIBXRANDR_VERSION; ./configure; make install; cd ..

				rm -rf $LIBXRANDR_VERSION

				wget $XORG_RELEASES/lib/$LIBXSHMFENCE_VERSION.tar.bz2

				tar -xvf $LIBXSHMFENCE_VERSION.tar.bz2 && rm $LIBXSHMFENCE_VERSION.tar.bz2

				cd $LIBXSHMFENCE_VERSION; ./configure; make install; cd ..

				rm -rf $LIBXSHMFENCE_VERSION

				wget https://people.freedesktop.org/~aplattner/vdpau/$LIBVDPAU_VERSION.tar.bz2

				tar -xvf $LIBVDPAU_VERSION.tar.bz2 && rm $LIBVDPAU_VERSION.tar.bz2

				cd $LIBVDPAU_VERSION; ./configure; make install; cd ..

				rm -rf $LIBVDPAU_VERSION

				wget https://www.freedesktop.org/software/vaapi/releases/libva/$LIBVA_VERSION.tar.bz2

				tar -xvf $LIBVA_VERSION.tar.bz2 && rm $LIBVA_VERSION.tar.bz2

				cd $LIBVA_VERSION; ./configure --disable-wayland --disable-dummy-driver; make install; cd ..

				rm -rf $LIBVA_VERSION

				wget $WAYLAND_RELEASES/$LIBWAYLAND_VERSION.tar.xz

				tar -xvf $LIBWAYLAND_VERSION.tar.xz && rm $LIBWAYLAND_VERSION.tar.xz

				cd $LIBWAYLAND_VERSION; ./configure --enable-libraries --without-host-scanner --disable-documentation --disable-dtd-validation; make install; cd ..

				rm -rf $LIBWAYLAND_VERSION

				wget $WAYLAND_RELEASES/$WAYLAND_PROTOCOLS_VERSION.tar.xz

				tar -xvf $WAYLAND_PROTOCOLS_VERSION.tar.xz && rm $WAYLAND_PROTOCOLS_VERSION.tar.xz

				cd $WAYLAND_PROTOCOLS_VERSION; ./configure; make install; cd ..

				rm -rf $WAYLAND_PROTOCOLS_VERSION

				pushd /usr/local

				git clone https://gitlab.freedesktop.org/mesa/shader-db.git --depth 1

				rm -rf shader-db/.git

				cd shader-db

				make

				popd

				# Use ccache to speed up builds

				apt-get install -y ccache

				# We need xmllint to validate the XML files in Mesa

				apt-get install -y libxml2-utils

				# Generate cross build files for Meson

				for arch in $CROSS_ARCHITECTURES; do

				  cross_file="/cross_file-$arch.txt"

				  /usr/share/meson/debcrossgen --arch "$arch" -o "$cross_file"

				  # Work around a bug in debcrossgen that should be fixed in the next release

				  if [ "$arch" = "i386" ]; then

				    sed -i "s|cpu_family = 'i686'|cpu_family = 'x86'|g" "$cross_file"

				  fi

				done

				############### Build dEQP

				git config --global user.email "mesa@example.com"

				git config --global user.name "Mesa CI"

				# XXX: Use --depth 1 once we can drop the cherry-picks.

				git clone \

				    https://github.com/KhronosGroup/VK-GL-CTS.git \

				    -b opengl-es-cts-3.2.5.1 \

				    /VK-GL-CTS

				cd /VK-GL-CTS

				# Fix surfaceless build

				git cherry-pick -x 22f41e5e321c6dcd8569c4dad91bce89f06b3670

				git cherry-pick -x 1daa8dff73161ea60ead965bd6c9f2a0a2165648

				# surfaceless links against libkms and such despite not using it.

				sed -i '/gbm/d' targets/surfaceless/surfaceless.cmake

				sed -i '/libkms/d' targets/surfaceless/surfaceless.cmake

				sed -i '/libgbm/d' targets/surfaceless/surfaceless.cmake

				python3 external/fetch_sources.py

				mkdir -p /deqp

				cd /deqp

				cmake -G Ninja \

				      -DDEQP_TARGET=surfaceless               \

				      -DCMAKE_BUILD_TYPE=Release              \

				      /VK-GL-CTS

				ninja

				# Copy out the mustpass lists we want from a bunch of other junk.

				mkdir /deqp/mustpass

				for gles in gles2 gles3 gles31; do

				    cp \

				        /deqp/external/openglcts/modules/gl_cts/data/mustpass/gles/aosp_mustpass/3.2.5.x/$gles-master.txt \

				        /deqp/mustpass/$gles-master.txt

				done

				# Remove the rest of the build products that we don't need.

				rm -rf /deqp/external

				rm -rf /deqp/modules/internal

				rm -rf /deqp/executor

				rm -rf /deqp/execserver

				rm -rf /deqp/modules/egl

				rm -rf /deqp/framework

				du -sh *

				rm -rf /VK-GL-CTS

				############### Uninstall the build software

				apt-get purge -y \

				      git \

				      curl \

				      unzip \

				      gnupg \

				      cmake \

				      git \

				      libgles2-mesa-dev \

				      libgbm-dev

				apt-get autoremove -y --purge

10

.gitlab-ci/deqp-default-skips.txt Normal file

View File

@@ -0,0 +1,10 @@
 # Note: skips lists for CI are just a list of lines that, when
 # non-zero-length and not starting with '#', will regex match to
 # delete lines from the test list.  Be careful.
 # Skip the perf/stress tests to keep runtime manageable
 dEQP-GLES[0-9]*.performance
 dEQP-GLES[0-9]*.stress
 # These are really slow on tiling architectures (including llvmpipe).
 dEQP-GLES[0-9]*.functional.flush_finish

124

.gitlab-ci/deqp-llvmpipe-fails.txt Normal file

View File

@@ -0,0 +1,124 @@
 dEQP-GLES2.functional.clipping.line.wide_line_clip_viewport_center
 dEQP-GLES2.functional.clipping.line.wide_line_clip_viewport_corner
 dEQP-GLES2.functional.clipping.point.wide_point_clip
 dEQP-GLES2.functional.clipping.point.wide_point_clip_viewport_center
 dEQP-GLES2.functional.clipping.point.wide_point_clip_viewport_corner
 dEQP-GLES2.functional.clipping.triangle_vertex.clip_two.clip_neg_y_neg_z_and_neg_x_neg_y_pos_z
 dEQP-GLES2.functional.clipping.triangle_vertex.clip_two.clip_pos_y_pos_z_and_neg_x_neg_y_neg_z
 dEQP-GLES2.functional.fbo.render.color_clear.rbo_rgba4
 dEQP-GLES2.functional.fbo.render.color_clear.rbo_rgba4_depth_component16
 dEQP-GLES2.functional.fbo.render.color_clear.rbo_rgba4_stencil_index8
 dEQP-GLES2.functional.fbo.render.depth.rbo_rgba4_depth_component16
 dEQP-GLES2.functional.fbo.render.recreate_colorbuffer.no_rebind_rbo_rgba4
 dEQP-GLES2.functional.fbo.render.recreate_colorbuffer.no_rebind_rbo_rgba4_stencil_index8
 dEQP-GLES2.functional.fbo.render.recreate_colorbuffer.rebind_rbo_rgba4
 dEQP-GLES2.functional.fbo.render.recreate_colorbuffer.rebind_rbo_rgba4_stencil_index8
 dEQP-GLES2.functional.fbo.render.recreate_depthbuffer.no_rebind_rbo_rgba4_depth_component16
 dEQP-GLES2.functional.fbo.render.recreate_depthbuffer.rebind_rbo_rgba4_depth_component16
 dEQP-GLES2.functional.fbo.render.recreate_stencilbuffer.no_rebind_rbo_rgba4_stencil_index8
 dEQP-GLES2.functional.fbo.render.recreate_stencilbuffer.rebind_rbo_rgba4_stencil_index8
 dEQP-GLES2.functional.fbo.render.shared_colorbuffer.rbo_rgba4
 dEQP-GLES2.functional.fbo.render.shared_colorbuffer.rbo_rgba4_depth_component16
 dEQP-GLES2.functional.fbo.render.shared_depthbuffer.rbo_rgba4_depth_component16
 dEQP-GLES2.functional.polygon_offset.default_displacement_with_units
 dEQP-GLES2.functional.polygon_offset.fixed16_displacement_with_units
 dEQP-GLES2.functional.rasterization.interpolation.basic.line_loop_wide
 dEQP-GLES2.functional.rasterization.interpolation.basic.line_strip_wide
 dEQP-GLES2.functional.rasterization.interpolation.basic.lines_wide
 dEQP-GLES2.functional.rasterization.interpolation.projected.line_loop_wide
 dEQP-GLES2.functional.rasterization.interpolation.projected.line_strip_wide
 dEQP-GLES2.functional.rasterization.interpolation.projected.lines_wide
 dEQP-GLES2.functional.rasterization.limits.points
 dEQP-GLES2.functional.shaders.texture_functions.fragment.texture2d_bias
 dEQP-GLES2.functional.shaders.texture_functions.fragment.texture2dproj_vec3_bias
 dEQP-GLES2.functional.shaders.texture_functions.fragment.texture2dproj_vec4_bias
 dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_linear_clamp_rgba8888
 dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_linear_mirror_etc1
 dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_linear_mirror_rgba8888
 dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_linear_repeat_etc1
 dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_linear_repeat_rgba8888
 dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_nearest_clamp_rgba8888
 dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_nearest_mirror_etc1
 dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_nearest_mirror_rgba8888
 dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_nearest_repeat_etc1
 dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_nearest_repeat_l8
 dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_nearest_repeat_rgb888
 dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_nearest_repeat_rgba4444
 dEQP-GLES2.functional.texture.filtering.2d.linear_mipmap_linear_nearest_repeat_rgba8888
 dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_linear_clamp_rgba8888
 dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_linear_mirror_etc1
 dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_linear_mirror_rgba8888
 dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_linear_repeat_etc1
 dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_linear_repeat_rgba8888
 dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_nearest_clamp_rgba8888
 dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_nearest_mirror_etc1
 dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_nearest_mirror_rgba8888
 dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_nearest_repeat_etc1
 dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_nearest_repeat_l8
 dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_nearest_repeat_rgb888
 dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_nearest_repeat_rgba4444
 dEQP-GLES2.functional.texture.filtering.2d.nearest_mipmap_linear_nearest_repeat_rgba8888
 dEQP-GLES2.functional.texture.mipmap.2d.affine.linear_linear_repeat
 dEQP-GLES2.functional.texture.mipmap.2d.affine.nearest_linear_clamp
 dEQP-GLES2.functional.texture.mipmap.2d.affine.nearest_linear_mirror
 dEQP-GLES2.functional.texture.mipmap.2d.affine.nearest_linear_repeat
 dEQP-GLES2.functional.texture.mipmap.2d.basic.linear_linear_repeat
 dEQP-GLES2.functional.texture.mipmap.2d.basic.linear_linear_repeat_non_square
 dEQP-GLES2.functional.texture.mipmap.2d.basic.nearest_linear_clamp
 dEQP-GLES2.functional.texture.mipmap.2d.basic.nearest_linear_clamp_non_square
 dEQP-GLES2.functional.texture.mipmap.2d.basic.nearest_linear_mirror
 dEQP-GLES2.functional.texture.mipmap.2d.basic.nearest_linear_mirror_non_square
 dEQP-GLES2.functional.texture.mipmap.2d.basic.nearest_linear_repeat
 dEQP-GLES2.functional.texture.mipmap.2d.basic.nearest_linear_repeat_non_square
 dEQP-GLES2.functional.texture.mipmap.2d.projected.linear_linear_repeat
 dEQP-GLES2.functional.texture.mipmap.2d.projected.nearest_linear_clamp
 dEQP-GLES2.functional.texture.mipmap.2d.projected.nearest_linear_mirror
 dEQP-GLES2.functional.texture.mipmap.2d.projected.nearest_linear_repeat
 dEQP-GLES2.functional.texture.mipmap.cube.basic.linear_linear
 dEQP-GLES2.functional.texture.mipmap.cube.basic.linear_nearest
 dEQP-GLES2.functional.texture.mipmap.cube.bias.linear_linear
 dEQP-GLES2.functional.texture.mipmap.cube.bias.linear_nearest
 dEQP-GLES2.functional.texture.mipmap.cube.projected.linear_linear
 dEQP-GLES2.functional.texture.mipmap.cube.projected.linear_nearest
 dEQP-GLES2.functional.texture.vertex.2d.filtering.linear_mipmap_linear_linear_clamp
 dEQP-GLES2.functional.texture.vertex.2d.filtering.linear_mipmap_linear_linear_mirror
 dEQP-GLES2.functional.texture.vertex.2d.filtering.linear_mipmap_linear_linear_repeat
 dEQP-GLES2.functional.texture.vertex.2d.filtering.linear_mipmap_linear_nearest_clamp
 dEQP-GLES2.functional.texture.vertex.2d.filtering.linear_mipmap_linear_nearest_mirror
 dEQP-GLES2.functional.texture.vertex.2d.filtering.linear_mipmap_linear_nearest_repeat
 dEQP-GLES2.functional.texture.vertex.2d.filtering.nearest_mipmap_linear_linear_clamp
 dEQP-GLES2.functional.texture.vertex.2d.filtering.nearest_mipmap_linear_linear_mirror
 dEQP-GLES2.functional.texture.vertex.2d.filtering.nearest_mipmap_linear_linear_repeat
 dEQP-GLES2.functional.texture.vertex.2d.filtering.nearest_mipmap_linear_nearest_clamp
 dEQP-GLES2.functional.texture.vertex.2d.filtering.nearest_mipmap_linear_nearest_mirror
 dEQP-GLES2.functional.texture.vertex.2d.filtering.nearest_mipmap_linear_nearest_repeat
 dEQP-GLES2.functional.texture.vertex.2d.wrap.clamp_clamp
 dEQP-GLES2.functional.texture.vertex.2d.wrap.clamp_mirror
 dEQP-GLES2.functional.texture.vertex.2d.wrap.clamp_repeat
 dEQP-GLES2.functional.texture.vertex.2d.wrap.mirror_clamp
 dEQP-GLES2.functional.texture.vertex.2d.wrap.mirror_mirror
 dEQP-GLES2.functional.texture.vertex.2d.wrap.mirror_repeat
 dEQP-GLES2.functional.texture.vertex.2d.wrap.repeat_clamp
 dEQP-GLES2.functional.texture.vertex.2d.wrap.repeat_mirror
 dEQP-GLES2.functional.texture.vertex.2d.wrap.repeat_repeat
 dEQP-GLES2.functional.texture.vertex.cube.filtering.linear_mipmap_linear_linear_clamp
 dEQP-GLES2.functional.texture.vertex.cube.filtering.linear_mipmap_linear_linear_mirror
 dEQP-GLES2.functional.texture.vertex.cube.filtering.linear_mipmap_linear_linear_repeat
 dEQP-GLES2.functional.texture.vertex.cube.filtering.linear_mipmap_linear_nearest_clamp
 dEQP-GLES2.functional.texture.vertex.cube.filtering.linear_mipmap_linear_nearest_mirror
 dEQP-GLES2.functional.texture.vertex.cube.filtering.linear_mipmap_linear_nearest_repeat
 dEQP-GLES2.functional.texture.vertex.cube.filtering.nearest_mipmap_linear_linear_clamp
 dEQP-GLES2.functional.texture.vertex.cube.filtering.nearest_mipmap_linear_linear_mirror
 dEQP-GLES2.functional.texture.vertex.cube.filtering.nearest_mipmap_linear_linear_repeat
 dEQP-GLES2.functional.texture.vertex.cube.filtering.nearest_mipmap_linear_nearest_clamp
 dEQP-GLES2.functional.texture.vertex.cube.filtering.nearest_mipmap_linear_nearest_mirror
 dEQP-GLES2.functional.texture.vertex.cube.filtering.nearest_mipmap_linear_nearest_repeat
 dEQP-GLES2.functional.texture.vertex.cube.wrap.clamp_clamp
 dEQP-GLES2.functional.texture.vertex.cube.wrap.clamp_mirror
 dEQP-GLES2.functional.texture.vertex.cube.wrap.clamp_repeat
 dEQP-GLES2.functional.texture.vertex.cube.wrap.mirror_clamp
 dEQP-GLES2.functional.texture.vertex.cube.wrap.mirror_mirror
 dEQP-GLES2.functional.texture.vertex.cube.wrap.mirror_repeat
 dEQP-GLES2.functional.texture.vertex.cube.wrap.repeat_clamp
 dEQP-GLES2.functional.texture.vertex.cube.wrap.repeat_mirror
 dEQP-GLES2.functional.texture.vertex.cube.wrap.repeat_repeat

									
										112

.gitlab-ci/deqp-runner.sh
									
										Executable file
									
												View File
												
				@@ -0,0 +1,112 @@

				#!/bin/bash

				set -ex

				DEQP_OPTIONS=(--deqp-surface-width=256 --deqp-surface-height=256)

				DEQP_OPTIONS+=(--deqp-surface-type=pbuffer)

				DEQP_OPTIONS+=(--deqp-gl-config-name=rgba8888d24s8ms0)

				DEQP_OPTIONS+=(--deqp-visibility=hidden)

				DEQP_OPTIONS+=(--deqp-log-images=disable)

				DEQP_OPTIONS+=(--deqp-watchdog=enable)

				DEQP_OPTIONS+=(--deqp-crashhandler=enable)

				if [ -z "$DEQP_VER" ]; then

				   echo 'DEQP_VER must be set to something like "gles2" or "gles31" for the test run'

				   exit 1

				fi

				if [ -z "$DEQP_SKIPS" ]; then

				   echo 'DEQP_SKIPS must be set to something like "deqp-default-skips.txt"'

				   exit 1

				fi

				# Prep the expected failure list

				if [ -n "$DEQP_EXPECTED_FAILS" ]; then

				   export DEQP_EXPECTED_FAILS=`pwd`/artifacts/$DEQP_EXPECTED_FAILS

				else

				   export DEQP_EXPECTED_FAILS=/tmp/expect-no-failures.txt

				   touch $DEQP_EXPECTED_FAILS

				fi

				sort < $DEQP_EXPECTED_FAILS > /tmp/expected-fails.txt

				# Fix relative paths on inputs.

				export DEQP_SKIPS=`pwd`/artifacts/$DEQP_SKIPS

				# Be a good citizen on the shared runners.

				export LP_NUM_THREADS=4

				# Set up the driver environment.

				export LD_LIBRARY_PATH=`pwd`/install/lib/

				export EGL_PLATFORM=surfaceless

				# the runner was failing to look for libkms in /usr/local/lib for some reason

				# I never figured out.

				export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib

				RESULTS=`pwd`/results

				mkdir -p $RESULTS

				cd /deqp/modules/$DEQP_VER

				# Generate test case list file

				cp /deqp/mustpass/$DEQP_VER-master.txt /tmp/case-list.txt

				# Note: not using sorted input and comm, becuase I want to run the tests in

				# the same order that dEQP would.

				while read -r line; do

				   if echo "$line" | grep -q '^[^#]'; then

				       sed -i "/$line/d" /tmp/case-list.txt

				   fi

				done < $DEQP_SKIPS

				# If the job is parallel, take the corresponding fraction of the caselist.

				# Note: N~M is a gnu sed extension to match every nth line (first line is #1).

				if [ -n "$CI_NODE_INDEX" ]; then

				   sed -ni $CI_NODE_INDEX~$CI_NODE_TOTAL"p" /tmp/case-list.txt

				fi

				if [ ! -s /tmp/case-list.txt ]; then

				    echo "Caselist generation failed"

				    exit 1

				fi

				# Cannot use tee because dash doesn't have pipefail

				touch /tmp/result.txt

				tail -f /tmp/result.txt &

				./deqp-$DEQP_VER "${DEQP_OPTIONS[@]}" --deqp-log-filename=$RESULTS/results.qpa --deqp-caselist-file=/tmp/case-list.txt >> /tmp/result.txt

				DEQP_EXITCODE=$?

				sed -ne \

				    '/StatusCode="Fail"/{x;p}; s/#beginTestCaseResult //; T; h' \

				    $RESULTS/results.qpa \

				    > /tmp/unsorted-fails.txt

				# Scrape out the renderer that the test run used, so we can validate that the

				# right driver was used.

				if grep -q "dEQP-.*.info.renderer" /tmp/case-list.txt; then

				    # This is an ugly dependency on the .qpa format: Print 3 lines after the

				    # match, which happens to contain the result.

				    RENDERER=`sed -n '/#beginTestCaseResult dEQP-.*.info.renderer/{n;n;n;p}' $RESULTS/results.qpa | sed -n -E "s|<Text>(.*)</Text>|\1|p"`

				    echo "GL_RENDERER for this test run: $RENDERER"

				    if [ -n "$DEQP_RENDERER_MATCH" ]; then

				        echo $RENDERER | grep -q $DEQP_RENDERER_MATCH > /dev/null

				    fi

				fi

				if [ $DEQP_EXITCODE -ne 0 ]; then

				   exit $DEQP_EXITCODE

				fi

				sort < /tmp/unsorted-fails.txt > $RESULTS/fails.txt

				comm -23 $RESULTS/fails.txt /tmp/expected-fails.txt > /tmp/new-fails.txt

				if [ -s /tmp/new-fails.txt ]; then

				    echo "Unexpected failures:"

				    cat /tmp/new-fails.txt

				    exit 1

				else

				    echo "No new failures"

				fi

445

.gitlab-ci/deqp-softpipe-fails.txt Normal file

View File

@@ -0,0 +1,445 @@
 dEQP-GLES2.functional.clipping.line.wide_line_clip_viewport_center
 dEQP-GLES2.functional.clipping.line.wide_line_clip_viewport_corner
 dEQP-GLES2.functional.clipping.point.wide_point_clip
 dEQP-GLES2.functional.clipping.point.wide_point_clip_viewport_center
 dEQP-GLES2.functional.clipping.point.wide_point_clip_viewport_corner
 dEQP-GLES2.functional.clipping.triangle_vertex.clip_two.clip_neg_y_neg_z_and_neg_x_neg_y_pos_z
 dEQP-GLES2.functional.clipping.triangle_vertex.clip_two.clip_pos_y_pos_z_and_neg_x_neg_y_neg_z
 dEQP-GLES2.functional.polygon_offset.default_displacement_with_units
 dEQP-GLES2.functional.polygon_offset.fixed16_displacement_with_units
 dEQP-GLES2.functional.rasterization.interpolation.basic.line_loop_wide
 dEQP-GLES2.functional.rasterization.interpolation.basic.line_strip_wide
 dEQP-GLES2.functional.rasterization.interpolation.basic.lines_wide
 dEQP-GLES2.functional.rasterization.interpolation.projected.line_loop_wide
 dEQP-GLES2.functional.rasterization.interpolation.projected.line_strip_wide
 dEQP-GLES2.functional.rasterization.interpolation.projected.lines_wide
 dEQP-GLES2.functional.rasterization.limits.points
 dEQP-GLES2.functional.rasterization.primitives.points
 dEQP-GLES3.functional.clipping.line.wide_line_clip_viewport_center
 dEQP-GLES3.functional.clipping.line.wide_line_clip_viewport_corner
 dEQP-GLES3.functional.clipping.point.wide_point_clip
 dEQP-GLES3.functional.clipping.point.wide_point_clip_viewport_center
 dEQP-GLES3.functional.clipping.point.wide_point_clip_viewport_corner
 dEQP-GLES3.functional.clipping.triangle_vertex.clip_two.clip_neg_y_neg_z_and_neg_x_neg_y_pos_z
 dEQP-GLES3.functional.clipping.triangle_vertex.clip_two.clip_pos_y_pos_z_and_neg_x_neg_y_neg_z
 dEQP-GLES3.functional.draw.random.124
 dEQP-GLES3.functional.fbo.depth.depth_test_clamp.depth24_stencil8
 dEQP-GLES3.functional.fbo.depth.depth_test_clamp.depth32f_stencil8
 dEQP-GLES3.functional.fbo.depth.depth_test_clamp.depth_component16
 dEQP-GLES3.functional.fbo.depth.depth_test_clamp.depth_component24
 dEQP-GLES3.functional.fbo.depth.depth_test_clamp.depth_component32f
 dEQP-GLES3.functional.fbo.depth.depth_write_clamp.depth32f_stencil8
 dEQP-GLES3.functional.fbo.depth.depth_write_clamp.depth_component32f
 dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_msaa_color
 dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_msaa_depth
 dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_msaa_depth_stencil
 dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_msaa_stencil
 dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_msaa_color
 dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_msaa_depth
 dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_msaa_depth_stencil
 dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_msaa_stencil
 dEQP-GLES3.functional.fbo.msaa.2_samples.depth24_stencil8
 dEQP-GLES3.functional.fbo.msaa.2_samples.depth32f_stencil8
 dEQP-GLES3.functional.fbo.msaa.2_samples.depth_component16
 dEQP-GLES3.functional.fbo.msaa.2_samples.depth_component24
 dEQP-GLES3.functional.fbo.msaa.2_samples.depth_component32f
 dEQP-GLES3.functional.fbo.msaa.2_samples.r11f_g11f_b10f
 dEQP-GLES3.functional.fbo.msaa.2_samples.r16f
 dEQP-GLES3.functional.fbo.msaa.2_samples.r8
 dEQP-GLES3.functional.fbo.msaa.2_samples.rg16f
 dEQP-GLES3.functional.fbo.msaa.2_samples.rg8
 dEQP-GLES3.functional.fbo.msaa.2_samples.rgb10_a2
 dEQP-GLES3.functional.fbo.msaa.2_samples.rgb565
 dEQP-GLES3.functional.fbo.msaa.2_samples.rgb5_a1
 dEQP-GLES3.functional.fbo.msaa.2_samples.rgb8
 dEQP-GLES3.functional.fbo.msaa.2_samples.rgba4
 dEQP-GLES3.functional.fbo.msaa.2_samples.rgba8
 dEQP-GLES3.functional.fbo.msaa.2_samples.srgb8_alpha8
 dEQP-GLES3.functional.fbo.msaa.2_samples.stencil_index8
 dEQP-GLES3.functional.fbo.msaa.4_samples.depth24_stencil8
 dEQP-GLES3.functional.fbo.msaa.4_samples.depth32f_stencil8
 dEQP-GLES3.functional.fbo.msaa.4_samples.depth_component16
 dEQP-GLES3.functional.fbo.msaa.4_samples.depth_component24
 dEQP-GLES3.functional.fbo.msaa.4_samples.depth_component32f
 dEQP-GLES3.functional.fbo.msaa.4_samples.r11f_g11f_b10f
 dEQP-GLES3.functional.fbo.msaa.4_samples.r16f
 dEQP-GLES3.functional.fbo.msaa.4_samples.r8
 dEQP-GLES3.functional.fbo.msaa.4_samples.rg16f
 dEQP-GLES3.functional.fbo.msaa.4_samples.rg8
 dEQP-GLES3.functional.fbo.msaa.4_samples.rgb10_a2
 dEQP-GLES3.functional.fbo.msaa.4_samples.rgb565
 dEQP-GLES3.functional.fbo.msaa.4_samples.rgb5_a1
 dEQP-GLES3.functional.fbo.msaa.4_samples.rgb8
 dEQP-GLES3.functional.fbo.msaa.4_samples.rgba4
 dEQP-GLES3.functional.fbo.msaa.4_samples.rgba8
 dEQP-GLES3.functional.fbo.msaa.4_samples.srgb8_alpha8
 dEQP-GLES3.functional.fbo.msaa.4_samples.stencil_index8
 dEQP-GLES3.functional.multisample.fbo_max_samples.proportionality_alpha_to_coverage
 dEQP-GLES3.functional.multisample.fbo_max_samples.proportionality_sample_coverage
 dEQP-GLES3.functional.multisample.fbo_max_samples.proportionality_sample_coverage_inverted
 dEQP-GLES3.functional.multisample.fbo_max_samples.sample_coverage_invert
 dEQP-GLES3.functional.negative_api.buffer.blit_framebuffer_multisample
 dEQP-GLES3.functional.negative_api.buffer.read_pixels_fbo_format_mismatch
 dEQP-GLES3.functional.polygon_offset.default_displacement_with_units
 dEQP-GLES3.functional.polygon_offset.fixed16_displacement_with_units
 dEQP-GLES3.functional.polygon_offset.fixed24_displacement_with_units
 dEQP-GLES3.functional.polygon_offset.float32_displacement_with_units
 dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_max.interpolation.lines_wide
 dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_max.primitives.lines_wide
 dEQP-GLES3.functional.rasterization.fbo.rbo_singlesample.interpolation.lines_wide
 dEQP-GLES3.functional.rasterization.fbo.rbo_singlesample.primitives.points
 dEQP-GLES3.functional.rasterization.fbo.texture_2d.interpolation.lines_wide
 dEQP-GLES3.functional.rasterization.fbo.texture_2d.primitives.points
 dEQP-GLES3.functional.rasterization.interpolation.basic.line_loop_wide
 dEQP-GLES3.functional.rasterization.interpolation.basic.line_strip_wide
 dEQP-GLES3.functional.rasterization.interpolation.basic.lines_wide
 dEQP-GLES3.functional.rasterization.interpolation.projected.line_loop_wide
 dEQP-GLES3.functional.rasterization.interpolation.projected.line_strip_wide
 dEQP-GLES3.functional.rasterization.interpolation.projected.lines_wide
 dEQP-GLES3.functional.rasterization.primitives.points
 dEQP-GLES3.functional.rasterizer_discard.basic.write_depth_points
 dEQP-GLES3.functional.rasterizer_discard.basic.write_stencil_points
 dEQP-GLES3.functional.rasterizer_discard.fbo.write_depth_points
 dEQP-GLES3.functional.rasterizer_discard.fbo.write_stencil_points
 dEQP-GLES3.functional.rasterizer_discard.scissor.write_depth_points
 dEQP-GLES3.functional.rasterizer_discard.scissor.write_stencil_points
 dEQP-GLES3.functional.shaders.derivate.dfdx.fastest.fbo_msaa4.float_highp
 dEQP-GLES3.functional.shaders.derivate.dfdx.fastest.fbo_msaa4.float_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdx.fastest.fbo_msaa4.vec2_highp
 dEQP-GLES3.functional.shaders.derivate.dfdx.fastest.fbo_msaa4.vec2_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdx.fastest.fbo_msaa4.vec3_highp
 dEQP-GLES3.functional.shaders.derivate.dfdx.fastest.fbo_msaa4.vec3_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdx.fastest.fbo_msaa4.vec4_highp
 dEQP-GLES3.functional.shaders.derivate.dfdx.fastest.fbo_msaa4.vec4_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa2.float_highp
 dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa2.float_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa2.vec2_highp
 dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa2.vec2_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa2.vec3_highp
 dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa2.vec3_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa2.vec4_highp
 dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa2.vec4_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa4.float_highp
 dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa4.float_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa4.vec2_highp
 dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa4.vec2_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa4.vec3_highp
 dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa4.vec3_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa4.vec4_highp
 dEQP-GLES3.functional.shaders.derivate.dfdx.fbo_msaa4.vec4_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdx.nicest.fbo_msaa4.float_highp
 dEQP-GLES3.functional.shaders.derivate.dfdx.nicest.fbo_msaa4.float_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdx.nicest.fbo_msaa4.vec2_highp
 dEQP-GLES3.functional.shaders.derivate.dfdx.nicest.fbo_msaa4.vec2_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdx.nicest.fbo_msaa4.vec3_highp
 dEQP-GLES3.functional.shaders.derivate.dfdx.nicest.fbo_msaa4.vec3_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdx.nicest.fbo_msaa4.vec4_highp
 dEQP-GLES3.functional.shaders.derivate.dfdx.nicest.fbo_msaa4.vec4_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdx.texture.msaa4.float_highp
 dEQP-GLES3.functional.shaders.derivate.dfdx.texture.msaa4.float_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdx.texture.msaa4.vec2_highp
 dEQP-GLES3.functional.shaders.derivate.dfdx.texture.msaa4.vec2_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdx.texture.msaa4.vec3_highp
 dEQP-GLES3.functional.shaders.derivate.dfdx.texture.msaa4.vec3_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdx.texture.msaa4.vec4_highp
 dEQP-GLES3.functional.shaders.derivate.dfdx.texture.msaa4.vec4_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdy.fastest.fbo_msaa4.float_highp
 dEQP-GLES3.functional.shaders.derivate.dfdy.fastest.fbo_msaa4.float_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdy.fastest.fbo_msaa4.vec2_highp
 dEQP-GLES3.functional.shaders.derivate.dfdy.fastest.fbo_msaa4.vec2_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdy.fastest.fbo_msaa4.vec3_highp
 dEQP-GLES3.functional.shaders.derivate.dfdy.fastest.fbo_msaa4.vec3_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdy.fastest.fbo_msaa4.vec4_highp
 dEQP-GLES3.functional.shaders.derivate.dfdy.fastest.fbo_msaa4.vec4_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa2.float_highp
 dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa2.float_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa2.vec2_highp
 dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa2.vec2_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa2.vec3_highp
 dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa2.vec3_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa2.vec4_highp
 dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa2.vec4_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa4.float_highp
 dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa4.float_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa4.vec2_highp
 dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa4.vec2_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa4.vec3_highp
 dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa4.vec3_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa4.vec4_highp
 dEQP-GLES3.functional.shaders.derivate.dfdy.fbo_msaa4.vec4_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdy.nicest.fbo_msaa4.float_highp
 dEQP-GLES3.functional.shaders.derivate.dfdy.nicest.fbo_msaa4.float_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdy.nicest.fbo_msaa4.vec2_highp
 dEQP-GLES3.functional.shaders.derivate.dfdy.nicest.fbo_msaa4.vec2_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdy.nicest.fbo_msaa4.vec3_highp
 dEQP-GLES3.functional.shaders.derivate.dfdy.nicest.fbo_msaa4.vec3_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdy.nicest.fbo_msaa4.vec4_highp
 dEQP-GLES3.functional.shaders.derivate.dfdy.nicest.fbo_msaa4.vec4_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdy.texture.msaa4.float_highp
 dEQP-GLES3.functional.shaders.derivate.dfdy.texture.msaa4.float_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdy.texture.msaa4.vec2_highp
 dEQP-GLES3.functional.shaders.derivate.dfdy.texture.msaa4.vec2_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdy.texture.msaa4.vec3_highp
 dEQP-GLES3.functional.shaders.derivate.dfdy.texture.msaa4.vec3_mediump
 dEQP-GLES3.functional.shaders.derivate.dfdy.texture.msaa4.vec4_highp
 dEQP-GLES3.functional.shaders.derivate.dfdy.texture.msaa4.vec4_mediump
 dEQP-GLES3.functional.shaders.derivate.fwidth.fastest.fbo_msaa4.float_highp
 dEQP-GLES3.functional.shaders.derivate.fwidth.fastest.fbo_msaa4.float_mediump
 dEQP-GLES3.functional.shaders.derivate.fwidth.fastest.fbo_msaa4.vec2_highp
 dEQP-GLES3.functional.shaders.derivate.fwidth.fastest.fbo_msaa4.vec2_mediump
 dEQP-GLES3.functional.shaders.derivate.fwidth.fastest.fbo_msaa4.vec3_highp
 dEQP-GLES3.functional.shaders.derivate.fwidth.fastest.fbo_msaa4.vec3_mediump
 dEQP-GLES3.functional.shaders.derivate.fwidth.fastest.fbo_msaa4.vec4_highp
 dEQP-GLES3.functional.shaders.derivate.fwidth.fastest.fbo_msaa4.vec4_mediump
 dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa2.float_highp
 dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa2.float_mediump
 dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa2.vec2_highp
 dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa2.vec2_mediump
 dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa2.vec3_highp
 dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa2.vec3_mediump
 dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa2.vec4_highp
 dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa2.vec4_mediump
 dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa4.float_highp
 dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa4.float_mediump
 dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa4.vec2_highp
 dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa4.vec2_mediump
 dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa4.vec3_highp
 dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa4.vec3_mediump
 dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa4.vec4_highp
 dEQP-GLES3.functional.shaders.derivate.fwidth.fbo_msaa4.vec4_mediump
 dEQP-GLES3.functional.shaders.derivate.fwidth.nicest.fbo_msaa4.float_highp
 dEQP-GLES3.functional.shaders.derivate.fwidth.nicest.fbo_msaa4.float_mediump
 dEQP-GLES3.functional.shaders.derivate.fwidth.nicest.fbo_msaa4.vec2_highp
 dEQP-GLES3.functional.shaders.derivate.fwidth.nicest.fbo_msaa4.vec2_mediump
 dEQP-GLES3.functional.shaders.derivate.fwidth.nicest.fbo_msaa4.vec3_highp
 dEQP-GLES3.functional.shaders.derivate.fwidth.nicest.fbo_msaa4.vec3_mediump
 dEQP-GLES3.functional.shaders.derivate.fwidth.nicest.fbo_msaa4.vec4_highp
 dEQP-GLES3.functional.shaders.derivate.fwidth.nicest.fbo_msaa4.vec4_mediump
 dEQP-GLES3.functional.shaders.derivate.fwidth.texture.msaa4.float_highp
 dEQP-GLES3.functional.shaders.derivate.fwidth.texture.msaa4.float_mediump
 dEQP-GLES3.functional.shaders.derivate.fwidth.texture.msaa4.vec2_highp
 dEQP-GLES3.functional.shaders.derivate.fwidth.texture.msaa4.vec2_mediump
 dEQP-GLES3.functional.shaders.derivate.fwidth.texture.msaa4.vec3_highp
 dEQP-GLES3.functional.shaders.derivate.fwidth.texture.msaa4.vec3_mediump
 dEQP-GLES3.functional.shaders.derivate.fwidth.texture.msaa4.vec4_highp
 dEQP-GLES3.functional.shaders.derivate.fwidth.texture.msaa4.vec4_mediump
 dEQP-GLES3.functional.state_query.integers.max_samples_getfloat
 dEQP-GLES3.functional.state_query.integers.max_samples_getinteger64
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_clamp_clamp_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_clamp_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_clamp_mirror_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_clamp_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_clamp_repeat_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_clamp_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_mirror_clamp_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_mirror_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_mirror_mirror_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_mirror_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_mirror_repeat_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_mirror_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_repeat_clamp_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_repeat_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_repeat_mirror_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_repeat_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_repeat_repeat_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_linear_repeat_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_clamp_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_clamp_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_clamp_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_mirror_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_mirror_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_mirror_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_repeat_clamp_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_repeat_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_repeat_mirror_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_repeat_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_repeat_repeat_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_linear_repeat_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_nearest_repeat_clamp_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_nearest_repeat_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_nearest_repeat_mirror_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_nearest_repeat_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_nearest_repeat_repeat_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_linear_nearest_repeat_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_clamp_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_clamp_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_clamp_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_mirror_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_mirror_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_mirror_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_repeat_clamp_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_repeat_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_repeat_mirror_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_repeat_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_repeat_repeat_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_linear_repeat_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_nearest_repeat_clamp_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_nearest_repeat_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_nearest_repeat_mirror_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_nearest_repeat_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_nearest_repeat_repeat_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_mipmap_nearest_nearest_repeat_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_clamp_clamp_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_clamp_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_clamp_mirror_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_clamp_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_clamp_repeat_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_clamp_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_mirror_clamp_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_mirror_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_mirror_mirror_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_mirror_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_mirror_repeat_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_mirror_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_repeat_clamp_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_repeat_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_repeat_mirror_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_repeat_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_repeat_repeat_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.linear_nearest_repeat_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_clamp_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_clamp_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_clamp_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_mirror_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_mirror_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_mirror_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_repeat_clamp_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_repeat_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_repeat_mirror_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_repeat_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_repeat_repeat_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_linear_repeat_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_clamp_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_clamp_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_clamp_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_mirror_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_mirror_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_mirror_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_repeat_clamp_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_repeat_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_repeat_mirror_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_repeat_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_repeat_repeat_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_linear_linear_repeat_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_clamp_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_clamp_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_clamp_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_mirror_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_mirror_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_mirror_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_repeat_clamp_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_repeat_clamp_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_repeat_mirror_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_repeat_mirror_repeat
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_repeat_repeat_mirror
 dEQP-GLES3.functional.texture.filtering.3d.combinations.nearest_mipmap_nearest_linear_repeat_repeat_repeat
 dEQP-GLES3.functional.texture.filtering.3d.formats.r11f_g11f_b10f_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.r11f_g11f_b10f_linear_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.r11f_g11f_b10f_linear_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.formats.r11f_g11f_b10f_nearest_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.r11f_g11f_b10f_nearest_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgb10_a2_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgb10_a2_linear_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgb10_a2_linear_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgb10_a2_nearest_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgb10_a2_nearest_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgb565_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgb565_linear_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgb565_linear_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgb565_nearest_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgb565_nearest_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgb5_a1_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgb5_a1_linear_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgb5_a1_linear_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgb5_a1_nearest_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgb5_a1_nearest_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgb9_e5_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgb9_e5_linear_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgb9_e5_linear_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgb9_e5_nearest_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgb9_e5_nearest_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgba16f_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgba16f_linear_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgba16f_linear_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgba16f_nearest_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgba16f_nearest_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgba4_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgba4_linear_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgba4_linear_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgba4_nearest_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgba4_nearest_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgba8_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgba8_linear_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgba8_linear_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgba8_nearest_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgba8_nearest_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgba8_snorm_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgba8_snorm_linear_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgba8_snorm_linear_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgba8_snorm_nearest_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.rgba8_snorm_nearest_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.formats.srgb8_alpha8_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.srgb8_alpha8_linear_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.srgb8_alpha8_linear_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.formats.srgb8_alpha8_nearest_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.srgb8_alpha8_nearest_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.formats.srgb_r8_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.srgb_r8_linear_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.srgb_r8_linear_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.formats.srgb_r8_nearest_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.formats.srgb_r8_nearest_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.sizes.128x32x64_linear
 dEQP-GLES3.functional.texture.filtering.3d.sizes.128x32x64_linear_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.sizes.128x32x64_linear_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.sizes.128x32x64_nearest_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.sizes.128x32x64_nearest_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.sizes.63x63x63_linear
 dEQP-GLES3.functional.texture.filtering.3d.sizes.63x63x63_linear_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.sizes.63x63x63_linear_mipmap_nearest
 dEQP-GLES3.functional.texture.filtering.3d.sizes.63x63x63_nearest_mipmap_linear
 dEQP-GLES3.functional.texture.filtering.3d.sizes.63x63x63_nearest_mipmap_nearest
 dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_linear_clamp
 dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_linear_mirror
 dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_linear_repeat
 dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_mipmap_linear_linear_clamp
 dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_mipmap_linear_linear_mirror
 dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_mipmap_linear_linear_repeat
 dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_mipmap_linear_nearest_clamp
 dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_mipmap_linear_nearest_mirror
 dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_mipmap_linear_nearest_repeat
 dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_mipmap_nearest_linear_repeat
 dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_nearest_clamp
 dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_nearest_mirror
 dEQP-GLES3.functional.texture.vertex.3d.filtering.linear_nearest_repeat
 dEQP-GLES3.functional.texture.vertex.3d.filtering.nearest_linear_repeat
 dEQP-GLES3.functional.texture.vertex.3d.filtering.nearest_mipmap_linear_linear_repeat
 dEQP-GLES3.functional.texture.vertex.3d.filtering.nearest_mipmap_nearest_linear_repeat
 dEQP-GLES3.functional.texture.vertex.3d.wrap.clamp_clamp_clamp
 dEQP-GLES3.functional.texture.vertex.3d.wrap.clamp_clamp_mirror
 dEQP-GLES3.functional.texture.vertex.3d.wrap.clamp_clamp_repeat
 dEQP-GLES3.functional.texture.vertex.3d.wrap.clamp_mirror_mirror
 dEQP-GLES3.functional.texture.vertex.3d.wrap.clamp_mirror_repeat
 dEQP-GLES3.functional.texture.vertex.3d.wrap.clamp_repeat_mirror
 dEQP-GLES3.functional.texture.vertex.3d.wrap.clamp_repeat_repeat
 dEQP-GLES3.functional.texture.vertex.3d.wrap.mirror_clamp_clamp
 dEQP-GLES3.functional.texture.vertex.3d.wrap.mirror_clamp_mirror
 dEQP-GLES3.functional.texture.vertex.3d.wrap.mirror_clamp_repeat
 dEQP-GLES3.functional.texture.vertex.3d.wrap.mirror_mirror_mirror
 dEQP-GLES3.functional.texture.vertex.3d.wrap.mirror_mirror_repeat
 dEQP-GLES3.functional.texture.vertex.3d.wrap.mirror_repeat_mirror
 dEQP-GLES3.functional.texture.vertex.3d.wrap.mirror_repeat_repeat
 dEQP-GLES3.functional.texture.vertex.3d.wrap.repeat_clamp_clamp
 dEQP-GLES3.functional.texture.vertex.3d.wrap.repeat_clamp_mirror
 dEQP-GLES3.functional.texture.vertex.3d.wrap.repeat_clamp_repeat
 dEQP-GLES3.functional.texture.vertex.3d.wrap.repeat_mirror_clamp
 dEQP-GLES3.functional.texture.vertex.3d.wrap.repeat_mirror_mirror
 dEQP-GLES3.functional.texture.vertex.3d.wrap.repeat_mirror_repeat
 dEQP-GLES3.functional.texture.vertex.3d.wrap.repeat_repeat_clamp
 dEQP-GLES3.functional.texture.vertex.3d.wrap.repeat_repeat_mirror
 dEQP-GLES3.functional.texture.vertex.3d.wrap.repeat_repeat_repeat
 dEQP-GLES3.functional.texture.wrap.astc_8x8.repeat_repeat_linear_divisible
 dEQP-GLES3.functional.texture.wrap.astc_8x8.repeat_repeat_linear_not_divisible
 dEQP-GLES3.functional.texture.wrap.astc_8x8_srgb.repeat_repeat_linear_divisible
 dEQP-GLES3.functional.texture.wrap.astc_8x8_srgb.repeat_repeat_linear_not_divisible
 dEQP-GLES3.functional.vertex_arrays.single_attribute.normalize.int2_10_10_10.components4_quads1
 dEQP-GLES3.functional.vertex_arrays.single_attribute.normalize.int2_10_10_10.components4_quads256

									
										62

.gitlab-ci/meson-build.sh
									
										Executable file
									
												View File
												
				@@ -0,0 +1,62 @@

				#!/bin/bash

				set -e

				set -o xtrace

				# We need to control the version of llvm-config we're using, so we'll

				# generate a native file to do so. This requires meson >=0.49

				if test -n "$LLVM_VERSION"; then

				    LLVM_CONFIG="llvm-config-${LLVM_VERSION}"

				    echo -e "[binaries]\nllvm-config = '`which $LLVM_CONFIG`'" > native.file

				    $LLVM_CONFIG --version

				else

				    rm -f native.file

				    touch native.file

				fi

				rm -rf _build

				meson _build --native-file=native.file \

				      ${CROSS+--cross /cross_file-$CROSS.txt} \

				      -D prefix=`pwd`/install \

				      -D libdir=lib \

				      -D buildtype=${BUILDTYPE:-debug} \

				      -D build-tests=true \

				      -D libunwind=${UNWIND} \

				      ${DRI_LOADERS} \

				      -D dri-drivers=${DRI_DRIVERS:-[]} \

				      ${GALLIUM_ST} \

				      -D gallium-drivers=${GALLIUM_DRIVERS:-[]} \

				      -D vulkan-drivers=${VULKAN_DRIVERS:-[]} \

				      -D I-love-half-baked-turnips=true \

				      ${EXTRA_OPTION}

				cd _build

				meson configure

				ninja -j4

				LC_ALL=C.UTF-8 ninja test

				ninja install

				cd ..

				if test -n "$MESON_SHADERDB"; then

				    ./.gitlab-ci/run-shader-db.sh;

				fi

				# Delete 2MB of includes from artifacts.

				rm -rf install/include

				# Strip the drivers in the artifacts to cut 80% of the artifacts size.

				if [ -n "$CROSS" ]; then

				    STRIP=`sed -n -E "s/strip\s*=\s*'(.*)'/\1/p" /cross_file-$CROSS.txt`

				    if [ -z "$STRIP" ]; then

				        echo "Failed to find strip command in cross file"

				        exit 1

				    fi

				else

				    STRIP="strip"

				fi

				find install -name \*.so -exec $STRIP {} \;

				# Test runs don't pull down the git tree, so put the dEQP helper

				# script and associated bits there.

				mkdir -p artifacts/

				cp -Rp .gitlab-ci/deqp* artifacts/

				# cp -Rp src/freedreno/ci/expected* artifacts/

									
										17

.gitlab-ci/run-shader-db.sh
									
										Executable file
									
												View File
												
				@@ -0,0 +1,17 @@

				set -e

				set -v

				ARTIFACTSDIR=`pwd`/shader-db

				mkdir -p $ARTIFACTSDIR

				export DRM_SHIM_DEBUG=true

				LIBDIR=`pwd`/install/lib

				export LD_LIBRARY_PATH=$LIBDIR

				cd /usr/local/shader-db

				for driver in freedreno v3d; do

				    env LD_PRELOAD=$LIBDIR/lib${driver}_noop_drm_shim.so \

				        ./run -j 4 ./shaders \

				            > $ARTIFACTSDIR/${driver}-shader-db.txt

				done

8

.mailmap

View File

@@ -145,6 +145,11 @@ Edward O'Callaghan <funfunctor@folklore1984.net> <eocallaghan@alterapraxis.com>
 Emeric Grange <emeric.grange@gmail.com> Emeric <emeric.grange@gmail.com>
 Emil Velikov <emil.l.velikov@gmail.com> <emil.velikov@collabora.com>
 Emil Velikov <emil.l.velikov@gmail.com> <emil.veliko@collabora.com>
 Emil Velikov <emil.l.velikov@gmail.com> <emil.velikov@collabora.co.uk>
 Emil Velikov <emil.l.velikov@gmail.com> <emil.veliikov@collabora.com>
 Emil Velikov <emil.l.velikov@gmail.com> <emil.velikov@gmail.com>
 Emil Velikov <emil.l.velikov@gmail.com> <emmil.velikov@collabora.com>
 Eric Anholt <eric@anholt.net> Eric Anholt <anholt@FreeBSD.org>
@@ -260,6 +265,9 @@ Kristian Høgsberg <krh@bitplanet.net> <krh@hinata.boston.redhat.com>
 Kristian Høgsberg <krh@bitplanet.net> <krh@sasori.boston.redhat.com>
 Kristian Høgsberg <krh@bitplanet.net> <krh@temari.boston.redhat.com>
 Kristian Høgsberg <krh@bitplanet.net> <kristian.h.kristensen@intel.com>
 Kristian Høgsberg <krh@bitplanet.net> <hoegsberg@chromium.org>
 Kristian Høgsberg <krh@bitplanet.net> <hoegsberg@google.com>
 Kristian Høgsberg <krh@bitplanet.net> <hoegsberg@gmail.com>
 Krzesimir Nowak <qdlacz@gmail.com> <krzesimir@kinvolk.io>

									
										679

.travis.yml
									
												View File
												
				@@ -1,671 +1,62 @@

				language: c

				sudo: false

				dist: trusty

				os: osx

				cache:

				  apt: true

				  ccache: true

				env:

				  global:

				    - XORG_RELEASES=http://xorg.freedesktop.org/releases/individual

				    - XCB_RELEASES=http://xcb.freedesktop.org/dist

				    - WAYLAND_RELEASES=http://wayland.freedesktop.org/releases

				    - XORGMACROS_VERSION=util-macros-1.19.0

				    - GLPROTO_VERSION=glproto-1.4.17

				    - DRI2PROTO_VERSION=dri2proto-2.8

				    - LIBPCIACCESS_VERSION=libpciaccess-0.13.4

				    - LIBDRM_VERSION=libdrm-2.4.74

				    - XCBPROTO_VERSION=xcb-proto-1.13

				    - RANDRPROTO_VERSION=randrproto-1.3.0

				    - LIBXRANDR_VERSION=libXrandr-1.3.0

				    - LIBXCB_VERSION=libxcb-1.13

				    - LIBXSHMFENCE_VERSION=libxshmfence-1.2

				    - LIBVDPAU_VERSION=libvdpau-1.1

				    - LIBVA_VERSION=libva-1.7.0

				    - LIBWAYLAND_VERSION=wayland-1.15.0

				    - WAYLAND_PROTOCOLS_VERSION=wayland-protocols-1.8

				    - PKG_CONFIG_PATH=$HOME/prefix/lib/pkgconfig:$HOME/prefix/share/pkgconfig

				    - LD_LIBRARY_PATH="$HOME/prefix/lib:$LD_LIBRARY_PATH"

				    - PATH="$HOME/prefix/bin:$PATH"

				    - PKG_CONFIG_PATH=""

				matrix:

				  include:

				    - env:

				        - LABEL="meson Vulkan"

				        - BUILD=meson

				        - MESON_OPTIONS="-Ddri-drivers=[] -Dgallium-drivers=[]"

				        - LLVM_VERSION=5.0

				        - LLVM_CONFIG="llvm-config-${LLVM_VERSION}"

				      addons:

				        apt:

				          sources:

				            - llvm-toolchain-trusty-5.0

				          packages:

				            # LLVM packaging is broken and misses these dependencies

				            - libedit-dev

				            # From sources above

				            - llvm-5.0-dev

				            # Common

				            - xz-utils

				            - libexpat1-dev

				            - libelf-dev

				            - python3-pip

				      - BUILD=meson

				    - env:

				        - LABEL="meson loaders/classic DRI"

				        - BUILD=meson

				        - MESON_OPTIONS="-Dvulkan-drivers=[] -Dgallium-drivers=[]"

				      addons:

				        apt:

				          packages:

				            - xz-utils

				            - x11proto-xf86vidmode-dev

				            - libexpat1-dev

				            - libx11-xcb-dev

				            - libxdamage-dev

				            - libxfixes-dev

				            - python3-pip

				    - env:

				        - LABEL="make loaders/classic DRI"

				        - BUILD=make

				        - MAKEFLAGS="-j4"

				        - MAKE_CHECK_COMMAND="make check"

				        - DRI_LOADERS="--enable-glx --enable-gbm --enable-egl --with-platforms=x11,drm,surfaceless,wayland --enable-osmesa"

				        - DRI_DRIVERS="i915,i965,radeon,r200,swrast,nouveau"

				        - GALLIUM_ST="--enable-dri --disable-opencl --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa"

				        - GALLIUM_DRIVERS=""

				        - VULKAN_DRIVERS=""

				        - LIBUNWIND_FLAGS="--disable-libunwind"

				      addons:

				        apt:

				          packages:

				            - xz-utils

				            - x11proto-xf86vidmode-dev

				            - libexpat1-dev

				            - libx11-xcb-dev

				            - libxdamage-dev

				            - libxfixes-dev

				    - env:

				        # NOTE: Building SWR is 2x (yes two) times slower than all the other

				        # gallium drivers combined.

				        # Start this early so that it doesn't hunder the run time.

				        - LABEL="make Gallium Drivers SWR"

				        - BUILD=make

				        - MAKEFLAGS="-j4"

				        - MAKE_CHECK_COMMAND="true"

				        - LLVM_VERSION=5.0

				        - LLVM_CONFIG="llvm-config-${LLVM_VERSION}"

				        - OVERRIDE_CC="gcc-4.8"

				        - OVERRIDE_CXX="g++-4.8"

				        - DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"

				        - DRI_DRIVERS=""

				        - GALLIUM_ST="--enable-dri --disable-opencl --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa"

				        - GALLIUM_DRIVERS="swr"

				        - VULKAN_DRIVERS=""

				        - LIBUNWIND_FLAGS="--enable-libunwind"

				      addons:

				        apt:

				          sources:

				            - llvm-toolchain-trusty-5.0

				          packages:

				            # LLVM packaging is broken and misses these dependencies

				            - libedit-dev

				            # From sources above

				            - llvm-5.0-dev

				            # Common

				            - xz-utils

				            - x11proto-xf86vidmode-dev

				            - libexpat1-dev

				            - libx11-xcb-dev

				            - libelf-dev

				            - libunwind8-dev

				    - env:

				        - LABEL="make Gallium Drivers RadeonSI"

				        - BUILD=make

				        - MAKEFLAGS="-j4"

				        - MAKE_CHECK_COMMAND="true"

				        - LLVM_VERSION=5.0

				        - LLVM_CONFIG="llvm-config-${LLVM_VERSION}"

				        - DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"

				        - DRI_DRIVERS=""

				        - GALLIUM_ST="--enable-dri --disable-opencl --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa"

				        - GALLIUM_DRIVERS="radeonsi"

				        - VULKAN_DRIVERS=""

				        - LIBUNWIND_FLAGS="--enable-libunwind"

				      addons:

				        apt:

				          sources:

				            - llvm-toolchain-trusty-5.0

				          packages:

				            # LLVM packaging is broken and misses these dependencies

				            - libedit-dev

				            # From sources above

				            - llvm-5.0-dev

				            # Common

				            - xz-utils

				            - x11proto-xf86vidmode-dev

				            - libexpat1-dev

				            - libx11-xcb-dev

				            - libelf-dev

				            - libunwind8-dev

				    - env:

				        - LABEL="make Gallium Drivers Other"

				        - BUILD=make

				        - MAKEFLAGS="-j4"

				        - MAKE_CHECK_COMMAND="true"

				        - LLVM_VERSION=3.9

				        - LLVM_CONFIG="llvm-config-${LLVM_VERSION}"

				        # New binutils linker is required for llvm-3.9

				        - OVERRIDE_PATH=/usr/lib/binutils-2.26/bin

				        - DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"

				        - DRI_DRIVERS=""

				        - GALLIUM_ST="--enable-dri --disable-opencl --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa"

				        - GALLIUM_DRIVERS="i915,nouveau,pl111,r300,r600,freedreno,svga,swrast,v3d,vc4,virgl,etnaviv,imx"

				        - VULKAN_DRIVERS=""

				        - LIBUNWIND_FLAGS="--enable-libunwind"

				      addons:

				        apt:

				          sources:

				            - llvm-toolchain-trusty-3.9

				          packages:

				            - binutils-2.26

				            # LLVM packaging is broken and misses these dependencies

				            - libedit-dev

				            # From sources above

				            - llvm-3.9-dev

				            # Common

				            - xz-utils

				            - x11proto-xf86vidmode-dev

				            - libexpat1-dev

				            - libx11-xcb-dev

				            - libelf-dev

				            - libunwind8-dev

				    - env:

				        # NOTE: Analogous to SWR above, building Clover is quite slow.

				        - LABEL="make Gallium ST Clover LLVM-3.9"

				        - BUILD=make

				        - MAKEFLAGS="-j4"

				        - MAKE_CHECK_COMMAND="true"

				        - LLVM_VERSION=3.9

				        - LLVM_CONFIG="llvm-config-${LLVM_VERSION}"

				        - OVERRIDE_CC=gcc-4.7

				        - OVERRIDE_CXX=g++-4.7

				        # New binutils linker is required for llvm-3.9

				        - OVERRIDE_PATH=/usr/lib/binutils-2.26/bin

				        - DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"

				        - DRI_DRIVERS=""

				        - GALLIUM_ST="--disable-dri --enable-opencl --enable-opencl-icd --enable-llvm --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa"

				        - GALLIUM_DRIVERS="r600"

				        - VULKAN_DRIVERS=""

				        - LIBUNWIND_FLAGS="--enable-libunwind"

				      addons:

				        apt:

				          sources:

				            - llvm-toolchain-trusty-3.9

				          packages:

				            - binutils-2.26

				            - libclc-dev

				            # LLVM packaging is broken and misses these dependencies

				            - libedit-dev

				            - g++-4.7

				            # From sources above

				            - llvm-3.9-dev

				            - clang-3.9

				            - libclang-3.9-dev

				            # Common

				            - xz-utils

				            - x11proto-xf86vidmode-dev

				            - libexpat1-dev

				            - libx11-xcb-dev

				            - libelf-dev

				            - libunwind8-dev

				    - env:

				        # NOTE: Analogous to SWR above, building Clover is quite slow.

				        - LABEL="make Gallium ST Clover LLVM-4.0"

				        - BUILD=make

				        - MAKEFLAGS="-j4"

				        - MAKE_CHECK_COMMAND="true"

				        - LLVM_VERSION=4.0

				        - LLVM_CONFIG="llvm-config-${LLVM_VERSION}"

				        - OVERRIDE_CC=gcc-4.8

				        - OVERRIDE_CXX=g++-4.8

				        - DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"

				        - DRI_DRIVERS=""

				        - GALLIUM_ST="--disable-dri --enable-opencl --enable-opencl-icd --enable-llvm --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa"

				        - GALLIUM_DRIVERS="r600"

				        - VULKAN_DRIVERS=""

				        - LIBUNWIND_FLAGS="--enable-libunwind"

				      addons:

				        apt:

				          sources:

				            - llvm-toolchain-trusty-4.0

				          packages:

				            - libclc-dev

				            # LLVM packaging is broken and misses these dependencies

				            - libedit-dev

				            - g++-4.8

				            # From sources above

				            - llvm-4.0-dev

				            - clang-4.0

				            - libclang-4.0-dev

				            # Common

				            - xz-utils

				            - x11proto-xf86vidmode-dev

				            - libexpat1-dev

				            - libx11-xcb-dev

				            - libelf-dev

				            - libunwind8-dev

				    - env:

				        # NOTE: Analogous to SWR above, building Clover is quite slow.

				        - LABEL="make Gallium ST Clover LLVM-5.0"

				        - BUILD=make

				        - MAKEFLAGS="-j4"

				        - MAKE_CHECK_COMMAND="true"

				        - LLVM_VERSION=5.0

				        - LLVM_CONFIG="llvm-config-${LLVM_VERSION}"

				        - OVERRIDE_CC=gcc-4.8

				        - OVERRIDE_CXX=g++-4.8

				        - DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"

				        - DRI_DRIVERS=""

				        - GALLIUM_ST="--disable-dri --enable-opencl --enable-opencl-icd --enable-llvm --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa"

				        - GALLIUM_DRIVERS="r600,radeonsi"

				        - VULKAN_DRIVERS=""

				        - LIBUNWIND_FLAGS="--enable-libunwind"

				      addons:

				        apt:

				          sources:

				            - llvm-toolchain-trusty-5.0

				          packages:

				            - libclc-dev

				            # LLVM packaging is broken and misses these dependencies

				            - libedit-dev

				            - g++-4.8

				            # From sources above

				            - llvm-5.0-dev

				            - clang-5.0

				            - libclang-5.0-dev

				            # Common

				            - xz-utils

				            - x11proto-xf86vidmode-dev

				            - libexpat1-dev

				            - libx11-xcb-dev

				            - libelf-dev

				            - libunwind8-dev

				    - env:

				        # NOTE: Analogous to SWR above, building Clover is quite slow.

				        - LABEL="make Gallium ST Clover LLVM-6.0"

				        - BUILD=make

				        - MAKEFLAGS="-j4"

				        - MAKE_CHECK_COMMAND="true"

				        - LLVM_VERSION=6.0

				        - LLVM_CONFIG="llvm-config-${LLVM_VERSION}"

				        - DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"

				        - DRI_DRIVERS=""

				        - GALLIUM_ST="--disable-dri --enable-opencl --enable-opencl-icd --enable-llvm --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa"

				        - GALLIUM_DRIVERS="r600,radeonsi"

				        - VULKAN_DRIVERS=""

				        - LIBUNWIND_FLAGS="--enable-libunwind"

				      addons:

				        apt:

				          sources:

				            - llvm-toolchain-trusty-6.0

				            # llvm-6 depends on gcc-4.9 which is not in main repo

				            - ubuntu-toolchain-r-test

				          packages:

				            - libclc-dev

				            # From sources above

				            - llvm-6.0-dev

				            - clang-6.0

				            - libclang-6.0-dev

				            # Common

				            - xz-utils

				            - x11proto-xf86vidmode-dev

				            - libexpat1-dev

				            - libx11-xcb-dev

				            - libelf-dev

				            - libunwind8-dev

				    - env:

				        - LABEL="make Gallium ST Other"

				        - BUILD=make

				        - MAKEFLAGS="-j4"

				        - MAKE_CHECK_COMMAND="true"

				        - LLVM_VERSION=3.3

				        - LLVM_CONFIG="llvm-config-${LLVM_VERSION}"

				        - DRI_LOADERS="--disable-glx --disable-gbm --disable-egl"

				        - DRI_DRIVERS=""

				        - GALLIUM_ST="--enable-dri --disable-opencl --enable-xa --enable-nine --enable-xvmc --enable-vdpau --enable-va --enable-omx-bellagio --enable-gallium-osmesa"

				        # We need swrast for osmesa and nine.

				        # i915 most likely doesn't work with most ST.

				        # Regardless - we're doing a quick build test here.

				        - GALLIUM_DRIVERS="i915,swrast"

				        - VULKAN_DRIVERS=""

				        - LIBUNWIND_FLAGS="--enable-libunwind"

				      addons:

				        apt:

				          packages:

				            # We actually want to test against llvm-3.3

				            - llvm-3.3-dev

				            # Nine requires gcc 4.6... which is the one we have right ?

				            - libxvmc-dev

				            # Build locally, for now.

				            #- libvdpau-dev

				            #- libva-dev

				            - libomxil-bellagio-dev

				            # LLVM packaging is broken and misses these dependencies

				            - libedit-dev

				            # Common

				            - xz-utils

				            - x11proto-xf86vidmode-dev

				            - libexpat1-dev

				            - libx11-xcb-dev

				            - libelf-dev

				            - libunwind8-dev

				    - env:

				        - LABEL="make Vulkan"

				        - BUILD=make

				        - MAKEFLAGS="-j4"

				        - MAKE_CHECK_COMMAND="make -C src/gtest check && make -C src/intel check"

				        - LLVM_VERSION=5.0

				        - LLVM_CONFIG="llvm-config-${LLVM_VERSION}"

				        - DRI_LOADERS="--disable-glx --disable-gbm --disable-egl --with-platforms=x11,wayland"

				        - DRI_DRIVERS=""

				        - GALLIUM_ST="--enable-dri --enable-dri3 --disable-opencl --disable-xa --disable-nine --disable-xvmc --disable-vdpau --disable-va --disable-omx-bellagio --disable-gallium-osmesa"

				        - GALLIUM_DRIVERS=""

				        - VULKAN_DRIVERS="intel,radeon"

				        - LIBUNWIND_FLAGS="--disable-libunwind"

				      addons:

				        apt:

				          sources:

				            - llvm-toolchain-trusty-5.0

				          packages:

				            # LLVM packaging is broken and misses these dependencies

				            - libedit-dev

				            # From sources above

				            - llvm-5.0-dev

				            # Common

				            - xz-utils

				            - x11proto-xf86vidmode-dev

				            - libexpat1-dev

				            - libx11-xcb-dev

				            - libelf-dev

				    - env:

				        - LABEL="scons"

				        - BUILD=scons

				        - SCONSFLAGS="-j4"

				        # Explicitly disable.

				        - SCONS_TARGET="llvm=0"

				        # Keep it symmetrical to the make build.

				        - SCONS_CHECK_COMMAND="scons llvm=0 check"

				      addons:

				        apt:

				          packages:

				            # Common

				            - xz-utils

				            - x11proto-xf86vidmode-dev

				            - libexpat1-dev

				            - libx11-xcb-dev

				            - libelf-dev

				    - env:

				        - LABEL="scons LLVM"

				        - BUILD=scons

				        - SCONSFLAGS="-j4"

				        - SCONS_TARGET="llvm=1"

				        # Keep it symmetrical to the make build.

				        - SCONS_CHECK_COMMAND="scons llvm=1 check"

				        - LLVM_VERSION=3.3

				        - LLVM_CONFIG="llvm-config-${LLVM_VERSION}"

				      addons:

				        apt:

				          packages:

				            # LLVM packaging is broken and misses these dependencies

				            - libedit-dev

				            - llvm-3.3-dev

				            # Common

				            - xz-utils

				            - x11proto-xf86vidmode-dev

				            - libexpat1-dev

				            - libx11-xcb-dev

				            - libelf-dev

				    - env:

				        - LABEL="scons SWR"

				        - BUILD=scons

				        - SCONSFLAGS="-j4"

				        - SCONS_TARGET="swr=1"

				        - LLVM_VERSION=5.0

				        - LLVM_CONFIG="llvm-config-${LLVM_VERSION}"

				        # Keep it symmetrical to the make build. There's no actual SWR, yet.

				        - SCONS_CHECK_COMMAND="true"

				        - OVERRIDE_CC="gcc-4.8"

				        - OVERRIDE_CXX="g++-4.8"

				      addons:

				        apt:

				          sources:

				            - llvm-toolchain-trusty-5.0

				          packages:

				            # LLVM packaging is broken and misses these dependencies

				            - libedit-dev

				            # From sources above

				            - llvm-5.0-dev

				            # Common

				            - xz-utils

				            - x11proto-xf86vidmode-dev

				            - libexpat1-dev

				            - libx11-xcb-dev

				            - libelf-dev

				    - env:

				        - LABEL="macOS make"

				        - BUILD=make

				        - MAKEFLAGS="-j4"

				        - MAKE_CHECK_COMMAND="make check"

				        - DRI_LOADERS="--with-platforms=x11 --disable-egl"

				      os: osx

				    - env:

				        - LABEL="macOS meson"

				        - BUILD=meson

				        - MESON_OPTIONS="-Degl=false"

				      os: osx

				      - BUILD=scons

				before_install:

				  - |

				    if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then

				      HOMEBREW_NO_AUTO_UPDATE=1 brew install python3 ninja expat gettext

				      # Set PATH for homebrew pip3 installs

				      PATH="$HOME/Library/Python/3.6/bin:${PATH}"

				      # Set PKG_CONFIG_PATH for keg-only expat

				      PKG_CONFIG_PATH="/usr/local/opt/expat/lib/pkgconfig:${PKG_CONFIG_PATH}"

				      # Set PATH for keg-only gettext

				      PATH="/usr/local/opt/gettext/bin:${PATH}"

				      # Install xquartz for prereqs ...

				      XQUARTZ_VERSION="2.7.11"

				      wget -nv https://dl.bintray.com/xquartz/downloads/XQuartz-${XQUARTZ_VERSION}.dmg

				      hdiutil attach XQuartz-${XQUARTZ_VERSION}.dmg

				      sudo installer -pkg /Volumes/XQuartz-${XQUARTZ_VERSION}/XQuartz.pkg -target /

				      hdiutil detach /Volumes/XQuartz-${XQUARTZ_VERSION}

				      # ... and set paths

				      PATH="/opt/X11/bin:${PATH}"

				      PKG_CONFIG_PATH="/opt/X11/share/pkgconfig:/opt/X11/lib/pkgconfig:${PKG_CONFIG_PATH}"

				      ACLOCAL="aclocal -I /opt/X11/share/aclocal -I /usr/local/share/aclocal"

				  - HOMEBREW_NO_AUTO_UPDATE=1 brew install expat gettext

				  - if test "x$BUILD" = xmeson; then

				      HOMEBREW_NO_AUTO_UPDATE=1 brew install python3 ninja;

				    fi

				  - if test "x$BUILD" = xscons; then

				      HOMEBREW_NO_AUTO_UPDATE=1 brew install python2 scons;

				    fi

				  # Set PATH for homebrew pip3 installs

				  - PATH="$HOME/Library/Python/3.6/bin:${PATH}"

				  # Set PKG_CONFIG_PATH for keg-only expat

				  - PKG_CONFIG_PATH="/usr/local/opt/expat/lib/pkgconfig:${PKG_CONFIG_PATH}"

				  # Set PATH for keg-only gettext

				  - PATH="/usr/local/opt/gettext/bin:${PATH}"

				  # Install xquartz for prereqs ...

				  - XQUARTZ_VERSION="2.7.11"

				  - wget -nv https://dl.bintray.com/xquartz/downloads/XQuartz-${XQUARTZ_VERSION}.dmg

				  - hdiutil attach XQuartz-${XQUARTZ_VERSION}.dmg

				  - sudo installer -pkg /Volumes/XQuartz-${XQUARTZ_VERSION}/XQuartz.pkg -target /

				  - hdiutil detach /Volumes/XQuartz-${XQUARTZ_VERSION}

				  # ... and set paths

				  - PKG_CONFIG_PATH="/opt/X11/share/pkgconfig:/opt/X11/lib/pkgconfig:${PKG_CONFIG_PATH}"

				install:

				  - pip2 install --user mako

				  # Install a more modern meson from pip, since the version in the

				  # ubuntu repos is often quite old. Avoid >=0.45.0 as it needs python

				  # 3.5+

				  - if test "x$BUILD" = xmeson; then

				      pip3 install --user "meson<0.45.0";

				      pip3 install --user meson;

				      pip3 install --user mako;

				    fi

				  # Install a more modern scons from pip.

				  - if test "x$BUILD" = xscons; then

				      pip2 install --user "scons>=2.4";

				    fi

				  # Since libdrm gets updated in configure.ac regularly, try to pick up the

				  # latest version from there.

				  - for line in `grep "^LIBDRM.*_REQUIRED=" configure.ac`; do

				      old_ver=`echo $LIBDRM_VERSION | sed 's/libdrm-//'`;

				      new_ver=`echo $line | sed 's/.*REQUIRED=//'`;

				      if `echo "$old_ver,$new_ver" | tr ',' '\n' | sort -Vc 2> /dev/null`; then

				        export LIBDRM_VERSION="libdrm-$new_ver";

				      fi;

				    done

				  # Install dependencies where we require specific versions (or where

				  # disallowed by Travis CI's package whitelisting).

				  - |

				    if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then

				      wget $XORG_RELEASES/util/$XORGMACROS_VERSION.tar.bz2

				      tar -jxvf $XORGMACROS_VERSION.tar.bz2

				      (cd $XORGMACROS_VERSION && ./configure --prefix=$HOME/prefix && make install)

				      wget $XORG_RELEASES/proto/$GLPROTO_VERSION.tar.bz2

				      tar -jxvf $GLPROTO_VERSION.tar.bz2

				      (cd $GLPROTO_VERSION && ./configure --prefix=$HOME/prefix && make install)

				      wget $XORG_RELEASES/proto/$DRI2PROTO_VERSION.tar.bz2

				      tar -jxvf $DRI2PROTO_VERSION.tar.bz2

				      (cd $DRI2PROTO_VERSION && ./configure --prefix=$HOME/prefix && make install)

				      wget $XCB_RELEASES/$XCBPROTO_VERSION.tar.bz2

				      tar -jxvf $XCBPROTO_VERSION.tar.bz2

				      (cd $XCBPROTO_VERSION && ./configure --prefix=$HOME/prefix && make install)

				      wget $XCB_RELEASES/$LIBXCB_VERSION.tar.bz2

				      tar -jxvf $LIBXCB_VERSION.tar.bz2

				      (cd $LIBXCB_VERSION && ./configure --prefix=$HOME/prefix && make install)

				      wget $XORG_RELEASES/lib/$LIBPCIACCESS_VERSION.tar.bz2

				      tar -jxvf $LIBPCIACCESS_VERSION.tar.bz2

				      (cd $LIBPCIACCESS_VERSION && ./configure --prefix=$HOME/prefix && make install)

				      wget http://dri.freedesktop.org/libdrm/$LIBDRM_VERSION.tar.bz2

				      tar -jxvf $LIBDRM_VERSION.tar.bz2

				      (cd $LIBDRM_VERSION && ./configure --prefix=$HOME/prefix --enable-vc4 --enable-freedreno --enable-etnaviv-experimental-api && make install)

				      wget $XORG_RELEASES/proto/$RANDRPROTO_VERSION.tar.bz2

				      tar -jxvf $RANDRPROTO_VERSION.tar.bz2

				      (cd $RANDRPROTO_VERSION && ./configure --prefix=$HOME/prefix && make install)

				      wget $XORG_RELEASES/lib/$LIBXRANDR_VERSION.tar.bz2

				      tar -jxvf $LIBXRANDR_VERSION.tar.bz2

				      (cd $LIBXRANDR_VERSION && ./configure --prefix=$HOME/prefix && make install)

				      wget $XORG_RELEASES/lib/$LIBXSHMFENCE_VERSION.tar.bz2

				      tar -jxvf $LIBXSHMFENCE_VERSION.tar.bz2

				      (cd $LIBXSHMFENCE_VERSION && ./configure --prefix=$HOME/prefix && make install)

				      wget http://people.freedesktop.org/~aplattner/vdpau/$LIBVDPAU_VERSION.tar.bz2

				      tar -jxvf $LIBVDPAU_VERSION.tar.bz2

				      (cd $LIBVDPAU_VERSION && ./configure --prefix=$HOME/prefix && make install)

				      wget http://www.freedesktop.org/software/vaapi/releases/libva/$LIBVA_VERSION.tar.bz2

				      tar -jxvf $LIBVA_VERSION.tar.bz2

				      (cd $LIBVA_VERSION && ./configure --prefix=$HOME/prefix --disable-wayland --disable-dummy-driver && make install)

				      wget $WAYLAND_RELEASES/$LIBWAYLAND_VERSION.tar.xz

				      tar -axvf $LIBWAYLAND_VERSION.tar.xz

				      (cd $LIBWAYLAND_VERSION && ./configure --prefix=$HOME/prefix --enable-libraries --without-host-scanner --disable-documentation --disable-dtd-validation && make install)

				      wget $WAYLAND_RELEASES/$WAYLAND_PROTOCOLS_VERSION.tar.xz

				      tar -axvf $WAYLAND_PROTOCOLS_VERSION.tar.xz

				      (cd $WAYLAND_PROTOCOLS_VERSION && ./configure --prefix=$HOME/prefix && make install)

				      # Meson requires ninja >= 1.6, but trusty has 1.3.x

				      wget https://github.com/ninja-build/ninja/releases/download/v1.6.0/ninja-linux.zip

				      unzip ninja-linux.zip

				      mv ninja $HOME/prefix/bin/

				      # Generate this header since one is missing on the Travis instance

				      mkdir -p linux

				      printf "%s\n" \

				           "#ifndef _LINUX_MEMFD_H" \

				           "#define _LINUX_MEMFD_H" \

				           "" \

				           "#define MFD_CLOEXEC             0x0001U" \

				           "#define MFD_ALLOW_SEALING       0x0002U" \

				           "" \

				           "#endif /* _LINUX_MEMFD_H */" > linux/memfd.h

				      # Generate this header, including the missing SYS_memfd_create

				      # macro, which is not provided by the header in the Travis

				      # instance

				      mkdir -p sys

				      printf "%s\n" \

				           "#ifndef _SYSCALL_H" \

				           "#define _SYSCALL_H      1" \

				           "" \

				           "#include <asm/unistd.h>" \

				           "" \

				           "#ifndef _LIBC" \

				           "# include <bits/syscall.h>" \

				           "#endif" \

				           "" \

				           "#ifndef __NR_memfd_create" \

				           "# define __NR_memfd_create 319 /* Taken from <asm/unistd_64.h> */" \

				           "#endif" \

				           "" \

				           "#ifndef SYS_memfd_create" \

				           "# define SYS_memfd_create __NR_memfd_create" \

				           "#endif" \

				           "" \

				           "#endif" > sys/syscall.h

				      pip2 install --user mako;

				    fi

				script:

				  - if test "x$BUILD" = xmake; then

				      test -n "$OVERRIDE_CC" && export CC="$OVERRIDE_CC";

				      test -n "$OVERRIDE_CXX" && export CXX="$OVERRIDE_CXX";

				      test -n "$OVERRIDE_PATH" && export PATH="$OVERRIDE_PATH:$PATH";

				      export CFLAGS="$CFLAGS -isystem`pwd`";

				      mkdir build &&

				      cd build &&

				      ../autogen.sh --enable-debug

				        $LIBUNWIND_FLAGS

				        $DRI_LOADERS

				        --with-dri-drivers=$DRI_DRIVERS

				        $GALLIUM_ST

				        --with-gallium-drivers=$GALLIUM_DRIVERS

				        --with-vulkan-drivers=$VULKAN_DRIVERS

				        --disable-llvm-shared-libs

				        &&

				      make && eval $MAKE_CHECK_COMMAND;

				  - if test "x$BUILD" = xmeson; then

				      meson _build -Dbuild-tests=true;

				      ninja -C _build;

				      ninja -C _build test;

				    fi

				  - if test "x$BUILD" = xscons; then

				      test -n "$OVERRIDE_CC" && export CC="$OVERRIDE_CC";

				      test -n "$OVERRIDE_CXX" && export CXX="$OVERRIDE_CXX";

				      scons $SCONS_TARGET && eval $SCONS_CHECK_COMMAND;

				    fi

				  - |

				    if test "x$BUILD" = xmeson; then

				      # Travis CI has moved to LLVM 5.0, and meson is detecting

				      # automatically the available version in /usr/local/bin based on

				      # the PATH env variable order preference.

				      #

				      # As for 0.44.x, Meson cannot receive the path to the

				      # llvm-config binary as a configuration parameter. See

				      # https://github.com/mesonbuild/meson/issues/2887 and

				      # https://github.com/dcbaker/meson/commit/7c8b6ee3fa42f43c9ac7dcacc61a77eca3f1bcef

				      #

				      # We want to use the custom (APT) installed version. Therefore,

				      # let's make Meson find our wanted version sooner than the one

				      # at /usr/local/bin

				      #

				      # Once this is corrected, we would still need a patch similar

				      # to:

				      # https://lists.freedesktop.org/archives/mesa-dev/2017-December/180217.html

				      test -f /usr/bin/$LLVM_CONFIG && ln -s /usr/bin/$LLVM_CONFIG $HOME/prefix/bin/llvm-config

				      export CFLAGS="$CFLAGS -isystem`pwd`"

				      meson _build $MESON_OPTIONS

				      ninja -C _build

				      scons;

				      scons check;

				    fi

									
										19

Android.common.mk
									
												View File
												
				@@ -32,12 +32,12 @@ LOCAL_C_INCLUDES += \

				MESA_VERSION := $(shell cat $(MESA_TOP)/VERSION)

				LOCAL_CFLAGS += \

					-Wno-error \

					-Werror=incompatible-pointer-types \

					-Wno-unused-parameter \

					-Wno-pointer-arith \

					-Wno-missing-field-initializers \

					-Wno-initializer-overrides \

					-Wno-mismatched-tags \

					-DVERSION=\"$(MESA_VERSION)\" \

					-DPACKAGE_VERSION=\"$(MESA_VERSION)\" \

					-DPACKAGE_BUGREPORT=\"https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa\"

				@@ -76,6 +76,8 @@ LOCAL_CFLAGS += \

					-DMAJOR_IN_SYSMACROS \

					-DVK_USE_PLATFORM_ANDROID_KHR \

					-fvisibility=hidden \

					-fno-math-errno \

					-fno-trapping-math \

					-Wno-sign-compare

				LOCAL_CPPFLAGS += \

				@@ -89,12 +91,21 @@ LOCAL_CPPFLAGS += \

				LOCAL_CONLYFLAGS += \

					-std=c99

				ifeq ($(strip $(MESA_ENABLE_ASM)),true)

				# c11 timespec_get is part of bionic as well

				# https://android-review.googlesource.com/c/718518

				# This means releases from P and earlier won't need this

				ifeq ($(filter 5 6 7 8 9, $(MESA_ANDROID_MAJOR_VERSION)),)

				LOCAL_CFLAGS += -DHAVE_TIMESPEC_GET

				endif

				# Android's libc began supporting shm in Oreo

				ifeq ($(shell test $(PLATFORM_SDK_VERSION) -ge 26 && echo true),true)

				LOCAL_CFLAGS += -DHAVE_SYS_SHM_H

				endif

				ifeq ($(TARGET_ARCH),x86)

				LOCAL_CFLAGS += \

					-DUSE_X86_ASM

				endif

				endif

				ifeq ($(ARCH_ARM_HAVE_NEON),true)

				LOCAL_CFLAGS_arm += -DUSE_ARM_ASM

									
										26

Android.mk
									
												View File
												
				@@ -24,7 +24,7 @@

				# BOARD_GPU_DRIVERS should be defined.  The valid values are

				#

				#   classic drivers: i915 i965

				#   gallium drivers: swrast freedreno i915g nouveau pl111 r300g r600g radeonsi vc4 virgl vmwgfx etnaviv imx

				#   gallium drivers: swrast freedreno i915g nouveau kmsro r300g r600g radeonsi vc4 virgl vmwgfx etnaviv iris lima

				#

				# The main target is libGLES_mesa.  For each classic driver enabled, a DRI

				# module will also be built.  DRI modules will be loaded by libGLES_mesa.

				@@ -52,7 +52,7 @@ gallium_drivers := \

					freedreno.HAVE_GALLIUM_FREEDRENO \

					i915g.HAVE_GALLIUM_I915 \

					nouveau.HAVE_GALLIUM_NOUVEAU \

					pl111.HAVE_GALLIUM_PL111 \

					kmsro.HAVE_GALLIUM_KMSRO \

					r300g.HAVE_GALLIUM_R300 \

					r600g.HAVE_GALLIUM_R600 \

					radeonsi.HAVE_GALLIUM_RADEONSI \

				@@ -60,7 +60,8 @@ gallium_drivers := \

					vc4.HAVE_GALLIUM_VC4 \

					virgl.HAVE_GALLIUM_VIRGL \

					etnaviv.HAVE_GALLIUM_ETNAVIV \

					imx.HAVE_GALLIUM_IMX

					iris.HAVE_GALLIUM_IRIS \

					lima.HAVE_GALLIUM_LIMA

				ifeq ($(BOARD_GPU_DRIVERS),all)

				MESA_BUILD_CLASSIC := $(filter HAVE_%, $(subst ., , $(classic_drivers)))

				@@ -82,13 +83,6 @@ endif

				$(foreach d, $(MESA_BUILD_CLASSIC) $(MESA_BUILD_GALLIUM), $(eval $(d) := true))

				# host and target must be the same arch to generate matypes.h

				ifeq ($(TARGET_ARCH),$(HOST_ARCH))

				MESA_ENABLE_ASM := true

				else

				MESA_ENABLE_ASM := false

				endif

				ifneq ($(filter true, $(HAVE_GALLIUM_RADEONSI)),)

				MESA_ENABLE_LLVM := true

				endif

				@@ -97,18 +91,19 @@ define mesa-build-with-llvm

				  $(if $(filter $(MESA_ANDROID_MAJOR_VERSION), 4 5), \

				    $(warning Unsupported LLVM version in Android $(MESA_ANDROID_MAJOR_VERSION)),) \

				  $(if $(filter 6,$(MESA_ANDROID_MAJOR_VERSION)), \

				    $(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0307 -DMESA_LLVM_VERSION_PATCH=0)) \

				    $(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0307 -DMESA_LLVM_VERSION_STRING=\"3.7\")) \

				  $(if $(filter 7,$(MESA_ANDROID_MAJOR_VERSION)), \

				    $(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0308 -DMESA_LLVM_VERSION_PATCH=0)) \

				    $(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0308 -DMESA_LLVM_VERSION_STRING=\"3.8\")) \

				  $(if $(filter 8,$(MESA_ANDROID_MAJOR_VERSION)), \

				    $(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0309 -DMESA_LLVM_VERSION_PATCH=0)) \

				    $(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0309 -DMESA_LLVM_VERSION_STRING=\"3.9\")) \

				  $(if $(filter P,$(MESA_ANDROID_MAJOR_VERSION)), \

				    $(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0309 -DMESA_LLVM_VERSION_PATCH=0)) \

				    $(eval LOCAL_CFLAGS += -DHAVE_LLVM=0x0309 -DMESA_LLVM_VERSION_STRING=\"3.9\")) \

				  $(eval LOCAL_SHARED_LIBRARIES += libLLVM)

				endef

				# add subdirectories

				SUBDIRS := \

					src/freedreno \

					src/gbm \

					src/loader \

					src/mapi \

				@@ -120,7 +115,8 @@ SUBDIRS := \

					src/broadcom \

					src/intel \

					src/mesa/drivers/dri \

					src/vulkan

					src/vulkan \

					src/panfrost \

				INC_DIRS := $(call all-named-subdir-makefiles,$(SUBDIRS))

				INC_DIRS += $(call all-named-subdir-makefiles,src/gallium)

									
										6

CleanSpec.mk
									
												View File
												
				@@ -10,7 +10,7 @@ $(call add-clean-step, rm -rf $(PRODUCT_OUT)/*/STATIC_LIBRARIES/libmesa_*_interm

				$(call add-clean-step, rm -rf $(PRODUCT_OUT)/*/SHARED_LIBRARIES/i9?5_dri_intermediates)

				$(call add-clean-step, rm -rf $(PRODUCT_OUT)/*/SHARED_LIBRARIES/libglapi_intermediates)

				$(call add-clean-step, rm -rf $(PRODUCT_OUT)/*/SHARED_LIBRARIES/libGLES_mesa_intermediates)

				$(call add-clean-step, rm -rf $(HOST_OUT_release)/*/EXECUTABLES/mesa_*_intermediates)

				$(call add-clean-step, rm -rf $(HOST_OUT_release)/*/EXECUTABLES/glsl_compiler_intermediates)

				$(call add-clean-step, rm -rf $(HOST_OUT_release)/*/STATIC_LIBRARIES/libmesa_*_intermediates)

				$(call add-clean-step, rm -rf $(HOST_OUT)/*/EXECUTABLES/mesa_*_intermediates)

				$(call add-clean-step, rm -rf $(HOST_OUT)/*/EXECUTABLES/glsl_compiler_intermediates)

				$(call add-clean-step, rm -rf $(HOST_OUT)/*/STATIC_LIBRARIES/libmesa_*_intermediates)

				$(call add-clean-step, rm -rf $(PRODUCT_OUT)/*/SHARED_LIBRARIES/*_dri_intermediates)

									
										92

Makefile.am
									
												View File
											
				@@ -1,92 +0,0 @@

				# Copyright © 2012 Intel Corporation

				#

				# Permission is hereby granted, free of charge, to any person obtaining a

				# copy of this software and associated documentation files (the "Software"),

				# to deal in the Software without restriction, including without limitation

				# the rights to use, copy, modify, merge, publish, distribute, sublicense,

				# and/or sell copies of the Software, and to permit persons to whom the

				# Software is furnished to do so, subject to the following conditions:

				#

				# The above copyright notice and this permission notice (including the next

				# paragraph) shall be included in all copies or substantial portions of the

				# Software.

				#

				# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR

				# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,

				# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL

				# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER

				# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING

				# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS

				# IN THE SOFTWARE.

				SUBDIRS = src

				AM_DISTCHECK_CONFIGURE_FLAGS = \

					--enable-dri \

					--enable-dri3 \

					--enable-egl \

					--enable-gallium-tests \

					--enable-gallium-osmesa \

					--enable-llvm \

					--enable-gbm \

					--enable-gles1 \

					--enable-gles2 \

					--enable-glx \

					--enable-glx-tls \

					--enable-nine \

					--enable-opencl \

					--enable-opencl-icd \

					--enable-opengl \

					--enable-va \

					--enable-vdpau \

					--enable-xa \

					--enable-xvmc \

					--enable-llvm-shared-libs \

					--enable-libunwind \

					--with-platforms=x11,wayland,drm,surfaceless \

					--with-dri-drivers=i915,i965,nouveau,radeon,r200,swrast \

					--with-gallium-drivers=i915,nouveau,r300,pl111,r600,radeonsi,freedreno,svga,swrast,vc4,tegra,virgl,swr,etnaviv,imx \

					--with-vulkan-drivers=intel,radeon

				ACLOCAL_AMFLAGS = -I m4

				EXTRA_DIST = \

					autogen.sh \

					common.py \

					docs \

					doxygen \

					bin/git_sha1_gen.py \

					scons \

					SConstruct \

					build-support/conftest.dyn \

					build-support/conftest.map \

					meson.build \

					meson_options.txt \

					bin/meson.build \

					include/meson.build \

					bin/install_megadrivers.py \

					bin/meson_get_version.py

				noinst_HEADERS = \

					include/c99_alloca.h \

					include/c99_compat.h \

					include/c99_math.h \

					include/c11 \

					include/drm-uapi/drm.h \

					include/drm-uapi/drm_fourcc.h \

					include/drm-uapi/drm_mode.h \

					include/drm-uapi/i915_drm.h \

					include/drm-uapi/tegra_drm.h \

					include/drm-uapi/v3d_drm.h \

					include/drm-uapi/vc4_drm.h \

					include/D3D9 \

					include/GL/wglext.h \

					include/HaikuGL \

					include/no_extern_c.h \

					include/pci_ids \

					include/vulkan

				# We list some directories in EXTRA_DIST, but don't actually want to include

				# the .gitignore files in the tarball.

				dist-hook:

					find $(distdir) -name .gitignore -exec $(RM) {} +

									
										19

README.rst
									
												View File
												
				@@ -9,25 +9,6 @@ This repository lives at https://gitlab.freedesktop.org/mesa/mesa.

				Other repositories are likely forks, and code found there is not supported.

				Build status

				------------

				Travis:

				.. image:: https://travis-ci.org/mesa3d/mesa.svg?branch=master

				    :target: https://travis-ci.org/mesa3d/mesa

				Appveyor:

				.. image:: https://img.shields.io/appveyor/ci/mesa3d/mesa.svg

				    :target: https://ci.appveyor.com/project/mesa3d/mesa

				Coverity:

				.. image:: https://scan.coverity.com/projects/139/badge.svg?flat=1

				    :target: https://scan.coverity.com/projects/mesa

				Build & install

				---------------

15

REVIEWERS

View File

@@ -72,7 +72,9 @@ F: src/loader/
 EGL
 R: Eric Engestrom <eric@engestrom.ch>
 R: Emil Velikov <emil.l.velikov@gmail.com>
 F: src/egl/
 F: include/EGL/
 HAIKU
 R: Alexander von Gluck IV <kallisti5@unixzen.com>
@@ -92,14 +94,6 @@ GALLIUM TARGETS
 R: Emil Velikov <emil.l.velikov@gmail.com>
 F: src/gallium/targets/
 AUTOCONF BUILD
 R: Emil Velikov <emil.l.velikov@gmail.com>
 F: autogen.sh
 F: configure.ac
 F: */Automake.inc
 F: */Makefile.*am
 F: */Makefile.sources
 SCONS BUILD
 F: scons/
 F: */SConscript*
@@ -136,3 +130,8 @@ F:	src/gallium/drivers/freedreno/
 GLX
 R: Adam Jackson <ajax@redhat.com>
 F: src/glx/
 VULKAN
 R: Eric Engestrom <eric@engestrom.ch>
 F: src/vulkan/
 F: include/vulkan/

									
										1

SConstruct
									
												View File
												
				@@ -31,6 +31,7 @@ import common

				# Minimal scons version

				EnsureSConsVersion(2, 4)

				EnsurePythonVersion(2, 7)

				#######################################################################

2

VERSION

View File

@@ -1 +1 @@
 .2.0-devel
 .2.0-devel

									
										30

appveyor.yml
									
												View File
												
				@@ -33,31 +33,41 @@ branches:

				# - https://www.appveyor.com/blog/2014/06/04/shallow-clone-for-git-repositories

				clone_depth: 100

				# https://www.appveyor.com/docs/build-cache/

				cache:

				- win_flex_bison-2.5.9.zip

				- llvm-5.0.1-msvc2015-mtd.7z

				- '%LOCALAPPDATA%\pip\Cache -> appveyor.yml'

				- win_flex_bison-2.5.15.zip

				- llvm-5.0.1-msvc2017-mtd.7z

				os: Visual Studio 2015

				os: Visual Studio 2017

				init:

				# Appveyor defaults core.autocrlf to input instead of the default (true), but

				# that can hide problems processing CRLF text on Windows

				- git config --global core.autocrlf true

				environment:

				  WINFLEXBISON_ARCHIVE: win_flex_bison-2.5.9.zip

				  LLVM_ARCHIVE: llvm-5.0.1-msvc2015-mtd.7z

				  WINFLEXBISON_VERSION: 2.5.15

				  LLVM_ARCHIVE: llvm-5.0.1-msvc2017-mtd.7z

				install:

				# Check git config

				- git config core.autocrlf

				# Check pip

				- python --version

				- python -m pip --version

				# Install Mako

				- python -m pip install Mako==1.0.6

				- python -m pip install Mako==1.0.7

				# Install pywin32 extensions, needed by SCons

				- python -m pip install pypiwin32

				# Install python wheels, necessary to install SCons via pip

				- python -m pip install wheel

				# Install SCons

				- python -m pip install scons==2.5.1

				- python -m pip install scons==3.0.1

				- scons --version

				# Install flex/bison

				- if not exist "%WINFLEXBISON_ARCHIVE%" appveyor DownloadFile "https://downloads.sourceforge.net/project/winflexbison/old_versions/%WINFLEXBISON_ARCHIVE%"

				- set WINFLEXBISON_ARCHIVE=win_flex_bison-%WINFLEXBISON_VERSION%.zip

				- if not exist "%WINFLEXBISON_ARCHIVE%" appveyor DownloadFile "https://github.com/lexxmark/winflexbison/releases/download/v%WINFLEXBISON_VERSION%/%WINFLEXBISON_ARCHIVE%"

				- 7z x -y -owinflexbison\ "%WINFLEXBISON_ARCHIVE%" > nul

				- set Path=%CD%\winflexbison;%Path%

				- win_flex --version

				@@ -69,10 +79,10 @@ install:

				- set LLVM=%CD%\llvm

				build_script:

				- scons -j%NUMBER_OF_PROCESSORS% MSVC_VERSION=14.0 llvm=1

				- scons -j%NUMBER_OF_PROCESSORS% MSVC_VERSION=14.1 llvm=1

				after_build:

				- scons -j%NUMBER_OF_PROCESSORS% MSVC_VERSION=14.0 llvm=1 check

				- scons -j%NUMBER_OF_PROCESSORS% MSVC_VERSION=14.1 llvm=1 check

				# It's possible to setup notification here, as described in

									
										14

autogen.sh
									
												View File
											
				@@ -1,14 +0,0 @@

				#! /bin/sh

				srcdir=`dirname "$0"`

				test -z "$srcdir" && srcdir=.

				ORIGDIR=`pwd`

				cd "$srcdir"

				autoreconf --force --verbose --install || exit 1

				cd "$ORIGDIR" || exit $?

				if test -z "$NOCONFIGURE"; then

				    "$srcdir"/configure "$@"

				fi

9

bin/.gitignore vendored

View File

@@ -1,9 +0,0 @@
 config.guess
 config.sub
 install-sh
 /depcomp
 /missing
 ylwrap
 compile
 ar-lib
 /test-driver

									
										81

bin/get-fixes-pick-list.sh
									
												View File
											
				@@ -1,81 +0,0 @@

				#!/bin/sh

				# Script for generating a list of candidates [referenced by a Fixes tag] for

				# cherry-picking to a stable branch

				#

				# Usage examples:

				#

				# $ bin/get-fixes-pick-list.sh

				# $ bin/get-fixes-pick-list.sh > picklist

				# $ bin/get-fixes-pick-list.sh | tee picklist

				# Use the last branchpoint as our limit for the search

				latest_branchpoint=`git merge-base origin/master HEAD`

				# List all the commits between day 1 and the branch point...

				git log --reverse --pretty=%H $latest_branchpoint > already_landed

				# ... and the ones cherry-picked.

				git log --reverse --pretty=medium --grep="cherry picked from commit" $latest_branchpoint..HEAD |\

					grep "cherry picked from commit" |\

					sed -e 's/^[[:space:]]*(cherry picked from commit[[:space:]]*//' -e 's/)//'  > already_picked

				# Grep for commits with Fixes tag

				git log --reverse --pretty=%H -i --grep="fixes:" $latest_branchpoint..origin/master |\

				while read sha

				do

					# Check to see whether the patch is on the ignore list ...

					if [ -f bin/.cherry-ignore ] ; then

						if grep -q ^$sha bin/.cherry-ignore ; then

							continue

						fi

					fi

					# Skip if it has been already cherry-picked.

					if grep -q ^$sha already_picked ; then

						continue

					fi

					# Place every "fixes:" tag on its own line and join with the next word

					# on its line or a later one.

					fixes=`git show --pretty=medium -s $sha | tr -d "\n" | sed -e 's/fixes:[[:space:]]*/\nfixes:/Ig' | grep "fixes:" | sed -e 's/\(fixes:[a-zA-Z0-9]*\).*$/\1/'`

					# For each one try to extract the tag

					fixes_count=`echo "$fixes" | wc -l`

					warn=`(test $fixes_count -gt 1 && echo $fixes_count) || echo 0`

					while [ $fixes_count -gt 0 ] ; do

						# Treat only the current line

						id=`echo "$fixes" | tail -n $fixes_count | head -n 1 | cut -d : -f 2`

						fixes_count=$(($fixes_count-1))

						# Bail out if we cannot find suitable id.

						# Any specific validation the $id is valid and not some junk, is

						# implied with the follow up code

						if [ "x$id" = x ] ; then

							continue

						fi

						# Check if the offending commit is in branch.

						# Be that cherry-picked ...

						# ... or landed before the branchpoint.

						if grep -q ^$id already_picked ||

						   grep -q ^$id already_landed ; then

							printf "Commit \"%s\" fixes %s\n" \

							       "`git log -n1 --pretty=oneline $sha`" \

							       "$id"

							warn=$(($warn-1))

						fi

					done

					if [ $warn -gt 0 ] ; then

						printf "WARNING: Commit \"%s\" has more than one Fixes tag\n" \

						       "`git log -n1 --pretty=oneline $sha`"

					fi

				done

				rm -f already_picked

				rm -f already_landed

									
										122

bin/get-pick-list.sh
									
												View File
												
				@@ -7,21 +7,107 @@

				# $ bin/get-pick-list.sh

				# $ bin/get-pick-list.sh > picklist

				# $ bin/get-pick-list.sh | tee picklist

				#

				# The output is as follows:

				# [nomination_type] commit_sha commit summary

				is_stable_nomination()

				{

					git show --pretty=medium --summary "$1" | grep -q -i -o "CC:.*mesa-stable"

				}

				is_typod_nomination()

				{

					git show --pretty=medium --summary "$1" | grep -q -i -o "CC:.*mesa-dev"

				}

				fixes=

				# Helper to handle various mistypos of the fixes tag.

				# The tag string itself is passed as argument and normalised within.

				#

				# Resulting string in the global variable "fixes" and contains entries

				# in the form "fixes:$sha"

				is_sha_nomination()

				{

					fixes=`git show --pretty=medium -s $1 | tr -d "\n" | \

						sed -e 's/'"$2"'/\nfixes:/Ig' | \

						grep -Eo 'fixes:[a-f0-9]{8,40}'`

					fixes_count=`echo "$fixes" | grep "fixes:" | wc -l`

					if test $fixes_count -eq 0; then

						return 1

					fi

					# Throw a warning for each invalid sha

					while test $fixes_count -gt 0; do

						# Treat only the current line

						id=`echo "$fixes" | tail -n $fixes_count | head -n 1 | cut -d : -f 2`

						fixes_count=$(($fixes_count-1))

						if ! git show $id >/dev/null 2>&1; then

							echo WARNING: Commit $1 lists invalid sha $id

						fi

					done

					return 0

				}

				# Checks if at least one of offending commits, listed in the global

				# "fixes", is in branch.

				sha_in_range()

				{

					fixes_count=`echo "$fixes" | grep "fixes:" | wc -l`

					while test $fixes_count -gt 0; do

						# Treat only the current line

						id=`echo "$fixes" | tail -n $fixes_count | head -n 1 | cut -d : -f 2`

						fixes_count=$(($fixes_count-1))

						# Be that cherry-picked ...

						# ... or landed before the branchpoint.

						if grep -q ^$id already_picked ||

						   grep -q ^$id already_landed ; then

							return 0

						fi

					done

					return 1

				}

				is_fixes_nomination()

				{

					is_sha_nomination "$1" "fixes:[[:space:]]*"

					if test $? -eq 0; then

						return 0

					fi

					is_sha_nomination "$1" "fixes[[:space:]]\+"

				}

				is_brokenby_nomination()

				{

					is_sha_nomination "$1" "broken by"

				}

				is_revert_nomination()

				{

					is_sha_nomination "$1" "This reverts commit "

				}

				# Use the last branchpoint as our limit for the search

				latest_branchpoint=`git merge-base origin/master HEAD`

				# Grep for commits with "cherry picked from commit" in the commit message.

				# List all the commits between day 1 and the branch point...

				git log --reverse --pretty=%H $latest_branchpoint > already_landed

				# ... and the ones cherry-picked.

				git log --reverse --pretty=medium --grep="cherry picked from commit" $latest_branchpoint..HEAD |\

					grep "cherry picked from commit" |\

					sed -e 's/^[[:space:]]*(cherry picked from commit[[:space:]]*//' -e 's/)//' > already_picked

				# Grep for commits that were marked as a candidate for the stable tree.

				git log --reverse --pretty=%H -i --grep='^CC:.*mesa-stable' $latest_branchpoint..origin/master |\

				# Grep for potential candidates

				git log --reverse --pretty=%H -i --grep='^CC:.*mesa-stable\|^CC:.*mesa-dev\|\<fixes\>\|\<broken by\>\|This reverts commit' $latest_branchpoint..origin/master |\

				while read sha

				do

					# Check to see whether the patch is on the ignore list.

					if [ -f bin/.cherry-ignore ] ; then

					if test -f bin/.cherry-ignore; then

						if grep -q ^$sha bin/.cherry-ignore ; then

							continue

						fi

				@@ -32,7 +118,33 @@ do

						continue

					fi

					git log -n1 --pretty=oneline $sha | cat

					if is_fixes_nomination "$sha"; then

						tag=fixes

					elif is_brokenby_nomination "$sha"; then

						tag=brokenby

					elif is_revert_nomination "$sha"; then

						tag=revert

					elif is_stable_nomination "$sha"; then

						tag=stable

					elif is_typod_nomination "$sha"; then

						tag=typod

					else

						continue

					fi

					case "$tag" in

					fixes | brokenby | revert )

						if ! sha_in_range; then

							continue

						fi

						;;

					* )

						;;

					esac

					printf "[ %8s ] " "$tag"

					git --no-pager show --no-patch --oneline $sha

				done

				rm -f already_picked

				rm -f already_landed

									
										42

bin/get-typod-pick-list.sh
									
												View File
											
				@@ -1,42 +0,0 @@

				#!/bin/sh

				# Script for generating a list of candidates which have typos in the nomination line

				#

				# Usage examples:

				#

				# $ bin/get-typod-pick-list.sh

				# $ bin/get-typod-pick-list.sh > picklist

				# $ bin/get-typod-pick-list.sh | tee picklist

				# NB:

				# This script intentionally _never_ checks for specific version tag

				# Should we consider folding it with the original get-pick-list.sh

				# Use the last branchpoint as our limit for the search

				latest_branchpoint=`git merge-base origin/master HEAD`

				# Grep for commits with "cherry picked from commit" in the commit message.

				git log --reverse --grep="cherry picked from commit" $latest_branchpoint..HEAD |\

					grep "cherry picked from commit" |\

					sed -e 's/^[[:space:]]*(cherry picked from commit[[:space:]]*//' -e 's/)//' > already_picked

				# Grep for commits that were marked as a candidate for the stable tree.

				git log --reverse --pretty=%H -i --grep='^CC:.*mesa-dev' $latest_branchpoint..origin/master |\

				while read sha

				do

					# Check to see whether the patch is on the ignore list.

					if [ -f bin/.cherry-ignore ] ; then

						if grep -q ^$sha bin/.cherry-ignore ; then

							continue

						fi

					fi

					# Check to see if it has already been picked over.

					if grep -q ^$sha already_picked ; then

						continue

					fi

					git log -n1 --pretty=oneline $sha | cat

				done

				rm -f already_picked

									
										29

bin/git_sha1_gen.py
									
										Executable file → Normal file
									
												View File
												
				@@ -1,5 +1,3 @@

				#!/usr/bin/env python

				"""

				Generate the contents of the git_sha1.h file.

				The output of this script goes to stdout.

				@@ -28,22 +26,25 @@ def get_git_sha1():

				        git_sha1 = ''

				    return git_sha1

				def write_if_different(contents):

				    """

				    Avoid touching the output file if it doesn't need modifications

				    Useful to avoid triggering rebuilds when nothing has changed.

				    """

				    if os.path.isfile(args.output):

				        with open(args.output, 'r') as file:

				            if file.read() == contents:

				                return

				    with open(args.output, 'w') as file:

				        file.write(contents)

				parser = argparse.ArgumentParser()

				parser.add_argument('--output', help='File to write the #define in',

				        required=True)

				                    required=True)

				args = parser.parse_args()

				git_sha1 = os.environ.get('MESA_GIT_SHA1_OVERRIDE', get_git_sha1())[:10]

				if git_sha1:

				    git_sha1_h_in_path = os.path.join(os.path.dirname(sys.argv[0]),

				            '..', 'src', 'git_sha1.h.in')

				    with open(git_sha1_h_in_path , 'r') as git_sha1_h_in:

				        new_sha1 = git_sha1_h_in.read().replace('@VCS_TAG@', git_sha1)

				        if os.path.isfile(args.output):

				            with open(args.output, 'r') as git_sha1_h:

				                if git_sha1_h.read() == new_sha1:

				                    quit()

				        with open(args.output, 'w') as git_sha1_h:

				            git_sha1_h.write(new_sha1)

				    write_if_different('#define MESA_GIT_SHA1 " (git-' + git_sha1 + ')"')

				else:

				    open(args.output, 'w').close()

				    write_if_different('#define MESA_GIT_SHA1 ""')

									
										22

bin/install_megadrivers.py
									
										Executable file → Normal file
									
												View File
												
				@@ -1,4 +1,3 @@

				#!/usr/bin/env python

				# encoding=utf-8

				# Copyright © 2017-2018 Intel Corporation

				@@ -25,7 +24,6 @@

				from __future__ import print_function

				import argparse

				import os

				import shutil

				def main():

				@@ -36,20 +34,25 @@ def main():

				    args = parser.parse_args()

				    if os.path.isabs(args.libdir):

				        to = os.path.join(os.environ.get('DESTDIR', '/'), args.libdir[1:])

				        destdir = os.environ.get('DESTDIR')

				        if destdir:

				            to = os.path.join(destdir, args.libdir[1:])

				        else:

				            to = args.libdir

				    else:

				        to = os.path.join(os.environ['MESON_INSTALL_DESTDIR_PREFIX'], args.libdir)

				    master = os.path.join(to, os.path.basename(args.megadriver))

				    if not os.path.exists(to):

				        if os.path.lexists(to):

				            os.unlink(to)

				        os.makedirs(to)

				    shutil.copy(args.megadriver, master)

				    for driver in args.drivers:

				        abs_driver = os.path.join(to, driver)

				        if os.path.exists(abs_driver):

				        if os.path.lexists(abs_driver):

				            os.unlink(abs_driver)

				        print('installing {} to {}'.format(args.megadriver, abs_driver))

				        os.link(master, abs_driver)

				@@ -60,13 +63,20 @@ def main():

				            name, ext = os.path.splitext(driver)

				            while ext != '.so':

				                if os.path.exists(name):

				                if os.path.lexists(name):

				                    os.unlink(name)

				                os.symlink(driver, name)

				                name, ext = os.path.splitext(name)

				        finally:

				            os.chdir(ret)

				    # Remove meson-created master .so and symlinks

				    os.unlink(master)

				    name, ext = os.path.splitext(master)

				    while ext != '.so':

				        if os.path.lexists(name):

				            os.unlink(name)

				        name, ext = os.path.splitext(name)

				if __name__ == '__main__':

									
										88

bin/meson-cmd-extract.py
									
										Executable file
									
												View File
												
				@@ -0,0 +1,88 @@

				#!/usr/bin/env python3

				# Copyright © 2019 Intel Corporation

				# Permission is hereby granted, free of charge, to any person obtaining a copy

				# of this software and associated documentation files (the "Software"), to deal

				# in the Software without restriction, including without limitation the rights

				# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell

				# copies of the Software, and to permit persons to whom the Software is

				# furnished to do so, subject to the following conditions:

				# The above copyright notice and this permission notice shall be included in

				# all copies or substantial portions of the Software.

				# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR

				# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,

				# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE

				# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER

				# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,

				# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE

				# SOFTWARE.

				"""This script reads a meson build directory and gives back the command line it

				was configured with.

				This only works for meson 0.49.0 and newer.

				"""

				import argparse

				import ast

				import configparser

				import pathlib

				import sys

				def parse_args() -> argparse.Namespace:

				    """Parse arguments."""

				    parser = argparse.ArgumentParser()

				    parser.add_argument(

				        'build_dir',

				        help='Path the meson build directory')

				    args = parser.parse_args()

				    return args

				def load_config(path: pathlib.Path) -> configparser.ConfigParser:

				    """Load config file."""

				    conf = configparser.ConfigParser()

				    with path.open() as f:

				        conf.read_file(f)

				    return conf

				def build_cmd(conf: configparser.ConfigParser) -> str:

				    """Rebuild the command line."""

				    args = []

				    for k, v in conf['options'].items():

				        if ' ' in v:

				            args.append(f'-D{k}="{v}"')

				        else:

				            args.append(f'-D{k}={v}')

				    cf = conf['properties'].get('cross_file')

				    if cf:

				        args.append('--cross-file={}'.format(cf))

				    nf = conf['properties'].get('native_file')

				    if nf:

				        # this will be in the form "['str', 'str']", so use ast.literal_eval to

				        # convert it to a list of strings.

				        nf = ast.literal_eval(nf)

				        args.extend(['--native-file={}'.format(f) for f in nf])

				    return ' '.join(args)

				def main():

				    args = parse_args()

				    path = pathlib.Path(args.build_dir, 'meson-private', 'cmd_line.txt')

				    if not path.exists():

				        print('Cannot find the necessary file to rebuild command line. '

				              'Is your meson version >= 0.49.0?', file=sys.stderr)

				        sys.exit(1)

				    conf = load_config(path)

				    cmd = build_cmd(conf)

				    print(cmd)

				if __name__ == '__main__':

				    main()

									
										63

bin/meson-options.py
									
										Executable file
									
												View File
												
				@@ -0,0 +1,63 @@

				#!/usr/bin/env python3

				from os import get_terminal_size

				from textwrap import wrap

				from mesonbuild import coredata

				from mesonbuild import optinterpreter

				(COLUMNS, _) = get_terminal_size()

				def describe_option(option_name: str, option_default_value: str,

				                    option_type: str, option_message: str) -> None:

				    print('name:    ' + option_name)

				    print('default: ' + option_default_value)

				    print('type:    ' + option_type)

				    for line in wrap(option_message, width=COLUMNS - 9):

				        print('         ' + line)

				    print('---')

				oi = optinterpreter.OptionInterpreter('')

				oi.process('meson_options.txt')

				for (name, value) in oi.options.items():

				    if isinstance(value, coredata.UserStringOption):

				        describe_option(name,

				                        value.value,

				                        'string',

				                        "You can type what you want, but make sure it makes sense")

				    elif isinstance(value, coredata.UserBooleanOption):

				        describe_option(name,

				                        'true' if value.value else 'false',

				                        'boolean',

				                        "You can set it to 'true' or 'false'")

				    elif isinstance(value, coredata.UserIntegerOption):

				        describe_option(name,

				                        str(value.value),

				                        'integer',

				                        "You can set it to any integer value between '{}' and '{}'".format(value.min_value, value.max_value))

				    elif isinstance(value, coredata.UserUmaskOption):

				        describe_option(name,

				                        str(value.value),

				                        'umask',

				                        "You can set it to 'preserve' or a value between '0000' and '0777'")

				    elif isinstance(value, coredata.UserComboOption):

				        choices = '[' + ', '.join(["'" + v + "'" for v in value.choices]) + ']'

				        describe_option(name,

				                        value.value,

				                        'combo',

				                        "You can set it to any one of those values: " + choices)

				    elif isinstance(value, coredata.UserArrayOption):

				        choices = '[' + ', '.join(["'" + v + "'" for v in value.choices]) + ']'

				        value = '[' + ', '.join(["'" + v + "'" for v in value.value]) + ']'

				        describe_option(name,

				                        value,

				                        'array',

				                        "You can set it to one or more of those values: " + choices)

				    elif isinstance(value, coredata.UserFeatureOption):

				        describe_option(name,

				                        value.value,

				                        'feature',

				                        "You can set it to 'auto', 'enabled', or 'disabled'")

				    else:

				        print(name + ' is an option of a type unknown to this script')

				        print('---')

									
										1

bin/meson.build
									
												View File
												
				@@ -19,3 +19,4 @@

				# SOFTWARE.

				git_sha1_gen_py = files('git_sha1_gen.py')

				symbols_check = find_program('symbols-check.py')

									
										130

bin/symbols-check.py
									
										Normal file
									
												View File
												
				@@ -0,0 +1,130 @@

				#!/usr/bin/env python

				import argparse

				import os

				import platform

				import subprocess

				# This list contains symbols that _might_ be exported for some platforms

				PLATFORM_SYMBOLS = [

				    '__bss_end__',

				    '__bss_start__',

				    '__bss_start',

				    '__end__',

				    '_bss_end__',

				    '_edata',

				    '_end',

				    '_fini',

				    '_init',

				]

				def get_symbols(nm, lib):

				    '''

				    List all the (non platform-specific) symbols exported by the library

				    '''

				    symbols = []

				    platform_name = platform.system()

				    output = subprocess.check_output([nm, '-gP', lib],

				                                     stderr=open(os.devnull, 'w')).decode("ascii")

				    for line in output.splitlines():

				        fields = line.split()

				        if len(fields) == 2 or fields[1] == 'U':

				            continue

				        symbol_name = fields[0]

				        if platform_name == 'Linux':

				            if symbol_name in PLATFORM_SYMBOLS:

				                continue

				        elif platform_name == 'Darwin':

				            assert symbol_name[0] == '_'

				            symbol_name = symbol_name[1:]

				        symbols.append(symbol_name)

				    return symbols

				def main():

				    parser = argparse.ArgumentParser()

				    parser.add_argument('--symbols-file',

				                        action='store',

				                        required=True,

				                        help='path to file containing symbols')

				    parser.add_argument('--lib',

				                        action='store',

				                        required=True,

				                        help='path to library')

				    parser.add_argument('--nm',

				                        action='store',

				                        required=True,

				                        help='path to binary (or name in $PATH)')

				    args = parser.parse_args()

				    try:

				        lib_symbols = get_symbols(args.nm, args.lib)

				    except:

				        # We can't run this test, but we haven't technically failed it either

				        # Return the GNU "skip" error code

				        exit(77)

				    mandatory_symbols = []

				    optional_symbols = []

				    with open(args.symbols_file) as symbols_file:

				        qualifier_optional = '(optional)'

				        for line in symbols_file.readlines():

				            # Strip comments

				            line = line.split('#')[0]

				            line = line.strip()

				            if not line:

				                continue

				            # Line format:

				            # [qualifier] symbol

				            qualifier = None

				            symbol = None

				            fields = line.split()

				            if len(fields) == 1:

				                symbol = fields[0]

				            elif len(fields) == 2:

				                qualifier = fields[0]

				                symbol = fields[1]

				            else:

				                print(args.symbols_file + ': invalid format: ' + line)

				                exit(1)

				            # The only supported qualifier is 'optional', which means the

				            # symbol doesn't have to be exported by the library

				            if qualifier and not qualifier == qualifier_optional:

				                print(args.symbols_file + ': invalid qualifier: ' + qualifier)

				                exit(1)

				            if qualifier == qualifier_optional:

				                optional_symbols.append(symbol)

				            else:

				                mandatory_symbols.append(symbol)

				    unknown_symbols = []

				    for symbol in lib_symbols:

				        if symbol in mandatory_symbols:

				            continue

				        if symbol in optional_symbols:

				            continue

				        unknown_symbols.append(symbol)

				    missing_symbols = [

				        sym for sym in mandatory_symbols if sym not in lib_symbols

				    ]

				    for symbol in unknown_symbols:

				        print(args.lib + ': unknown symbol exported: ' + symbol)

				    for symbol in missing_symbols:

				        print(args.lib + ': missing symbol: ' + symbol)

				    if unknown_symbols or missing_symbols:

				        exit(1)

				    exit(0)

				if __name__ == '__main__':

				    main()

									
										8

common.py
									
												View File
												
				@@ -86,7 +86,7 @@ def AddOptions(opts):

				        from SCons.Options.EnumOption import EnumOption

				    opts.Add(EnumOption('build', 'build type', 'debug',

				                        allowed_values=('debug', 'checked', 'profile',

				                                        'release', 'opt')))

				                                        'release')))

				    opts.Add(BoolOption('verbose', 'verbose output', 'no'))

				    opts.Add(EnumOption('machine', 'use machine-specific assembly code',

				                        default_machine,

				@@ -99,17 +99,13 @@ def AddOptions(opts):

				                        'enable static code analysis where available', 'no'))

				    opts.Add(BoolOption('asan', 'enable Address Sanitizer', 'no'))

				    opts.Add('toolchain', 'compiler toolchain', default_toolchain)

				    opts.Add(BoolOption('gles', 'EXPERIMENTAL: enable OpenGL ES support',

				                        'no'))

				    opts.Add(BoolOption('llvm', 'use LLVM', default_llvm))

				    opts.Add(BoolOption('openmp', 'EXPERIMENTAL: compile with openmp (swrast)',

				                        'no'))

				    opts.Add(BoolOption('debug', 'DEPRECATED: debug build', 'yes'))

				    opts.Add(BoolOption('profile', 'DEPRECATED: profile build', 'no'))

				    opts.Add(BoolOption('quiet', 'DEPRECATED: profile build', 'yes'))

				    opts.Add(BoolOption('texture_float',

				                        'enable floating-point textures and renderbuffers',

				                        'no'))

				    opts.Add(BoolOption('swr', 'Build OpenSWR', 'no'))

				    if host_platform == 'windows':

				        opts.Add('MSVC_VERSION', 'Microsoft Visual C/C++ version')

				        opts.Add('MSVC_USE_SCRIPT', 'Microsoft Visual C/C++ vcvarsall script', True)

3334

configure.ac

View File

File diff suppressed because it is too large Load Diff

									
										14

docs/application-issues.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				@@ -48,23 +48,25 @@ start-up because of an extension string buffer-overflow problem.

				<p>

				The problem is a modern OpenGL driver will return a very long string

				for the glGetString(GL_EXTENSIONS) query and if the application

				for the <code>glGetString(GL_EXTENSIONS)</code> query and if the application

				naively copies the string into a fixed-size buffer it can overflow the

				buffer and crash the application.

				</p>

				<p>

				The work-around is to set the MESA_EXTENSION_MAX_YEAR environment variable

				to the approximate release year of the game.

				This will cause the glGetString(GL_EXTENSIONS) query to only report extensions

				older than the given year.

				The work-around is to set the <code>MESA_EXTENSION_MAX_YEAR</code>

				environment variable to the approximate release year of the game.

				This will cause the <code>glGetString(GL_EXTENSIONS)</code> query to only report

				extensions older than the given year.

				</p>

				<p>

				For example, if the game was released in 2001, do

				</p>

				<pre>

				export MESA_EXTENSION_MAX_YEAR=2001

				</pre>

				<p>

				before running the game.

				</p>

									
										257

docs/autoconf.html
									
												View File
											
				@@ -1,257 +0,0 @@

				<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

				<html lang="en">

				<head>

				  <meta http-equiv="content-type" content="text/html; charset=utf-8">

				  <title>Compilation and Installation using Autoconf</title>

				  <link rel="stylesheet" type="text/css" href="mesa.css">

				</head>

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				</div>

				<iframe src="contents.html"></iframe>

				<div class="content">

				<h1>Compilation and Installation using Autoconf</h1>

				<ol>

				<li><p><a href="#basic">Basic Usage</a></li>

				<li><p><a href="#driver">Driver Options</a>

				  <ul>

				  <li><a href="#xlib">Xlib Driver Options</a></li>

				  <li><a href="#dri">DRI Driver Options</a></li>

				  <li><a href="#osmesa">OSMesa Driver Options</a></li>

				  </ul>

				</ol>

				<h2 id="basic">1. Basic Usage</h2>

				<p>

				The autoconf generated configure script can be used to guess your

				platform and change various options for building Mesa. To use the

				configure script, type:

				</p>

				<pre>

				    ./configure

				</pre>

				<p>

				To see a short description of all the options, type <code>./configure

				--help</code>. If you are using a development snapshot and the configure

				script does not exist, type <code>./autogen.sh</code> to generate it

				first. If you know the options you want to pass to

				<code>configure</code>, you can pass them to <code>autogen.sh</code>. It

				will run <code>configure</code> with these options after it is

				generated. Once you have run <code>configure</code> and set the options

				to your preference, type:

				</p>

				<pre>

				    make

				</pre>

				<p>

				This will produce libGL.so and/or several other libraries depending on the

				options you have chosen. Later, if you want to rebuild for a different

				configuration run <code>make realclean</code> before rebuilding.

				</p>

				<p>

				Some of the generic autoconf options are used with Mesa:

				</p>

				<dl>

				<dt><code>--prefix=PREFIX</code></dt>

				<dd><p>This is the root directory where

				files will be installed by <code>make install</code>. The default is

				<code>/usr/local</code>.</p>

				</dd>

				<dt><code>--exec-prefix=EPREFIX</code></dt>

				<dd><p>This is the root directory

				where architecture-dependent files will be installed. In Mesa, this is

				only used to derive the directory for the libraries. The default is

				<code>${prefix}</code>.</p>

				</dd>

				<dt><code>--libdir=LIBDIR</code></dt>

				<dd><p>This option specifies the directory

				where the GL libraries will be installed. The default is

				<code>${exec_prefix}/lib</code>. It also serves as the name of the

				library staging area in the source tree. For instance, if the option

				<code>--libdir=/usr/local/lib64</code> is used, the libraries will be

				created in a <code>lib64</code> directory at the top of the Mesa source

				tree.</p>

				</dd>

				<dt><code>--sysconfdir=DIR</code></dt>

				<dd><p>This option specifies the directory where the configuration

				files will be installed. The default is <code>${prefix}/etc</code>.

				Currently there's only one config file provided when dri drivers are

				enabled - it's <code>drirc</code>.</p>

				</dd>

				<dt><code>--enable-static, --disable-shared</code></dt>

				<dd><p>By default, Mesa

				will build shared libraries. Either of these options will force static

				libraries to be built. It is not currently possible to build static and

				shared libraries in a single pass.</p>

				</dd>

				<dt><code>CC, CFLAGS, CXX, CXXFLAGS</code></dt>

				<dd><p>These environment variables

				control the C and C++ compilers used during the build. By default,

				<code>gcc</code> and <code>g++</code> are used and the debug/optimisation

				level is left unchanged.</p>

				</dd>

				<dt><code>LDFLAGS</code></dt>

				<dd><p>An environment variable specifying flags to

				pass when linking programs. These should be empty and

				<code>PKG_CONFIG_PATH</code> is recommended to be used instead. If needed

				it can be used to direct the linker to use libraries in nonstandard

				directories. For example, <code>LDFLAGS="-L/usr/X11R6/lib"</code>.</p>

				</dd>

				<dt><code>PKG_CONFIG_PATH</code></dt>

				<dd><p>The

				<code>pkg-config</code> utility is a hard requirement for configuring and

				building mesa. It is used to search for external libraries

				on the system. This environment variable is used to control the search

				path for <code>pkg-config</code>. For instance, setting

				<code>PKG_CONFIG_PATH=/usr/X11R6/lib/pkgconfig</code> will search for

				package metadata in <code>/usr/X11R6</code> before the standard

				directories.</p>

				</dd>

				</dl>

				<p>

				There are also a few general options for altering the Mesa build:

				</p>

				<dl>

				<dt><code>--enable-debug</code></dt>

				<dd><p>This option will set the compiler debug/optimisation levels (if the user

				hasn't already set them via the CFLAGS/CXXFLAGS) and macros to aid in

				debugging the Mesa libraries.</p>

				<p>Note that enabling this option can lead to noticeable loss of performance.</p>

				<dt><code>--disable-asm</code></dt>

				<dd><p>There are assembly routines

				available for a few architectures. These will be used by default if

				one of these architectures is detected. This option ensures that

				assembly will not be used.</p>

				</dd>

				<dt><code>--build=</code></dt>

				<dt><code>--host=</code></dt>

				<dd><p>By default, the build will compile code for the architecture that

				it's running on. In order to build cross-compile Mesa on a x86-64 machine

				that is to run on a i686, one would need to set the options to:</p>

				<p><code>--build=x86_64-pc-linux-gnu --host=i686-pc-linux-gnu</code></p>

				Note that these can vary from distribution to distribution. For more

				information check with the

				<a href="https://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.69/html_node/Specifying-Target-Triplets.html">

				autoconf manual</a>.

				Note that you will need to correctly set <code>PKG_CONFIG_PATH</code> as well.

				<p>In some cases a single compiler is capable of handling both architectures

				(multilib) in that case one would need to set the <code>CC,CXX</code> variables

				appending the correct machine options. Seek your compiler documentation for

				further information -

				<a href="https://gcc.gnu.org/onlinedocs/gcc/Submodel-Options.html"> gcc

				machine dependent options</a></p>

				<p>In addition to specifying correct <code>PKG_CONFIG_PATH</code> for the target

				architecture, the following should be sufficient to configure multilib Mesa</p>

				<code>./configure CC="gcc -m32" CXX="g++ -m32" --build=x86_64-pc-linux-gnu --host=i686-pc-linux-gnu ...</code>

				</dd>

				</dl>

				<h2 id="driver">2. GL Driver Options</h2>

				<p>

				There are several different driver modes that Mesa can use. These are

				described in more detail in the <a href="install.html">basic

				installation instructions</a>. The Mesa driver is controlled through the

				configure options <code>--enable-glx</code> and <code>--enable-osmesa</code>

				</p>

				<h3 id="xlib">Xlib</h3><p>

				It uses Xlib as a software renderer to do all rendering. It corresponds

				to the option <code>--enable-glx=xlib</code> or <code>--enable-glx=gallium-xlib</code>.

				<h3 id="dri">DRI</h3><p>This mode uses the DRI hardware drivers for

				accelerated OpenGL rendering. To enable use <code>--enable-glx=dri

				--enable-dri</code>.

				<!-- DRI specific options -->

				<dl>

				<dt><code>--with-dri-driverdir=DIR</code>

				<dd><p> This option specifies the

				location the DRI drivers will be installed to and the location libGL

				will search for DRI drivers. The default is <code>${libdir}/dri</code>.

				<dt><code>--with-dri-drivers=DRIVER,DRIVER,...</code>

				<dd><p> This option

				allows a specific set of DRI drivers to be built. For example,

				<code>--with-dri-drivers="swrast,i965,radeon,nouveau"</code>. By

				default, the drivers will be chosen depending on the target platform.

				See the directory <code>src/mesa/drivers/dri</code> in the source tree

				for available drivers. Beware that the swrast DRI driver is used by both

				libGL and the X.Org xserver GLX module to do software rendering, so you

				may run into problems if it is not available.

				<!-- This explanation might be totally bogus. Kristian? -->

				<dt><code>--disable-driglx-direct</code>

				<dd><p> Disable direct rendering in

				GLX. Normally, direct hardware rendering through the DRI drivers and

				indirect software rendering are enabled in GLX. This option disables

				direct rendering entirely. It can be useful on architectures where

				kernel DRM modules are not available.

				<dt><code>--enable-glx-tls</code> <dd><p>

				Enable Thread Local Storage (TLS) in

				GLX.

				<dt><code>--with-expat=DIR</code>

				<dd><p><strong>DEPRECATED</strong>, use <code>PKG_CONFIG_PATH</code> instead.</p>

				<p>The DRI-enabled libGL uses expat to

				parse the DRI configuration files in <code>${sysconfdir}/drirc</code> and

				<code>~/.drirc</code>. This option allows a specific expat installation

				to be used. For example, <code>--with-expat=/usr/local</code> will

				search for expat headers and libraries in <code>/usr/local/include</code>

				and <code>/usr/local/lib</code>, respectively.

				</dl>

				<h3 id="osmesa">OSMesa </h3><p> No libGL is built in this

				mode. Instead, the driver code is built into the Off-Screen Mesa

				(OSMesa) library. See the <a href="osmesa.html">Off-Screen Rendering</a>

				page for more details.  It corresponds to the option

				<code>--enable-osmesa</code>.

				<!-- OSMesa specific options -->

				<dl>

				<dt><code>--with-osmesa-bits=BITS</code>

				<dd><p> This option allows the size

				of the color channel in bits to be specified. By default, an 8-bit

				channel will be used, and the driver will be named libOSMesa. Other

				options are 16- and 32-bit color channels, which will add the bit size

				to the library name. For example, <code>--with-osmesa-bits=16</code>

				will create the libOSMesa16 library with a 16-bit color channel.

				</dl>

				<h2 id="library">3. Library Options</h2>

				<p>

				The configure script provides more fine grained control over the libraries

				that will be built.

				</div>

				</body>

				</html>

									
										6

docs/bugs.html
									
												View File
												
				@@ -2,19 +2,19 @@

				<html lang="en">

				<head>

				  <meta http-equiv="content-type" content="text/html; charset=utf-8">

				  <title>Mesa Bug Reporting</title>

				  <title>Report a Bug</title>

				  <link rel="stylesheet" type="text/css" href="mesa.css">

				</head>

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				<div class="content">

				<h1>Bug Database</h1>

				<h1>Report a Bug</h1>

				<p>

				The Mesa bug database is hosted on

									
										37

docs/codingstyle.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				@@ -48,19 +48,19 @@ For example:

				   }

				</pre>

				<li>Put a space before/after operators.  For example, <tt>a = b + c;</tt>

				and not <tt>a=b+c;</tt>

				<li>Put a space before/after operators.  For example, <code>a = b + c;</code>

				and not <code>a=b+c;</code>

				<li>This GNU indent command generally does the right thing for formatting:

				<pre>

				   indent -br -i3 -npcs --no-tabs infile.c -o outfile.c

				</pre>

				<li>Use comments wherever you think it would be helpful for other developers.

				<li>

				<p>Use comments wherever you think it would be helpful for other developers.

				Several specific cases and style examples follow.  Note that we roughly

				follow <a href="https://www.stack.nl/~dimitri/doxygen/">Doxygen</a> conventions.

				<br>

				<br>

				follow <a href="http://www.doxygen.nl">Doxygen</a> conventions.

				</p>

				Single-line comments:

				<pre>

				   /* null-out pointer to prevent dangling reference below */

				@@ -120,22 +120,23 @@ the opening brace goes on the next line by itself (see above.)

				   _mesa_foo_bar()  - an internal non-static Mesa function

				</pre>

				<li>Constants, macros and enum names are ALL_UPPERCASE, with _ between

				words.

				<li>Mesa usually uses camel case for local variables (Ex: "localVarname")

				while gallium typically uses underscores (Ex: "local_var_name").

				<li>Constants, macros and enum names are <code>ALL_UPPERCASE</code>, with _

				between words.

				<li>Mesa usually uses camel case for local variables (Ex:

				<code>localVarname</code>) while gallium typically uses underscores (Ex:

				<code>local_var_name</code>).

				<li>Global variables are almost never used because Mesa should be thread-safe.

				<li>Booleans.  Places that are not directly visible to the GL API

				should prefer the use of <tt>bool</tt>, <tt>true</tt>, and

				<tt>false</tt> over <tt>GLboolean</tt>, <tt>GL_TRUE</tt>, and

				<tt>GL_FALSE</tt>.  In C code, this may mean that

				<tt>#include &lt;stdbool.h&gt;</tt> needs to be added.  The

				<tt>try_emit_</tt>* methods in src/mesa/program/ir_to_mesa.cpp and

				src/mesa/state_tracker/st_glsl_to_tgsi.cpp can serve as examples.

				should prefer the use of <code>bool</code>, <code>true</code>, and

				<code>false</code> over <code>GLboolean</code>, <code>GL_TRUE</code>, and

				<code>GL_FALSE</code>.  In C code, this may mean that

				<code>#include &lt;stdbool.h&gt;</code> needs to be added.  The

				<code>try_emit_*</code> methods in <code>src/mesa/program/ir_to_mesa.cpp</code>

				and <code>src/mesa/state_tracker/st_glsl_to_tgsi.cpp</code> can serve as

				examples.

				</ul>

				</p>

				</div>

				</body>

									
										6

docs/conform.html
									
												View File
												
				@@ -2,19 +2,19 @@

				<html lang="en">

				<head>

				  <meta http-equiv="content-type" content="text/html; charset=utf-8">

				  <title>Conformance</title>

				  <title>Conformance Testing</title>

				  <link rel="stylesheet" type="text/css" href="mesa.css">

				</head>

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				<div class="content">

				<h1>Conformance</h1>

				<h1>Conformance Testing</h1>

				<p>

				The SGI OpenGL conformance tests verify correct operation of OpenGL

									
										58

docs/contents.html
									
												View File
												
				@@ -12,6 +12,10 @@

				      background-color: #cccccc;

				      color: black;

				    }

				    h2 {

				      font-size: inherit;

				      font-weight: bold;

				    }

				    a:link {

				      color: #000;

				    }

				@@ -23,59 +27,56 @@

				</head>

				<body>

				<b>Documentation</b>

				<h2>Documentation</h2>

				<ul>

				<li><a href="intro.html" target="_parent">Introduction</a>

				<li><a href="index.html" target="_parent">News</a>

				<li><a href="developers.html" target="_parent">Developers</a>

				<li><a href="systems.html" target="_parent">Platforms and Drivers</a>

				<li><a href="license.html" target="_parent">License &amp; Copyright</a>

				<li><a href="faq.html" target="_parent">FAQ</a>

				<li><a href="license.html" target="_parent">License and Copyright</a>

				<li><a href="faq.html" target="_parent">Frequently Asked Questions</a>

				<li><a href="relnotes.html" target="_parent">Release Notes</a>

				<li><a href="thanks.html" target="_parent">Acknowledgements</a>

				<li><a href="conform.html" target="_parent">Conformance Testing</a>

				<li>more docs below...

				</ul>

				<b>Download / Install</b>

				<h2>Download and Install</h2>

				<ul>

				<li><a href="download.html" target="_parent">Downloading / Unpacking</a>

				<li><a href="install.html" target="_parent">Compiling / Installing</a>

				<li><a href="download.html" target="_parent">Downloading and Unpacking</a>

				<li><a href="install.html" target="_parent">Compiling and Installing</a>

				  <ul>

				    <li><a href="autoconf.html" target="_parent">Autoconf</a></li>

				    <li><a href="meson.html" target="_parent">Meson</a></li>

				  </ul>

				</li>

				<li><a href="precompiled.html" target="_parent">Precompiled Libraries</a>

				</ul>

				<b>Resources</b>

				<h2>Need help?</h2>

				<ul>

				<li><a href="lists.html" target="_parent">Mailing Lists</a>

				<li><a href="bugs.html" target="_parent">Bug Database</a>

				<li><a href="bugs.html" target="_parent">Report a bug</a>

				<li><a href="webmaster.html" target="_parent">Webmaster</a>

				<li><a href="https://dri.freedesktop.org/" target="_parent">Mesa/DRI Wiki</a>

				</ul>

				<b>User Topics</b>

				<h2>User Topics</h2>

				<ul>

				<li><a href="shading.html" target="_parent">Shading Language</a>

				<li><a href="egl.html" target="_parent">EGL</a>

				<li><a href="opengles.html" target="_parent">OpenGL ES</a>

				<li><a href="envvars.html" target="_parent">Environment Variables</a>

				<li><a href="osmesa.html" target="_parent">Off-Screen Rendering</a>

				<li><a href="osmesa.html" target="_parent">Off-screen Rendering</a>

				<li><a href="debugging.html" target="_parent">Debugging Tips</a>

				<li><a href="perf.html" target="_parent">Performance Tips</a>

				<li><a href="extensions.html" target="_parent">Mesa Extensions</a>

				<li><a href="mangling.html" target="_parent">GL Function Name Mangling</a>

				<li><a href="llvmpipe.html" target="_parent">Gallium llvmpipe driver</a>

				<li><a href="vmware-guest.html" target="_parent">VMware SVGA3D guest driver</a>

				<li><a href="postprocess.html" target="_parent">Gallium post-processing</a>

				<li><a href="llvmpipe.html" target="_parent">Gallium LLVMpipe Driver</a>

				<li><a href="vmware-guest.html" target="_parent">VMware SVGA3D Guest Driver</a>

				<li><a href="postprocess.html" target="_parent">Gallium Post-processing</a>

				<li><a href="application-issues.html" target="_parent">Application Issues</a>

				<li><a href="viewperf.html" target="_parent">Viewperf Issues</a>

				</ul>

				<b>Developer Topics</b>

				<h2>Developer Topics</h2>

				<ul>

				<li><a href="repository.html" target="_parent">Source Code Repository</a>

				<li><a href="sourcetree.html" target="_parent">Source Code Tree</a>

				@@ -83,26 +84,25 @@

				<li><a href="helpwanted.html" target="_parent">Help Wanted</a>

				<li><a href="devinfo.html" target="_parent">Development Notes</a>

				<li><a href="codingstyle.html" target="_parent">Coding Style</a>

				<li><a href="submittingpatches.html" target="_parent">Submitting patches</a>

				<li><a href="releasing.html" target="_parent">Releasing process</a>

				<li><a href="release-calendar.html" target="_parent">Release calendar</a>

				<li><a href="submittingpatches.html" target="_parent">Submitting Patches</a>

				<li><a href="releasing.html" target="_parent">Releasing Process</a>

				<li><a href="release-calendar.html" target="_parent">Release Calendar</a>

				<li><a href="sourcedocs.html" target="_parent">Source Documentation</a>

				<li><a href="dispatch.html" target="_parent">GL Dispatch</a>

				</ul>

				<b>Links</b>

				<h2>Links</h2>

				<ul>

				<li><a href="https://www.opengl.org" target="_parent">OpenGL website</a>

				<li><a href="https://dri.freedesktop.org" target="_parent">DRI website</a>

				<li><a href="https://www.opengl.org" target="_parent">OpenGL Website</a>

				<li><a href="https://dri.freedesktop.org" target="_parent">DRI Website</a>

				<li><a href="https://www.freedesktop.org" target="_parent">freedesktop.org</a>

				<li><a href="https://planet.freedesktop.org" target="_parent">Developer blogs</a>

				<li><a href="https://planet.freedesktop.org" target="_parent">Developer Blogs</a>

				</ul>

				<b>Hosted by:</b>

				<br>

				<blockquote>

				<a href="https://freedesktop.org" target="_parent">freedesktop.org</a>

				</blockquote>

				<h2>Hosted by:</h2>

				<dl>

				<dd><a href="https://www.freedesktop.org" target="_parent">freedesktop.org</a>

				</dl>

				</body>

				</html>

									
										22

docs/debugging.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				@@ -20,26 +20,22 @@

				   Normally Mesa (and OpenGL) records but does not notify the user of

				   errors.  It is up to the application to call

				   <code>glGetError</code> to check for errors.  Mesa supports an

				   environment variable, MESA_DEBUG, to help with debugging.  If

				   MESA_DEBUG is defined, a message will be printed to stdout whenever

				   an error occurs.

				   environment variable, <code>MESA_DEBUG</code>, to help with debugging.  If

				   <code>MESA_DEBUG</code> is defined, a message will be printed to stdout

				   whenever an error occurs.

				</p>

				<p>

				   More extensive error checking is done when Mesa is compiled with the

				   DEBUG symbol defined.  You'll have to edit the Make-config file and

				   add -DDEBUG to the CFLAGS line for your system configuration.  You may

				   also want to replace any optimization flags with the -g flag so you can

				   use your debugger.  After you've edited Make-config type 'make clean'

				   before recompiling.

				   More extensive error checking is done in DEBUG builds

				   (<code>--buildtype debug</code> for meson, <code>build=debug</code> for scons).

				</p>

				<p>

				   In your debugger you can set a breakpoint in _mesa_error() to trap Mesa

				   errors.

				   In your debugger you can set a breakpoint in <code>_mesa_error()</code> to trap

				   Mesa errors.

				</p>

				<p>

				   There is a display list printing/debugging facility.  See the end of

				   src/dlist.c for details.

				   <code>src/dlist.c</code> for details.

				</p>

				</div>

									
										2

docs/developers.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

									
										36

docs/devinfo.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				@@ -25,11 +25,12 @@

				<p>

				To add a new GL extension to Mesa you have to do at least the following.

				</p>

				<ul>

				<li>

				   If glext.h doesn't define the extension, edit include/GL/gl.h and add

				   code like this:

				   If <code>glext.h</code> doesn't define the extension, edit

				   <code>include/GL/gl.h</code> and add code like this:

				   <pre>

				     #ifndef GL_EXT_the_extension_name

				     #define GL_EXT_the_extension_name 1

				@@ -40,18 +41,18 @@ To add a new GL extension to Mesa you have to do at least the following.

				   </pre>

				</li>

				<li>

				   In the src/mapi/glapi/gen/ directory, add the new extension functions and

				   enums to the gl_API.xml file.

				   In the <code>src/mapi/glapi/gen/</code> directory, add the new extension

				   functions and enums to the <code>gl_API.xml</code> file.

				   Then, a bunch of source files must be regenerated by executing the

				   corresponding Python scripts.

				</li>

				<li>

				   Add a new entry to the <code>gl_extensions</code> struct in mtypes.h

				   if the extension requires driver capabilities not already exposed by

				   another extension.

				   Add a new entry to the <code>gl_extensions</code> struct in

				   <code>mtypes.h</code> if the extension requires driver capabilities not

				   already exposed by another extension.

				</li>

				<li>

				   Add a new entry to the src/mesa/main/extensions_table.h file.

				   Add a new entry to the <code>src/mesa/main/extensions_table.h</code> file.

				</li>

				<li>

				   From this point, the best way to proceed is to find another extension,

				@@ -59,21 +60,22 @@ To add a new GL extension to Mesa you have to do at least the following.

				   as an example.

				</li>

				<li>

				   If the new extension adds new GL state, the functions in get.c, enable.c

				   and attrib.c will most likely require new code.

				   If the new extension adds new GL state, the functions in

				   <code>get.c</code>, <code>enable.c</code> and <code>attrib.c</code>

				   will most likely require new code.

				</li>

				<li>

				   To determine if the new extension is active in the current context,

				   use the auto-generated _mesa_has_##name_str() function defined in

				   src/mesa/main/extensions.h.

				   use the auto-generated <code>_mesa_has_##name_str()</code> function

				   defined in <code>src/mesa/main/extensions.h</code>.

				</li>

				<li>

				   The dispatch tests check_table.cpp and dispatch_sanity.cpp

				   should be updated with details about the new extensions functions. These

				   tests are run using 'make check'

				   The dispatch tests <code>check_table.cpp</code> and

				   <code>dispatch_sanity.cpp</code> should be updated with details about

				   the new extensions functions. These tests are run using

				   <code>meson test</code>.

				</li>

				</ul>

				</p>

									
										100

docs/dispatch.html
									
												View File
												
				@@ -2,19 +2,19 @@

				<html lang="en">

				<head>

				  <meta http-equiv="content-type" content="text/html; charset=utf-8">

				  <title>GL Dispatch in Mesa</title>

				  <title>GL Dispatch</title>

				  <link rel="stylesheet" type="text/css" href="mesa.css">

				</head>

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				<div class="content">

				<h1>GL Dispatch in Mesa</h1>

				<h1>GL Dispatch</h1>

				<p>Several factors combine to make efficient dispatch of OpenGL functions

				fairly complicated.  This document attempts to explain some of the issues

				@@ -30,28 +30,28 @@ of the GL related state for the application.  Every texture, every buffer

				object, every enable, and much, much more is stored in the context.  Since

				an application can have more than one context, the context to be used is

				selected by a window-system dependent function such as

				<tt>glXMakeContextCurrent</tt>.</p>

				<code>glXMakeContextCurrent</code>.</p>

				<p>In environments that implement OpenGL with X-Windows using GLX, every GL

				function, including the pointers returned by <tt>glXGetProcAddress</tt>, are

				function, including the pointers returned by <code>glXGetProcAddress</code>, are

				<em>context independent</em>.  This means that no matter what context is

				currently active, the same <tt>glVertex3fv</tt> function is used.</p>

				currently active, the same <code>glVertex3fv</code> function is used.</p>

				<p>This creates the first bit of dispatch complexity.  An application can

				have two GL contexts.  One context is a direct rendering context where

				function calls are routed directly to a driver loaded within the

				application's address space.  The other context is an indirect rendering

				context where function calls are converted to GLX protocol and sent to a

				server.  The same <tt>glVertex3fv</tt> has to do the right thing depending

				server.  The same <code>glVertex3fv</code> has to do the right thing depending

				on which context is current.</p>

				<p>Highly optimized drivers or GLX protocol implementations may want to

				change the behavior of GL functions depending on current state.  For

				example, <tt>glFogCoordf</tt> may operate differently depending on whether

				example, <code>glFogCoordf</code> may operate differently depending on whether

				or not fog is enabled.</p>

				<p>In multi-threaded environments, it is possible for each thread to have a

				different GL context current.  This means that poor old <tt>glVertex3fv</tt>

				different GL context current.  This means that poor old <code>glVertex3fv</code>

				has to know which GL context is current in the thread where it is being

				called.</p>

				@@ -64,18 +64,18 @@ dispatch table stores pointers to functions that actually implement

				specific GL functions.  Each time a new context is made current in a thread,

				these pointers a updated.</p>

				<p>The implementation of functions such as <tt>glVertex3fv</tt> becomes

				<p>The implementation of functions such as <code>glVertex3fv</code> becomes

				conceptually simple:</p>

				<ul>

				<li>Fetch the current dispatch table pointer.</li>

				<li>Fetch the pointer to the real <tt>glVertex3fv</tt> function from the

				<li>Fetch the pointer to the real <code>glVertex3fv</code> function from the

				table.</li>

				<li>Call the real function.</li>

				</ul>

				<p>This can be implemented in just a few lines of C code.  The file

				<tt>src/mesa/glapi/glapitemp.h</tt> contains code very similar to this.</p>

				<code>src/mesa/glapi/glapitemp.h</code> contains code very similar to this.</p>

				<blockquote>

				<table border="1">

				@@ -93,9 +93,9 @@ void glVertex3f(GLfloat x, GLfloat y, GLfloat z)

				overhead that it adds to every GL function call.</p>

				<p>In a multithreaded environment, a naive implementation of

				<tt>GET_DISPATCH</tt> involves a call to <tt>pthread_getspecific</tt> or a

				<code>GET_DISPATCH</code> involves a call to <code>pthread_getspecific</code> or a

				similar function.  Mesa provides a wrapper function called

				<tt>_glapi_get_dispatch</tt> that is used by default.</p>

				<code>_glapi_get_dispatch</code> that is used by default.</p>

				<h2>3. Optimizations</h2>

				@@ -109,7 +109,7 @@ each can or cannot be used are listed.</p>

				<p>The vast majority of OpenGL applications use the API in a single threaded

				manner.  That is, the application has only one thread that makes calls into

				the GL.  In these cases, not only do the calls to

				<tt>pthread_getspecific</tt> hurt performance, but they are completely

				<code>pthread_getspecific</code> hurt performance, but they are completely

				unnecessary!  It is possible to detect this common case and avoid these

				calls.</p>

				@@ -118,15 +118,15 @@ of the executing thread.  If the same thread ID is always seen, Mesa knows

				that the application is, from OpenGL's point of view, single threaded.</p>

				<p>As long as an application is single threaded, Mesa stores a pointer to

				the dispatch table in a global variable called <tt>_glapi_Dispatch</tt>.

				the dispatch table in a global variable called <code>_glapi_Dispatch</code>.

				The pointer is also stored in a per-thread location via

				<tt>pthread_setspecific</tt>.  When Mesa detects that an application has

				become multithreaded, <tt>NULL</tt> is stored in <tt>_glapi_Dispatch</tt>.</p>

				<code>pthread_setspecific</code>.  When Mesa detects that an application has

				become multithreaded, <code>NULL</code> is stored in <code>_glapi_Dispatch</code>.</p>

				<p>Using this simple mechanism the dispatch functions can detect the

				multithreaded case by comparing <tt>_glapi_Dispatch</tt> to <tt>NULL</tt>.

				The resulting implementation of <tt>GET_DISPATCH</tt> is slightly more

				complex, but it avoids the expensive <tt>pthread_getspecific</tt> call in

				multithreaded case by comparing <code>_glapi_Dispatch</code> to <code>NULL</code>.

				The resulting implementation of <code>GET_DISPATCH</code> is slightly more

				complex, but it avoids the expensive <code>pthread_getspecific</code> call in

				the common case.</p>

				<blockquote>

				@@ -134,9 +134,9 @@ the common case.</p>

				<tr><td><pre>

				#define GET_DISPATCH() \

				    (_glapi_Dispatch != NULL) \

				        ? _glapi_Dispatch : pthread_getspecific(&_glapi_Dispatch_key)

				        ? _glapi_Dispatch : pthread_getspecific(&amp;_glapi_Dispatch_key)

				</pre></td></tr>

				<tr><td>Improved <tt>GET_DISPATCH</tt> Implementation</td></tr></table>

				<tr><td>Improved <code>GET_DISPATCH</code> Implementation</td></tr></table>

				</blockquote>

				<h3>3.2. ELF TLS</h3>

				@@ -144,14 +144,14 @@ the common case.</p>

				<p>Starting with the 2.4.20 Linux kernel, each thread is allocated an area

				of per-thread, global storage.  Variables can be put in this area using some

				extensions to GCC.  By storing the dispatch table pointer in this area, the

				expensive call to <tt>pthread_getspecific</tt> and the test of

				<tt>_glapi_Dispatch</tt> can be avoided.</p>

				expensive call to <code>pthread_getspecific</code> and the test of

				<code>_glapi_Dispatch</code> can be avoided.</p>

				<p>The dispatch table pointer is stored in a new variable called

				<tt>_glapi_tls_Dispatch</tt>.  A new variable name is used so that a single

				<code>_glapi_tls_Dispatch</code>.  A new variable name is used so that a single

				libGL can implement both interfaces.  This allows the libGL to operate with

				direct rendering drivers that use either interface.  Once the pointer is

				properly declared, <tt>GET_DISPACH</tt> becomes a simple variable

				properly declared, <code>GET_DISPACH</code> becomes a simple variable

				reference.</p>

				<blockquote>

				@@ -162,12 +162,12 @@ extern __thread struct _glapi_table *_glapi_tls_Dispatch

				#define GET_DISPATCH() _glapi_tls_Dispatch

				</pre></td></tr>

				<tr><td>TLS <tt>GET_DISPATCH</tt> Implementation</td></tr></table>

				<tr><td>TLS <code>GET_DISPATCH</code> Implementation</td></tr></table>

				</blockquote>

				<p>Use of this path is controlled by the preprocessor define

				<tt>GLX_USE_TLS</tt>.  Any platform capable of using TLS should use this as

				the default dispatch method.</p>

				<code>USE_ELF_TLS</code>.  Any platform capable of using ELF TLS should use this

				as the default dispatch method.</p>

				<h3>3.3. Assembly Language Dispatch Stubs</h3>

				@@ -185,13 +185,13 @@ ways that the dispatch table pointer can be accessed.  There are four

				different methods that can be used:</p>

				<ol>

				<li>Using <tt>_glapi_Dispatch</tt> directly in builds for non-multithreaded

				<li>Using <code>_glapi_Dispatch</code> directly in builds for non-multithreaded

				environments.</li>

				<li>Using <tt>_glapi_Dispatch</tt> and <tt>_glapi_get_dispatch</tt> in

				<li>Using <code>_glapi_Dispatch</code> and <code>_glapi_get_dispatch</code> in

				multithreaded environments.</li>

				<li>Using <tt>_glapi_Dispatch</tt> and <tt>pthread_getspecific</tt> in

				<li>Using <code>_glapi_Dispatch</code> and <code>pthread_getspecific</code> in

				multithreaded environments.</li>

				<li>Using <tt>_glapi_tls_Dispatch</tt> directly in TLS enabled

				<li>Using <code>_glapi_tls_Dispatch</code> directly in TLS enabled

				multithreaded environments.</li>

				</ol>

				@@ -204,13 +204,13 @@ terribly relevant.</p>

				few preprocessor defines.</p>

				<ul>

				<li>If <tt>GLX_USE_TLS</tt> is defined, method #3 is used.</li>

				<li>If <tt>HAVE_PTHREAD</tt> is defined, method #2 is used.</li>

				<li>If <code>USE_ELF_TLS</code> is defined, method #3 is used.</li>

				<li>If <code>HAVE_PTHREAD</code> is defined, method #2 is used.</li>

				<li>If none of the preceding are defined, method #1 is used.</li>

				</ul>

				<p>Two different techniques are used to handle the various different cases.

				On x86 and SPARC, a macro called <tt>GL_STUB</tt> is used.  In the preamble

				On x86 and SPARC, a macro called <code>GL_STUB</code> is used.  In the preamble

				of the assembly source file different implementations of the macro are

				selected based on the defined preprocessor variables.  The assembly code

				then consists of a series of invocations of the macros such as:

				@@ -220,7 +220,7 @@ then consists of a series of invocations of the macros such as:

				<tr><td><pre>

				GL_STUB(Color3fv, _gloffset_Color3fv)

				</pre></td></tr>

				<tr><td>SPARC Assembly Implementation of <tt>glColor3fv</tt></td></tr></table>

				<tr><td>SPARC Assembly Implementation of <code>glColor3fv</code></td></tr></table>

				</blockquote>

				<p>The benefit of this technique is that changes to the calling pattern

				@@ -231,32 +231,32 @@ changed lines in the assembly code.</p>

				implementation does not change based on the parameters passed to the

				function.  For example, since x86 passes all parameters on the stack, no

				additional code is needed to save and restore function parameters around a

				call to <tt>pthread_getspecific</tt>.  Since x86-64 passes parameters in

				call to <code>pthread_getspecific</code>.  Since x86-64 passes parameters in

				registers, varying amounts of code needs to be inserted around the call to

				<tt>pthread_getspecific</tt> to save and restore the GL function's

				<code>pthread_getspecific</code> to save and restore the GL function's

				parameters.</p>

				<p>The other technique, used by platforms like x86-64 that cannot use the

				first technique, is to insert <tt>#ifdef</tt> within the assembly

				first technique, is to insert <code>#ifdef</code> within the assembly

				implementation of each function.  This makes the assembly file considerably

				larger (e.g., 29,332 lines for <tt>glapi_x86-64.S</tt> versus 1,155 lines for

				<tt>glapi_x86.S</tt>) and causes simple changes to the function

				larger (e.g., 29,332 lines for <code>glapi_x86-64.S</code> versus 1,155 lines for

				<code>glapi_x86.S</code>) and causes simple changes to the function

				implementation to generate many lines of diffs.  Since the assembly files

				are typically generated by scripts (see <a href="#autogen">below</a>), this

				isn't a significant problem.</p>

				<p>Once a new assembly file is created, it must be inserted in the build

				system.  There are two steps to this.  The file must first be added to

				<tt>src/mesa/sources</tt>.  That gets the file built and linked.  The second

				step is to add the correct <tt>#ifdef</tt> magic to

				<tt>src/mesa/glapi/glapi_dispatch.c</tt> to prevent the C version of the

				<code>src/mesa/sources</code>.  That gets the file built and linked.  The second

				step is to add the correct <code>#ifdef</code> magic to

				<code>src/mesa/glapi/glapi_dispatch.c</code> to prevent the C version of the

				dispatch functions from being built.</p>

				<h3 id="fixedsize">3.4. Fixed-Length Dispatch Stubs</h3>

				<p>To implement <tt>glXGetProcAddress</tt>, Mesa stores a table that

				<p>To implement <code>glXGetProcAddress</code>, Mesa stores a table that

				associates function names with pointers to those functions.  This table is

				stored in <tt>src/mesa/glapi/glprocs.h</tt>.  For different reasons on

				stored in <code>src/mesa/glapi/glprocs.h</code>.  For different reasons on

				different platforms, storing all of those pointers is inefficient.  On most

				platforms, including all known platforms that support TLS, we can avoid this

				added overhead.</p>

				@@ -267,8 +267,8 @@ calculated by multiplying the size of the dispatch stub by the offset of the

				function in the table.  This value is then added to the address of the first

				dispatch stub.</p>

				<p>This path is activated by adding the correct <tt>#ifdef</tt> magic to

				<tt>src/mesa/glapi/glapi.c</tt> just before <tt>glprocs.h</tt> is

				<p>This path is activated by adding the correct <code>#ifdef</code> magic to

				<code>src/mesa/glapi/glapi.c</code> just before <code>glprocs.h</code> is

				included.</p>

				<h2 id="autogen">4. Automatic Generation of Dispatch Stubs</h2>

									
										50

docs/download.html
									
												View File
												
				@@ -2,19 +2,21 @@

				<html lang="en">

				<head>

				  <meta http-equiv="content-type" content="text/html; charset=utf-8">

				  <title>Getting Mesa</title>

				  <title>Downloading and Unpacking</title>

				  <link rel="stylesheet" type="text/css" href="mesa.css">

				</head>

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				<div class="content">

				<h1>Downloading</h1>

				<h1>Downloading and Unpacking</h1>

				<h2>Downloading</h2>

				<p>

				Primary Mesa download site:

				@@ -25,54 +27,38 @@ or <a href="https://mesa.freedesktop.org/archive/">mesa.freedesktop.org</a>

				<p>

				Starting with the first release of 2017, Mesa's version scheme is

				year-based. Filenames are in the form <tt>mesa-Y.N.P.tar.gz</tt>, where

				<tt>Y</tt> is the year (two digits), <tt>N</tt> is an incremental number

				(starting at 0) and <tt>P</tt> is the patch number (0 for the first

				year-based. Filenames are in the form <code>mesa-Y.N.P.tar.gz</code>, where

				<code>Y</code> is the year (two digits), <code>N</code> is an incremental number

				(starting at 0) and <code>P</code> is the patch number (0 for the first

				release, 1 for the first patch after that).

				</p>

				<p>

				When a new release is coming, release candidates (betas) may be found

				in the same directory, and are recognisable by the

				<tt>mesa-Y.N.P-<b>rc</b>X.tar.gz</tt> filename.

				<code>mesa-Y.N.P-<b>rc</b>X.tar.gz</code> filename.

				</p>

				<h1>Unpacking</h1>

				<h2>Unpacking</h2>

				<p>

				Mesa releases are available in two formats: <tt>.tar.xz</tt> and <tt>.tar.gz</tt>.

				Mesa releases are available in two formats: <code>.tar.xz</code> and <code>.tar.gz</code>.

				</p>

				<p>

				To unpack the tarball:

				</p>

				<pre>

					tar xf mesa-Y.N.P.tar.xz

				</pre>

				or

				<p>or</p>

				<pre>

					tar xf mesa-Y.N.P.tar.gz

				</pre>

				</p>

				<h1>Contents</h1>

				<p>

				After unpacking you'll have these files and directories (among others):

				</p>

				<pre>

				autogen.sh	- Autoconf script for *nix systems

				scons/		- SCons script for Windows builds

				include/	- GL header (include) files

				bin/		- shell scripts for making shared libraries, etc

				docs/		- documentation

				src/		- source code for libraries

				src/mesa	- sources for the main Mesa library and device drivers

				src/gallium     - sources for Gallium and Gallium drivers

				src/glx		- sources for building libGL with full GLX and DRI support

				</pre>

				<h2>Contents</h2>

				<p>

				Proceed to the <a href="install.html">compilation and installation

				@@ -80,7 +66,7 @@ instructions</a>.

				</p>

				<h1>Demos, GLUT, and GLU</h1>

				<h2>Demos, GLUT, and GLU</h2>

				<p>

				A package of SGI's GLU library is available

				@@ -102,9 +88,9 @@ In the past, GLUT, GLU and the Mesa demos were released in conjunction with

				Mesa releases.  But since GLUT, GLU and the demos change infrequently, they

				were split off into their own git repositories:

				<a href="https://cgit.freedesktop.org/mesa/glut/">GLUT</a>,

				<a href="https://cgit.freedesktop.org/mesa/glu/">GLU</a> and

				<a href="https://cgit.freedesktop.org/mesa/demos/">Demos</a>,

				<a href="https://gitlab.freedesktop.org/mesa/glut">GLUT</a>,

				<a href="https://gitlab.freedesktop.org/mesa/glu">GLU</a> and

				<a href="https://gitlab.freedesktop.org/mesa/demos">Demos</a>,

				</p>

				</div>

									
										45

docs/egl.html
									
												View File
												
				@@ -2,19 +2,19 @@

				<html lang="en">

				<head>

				  <meta http-equiv="content-type" content="text/html; charset=utf-8">

				  <title>Mesa EGL</title>

				  <title>EGL</title>

				  <link rel="stylesheet" type="text/css" href="mesa.css">

				</head>

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				<div class="content">

				<h1>Mesa EGL</h1>

				<h1>EGL</h1>

				<p>The current version of EGL in Mesa implements EGL 1.4.  More information

				about EGL can be found at

				@@ -33,13 +33,16 @@ directly dispatched to the drivers.</p>

				<ol>

				<li>

				<p>Run <code>configure</code> with the desired client APIs and enable

				the driver for your hardware.  For example</p>

				<p>Configure your build with the desired client APIs and enable

				the driver for your hardware.  For example:</p>

				<pre>

				  $ ./configure --enable-gles1 --enable-gles2 \

				                --with-dri-drivers=... \

				                --with-gallium-drivers=...

				$ meson configure \

				        -D egl=true \

				        -D gles1=true \

				        -D gles2=true \

				        -D dri-drivers=... \

				        -D gallium-drivers=...

				</pre>

				<p>The main library and OpenGL is enabled by default.  The first two options

				@@ -61,7 +64,7 @@ or more EGL drivers.</p>

				time</p>

				<dl>

				<dt><code>--enable-egl</code></dt>

				<dt><code>-D egl=true</code></dt>

				<dd>

				<p>By default, EGL is enabled.  When disabled, the main library and the drivers

				@@ -69,19 +72,11 @@ will not be built.</p>

				</dd>

				<dt><code>--with-egl-driver-dir</code></dt>

				<dd>

				<p>The directory EGL drivers should be installed to.  If not specified, EGL

				drivers will be installed to <code>${libdir}/egl</code>.</p>

				</dd>

				<dt><code>--with-platforms</code></dt>

				<dt><code>-D platforms=...</code></dt>

				<dd>

				<p>List the platforms (window systems) to support.  Its argument is a comma

				separated string such as <code>--with-platforms=x11,drm</code>.  It decides

				separated string such as <code>-D platforms=x11,drm</code>.  It decides

				the platforms a driver may support.  The first listed platform is also used by

				the main library to decide the native platform.</p>

				@@ -90,15 +85,13 @@ the main library to decide the native platform.</p>

				and <code>haiku</code>.

				The <code>android</code> platform can either be built as a system

				component, part of AOSP, using <code>Android.mk</code> files, or

				cross-compiled using appropriate <code>configure</code> options.

				The <code>haiku</code> platform can only be built with SCons.

				cross-compiled using appropriate options.

				Unless for special needs, the build system should

				select the right platforms automatically.</p>

				</dd>

				<dt><code>--enable-gles1</code></dt>

				<dt><code>--enable-gles2</code></dt>

				<dt><code>-D gles1=true</code> and <code>-D gles2=true</code></dt>

				<dd>

				<p>These options enable OpenGL ES support in OpenGL.  The result is one big

				@@ -106,7 +99,7 @@ internal library that supports multiple APIs.</p>

				</dd>

				<dt><code>--enable-shared-glapi</code></dt>

				<dt><code>-D shared-glapi=true</code></dt>

				<dd>

				<p>By default, <code>libGL</code> has its own copy of <code>libglapi</code>.

				@@ -134,9 +127,9 @@ runtime</p>

				<dd>

				<p>This variable specifies the native platform.  The valid values are the same

				as those for <code>--with-platforms</code>.  When the variable is not set,

				as those for <code>-D platforms=...</code>.  When the variable is not set,

				the main library uses the first platform listed in

				<code>--with-platforms</code> as the native platform.</p>

				<code>-D platforms=...</code> as the native platform.</p>

				<p>Extensions like <code>EGL_MESA_drm_display</code> define new functions to

				create displays for non-native platforms.  These extensions are usually used by

									
										717

docs/envvars.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				@@ -25,145 +25,206 @@ sometimes be useful for debugging end-user issues.

				<h2>LibGL environment variables</h2>

				<ul>

				<li>LIBGL_DEBUG - If defined debug information will be printed to stderr.

				   If set to 'verbose' additional information will be printed.

				<li>LIBGL_DRIVERS_PATH - colon-separated list of paths to search for DRI drivers

				<li>LIBGL_ALWAYS_INDIRECT - if set to `true`, forces an indirect rendering context/connection.

				<li>LIBGL_ALWAYS_SOFTWARE - if set to `true`, always use software rendering

				<li>LIBGL_NO_DRAWARRAYS - if set to `true`, do not use DrawArrays GLX protocol (for debugging)

				<li>LIBGL_SHOW_FPS - print framerate to stdout based on the number of glXSwapBuffers

				    calls per second.

				<li>LIBGL_DRI3_DISABLE - disable DRI3 if set to `true`.

				</ul>

				<dl>

				<dt><code>LIBGL_DEBUG</code></dt>

				<dd>If defined debug information will be printed to stderr.

				    If set to <code>verbose</code> additional information will be

				    printed.</dd>

				<dt><code>LIBGL_DRIVERS_PATH</code></dt>

				<dd>colon-separated list of paths to search for DRI drivers</dd>

				<dt><code>LIBGL_ALWAYS_INDIRECT</code></dt>

				<dd>if set to <code>true</code>, forces an indirect rendering

				    context/connection.</dd>

				<dt><code>LIBGL_ALWAYS_SOFTWARE</code></dt>

				<dd>if set to <code>true</code>, always use software rendering</dd>

				<dt><code>LIBGL_NO_DRAWARRAYS</code></dt>

				<dd>if set to <code>true</code>, do not use DrawArrays GLX protocol

				    (for debugging)</dd>

				<dt><code>LIBGL_SHOW_FPS</code></dt>

				<dd>print framerate to stdout based on the number of

				    <code>glXSwapBuffers</code> calls per second.</dd>

				<dt><code>LIBGL_DRI3_DISABLE</code></dt>

				<dd>disable DRI3 if set to <code>true</code>.</dd>

				</dl>

				<h2>Core Mesa environment variables</h2>

				<dl>

				<dt><code>MESA_NO_ASM</code></dt>

				<dd>if set, disables all assembly language optimizations</dd>

				<dt><code>MESA_NO_MMX</code></dt>

				<dd>if set, disables Intel MMX optimizations</dd>

				<dt><code>MESA_NO_3DNOW</code></dt>

				<dd>if set, disables AMD 3DNow! optimizations</dd>

				<dt><code>MESA_NO_SSE</code></dt>

				<dd>if set, disables Intel SSE optimizations</dd>

				<dt><code>MESA_NO_ERROR</code></dt>

				<dd>if set to 1, error checking is disabled as per <code>KHR_no_error</code>.

				    This will result in undefined behaviour for invalid use of the api, but

				    can reduce CPU use for apps that are known to be error free.</dd>

				<dt><code>MESA_DEBUG</code></dt>

				<dd>if set, error messages are printed to stderr.  For example,

				    if the application generates a <code>GL_INVALID_ENUM</code> error, a

				    corresponding error message indicating where the error occurred, and

				    possibly why, will be printed to stderr. For release builds,

				    <code>MESA_DEBUG</code> defaults to off (no debug output).

				    <code>MESA_DEBUG</code> accepts the following comma-separated list of

				    named flags, which adds extra behaviour to just set

				    <code>MESA_DEBUG=1</code>:

				    <dl>

				      <dt><code>silent</code></dt>

				      <dd>turn off debug messages. Only useful for debug builds.</dd>

				      <dt><code>flush</code></dt>

				      <dd>flush after each drawing command</dd>

				      <dt><code>incomplete_tex</code></dt>

				      <dd>extra debug messages when a texture is incomplete</dd>

				      <dt><code>incomplete_fbo</code></dt>

				      <dd>extra debug messages when a fbo is incomplete</dd>

				      <dt><code>context</code></dt>

				      <dd>create a debug context (see <code>GLX_CONTEXT_DEBUG_BIT_ARB</code>)

				          and print error and performance messages to stderr (or

				          <code>MESA_LOG_FILE</code>).</dd>

				    </dl>

				</dd>

				<dt><code>MESA_LOG_FILE</code></dt>

				<dd>specifies a file name for logging all errors, warnings,

				    etc., rather than stderr</dd>

				<dt><code>MESA_TEX_PROG</code></dt>

				<dd>if set, implement conventional texture env modes with

				    fragment programs (intended for developers only)</dd>

				<dt><code>MESA_TNL_PROG</code></dt>

				<dd>if set, implement conventional vertex transformation operations with

				    vertex programs (intended for developers only). Setting this variable

				    automatically sets the <code>MESA_TEX_PROG</code> variable as well.</dd>

				<dt><code>MESA_EXTENSION_OVERRIDE</code></dt>

				<dd>can be used to enable/disable extensions. A value such as

				    <code>GL_EXT_foo -GL_EXT_bar</code> will enable the

				    <code>GL_EXT_foo</code> extension and disable the

				    <code>GL_EXT_bar</code> extension.</dd>

				<dt><code>MESA_EXTENSION_MAX_YEAR</code></dt>

				<dd>The <code>GL_EXTENSIONS</code> string returned by Mesa is sorted by

				    extension year. If this variable is set to year X, only extensions

				    defined on or before year X will be reported. This is to work-around a

				    bug in some games where the extension string is copied into a fixed-size

				    buffer without truncating. If the extension string is too long, the

				    buffer overrun can cause the game to crash. This is a work-around for

				    that.</dd>

				<dt><code>MESA_GL_VERSION_OVERRIDE</code></dt>

				<dd>changes the value returned by

				<code>glGetString(GL_VERSION)</code> and possibly the GL API type.

				<ul>

				<li>MESA_NO_ASM - if set, disables all assembly language optimizations

				<li>MESA_NO_MMX - if set, disables Intel MMX optimizations

				<li>MESA_NO_3DNOW - if set, disables AMD 3DNow! optimizations

				<li>MESA_NO_SSE - if set, disables Intel SSE optimizations

				<li>MESA_NO_ERROR - if set to 1, error checking is disabled as per KHR_no_error.

				   This will result in undefined behaviour for invalid use of the api, but

				   can reduce CPU use for apps that are known to be error free.</li>

				<li>MESA_DEBUG - if set, error messages are printed to stderr.  For example,

				   if the application generates a GL_INVALID_ENUM error, a corresponding error

				   message indicating where the error occurred, and possibly why, will be

				   printed to stderr.<br>

				   For release builds, MESA_DEBUG defaults to off (no debug output).

				   MESA_DEBUG accepts the following comma-separated list of named

				   flags, which adds extra behaviour to just set MESA_DEBUG=1:

				   <ul>

				     <li>silent - turn off debug messages. Only useful for debug builds.</li>

				     <li>flush - flush after each drawing command</li>

				     <li>incomplete_tex - extra debug messages when a texture is incomplete</li>

				     <li>incomplete_fbo - extra debug messages when a fbo is incomplete</li>

				     <li>context - create a debug context (see GLX_CONTEXT_DEBUG_BIT_ARB) and

				         print error and performance messages to stderr (or MESA_LOG_FILE).</li>

				   </ul>

				<li>MESA_LOG_FILE - specifies a file name for logging all errors, warnings,

				etc., rather than stderr

				<li>MESA_TEX_PROG - if set, implement conventional texture env modes with

				fragment programs (intended for developers only)

				<li>MESA_TNL_PROG - if set, implement conventional vertex transformation

				operations with vertex programs (intended for developers only).

				Setting this variable automatically sets the MESA_TEX_PROG variable as well.

				<li>MESA_EXTENSION_OVERRIDE - can be used to enable/disable extensions.

				A value such as "GL_EXT_foo -GL_EXT_bar" will enable the GL_EXT_foo extension

				and disable the GL_EXT_bar extension.

				<li>MESA_EXTENSION_MAX_YEAR - The GL_EXTENSIONS string returned by Mesa is sorted

				by extension year.

				If this variable is set to year X, only extensions defined on or before year

				X will be reported.

				This is to work-around a bug in some games where the extension string is

				copied into a fixed-size buffer without truncating.

				If the extension string is too long, the buffer overrun can cause the game

				to crash.

				This is a work-around for that.

				<li>MESA_GL_VERSION_OVERRIDE - changes the value returned by

				glGetString(GL_VERSION) and possibly the GL API type.

				<ul>

				  <li>The format should be MAJOR.MINOR[FC|COMPAT]

				  <li>FC is an optional suffix that indicates a forward compatible

				      context. This is only valid for versions &gt;= 3.0.

				  <li>COMPAT is an optional suffix that indicates a compatibility

				      context or GL_ARB_compatibility support. This is only valid for

				      versions &gt;= 3.1.

				  <li>The format should be <code>MAJOR.MINOR[FC|COMPAT]</code>

				  <li><code>FC</code> is an optional suffix that indicates a forward

				      compatible context. This is only valid for versions &gt;= 3.0.

				  <li><code>COMPAT</code> is an optional suffix that indicates a

				      compatibility context or <code>GL_ARB_compatibility</code> support.

				      This is only valid for versions &gt;= 3.1.

				  <li>GL versions &lt;= 3.0 are set to a compatibility (non-Core)

				      profile

				  <li>GL versions = 3.1, depending on the driver, it may or may not

				      have the ARB_compatibility extension enabled.

				      have the <code>ARB_compatibility</code> extension enabled.

				  <li>GL versions &gt;= 3.2 are set to a Core profile

				  <li>Examples: 2.1, 3.0, 3.0FC, 3.1, 3.1FC, 3.1COMPAT, X.Y, X.YFC,

				      X.YCOMPAT.

				  <ul>

				    <li>2.1 - select a compatibility (non-Core) profile with GL

				        version 2.1.

				    <li>3.0 - select a compatibility (non-Core) profile with GL

				        version 3.0.

				    <li>3.0FC - select a Core+Forward Compatible profile with GL

				        version 3.0.

				    <li>3.1 - select GL version 3.1 with GL_ARB_compatibility enabled

				        per the driver default.

				    <li>3.1FC - select GL version 3.1 with forward compatibility and

				        GL_ARB_compatibility disabled.

				    <li>3.1COMPAT - select GL version 3.1 with GL_ARB_compatibility

				        enabled.

				    <li>X.Y - override GL version to X.Y without changing the profile.

				    <li>X.YFC - select a Core+Forward Compatible profile with GL

				        version X.Y.

				    <li>X.YCOMPAT - select a Compatibility profile with GL version

				        X.Y.

				  </ul>

				  <li>Examples:

				  <dl>

				    <dt><code>2.1</code></dt>

				    <dd>select a compatibility (non-Core) profile with GL version 2.1.</dd>

				    <dt><code>3.0</code></dt>

				    <dd>select a compatibility (non-Core) profile with GL version 3.0.</dd>

				    <dt><code>3.0FC</code></dt>

				    <dd>select a Core+Forward Compatible profile with GL version 3.0.</dd>

				    <dt><code>3.1</code></dt>

				    <dd>select GL version 3.1 with <code>GL_ARB_compatibility</code>

				        enabled per the driver default.</dd>

				    <dt><code>3.1FC</code></dt>

				    <dd>select GL version 3.1 with forward compatibility and

				        <code>GL_ARB_compatibility</code> disabled.</dd>

				    <dt><code>3.1COMPAT</code></dt>

				    <dd>select GL version 3.1 with <code>GL_ARB_compatibility</code>

				        enabled.</dd>

				    <dt><code>X.Y</code></dt>

				    <dd>override GL version to X.Y without changing the profile.</dd>

				    <dt><code>X.YFC</code></dt>

				    <dd>select a Core+Forward Compatible profile with GL version X.Y.</dd>

				    <dt><code>X.YCOMPAT</code></dt>

				    <dd>select a Compatibility profile with GL version X.Y.</dd>

				  </dl>

				  <li>Mesa may not really implement all the features of the given

				      version. (for developers only)

				</ul>

				<li>MESA_GLES_VERSION_OVERRIDE - changes the value returned by

				glGetString(GL_VERSION) for OpenGL ES.

				</dd>

				<dt><code>MESA_GLES_VERSION_OVERRIDE</code></dt>

				<dd>changes the value returned by <code>glGetString(GL_VERSION)</code>

				    for OpenGL ES.

				<ul>

				<li> The format should be MAJOR.MINOR

				<li> Examples: 2.0, 3.0, 3.1

				<li> The format should be <code>MAJOR.MINOR</code>

				<li> Examples: <code>2.0</code>, <code>3.0</code>, <code>3.1</code>

				<li> Mesa may not really implement all the features of the given version.

				(for developers only)

				</ul>

				<li>MESA_GLSL_VERSION_OVERRIDE - changes the value returned by

				glGetString(GL_SHADING_LANGUAGE_VERSION). Valid values are integers, such as

				"130".  Mesa will not really implement all the features of the given language version

				if it's higher than what's normally reported. (for developers only)

				<li>MESA_GLSL_CACHE_DISABLE - if set to `true`, disables the GLSL shader cache

				<li>MESA_GLSL_CACHE_MAX_SIZE - if set, determines the maximum size of

				the on-disk cache of compiled GLSL programs. Should be set to a number

				optionally followed by 'K', 'M', or 'G' to specify a size in

				kilobytes, megabytes, or gigabytes. By default, gigabytes will be

				assumed. And if unset, a maximum size of 1GB will be used. Note: A separate

				cache might be created for each architecture that Mesa is installed for on

				your system. For example under the default settings you may end up with a 1GB

				cache for x86_64 and another 1GB cache for i386.

				<li>MESA_GLSL_CACHE_DIR - if set, determines the directory to be used

				for the on-disk cache of compiled GLSL programs. If this variable is

				not set, then the cache will be stored in $XDG_CACHE_HOME/mesa (if

				that variable is set), or else within .cache/mesa within the user's

				home directory.

				<li>MESA_GLSL - <a href="shading.html#envvars">shading language compiler options</a>

				<li>MESA_NO_MINMAX_CACHE - when set, the minmax index cache is globally disabled.

				<li>MESA_SHADER_CAPTURE_PATH - see <a href="shading.html#capture">Capturing Shaders</a></li>

				<li>MESA_SHADER_DUMP_PATH and MESA_SHADER_READ_PATH - see <a href="shading.html#replacement">Experimenting with Shader Replacements</a></li>

				<li>MESA_VK_VERSION_OVERRIDE - changes the Vulkan physical device version

				    as returned in VkPhysicalDeviceProperties::apiVersion.

				</dd>

				<dt><code>MESA_GLSL_VERSION_OVERRIDE</code></dt>

				<dd>changes the value returned by

				    <code>glGetString(GL_SHADING_LANGUAGE_VERSION)</code>.

				    Valid values are integers, such as <code>130</code>.  Mesa will not

				    really implement all the features of the given language version if

				    it's higher than what's normally reported. (for developers only)

				</dd>

				<dt><code>MESA_GLSL_CACHE_DISABLE</code></dt>

				<dd>if set to <code>true</code>, disables the GLSL shader cache</dd>

				<dt><code>MESA_GLSL_CACHE_MAX_SIZE</code></dt>

				<dd>if set, determines the maximum size of the on-disk cache of compiled GLSL

				    programs. Should be set to a number optionally followed by <code>K</code>,

				    <code>M</code>, or <code>G</code> to specify a size in kilobytes,

				    megabytes, or gigabytes. By default, gigabytes will be assumed. And if

				    unset, a maximum size of 1GB will be used. Note: A separate cache might

				    be created for each architecture that Mesa is installed for on your

				    system. For example under the default settings you may end up with a 1GB

				    cache for x86_64 and another 1GB cache for i386.</dd>

				<dt><code>MESA_GLSL_CACHE_DIR</code></dt>

				<dd>if set, determines the directory to be used for the on-disk cache of

				    compiled GLSL programs. If this variable is not set, then the cache will

				    be stored in <code>$XDG_CACHE_HOME/mesa_shader_cache</code> (if that

				    variable is set), or else within <code>.cache/mesa_shader_cache</code>

				    within the user's home directory.

				</dd>

				<dt><code>MESA_GLSL</code></dt>

				<dd><a href="shading.html#envvars">shading language compiler options</a></dd>

				<dt><code>MESA_NO_MINMAX_CACHE</code></dt>

				<dd>when set, the minmax index cache is globally disabled.</dd>

				<dt><code>MESA_SHADER_CAPTURE_PATH</code></dt>

				<dd>see <a href="shading.html#capture">Capturing Shaders</a></dd>

				<dt><code>MESA_SHADER_DUMP_PATH</code> and <code>MESA_SHADER_READ_PATH</code></dt>

				<dd>see <a href="shading.html#replacement">Experimenting with Shader Replacements</a></dd>

				<dt><code>MESA_VK_VERSION_OVERRIDE</code></dt>

				<dd>changes the Vulkan physical device version

				    as returned in <code>VkPhysicalDeviceProperties::apiVersion</code>.

				  <ul>

				    <li>The format should be MAJOR.MINOR[.PATCH]</li>

				    <li>The format should be <code>MAJOR.MINOR[.PATCH]</code></li>

				    <li>This will not let you force a version higher than the driver's

				        instance versionas advertised by vkEnumerateInstanceVersion</li>

				        instance version as advertised by

				        <code>vkEnumerateInstanceVersion</code></li>

				    <li>This can be very useful for debugging but some features may not be

				        implemented correctly. (For developers only)</li>

				  </ul>

				</li>

				</ul>

				</dd>

				</dl>

				<h2>NIR passes enviroment variables</h2>

				<p>

				The following are only applicable for drivers that uses NIR, as they

				modify the behaviour for the common NIR_PASS and NIR_PASS_V macros,

				that wrap calls to NIR lowering/optimizations.

				</p>

				<dl>

				  <dt><code>NIR_PRINT</code></dt>

				  <dd>If defined, the resulting NIR shader will be printed out at each succesful NIR lowering/optimization call.</dd>

				  <dt><code>NIR_TEST_CLONE</code></dt>

				  <dd>If defined, cloning a NIR shader would be tested at each succesful NIR lowering/optimization call.</dd>

				  <dt><code>NIR_TEST_SERIALIZE</code></dt>

				  <dd>If defined, serialize and deserialize a NIR shader would be tested at each succesful NIR lowering/optimization call.</dd>

				</dl>

				<h2>Mesa Xlib driver environment variables</h2>

				@@ -172,80 +233,137 @@ home directory.

				The following are only applicable to the Mesa Xlib software driver.

				See the <a href="xlibdriver.html">Xlib software driver page</a> for details.

				</p>

				<ul>

				<li>MESA_RGB_VISUAL - specifies the X visual and depth for RGB mode

				<li>MESA_CI_VISUAL - specifies the X visual and depth for CI mode

				<li>MESA_BACK_BUFFER - specifies how to implement the back color buffer,

				    either "pixmap" or "ximage"

				<li>MESA_GAMMA - gamma correction coefficients for red, green, blue channels

				<li>MESA_XSYNC - enable synchronous X behavior (for debugging only)

				<li>MESA_GLX_FORCE_CI - if set, force GLX to treat 8bpp visuals as CI visuals

				<li>MESA_GLX_FORCE_ALPHA - if set, forces RGB windows to have an alpha channel.

				<li>MESA_GLX_DEPTH_BITS - specifies default number of bits for depth buffer.

				<li>MESA_GLX_ALPHA_BITS - specifies default number of bits for alpha channel.

				</ul>

				<dl>

				<dt><code>MESA_RGB_VISUAL</code></dt>

				<dd>specifies the X visual and depth for RGB mode</dd>

				<dt><code>MESA_CI_VISUAL</code></dt>

				<dd>specifies the X visual and depth for CI mode</dd>

				<dt><code>MESA_BACK_BUFFER</code></dt>

				<dd>specifies how to implement the back color buffer, either

				    <code>pixmap</code> or <code>ximage</code></dd>

				<dt><code>MESA_GAMMA</code></dt>

				<dd>gamma correction coefficients for red, green, blue channels</dd>

				<dt><code>MESA_XSYNC</code></dt>

				<dd>enable synchronous X behavior (for debugging only)</dd>

				<dt><code>MESA_GLX_FORCE_CI</code></dt>

				<dd>if set, force GLX to treat 8bpp visuals as CI visuals</dd>

				<dt><code>MESA_GLX_FORCE_ALPHA</code></dt>

				<dd>if set, forces RGB windows to have an alpha channel.</dd>

				<dt><code>MESA_GLX_DEPTH_BITS</code></dt>

				<dd>specifies default number of bits for depth buffer.</dd>

				<dt><code>MESA_GLX_ALPHA_BITS</code></dt>

				<dd>specifies default number of bits for alpha channel.</dd>

				</dl>

				<h2>i945/i965 driver environment variables (non-Gallium)</h2>

				<ul>

				<li>INTEL_NO_HW - if set to 1, prevents batches from being submitted to the hardware.

				   This is useful for debugging hangs, etc.</li>

				<li>INTEL_DEBUG - a comma-separated list of named flags, which do various things:

				<ul>

				   <li>ann - annotate IR in assembly dumps</li>

				   <li>aub - dump batches into an AUB trace for use with simulation tools</li>

				   <li>bat - emit batch information</li>

				   <li>blit - emit messages about blit operations</li>

				   <li>blorp - emit messages about the blorp operations (blits &amp; clears)</li>

				   <li>buf - emit messages about buffer objects</li>

				   <li>clip - emit messages about the clip unit (for old gens, includes the CLIP program)</li>

				   <li>color - use color in output</li>

				   <li>cs - dump shader assembly for compute shaders</li>

				   <li>do32 - generate compute shader SIMD32 programs even if workgroup size doesn't exceed the SIMD16 limit</li>

				   <li>dri - emit messages about the DRI interface</li>

				   <li>fbo - emit messages about framebuffers</li>

				   <li>fs - dump shader assembly for fragment shaders</li>

				   <li>gs - dump shader assembly for geometry shaders</li>

				   <li>hex - print instruction hex dump with the disassembly</li>

				   <li>l3 - emit messages about the new L3 state during transitions</li>

				   <li>miptree - emit messages about miptrees</li>

				   <li>no8 - don't generate SIMD8 fragment shader</li>

				   <li>no16 - suppress generation of 16-wide fragment shaders. useful for debugging broken shaders</li>

				   <li>nocompact - disable instruction compaction</li>

				   <li>nodualobj - suppress generation of dual-object geometry shader code</li>

				   <li>norbc - disable single sampled render buffer compression</li>

				   <li>optimizer - dump shader assembly to files at each optimization pass and iteration that make progress</li>

				   <li>perf - emit messages about performance issues</li>

				   <li>perfmon - emit messages about AMD_performance_monitor</li>

				   <li>pix - emit messages about pixel operations</li>

				   <li>prim - emit messages about drawing primitives</li>

				   <li>reemit - mark all state dirty on each draw call</li>

				   <li>sf - emit messages about the strips &amp; fans unit (for old gens, includes the SF program)</li>

				   <li>shader_time - record how much GPU time is spent in each shader</li>

				   <li>spill_fs - force spilling of all registers in the scalar backend (useful to debug spilling code)</li>

				   <li>spill_vec4 - force spilling of all registers in the vec4 backend (useful to debug spilling code)</li>

				   <li>state - emit messages about state flag tracking</li>

				   <li>submit - emit batchbuffer usage statistics</li>

				   <li>sync - after sending each batch, emit a message and wait for that batch to finish rendering</li>

				   <li>tcs - dump shader assembly for tessellation control shaders</li>

				   <li>tes - dump shader assembly for tessellation evaluation shaders</li>

				   <li>tex - emit messages about textures.</li>

				   <li>urb - emit messages about URB setup</li>

				   <li>vert - emit messages about vertex assembly</li>

				   <li>vs - dump shader assembly for vertex shaders</li>

				</ul>

				<li>INTEL_SCALAR_VS (or TCS, TES, GS) - force scalar/vec4 mode for a shader stage (Gen8-9 only)</li>

				<li>INTEL_PRECISE_TRIG - if set to 1, true or yes, then the driver prefers

				   accuracy over performance in trig functions.</li>

				</ul>

				<dl>

				<dt><code>INTEL_NO_HW</code></dt>

				<dd>if set to 1, prevents batches from being submitted to the hardware.

				    This is useful for debugging hangs, etc.</dd>

				<dt><code>INTEL_DEBUG</code></dt>

				<dd>a comma-separated list of named flags, which do various things:

				<dl>

				   <dt><code>ann</code></dt>

				   <dd>annotate IR in assembly dumps</dd>

				   <dt><code>aub</code></dt>

				   <dd>dump batches into an AUB trace for use with simulation tools</dd>

				   <dt><code>bat</code></dt>

				   <dd>emit batch information</dd>

				   <dt><code>blit</code></dt>

				   <dd>emit messages about blit operations</dd>

				   <dt><code>blorp</code></dt>

				   <dd>emit messages about the blorp operations (blits &amp; clears)</dd>

				   <dt><code>buf</code></dt>

				   <dd>emit messages about buffer objects</dd>

				   <dt><code>clip</code></dt>

				   <dd>emit messages about the clip unit (for old gens, includes the CLIP program)</dd>

				   <dt><code>color</code></dt>

				   <dd>use color in output</dd>

				   <dt><code>cs</code></dt>

				   <dd>dump shader assembly for compute shaders</dd>

				   <dt><code>do32</code></dt>

				   <dd>generate compute shader SIMD32 programs even if workgroup size doesn't exceed the SIMD16 limit</dd>

				   <dt><code>dri</code></dt>

				   <dd>emit messages about the DRI interface</dd>

				   <dt><code>fbo</code></dt>

				   <dd>emit messages about framebuffers</dd>

				   <dt><code>fs</code></dt>

				   <dd>dump shader assembly for fragment shaders</dd>

				   <dt><code>gs</code></dt>

				   <dd>dump shader assembly for geometry shaders</dd>

				   <dt><code>hex</code></dt>

				   <dd>print instruction hex dump with the disassembly</dd>

				   <dt><code>l3</code></dt>

				   <dd>emit messages about the new L3 state during transitions</dd>

				   <dt><code>miptree</code></dt>

				   <dd>emit messages about miptrees</dd>

				   <dt><code>no8</code></dt>

				   <dd>don't generate SIMD8 fragment shader</dd>

				   <dt><code>no16</code></dt>

				   <dd>suppress generation of 16-wide fragment shaders. useful for debugging broken shaders</dd>

				   <dt><code>nocompact</code></dt>

				   <dd>disable instruction compaction</dd>

				   <dt><code>nodualobj</code></dt>

				   <dd>suppress generation of dual-object geometry shader code</dd>

				   <dt><code>norbc</code></dt>

				   <dd>disable single sampled render buffer compression</dd>

				   <dt><code>optimizer</code></dt>

				   <dd>dump shader assembly to files at each optimization pass and iteration that make progress</dd>

				   <dt><code>perf</code></dt>

				   <dd>emit messages about performance issues</dd>

				   <dt><code>perfmon</code></dt>

				   <dd>emit messages about <code>AMD_performance_monitor</code></dd>

				   <dt><code>pix</code></dt>

				   <dd>emit messages about pixel operations</dd>

				   <dt><code>prim</code></dt>

				   <dd>emit messages about drawing primitives</dd>

				   <dt><code>reemit</code></dt>

				   <dd>mark all state dirty on each draw call</dd>

				   <dt><code>sf</code></dt>

				   <dd>emit messages about the strips &amp; fans unit (for old gens, includes the SF program)</dd>

				   <dt><code>shader_time</code></dt>

				   <dd>record how much GPU time is spent in each shader</dd>

				   <dt><code>spill_fs</code></dt>

				   <dd>force spilling of all registers in the scalar backend (useful to debug spilling code)</dd>

				   <dt><code>spill_vec4</code></dt>

				   <dd>force spilling of all registers in the vec4 backend (useful to debug spilling code)</dd>

				   <dt><code>state</code></dt>

				   <dd>emit messages about state flag tracking</dd>

				   <dt><code>submit</code></dt>

				   <dd>emit batchbuffer usage statistics</dd>

				   <dt><code>sync</code></dt>

				   <dd>after sending each batch, emit a message and wait for that batch to finish rendering</dd>

				   <dt><code>tcs</code></dt>

				   <dd>dump shader assembly for tessellation control shaders</dd>

				   <dt><code>tes</code></dt>

				   <dd>dump shader assembly for tessellation evaluation shaders</dd>

				   <dt><code>tex</code></dt>

				   <dd>emit messages about textures.</dd>

				   <dt><code>urb</code></dt>

				   <dd>emit messages about URB setup</dd>

				   <dt><code>vert</code></dt>

				   <dd>emit messages about vertex assembly</dd>

				   <dt><code>vs</code></dt>

				   <dd>dump shader assembly for vertex shaders</dd>

				</dl>

				</dd>

				<dt><code>INTEL_SCALAR_VS</code> (or <code>TCS</code>, <code>TES</code>,

				    <code>GS</code>)</dt>

				<dd>force scalar/vec4 mode for a shader stage (Gen8-9 only)</dd>

				<dt><code>INTEL_PRECISE_TRIG</code></dt>

				<dd>if set to 1, true or yes, then the driver prefers accuracy over

				    performance in trig functions.</dd>

				</dl>

				<h2>Radeon driver environment variables (radeon, r200, and r300g)</h2>

				<ul>

				<li>RADEON_NO_TCL - if set, disable hardware-accelerated Transform/Clip/Lighting.

				</ul>

				<dl>

				<dt><code>RADEON_NO_TCL</code></dt>

				<dd>if set, disable hardware-accelerated Transform/Clip/Lighting.</dd>

				</dl>

				<h2>EGL environment variables</h2>

				@@ -258,119 +376,170 @@ Mesa EGL supports different sets of environment variables.  See the

				<h2>Gallium environment variables</h2>

				<ul>

				<li>GALLIUM_HUD - draws various information on the screen, like framerate,

				<dl>

				<dt><code>GALLIUM_HUD</code></dt>

				<dd>draws various information on the screen, like framerate,

				    cpu load, driver statistics, performance counters, etc.

				    Set GALLIUM_HUD=help and run e.g. glxgears for more info.

				<li>GALLIUM_HUD_PERIOD - sets the hud update rate in seconds (float). Use zero

				    to update every frame. The default period is 1/2 second.

				<li>GALLIUM_HUD_VISIBLE - control default visibility, defaults to true.

				<li>GALLIUM_HUD_TOGGLE_SIGNAL - toggle visibility via user specified signal.

				    Set <code>GALLIUM_HUD=help</code> and run e.g.

				    <code>glxgears</code> for more info.</dd>

				<dt><code>GALLIUM_HUD_PERIOD</code></dt>

				<dd>sets the hud update rate in seconds (float). Use zero

				    to update every frame. The default period is 1/2 second.</dd>

				<dt><code>GALLIUM_HUD_VISIBLE</code></dt>

				<dd>control default visibility, defaults to true.</dd>

				<dt><code>GALLIUM_HUD_TOGGLE_SIGNAL</code></dt>

				<dd>toggle visibility via user specified signal.

				    Especially useful to toggle hud at specific points of application and

				    disable for unencumbered viewing the rest of the time. For example, set

				    GALLIUM_HUD_VISIBLE to false and GALLIUM_HUD_TOGGLE_SIGNAL to 10 (SIGUSR1).

				    Use kill -10 &lt;pid&gt; to toggle the hud as desired.

				<li>GALLIUM_HUD_DUMP_DIR - specifies a directory for writing the displayed

				    hud values into files.

				<li>GALLIUM_DRIVER - useful in combination with LIBGL_ALWAYS_SOFTWARE=true for

				    choosing one of the software renderers "softpipe", "llvmpipe" or "swr".

				<li>GALLIUM_LOG_FILE - specifies a file for logging all errors, warnings, etc.

				    rather than stderr.

				<li>GALLIUM_PRINT_OPTIONS - if non-zero, print all the Gallium environment

				    variables which are used, and their current values.

				<li>GALLIUM_DUMP_CPU - if non-zero, print information about the CPU on start-up

				<li>TGSI_PRINT_SANITY - if set, do extra sanity checking on TGSI shaders and

				    print any errors to stderr.

				<LI>DRAW_FSE - ???

				<LI>DRAW_NO_FSE - ???

				<li>DRAW_USE_LLVM - if set to zero, the draw module will not use LLVM to execute

				    shaders, vertex fetch, etc.

				<li>ST_DEBUG - controls debug output from the Mesa/Gallium state tracker.

				Setting to "tgsi", for example, will print all the TGSI shaders.

				See src/mesa/state_tracker/st_debug.c for other options.

				</ul>

				    <code>GALLIUM_HUD_VISIBLE</code> to <code>false</code> and

				    <code>GALLIUM_HUD_TOGGLE_SIGNAL</code> to <code>10</code>

				    (<code>SIGUSR1</code>).

				    Use <code>kill -10 &lt;pid&gt;</code> to toggle the hud as desired.</dd>

				<dt><code>GALLIUM_HUD_DUMP_DIR</code></dt>

				<dd>specifies a directory for writing the displayed hud values into files.</dd>

				<dt><code>GALLIUM_DRIVER</code></dt>

				<dd>useful in combination with <code>LIBGL_ALWAYS_SOFTWARE=true</code> for

				    choosing one of the software renderers <code>softpipe</code>,

				    <code>llvmpipe</code> or <code>swr</code>.</dd>

				<dt><code>GALLIUM_LOG_FILE</code></dt>

				<dd>specifies a file for logging all errors, warnings, etc.

				    rather than stderr.</dd>

				<dt><code>GALLIUM_PRINT_OPTIONS</code></dt>

				<dd>if non-zero, print all the Gallium environment variables which are

				    used, and their current values.</dd>

				<dt><code>GALLIUM_DUMP_CPU</code></dt>

				<dd>if non-zero, print information about the CPU on start-up</dd>

				<dt><code>TGSI_PRINT_SANITY</code></dt>

				<dd>if set, do extra sanity checking on TGSI shaders and

				    print any errors to stderr.</dd>

				<dt><code>DRAW_FSE</code></dt>

				<dd>???</dd>

				<dt><code>DRAW_NO_FSE</code></dt>

				<dd>???</dd>

				<dt><code>DRAW_USE_LLVM</code></dt>

				<dd>if set to zero, the draw module will not use LLVM to execute

				    shaders, vertex fetch, etc.</dd>

				<dt><code>ST_DEBUG</code></dt>

				<dd>controls debug output from the Mesa/Gallium state tracker.

				    Setting to <code>tgsi</code>, for example, will print all the TGSI

				    shaders. See <code>src/mesa/state_tracker/st_debug.c</code> for other

				    options.</dd>

				</dl>

				<h3>Clover state tracker environment variables</h3>

				<ul>

				<li>CLOVER_EXTRA_BUILD_OPTIONS - allows specifying additional compiler and linker

				<dl>

				<dt><code>CLOVER_EXTRA_BUILD_OPTIONS</code></dt>

				<dd>allows specifying additional compiler and linker

				    options. Specified options are appended after the options set by the OpenCL

				    program in clBuildProgram.

				<li>CLOVER_EXTRA_COMPILE_OPTIONS - allows specifying additional compiler

				    program in <code>clBuildProgram</code>.</dd>

				<dt><code>CLOVER_EXTRA_COMPILE_OPTIONS</code></dt>

				<dd>allows specifying additional compiler

				    options. Specified options are appended after the options set by the OpenCL

				    program in clCompileProgram.

				<li>CLOVER_EXTRA_LINK_OPTIONS - allows specifying additional linker

				    program in <code>clCompileProgram</code>.</dd>

				<dt><code>CLOVER_EXTRA_LINK_OPTIONS</code></dt>

				<dd>allows specifying additional linker

				    options. Specified options are appended after the options set by the OpenCL

				    program in clLinkProgram.

				</ul>

				    program in <code>clLinkProgram</code>.</dd>

				</dl>

				<h3>Softpipe driver environment variables</h3>

				<ul>

				<li>SOFTPIPE_DUMP_FS - if set, the softpipe driver will print fragment shaders

				    to stderr

				<li>SOFTPIPE_DUMP_GS - if set, the softpipe driver will print geometry shaders

				    to stderr

				<li>SOFTPIPE_NO_RAST - if set, rasterization is no-op'd.  For profiling purposes.

				<li>SOFTPIPE_USE_LLVM - if set, the softpipe driver will try to use LLVM JIT for

				    vertex shading processing.

				</ul>

				<dl>

				<dt><code>SOFTPIPE_DUMP_FS</code></dt>

				<dd>if set, the softpipe driver will print fragment shaders to stderr</dd>

				<dt><code>SOFTPIPE_DUMP_GS</code></dt>

				<dd>if set, the softpipe driver will print geometry shaders to stderr</dd>

				<dt><code>SOFTPIPE_NO_RAST</code></dt>

				<dd>if set, rasterization is no-op'd.  For profiling purposes.</dd>

				<dt><code>SOFTPIPE_USE_LLVM</code></dt>

				<dd>if set, the softpipe driver will try to use LLVM JIT for

				    vertex shading processing.</dd>

				</dl>

				<h3>LLVMpipe driver environment variables</h3>

				<ul>

				<li>LP_NO_RAST - if set LLVMpipe will no-op rasterization

				<li>LP_DEBUG - a comma-separated list of debug options is accepted.  See the

				    source code for details.

				<li>LP_PERF - a comma-separated list of options to selectively no-op various

				    parts of the driver.  See the source code for details.

				<li>LP_NUM_THREADS - an integer indicating how many threads to use for rendering.

				<dl>

				<dt><code>LP_NO_RAST</code></dt>

				<dd>if set LLVMpipe will no-op rasterization</dd>

				<dt><code>LP_DEBUG</code></dt>

				<dd>a comma-separated list of debug options is accepted.  See the

				    source code for details.</dd>

				<dt><code>LP_PERF</code></dt>

				<dd>a comma-separated list of options to selectively no-op various

				    parts of the driver.  See the source code for details.</dd>

				<dt><code>LP_NUM_THREADS</code></dt>

				<dd>an integer indicating how many threads to use for rendering.

				    Zero turns off threading completely.  The default value is the number of CPU

				    cores present.

				</ul>

				    cores present.</dd>

				</dl>

				<h3>VMware SVGA driver environment variables</h3>

				<ul>

				<li>SVGA_FORCE_SWTNL - force use of software vertex transformation

				<li>SVGA_NO_SWTNL - don't allow software vertex transformation fallbacks

				(will often result in incorrect rendering).

				<li>SVGA_DEBUG - for dumping shaders, constant buffers, etc.  See the code

				for details.

				<li>SVGA_EXTRA_LOGGING - if set, enables extra logging to the vmware.log file,

				such as the OpenGL program's name and command line arguments.

				<li>See the driver code for other, lesser-used variables.

				</ul>

				<dl>

				<dt><code>SVGA_FORCE_SWTNL</code></dt>

				<dd>force use of software vertex transformation</dd>

				<dt><code>SVGA_NO_SWTNL</code></dt>

				<dd>don't allow software vertex transformation fallbacks (will often result

				    in incorrect rendering).</dd>

				<dt><code>SVGA_DEBUG</code></dt>

				<dd>for dumping shaders, constant buffers, etc.  See the code for

				    details.</dd>

				<dt><code>SVGA_EXTRA_LOGGING</code></dt>

				<dd>if set, enables extra logging to the <code>vmware.log</code> file,

				    such as the OpenGL program's name and command line arguments.</dd>

				<dt><code>SVGA_NO_LOGGING</code></dt>

				<dd>if set, disables logging to the <code>vmware.log</code> file. This is

				    useful when using Valgrind because it otherwise crashes when

				    initializing the host log feature.</dd>

				</dl>

				<p>See the driver code for other, lesser-used variables.</p>

				<h3>WGL environment variables</h3>

				<ul>

				<li>WGL_SWAP_INTERVAL - to set a swap interval, equivalent to calling

				wglSwapIntervalEXT() in an application.  If this environment variable

				is set, application calls to wglSwapIntervalEXT() will have no effect.

				</ul>

				<dl>

				<dt><code>WGL_SWAP_INTERVAL</code></dt>

				<dd>to set a swap interval, equivalent to calling

				    <code>wglSwapIntervalEXT()</code> in an application.  If this

				    environment variable is set, application calls to

				    <code>wglSwapIntervalEXT()</code> will have no effect.</dd>

				</dl>

				<h3>VA-API state tracker environment variables</h3>

				<ul>

				<li>VAAPI_MPEG4_ENABLED - enable MPEG4 for VA-API, disabled by default.

				</ul>

				<dl>

				<dt><code>VAAPI_MPEG4_ENABLED</code></dt>

				<dd>enable MPEG4 for VA-API, disabled by default.</dd>

				</dl>

				<h3>VC4 driver environment variables</h3>

				<ul>

				<li>VC4_DEBUG - a comma-separated list of named flags, which do various things:

				<ul>

				   <li>cl - dump command list during creation</li>

				   <li>qpu - dump generated QPU instructions</li>

				   <li>qir - dump QPU IR during program compile</li>

				   <li>nir - dump NIR during program compile</li>

				   <li>tgsi - dump TGSI during program compile</li>

				   <li>shaderdb - dump program compile information for shader-db analysis</li>

				   <li>perf - print during performance-related events</li>

				   <li>norast - skip actual hardware execution of commands</li>

				   <li>always_flush - flush after each draw call</li>

				   <li>always_sync - wait for finish after each flush</li>

				   <li>dump - write a GPU command stream trace file (VC4 simulator only)</li>

				</ul>

				</ul>

				<dl>

				<dt><code>VC4_DEBUG</code></dt>

				<dd>a comma-separated list of named flags, which do various things:

				<dl>

				   <dt><code>cl</code></dt>

				   <dd>dump command list during creation</dd>

				   <dt><code>qpu</code></dt>

				   <dd>dump generated QPU instructions</dd>

				   <dt><code>qir</code></dt>

				   <dd>dump QPU IR during program compile</dd>

				   <dt><code>nir</code></dt>

				   <dd>dump NIR during program compile</dd>

				   <dt><code>tgsi</code></dt>

				   <dd>dump TGSI during program compile</dd>

				   <dt><code>shaderdb</code></dt>

				   <dd>dump program compile information for shader-db analysis</dd>

				   <dt><code>perf</code></dt>

				   <dd>print during performance-related events</dd>

				   <dt><code>norast</code></dt>

				   <dd>skip actual hardware execution of commands</dd>

				   <dt><code>always_flush</code></dt>

				   <dd>flush after each draw call</dd>

				   <dt><code>always_sync</code></dt>

				   <dd>wait for finish after each flush</dd>

				   <dt><code>dump</code></dt>

				   <dd>write a GPU command stream trace file (VC4 simulator only)</dd>

				</dl>

				</dd>

				</dl>

				<p>

									
										2

docs/extensions.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

									
										190

docs/faq.html
									
												View File
												
				@@ -2,42 +2,32 @@

				<html lang="en">

				<head>

				  <meta http-equiv="content-type" content="text/html; charset=utf-8">

				  <title>Mesa FAQ</title>

				  <title>Frequently Asked Questions</title>

				  <link rel="stylesheet" type="text/css" href="mesa.css">

				</head>

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				<div class="content">

				<center>

				<h1>Mesa Frequently Asked Questions</h1>

				Last updated: 9 October 2012

				</center>

				<h1>Frequently Asked Questions</h1>

				Last updated: 19 September 2018

				<br>

				<br>

				<h2>Index</h2>

				<a href="#part1">1. High-level Questions and Answers</a>

				<br>

				<a href="#part2">2. Compilation and Installation Problems</a>

				<br>

				<a href="#part3">3. Runtime / Rendering Problems</a>

				<br>

				<a href="#part4">4. Developer Questions</a>

				<br>

				<br>

				<br>

				<ol>

				  <li><a href="#part1">High-level Questions and Answers</a></li>

				  <li><a href="#part2">Compilation and Installation Problems</a></li>

				  <li><a href="#part3">Runtime / Rendering Problems</a></li>

				  <li><a href="#part4">Developer Questions</a></li>

				</ol>

				<h2 id="part1">1. High-level Questions and Answers</h2>

				<h1 id="part1">1. High-level Questions and Answers</h1>

				<h2>1.1 What is Mesa?</h2>

				<h3>1.1 What is Mesa?</h3>

				<p>

				Mesa is an open-source implementation of the OpenGL specification.

				OpenGL is a programming library for writing interactive 3D applications.

				@@ -106,17 +96,17 @@ the Xlib API:

				<li>The GLX wire protocol is not supported and there's no OpenGL extension

				    loaded by the X server.

				<li>There is no hardware acceleration.

				<li>The OpenGL library, libGL.so, contains everything (the programming API,

				    the GLX functions and all the rendering code).

				<li>The OpenGL library, <code>libGL.so</code>, contains everything (the

				    programming API, the GLX functions and all the rendering code).

				</ul>

				<p>

				Alternately, Mesa acts as the core for a number of OpenGL hardware drivers

				within the DRI (Direct Rendering Infrastructure):

				<ul>

				<li>The libGL.so library provides the GL and GLX API functions, a GLX

				    protocol encoder, and a device driver loader.

				<li>The device driver modules (such as r200_dri.so) contain a built-in

				    copy of the core Mesa code.

				<li>The <code>libGL.so</code> library provides the GL and GLX API functions,

				    a GLX protocol encoder, and a device driver loader.

				<li>The device driver modules (such as <code>r200_dri.so</code>) contain

				    a built-in copy of the core Mesa code.

				<li>The X server loads the GLX module.

				    The GLX module decodes incoming GLX protocol and dispatches the commands

				    to a rendering module.

				@@ -136,7 +126,7 @@ Just follow the Mesa <a href="install.html">compilation instructions</a>.

				<h2>1.6 Are there other open-source implementations of OpenGL?</h2>

				<p>

				Yes, SGI's <a href="http://oss.sgi.com/projects/ogl-sample/index.html">

				Yes, SGI's <a href="http://web.archive.org/web/20171010115110_/http://oss.sgi.com/projects/ogl-sample/index.html">

				OpenGL Sample Implementation (SI)</a> is available.

				The SI was written during the time that OpenGL was originally designed.

				Unfortunately, development of the SI has stagnated.

				@@ -148,8 +138,9 @@ Mesa is much more up to date with modern features and extensions.

				an open-source implementation of OpenGL ES for mobile devices.

				<p>

				<a href="http://www.dsbox.com/minigl.html">miniGL</a>

				is a subset of OpenGL for PalmOS devices.

				<a href="http://web.archive.org/web/20130830162848/http://www.dsbox.com/minigl.html">miniGL</a>

				is a subset of OpenGL for PalmOS devices. The website is gone, but the source

				code can still be found on <a href="https://sourceforge.net/projects/minigl/">sourceforge.net</a>.

				<p>

				<a href="http://bellard.org/TinyGL/">TinyGL</a>

				@@ -179,22 +170,16 @@ popular and feature-complete.

				</p>

				<h2 id="part2">2. Compilation and Installation Problems</h2>

				<br>

				<br>

				<h1 id="part2">2. Compilation and Installation Problems</h1>

				<h2>2.1 What's the easiest way to install Mesa?</h2>

				<h3>2.1 What's the easiest way to install Mesa?</h3>

				<p>

				If you're using a Linux-based system, your distro CD most likely already

				has Mesa packages (like RPM or DEB) which you can easily install.

				</p>

				<h2>2.2 I get undefined symbols such as bgnpolygon, v3f, etc...</h2>

				<h3>2.2 I get undefined symbols such as bgnpolygon, v3f, etc...</h3>

				<p>

				You're application is written in IRIS GL, not OpenGL.

				IRIS GL was the predecessor to OpenGL and is a different thing (almost)

				@@ -203,63 +188,72 @@ Mesa's not the solution.

				</p>

				<h2>2.3 Where is the GLUT library?</h2>

				<h3>2.3 Where is the GLUT library?</h3>

				<p>

				GLUT (OpenGL Utility Toolkit) is no longer in the separate MesaGLUT-x.y.z.tar.gz file.

				GLUT (OpenGL Utility Toolkit) is no longer in the separate

				<code>MesaGLUT-x.y.z.tar.gz</code> file.

				If you don't already have GLUT installed, you should grab 

				<a href="http://freeglut.sourceforge.net/">freeglut</a>.

				</p>

				<h2>2.4 Where is the GLw library?</h2>

				<h3>2.4 Where is the GLw library?</h3>

				<p>

				GLw (OpenGL widget library) is now available from a separate <a href="https://cgit.freedesktop.org/mesa/glw/">git repository</a>.  Unless you're using very old Xt/Motif applications with OpenGL, you shouldn't need it.

				GLw (OpenGL widget library) is now available from a separate <a href="https://gitlab.freedesktop.org/mesa/glw">git repository</a>.  Unless you're using very old Xt/Motif applications with OpenGL, you shouldn't need it.

				</p>

				<h2>2.5 What's the proper place for the libraries and headers?</h2>

				<p>

				On Linux-based systems you'll want to follow the

				<a href="http://oss.sgi.com/projects/ogl-sample/ABI/index.html">Linux ABI</a> standard.

				<a href="https://www.khronos.org/registry/OpenGL/ABI/">Linux ABI</a> standard.

				Basically you'll want the following:

				</p>

				<ul>

				<li>/usr/include/GL/gl.h - the main OpenGL header

				</li><li>/usr/include/GL/glu.h - the OpenGL GLU (utility) header

				</li><li>/usr/include/GL/glx.h - the OpenGL GLX header

				</li><li>/usr/include/GL/glext.h - the OpenGL extensions header

				</li><li>/usr/include/GL/glxext.h - the OpenGL GLX extensions header

				</li><li>/usr/include/GL/osmesa.h - the Mesa off-screen rendering header

				</li><li>/usr/lib/libGL.so - a symlink to libGL.so.1

				</li><li>/usr/lib/libGL.so.1 - a symlink to libGL.so.1.xyz

				</li><li>/usr/lib/libGL.so.xyz - the actual OpenGL/Mesa library.  xyz denotes the

				<dl>

				<dt><code>/usr/include/GL/gl.h</code></dt>

				<dd>the main OpenGL header</dd>

				<dt><code>/usr/include/GL/glu.h</code></dt>

				<dd>the OpenGL GLU (utility) header</dd>

				<dt><code>/usr/include/GL/glx.h</code></dt>

				<dd>the OpenGL GLX header</dd>

				<dt><code>/usr/include/GL/glext.h</code></dt>

				<dd>the OpenGL extensions header</dd>

				<dt><code>/usr/include/GL/glxext.h</code></dt>

				<dd>the OpenGL GLX extensions header</dd>

				<dt><code>/usr/include/GL/osmesa.h</code></dt>

				<dd>the Mesa off-screen rendering header</dd>

				<dt><code>/usr/lib/libGL.so</code></dt>

				<dd>a symlink to <code>libGL.so.1</code></dd>

				<dt><code>/usr/lib/libGL.so.1</code></dt>

				<dd>a symlink to <code>libGL.so.1.xyz</code></dd>

				<dt><code>/usr/lib/libGL.so.xyz</code></dt>

				<dd>the actual OpenGL/Mesa library.  xyz denotes the

				Mesa version number.

				</li></ul>

				</dd>

				</dl>

				<p>

				When configuring Mesa, there are three autoconf options that affect the install

				When configuring Mesa, there are three meson options that affect the install

				location that you should take care with: <code>--prefix</code>,

				<code>--libdir</code>, and <code>--with-dri-driverdir</code>. To install Mesa

				<code>--libdir</code>, and <code>-D dri-drivers-path</code>. To install Mesa

				into the system location where it will be available for all programs to use, set

				<code>--prefix=/usr</code>. Set <code>--libdir</code> to where your Linux

				distribution installs system libraries, usually either <code>/usr/lib</code> or

				<code>/usr/lib64</code>. Set <code>--with-dri-driverdir</code> to the directory

				<code>/usr/lib64</code>. Set <code>-D dri-drivers-path</code> to the directory

				where your Linux distribution installs DRI drivers. To find your system's DRI

				driver directory, try executing <code>find /usr -type d -name dri</code>. For

				example, if the <code>find</code> command listed <code>/usr/lib64/dri</code>,

				then set <code>--with-dri-driverdir=/usr/lib64/dri</code>.

				then set <code>-D dri-drivers-path=/usr/lib64/dri</code>.

				</p>

				<p>

				After determining the correct values for the install location, configure Mesa

				with <code>./configure --prefix=/usr --libdir=xxx --with-dri-driverdir=xxx</code>

				and then install with <code>sudo make install</code>.

				with <code>meson configure --prefix=/usr --libdir=xxx -D dri-drivers-path=xxx</code>

				and then install with <code>sudo ninja install</code>.

				</p>

				<br>

				<br>

				<h1 id="part3">3. Runtime / Rendering Problems</h1>

				<h2 id="part3">3. Runtime / Rendering Problems</h2>

				<h2>3.1 Rendering is slow / why isn't my graphics hardware being used?</h2>

				<h3>3.1 Rendering is slow / why isn't my graphics hardware being used?</h3>

				<p>

				If Mesa can't use its hardware accelerated drivers it falls back on one of its software renderers.

				(eg. classic swrast, softpipe or llvmpipe)

				@@ -280,60 +274,57 @@ If your DRI-based driver isn't working, go to the

				</p>

				<h2>3.2 I'm seeing errors in depth (Z) buffering.  Why?</h2>

				<h3>3.2 I'm seeing errors in depth (Z) buffering.  Why?</h3>

				<p>

				Make sure the ratio of the far to near clipping planes isn't too great.

				Look

				<a href="https://www.opengl.org/resources/faq/technical/depthbuffer.htm#0040">here</a>

				<a href="https://www.opengl.org/archives/resources/faq/technical/depthbuffer.htm#0040">here</a>

				for details.

				</p>

				<p>

				Mesa uses a 16-bit depth buffer by default which is smaller and faster

				to clear than a 32-bit buffer but not as accurate.

				If you need a deeper you can modify the parameters to

				<code> glXChooseVisual</code> in your code.

				<code>glXChooseVisual</code> in your code.

				</p>

				<h2>3.3 Why Isn't depth buffering working at all?</h2>

				<h3>3.3 Why Isn't depth buffering working at all?</h3>

				<p>

				Be sure you're requesting a depth buffered-visual.  If you set the MESA_DEBUG

				environment variable it will warn you about trying to enable depth testing

				when you don't have a depth buffer.

				Be sure you're requesting a depth buffered-visual.  If you set the

				<code>MESA_DEBUG</code> environment variable it will warn you about trying

				to enable depth testing when you don't have a depth buffer.

				</p>

				<p>Specifically, make sure <code>glutInitDisplayMode</code> is being called

				with <code>GLUT_DEPTH</code> or <code>glXChooseVisual</code> is being

				called with a non-zero value for GLX_DEPTH_SIZE.

				called with a non-zero value for <code>GLX_DEPTH_SIZE</code>.

				</p>

				<p>This discussion applies to stencil buffers, accumulation buffers and

				alpha channels too.

				</p>

				<h2>3.4 Why does glGetString() always return NULL?</h2>

				<h3>3.4 Why does <code>glGetString()</code> always return <code>NULL</code>?</h3>

				<p>

				Be sure you have an active/current OpenGL rendering context before

				calling glGetString.

				calling <code>glGetString</code>.

				</p>

				<h2>3.5 GL_POINTS and GL_LINES don't touch the right pixels</h2>

				<h3>3.5 <code>GL_POINTS</code> and <code>GL_LINES</code> don't touch the

				right pixels</h3>

				<p>

				If you're trying to draw a filled region by using GL_POINTS or GL_LINES

				and seeing holes or gaps it's because of a float-to-int rounding problem.

				But this is not a bug.

				See Appendix H of the OpenGL Programming Guide - "OpenGL Correctness Tips".

				Basically, applying a translation of (0.375, 0.375, 0.0) to your coordinates

				will fix the problem.

				If you're trying to draw a filled region by using <code>GL_POINTS</code> or

				<code>GL_LINES</code> and seeing holes or gaps it's because of a float-to-int

				rounding problem. But this is not a bug. See Appendix H of the OpenGL

				Programming Guide - "OpenGL Correctness Tips". Basically, applying a

				translation of (0.375, 0.375, 0.0) to your coordinates will fix the problem.

				</p>

				<br>

				<br>

				<h2 id="part4">4. Developer Questions</h2>

				<h1 id="part4">4. Developer Questions</h1>

				<h2>4.1 How can I contribute?</h2>

				<h3>4.1 How can I contribute?</h3>

				<p>

				First, join the <a href="lists.html">mesa-dev mailing list</a>.

				That's where Mesa development is discussed.

				@@ -347,7 +338,7 @@ You should read it.

				extensions, writing hardware drivers (for the DRI), and code optimization.

				</p>

				<h2>4.2 How do I write a new device driver?</h2>

				<h3>4.2 How do I write a new device driver?</h3>

				<p>

				Unfortunately, writing a device driver isn't easy.

				It requires detailed understanding of OpenGL, the Mesa code, and your

				@@ -371,20 +362,19 @@ the archives) is a good way to get information.

				</p>

				<h2>4.3 Why isn't GL_EXT_texture_compression_s3tc implemented in Mesa?</h2>

				<h3>4.3 Why isn't <code>GL_EXT_texture_compression_s3tc</code> implemented in

				Mesa?</h3>

				<p>

				The <a href="http://oss.sgi.com/projects/ogl-sample/registry/EXT/texture_compression_s3tc.txt">specification for the extension</a>

				indicates that there are intellectual property (IP) and/or patent issues

				to be dealt with.

				</p>

				<p>We've been unsuccessful in getting a response from S3 (or whoever owns

				the IP nowadays) to indicate whether or not an open source project can

				implement the extension (specifically the compression/decompression

				algorithms).

				Oh but it is! Prior to 2nd October 2017, the Mesa project did not include s3tc

				support due to intellectual property (IP) and/or patent issues around the s3tc

				algorithm.

				</p>

				<p>

				In the mean time, a 3rd party <a href="https://dri.freedesktop.org/wiki/S3TC">

				plug-in library</a> is available.

				As of Mesa 17.3.0, Mesa now officially supports s3tc, as the patent has expired.

				</p>

				<p>

				In versions prior to this, a 3rd party <a href="https://dri.freedesktop.org/wiki/S3TC">

				plug-in library</a> was required.

				</p>

				</div>

182

docs/features.txt

View File

@@ -63,7 +63,7 @@ GL 3.0, GLSL 1.30 --- all DONE: freedreno, i965, nv50, nvc0, r600, radeonsi, llv
   glVertexAttribI commands                              DONE
   Depth format cube textures                            DONE ()
   GLX_ARB_create_context (GLX 1.4 is required)          DONE
   Multisample anti-aliasing                             DONE (freedreno/a5xx, freedreno (*), llvmpipe (*), softpipe (*), swr (*))
   Multisample anti-aliasing                             DONE (freedreno/a5xx+, freedreno (*), llvmpipe (*), softpipe (*), swr (*))
 (*) freedreno (a2xx-a4xx), llvmpipe, softpipe, and swr have fake Multisample anti-aliasing support
@@ -90,7 +90,7 @@ GL 3.2, GLSL 1.50 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, soft
   GL_ARB_fragment_coord_conventions (Frag shader coord) DONE (freedreno)
   GL_ARB_provoking_vertex (Provoking vertex)            DONE (freedreno)
   GL_ARB_seamless_cube_map (Seamless cubemaps)          DONE (freedreno)
   GL_ARB_texture_multisample (Multisample textures)     DONE (freedreno/a5xx)
   GL_ARB_texture_multisample (Multisample textures)     DONE (freedreno/a5xx+)
   GL_ARB_depth_clamp (Frag depth clamp)                 DONE (freedreno)
   GL_ARB_sync (Fence objects)                           DONE (freedreno)
   GLX_ARB_create_context_profile                        DONE
@@ -115,26 +115,26 @@ GL 4.0, GLSL 4.00 --- all DONE: i965/gen7+, nvc0, r600, radeonsi, virgl
   GL_ARB_draw_buffers_blend                             DONE (freedreno, i965/gen6+, nv50, llvmpipe, softpipe, swr)
   GL_ARB_draw_indirect                                  DONE (freedreno, i965/gen7+, llvmpipe, softpipe, swr)
   GL_ARB_gpu_shader5                                    DONE (i965/gen7+)
   - 'precise' qualifier                                 DONE
   - 'precise' qualifier                                 DONE (softpipe)
   - Dynamically uniform sampler array indices           DONE (softpipe)
   - Dynamically uniform UBO array indices               DONE (freedreno)
   - Implicit signed -> unsigned conversions             DONE
   - Fused multiply-add                                  DONE ()
   - Dynamically uniform UBO array indices               DONE (freedreno, softpipe)
   - Implicit signed -> unsigned conversions             DONE (softpipe)
   - Fused multiply-add                                  DONE (softpipe)
   - Packing/bitfield/conversion functions               DONE (freedreno, softpipe)
   - Enhanced textureGather                              DONE (freedreno, softpipe)
   - Geometry shader instancing                          DONE (llvmpipe, softpipe)
   - Geometry shader multiple streams                    DONE ()
   - Geometry shader multiple streams                    DONE (softpipe)
   - Enhanced per-sample shading                         DONE ()
   - Interpolation functions                             DONE ()
   - New overload resolution rules                       DONE
   GL_ARB_gpu_shader_fp64                                DONE (i965/gen7+, llvmpipe, softpipe)
   GL_ARB_sample_shading                                 DONE (i965/gen6+, nv50)
   - Interpolation functions                             DONE (softpipe)
   - New overload resolution rules                       DONE (softpipe)
   GL_ARB_gpu_shader_fp64                                DONE (i965/gen7+, llvmpipe, softpipe, swr)
   GL_ARB_sample_shading                                 DONE (freedreno/a6xx, i965/gen6+, nv50)
   GL_ARB_shader_subroutine                              DONE (freedreno, i965/gen6+, nv50, llvmpipe, softpipe, swr)
   GL_ARB_tessellation_shader                            DONE (i965/gen7+)
   GL_ARB_texture_buffer_object_rgb32                    DONE (freedreno, i965/gen6+, llvmpipe, softpipe, swr)
   GL_ARB_texture_cube_map_array                         DONE (i965/gen6+, nv50, llvmpipe, softpipe)
   GL_ARB_texture_cube_map_array                         DONE (i965/gen6+, nv50, llvmpipe, softpipe, swr)
   GL_ARB_texture_gather                                 DONE (freedreno, i965/gen6+, nv50, llvmpipe, softpipe, swr)
   GL_ARB_texture_query_lod                              DONE (freedreno, i965, nv50, llvmpipe, softpipe)
   GL_ARB_texture_query_lod                              DONE (freedreno, i965, nv50, llvmpipe, softpipe, swr)
   GL_ARB_transform_feedback2                            DONE (i965/gen6+, nv50, llvmpipe, softpipe, swr)
   GL_ARB_transform_feedback3                            DONE (i965/gen7+, llvmpipe, softpipe, swr)
@@ -145,19 +145,19 @@ GL 4.1, GLSL 4.10 --- all DONE: i965/gen7+, nvc0, r600, radeonsi, virgl
   GL_ARB_get_program_binary                             DONE (0 or 1 binary formats)
   GL_ARB_separate_shader_objects                        DONE (all drivers)
   GL_ARB_shader_precision                               DONE (i965/gen7+, all drivers that support GLSL 4.10)
   GL_ARB_vertex_attrib_64bit                            DONE (i965/gen7+, llvmpipe, softpipe)
   GL_ARB_viewport_array                                 DONE (i965, nv50, llvmpipe, softpipe)
   GL_ARB_vertex_attrib_64bit                            DONE (i965/gen7+, llvmpipe, softpipe, swr)
   GL_ARB_viewport_array                                 DONE (i965, nv50, llvmpipe, softpipe, swr)
 GL 4.2, GLSL 4.20 -- all DONE: i965/gen7+, nvc0, r600, radeonsi, virgl
   GL_ARB_texture_compression_bptc                       DONE (freedreno, i965)
   GL_ARB_compressed_texture_pixel_storage               DONE (all drivers)
   GL_ARB_shader_atomic_counters                         DONE (freedreno/a5xx, i965, softpipe)
   GL_ARB_shader_atomic_counters                         DONE (freedreno/a5xx+, i965, llvmpipe, softpipe)
   GL_ARB_texture_storage                                DONE (all drivers)
   GL_ARB_transform_feedback_instanced                   DONE (freedreno, i965, nv50, llvmpipe, softpipe, swr)
   GL_ARB_base_instance                                  DONE (freedreno, i965, nv50, llvmpipe, softpipe, swr)
   GL_ARB_shader_image_load_store                        DONE (freedreno/a5xx, i965, softpipe)
   GL_ARB_shader_image_load_store                        DONE (freedreno/a5xx+, i965, softpipe)
   GL_ARB_conservative_depth                             DONE (all drivers that support GLSL 1.30)
   GL_ARB_shading_language_420pack                       DONE (all drivers that support GLSL 1.30)
   GL_ARB_shading_language_packing                       DONE (all drivers)
@@ -170,8 +170,8 @@ GL 4.3, GLSL 4.30 -- all DONE: i965/gen8+, nvc0, r600, radeonsi, virgl
   GL_ARB_arrays_of_arrays                               DONE (all drivers that support GLSL 1.30)
   GL_ARB_ES3_compatibility                              DONE (all drivers that support GLSL 3.30)
   GL_ARB_clear_buffer_object                            DONE (all drivers)
   GL_ARB_compute_shader                                 DONE (freedreno/a5xx, i965, softpipe)
   GL_ARB_copy_image                                     DONE (i965, nv50, softpipe, llvmpipe)
   GL_ARB_compute_shader                                 DONE (freedreno/a5xx+, i965, softpipe)
   GL_ARB_copy_image                                     DONE (i965, nv50, softpipe, llvmpipe, swr)
   GL_KHR_debug                                          DONE (all drivers)
   GL_ARB_explicit_uniform_location                      DONE (all drivers that support GLSL)
   GL_ARB_fragment_layer_viewport                        DONE (i965, nv50, llvmpipe, softpipe)
@@ -181,10 +181,10 @@ GL 4.3, GLSL 4.30 -- all DONE: i965/gen8+, nvc0, r600, radeonsi, virgl
   GL_ARB_multi_draw_indirect                            DONE (freedreno, i965, llvmpipe, softpipe, swr)
   GL_ARB_program_interface_query                        DONE (all drivers)
   GL_ARB_robust_buffer_access_behavior                  DONE (i965)
   GL_ARB_shader_image_size                              DONE (freedreno/a5xx, i965, softpipe)
   GL_ARB_shader_storage_buffer_object                   DONE (freedreno/a5xx, i965, softpipe)
   GL_ARB_shader_image_size                              DONE (freedreno/a5xx+, i965, softpipe)
   GL_ARB_shader_storage_buffer_object                   DONE (freedreno/a5xx+, i965, llvmpipe, softpipe)
   GL_ARB_stencil_texturing                              DONE (freedreno, i965/hsw+, nv50, llvmpipe, softpipe, swr)
   GL_ARB_texture_buffer_range                           DONE (freedreno, nv50, i965, llvmpipe)
   GL_ARB_texture_buffer_range                           DONE (freedreno, nv50, i965, softpipe, llvmpipe, swr)
   GL_ARB_texture_query_levels                           DONE (all drivers that support GLSL 1.30)
   GL_ARB_texture_storage_multisample                    DONE (all drivers that support GL_ARB_texture_multisample)
   GL_ARB_texture_view                                   DONE (freedreno, i965, nv50, llvmpipe, softpipe, swr)
@@ -196,7 +196,7 @@ GL 4.4, GLSL 4.40 -- all DONE: i965/gen8+, nvc0, r600, radeonsi
   GL_MAX_VERTEX_ATTRIB_STRIDE                           DONE (all drivers)
   GL_ARB_buffer_storage                                 DONE (freedreno, i965, nv50, llvmpipe, swr)
   GL_ARB_clear_texture                                  DONE (i965, nv50, llvmpipe, softpipe, swr)
   GL_ARB_enhanced_layouts                               DONE (i965, nv50, llvmpipe, softpipe)
   GL_ARB_enhanced_layouts                               DONE (i965, nv50, llvmpipe, softpipe, virgl)
   - compile-time constant expressions                   DONE
   - explicit byte offsets for blocks                    DONE
   - forced alignment within blocks                      DONE
@@ -204,33 +204,33 @@ GL 4.4, GLSL 4.40 -- all DONE: i965/gen8+, nvc0, r600, radeonsi
   - specified transform/feedback layout                 DONE
   - input/output block locations                        DONE
   GL_ARB_multi_bind                                     DONE (all drivers)
   GL_ARB_query_buffer_object                            DONE (i965/hsw+)
   GL_ARB_query_buffer_object                            DONE (i965/hsw+, virgl)
   GL_ARB_texture_mirror_clamp_to_edge                   DONE (i965, nv50, llvmpipe, softpipe, swr, virgl)
   GL_ARB_texture_stencil8                               DONE (freedreno, i965/hsw+, nv50, llvmpipe, softpipe, swr, virgl)
   GL_ARB_vertex_type_10f_11f_11f_rev                    DONE (i965, nv50, llvmpipe, softpipe, swr, virgl)
 GL 4.5, GLSL 4.50 -- all DONE: nvc0, radeonsi
 GL 4.5, GLSL 4.50 -- all DONE: nvc0, radeonsi, r600
   GL_ARB_ES3_1_compatibility                            DONE (i965/hsw+, r600, virgl)
   GL_ARB_clip_control                                   DONE (freedreno, i965, nv50, r600, llvmpipe, softpipe, swr)
   GL_ARB_conditional_render_inverted                    DONE (freedreno, i965, nv50, r600, llvmpipe, softpipe, swr, virgl)
   GL_ARB_cull_distance                                  DONE (i965, nv50, r600, llvmpipe, softpipe, swr, virgl)
   GL_ARB_derivative_control                             DONE (i965, nv50, r600, virgl)
   GL_ARB_ES3_1_compatibility                            DONE (i965/hsw+, softpipe, virgl)
   GL_ARB_clip_control                                   DONE (freedreno, i965, nv50, llvmpipe, softpipe, swr)
   GL_ARB_conditional_render_inverted                    DONE (freedreno, i965, nv50, llvmpipe, softpipe, swr, virgl)
   GL_ARB_cull_distance                                  DONE (i965, nv50, llvmpipe, softpipe, swr, virgl)
   GL_ARB_derivative_control                             DONE (i965, nv50, softpipe, virgl)
   GL_ARB_direct_state_access                            DONE (all drivers)
   GL_ARB_get_texture_sub_image                          DONE (all drivers)
   GL_ARB_shader_texture_image_samples                   DONE (i965, nv50, r600, virgl)
   GL_ARB_texture_barrier                                DONE (freedreno, i965, nv50, r600)
   GL_ARB_shader_texture_image_samples                   DONE (i965, nv50, virgl)
   GL_ARB_texture_barrier                                DONE (freedreno, i965, nv50, virgl)
   GL_KHR_context_flush_control                          DONE (all - but needs GLX/EGL extension to be useful)
   GL_KHR_robustness                                     DONE (i965)
   GL_KHR_robustness                                     DONE (freedreno, i965)
   GL_EXT_shader_integer_mix                             DONE (all drivers that support GLSL)
 GL 4.6, GLSL 4.60
   GL_ARB_gl_spirv                                       in progress (Nicolai Hähnle, Ian Romanick)
   GL_ARB_indirect_parameters                            DONE (i965/gen7+, nvc0, radeonsi)
   GL_ARB_indirect_parameters                            DONE (i965/gen7+, nvc0, radeonsi, virgl)
   GL_ARB_pipeline_statistics_query                      DONE (i965, nvc0, r600, radeonsi, llvmpipe, softpipe, swr)
   GL_ARB_polygon_offset_clamp                           DONE (freedreno, i965, nv50, nvc0, r600, radeonsi, llvmpipe, swr, virgl)
   GL_ARB_shader_atomic_counter_ops                      DONE (freedreno/a5xx, i965/gen7+, nvc0, r600, radeonsi, softpipe, virgl)
   GL_ARB_shader_atomic_counter_ops                      DONE (freedreno/a5xx+, i965/gen7+, nvc0, r600, radeonsi, llvmpipe, softpipe, virgl)
   GL_ARB_shader_draw_parameters                         DONE (i965, nvc0, radeonsi)
   GL_ARB_shader_group_vote                              DONE (i965, nvc0, radeonsi)
   GL_ARB_spirv_extensions                               in progress (Nicolai Hähnle, Ian Romanick)
@@ -244,23 +244,23 @@ These are the extensions cherry-picked to make GLES 3.1
 GLES3.1, GLSL ES 3.1 -- all DONE: i965/hsw+, nvc0, r600, radeonsi, virgl
   GL_ARB_arrays_of_arrays                               DONE (all drivers that support GLSL 1.30)
   GL_ARB_compute_shader                                 DONE (freedreno/a5xx, i965/gen7+, softpipe)
   GL_ARB_compute_shader                                 DONE (freedreno/a5xx+, i965/gen7+, softpipe)
   GL_ARB_draw_indirect                                  DONE (freedreno, i965/gen7+, llvmpipe, softpipe, swr)
   GL_ARB_explicit_uniform_location                      DONE (all drivers that support GLSL)
   GL_ARB_framebuffer_no_attachments                     DONE (freedreno, i965/gen7+, softpipe)
   GL_ARB_program_interface_query                        DONE (all drivers)
   GL_ARB_shader_atomic_counters                         DONE (freedreno/a5xx, i965/gen7+, softpipe)
   GL_ARB_shader_image_load_store                        DONE (freedreno/a5xx, i965/gen7+, softpipe)
   GL_ARB_shader_image_size                              DONE (freedreno/a5xx, i965/gen7+, softpipe)
   GL_ARB_shader_storage_buffer_object                   DONE (freedreno/a5xx, i965/gen7+, softpipe)
   GL_ARB_shader_atomic_counters                         DONE (freedreno/a5xx+, i965/gen7+, llvmpipe, softpipe)
   GL_ARB_shader_image_load_store                        DONE (freedreno/a5xx+, i965/gen7+, softpipe)
   GL_ARB_shader_image_size                              DONE (freedreno/a5xx+, i965/gen7+, softpipe)
   GL_ARB_shader_storage_buffer_object                   DONE (freedreno/a5xx+, i965/gen7+, llvmpipe, softpipe)
   GL_ARB_shading_language_packing                       DONE (all drivers)
   GL_ARB_separate_shader_objects                        DONE (all drivers)
   GL_ARB_stencil_texturing                              DONE (freedreno, nv50, llvmpipe, softpipe, swr)
   GL_ARB_texture_multisample (Multisample textures)     DONE (freedreno/a5xx, i965/gen7+, nv50, llvmpipe, softpipe)
   GL_ARB_texture_multisample (Multisample textures)     DONE (freedreno/a5xx+, i965/gen7+, nv50, llvmpipe, softpipe)
   GL_ARB_texture_storage_multisample                    DONE (all drivers that support GL_ARB_texture_multisample)
   GL_ARB_vertex_attrib_binding                          DONE (all drivers)
   GS5 Enhanced textureGather                            DONE (freedreno, i965/gen7+)
   GS5 Packing/bitfield/conversion functions             DONE (freedreno/a5xx, i965/gen6+)
   GS5 Packing/bitfield/conversion functions             DONE (freedreno/a5xx+, i965/gen6+)
   GL_EXT_shader_integer_mix                             DONE (all drivers that support GLSL)
   Additional functionality not covered above:
@@ -272,25 +272,25 @@ GLES3.1, GLSL ES 3.1 -- all DONE: i965/hsw+, nvc0, r600, radeonsi, virgl
 GLES3.2, GLSL ES 3.2 -- all DONE: i965/gen9+, radeonsi, virgl
   GL_EXT_color_buffer_float                             DONE (all drivers)
   GL_KHR_blend_equation_advanced                        DONE (i965, nvc0)
   GL_KHR_blend_equation_advanced                        DONE (freedreno/a6xx, i965, nvc0)
   GL_KHR_debug                                          DONE (all drivers)
   GL_KHR_robustness                                     DONE (i965, nvc0)
   GL_KHR_robustness                                     DONE (freedreno, i965, nvc0)
   GL_KHR_texture_compression_astc_ldr                   DONE (freedreno, i965/gen9+)
   GL_OES_copy_image                                     DONE (all drivers)
   GL_OES_draw_buffers_indexed                           DONE (all drivers that support GL_ARB_draw_buffers_blend)
   GL_OES_draw_elements_base_vertex                      DONE (all drivers)
   GL_OES_geometry_shader                                DONE (i965/hsw+, nvc0)
   GL_OES_gpu_shader5                                    DONE (all drivers that support GL_ARB_gpu_shader5)
   GL_OES_primitive_bounding_box                         DONE (i965/gen7+, nvc0)
   GL_OES_sample_shading                                 DONE (i965, nvc0, r600)
   GL_OES_sample_variables                               DONE (i965, nvc0, r600)
   GL_OES_geometry_shader                                DONE (i965/hsw+, nvc0, softpipe)
   GL_OES_gpu_shader5                                    DONE (freedreno/a6xx, all drivers that support GL_ARB_gpu_shader5)
   GL_OES_primitive_bounding_box                         DONE (freedreno/a5xx+, i965/gen7+, nvc0, softpipe)
   GL_OES_sample_shading                                 DONE (freedreno/a6xx, i965, nvc0, r600)
   GL_OES_sample_variables                               DONE (freedreno/a6xx, i965, nvc0, r600)
   GL_OES_shader_image_atomic                            DONE (all drivers that support GL_ARB_shader_image_load_store)
   GL_OES_shader_io_blocks                               DONE (All drivers that support GLES 3.1)
   GL_OES_shader_multisample_interpolation               DONE (i965, nvc0, r600)
   GL_OES_shader_multisample_interpolation               DONE (freedreno/a6xx, i965, nvc0, r600)
   GL_OES_tessellation_shader                            DONE (all drivers that support GL_ARB_tessellation_shader)
   GL_OES_texture_border_clamp                           DONE (all drivers)
   GL_OES_texture_buffer                                 DONE (freedreno, i965, nvc0)
   GL_OES_texture_cube_map_array                         DONE (i965/hsw+, nvc0)
   GL_OES_texture_buffer                                 DONE (freedreno, i965, nvc0, softpipe)
   GL_OES_texture_cube_map_array                         DONE (i965/hsw+, nvc0, softpipe)
   GL_OES_texture_stencil8                               DONE (all drivers that support GL_ARB_texture_stencil8)
   GL_OES_texture_storage_multisample_2d_array           DONE (all drivers that support GL_ARB_texture_multisample)
@@ -302,13 +302,13 @@ Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL ES ve
   GL_ARB_ES3_2_compatibility                            DONE (i965/gen8+, radeonsi, virgl)
   GL_ARB_fragment_shader_interlock                      DONE (i965)
   GL_ARB_gpu_shader_int64                               DONE (i965/gen8+, nvc0, radeonsi, softpipe, llvmpipe)
   GL_ARB_parallel_shader_compile                        not started, but Chia-I Wu did some related work in 2014
   GL_ARB_post_depth_coverage                            DONE (i965, nvc0)
   GL_ARB_parallel_shader_compile                        DONE (all drivers)
   GL_ARB_post_depth_coverage                            DONE (i965, nvc0, radeonsi)
   GL_ARB_robustness_isolation                           not started
   GL_ARB_sample_locations                               DONE (nvc0)
   GL_ARB_seamless_cubemap_per_texture                   DONE (freedreno, i965, nvc0, radeonsi, r600, softpipe, swr, virgl)
   GL_ARB_seamless_cubemap_per_texture                   DONE (etnaviv/SEAMLESS_CUBE_MAP, freedreno, i965, nvc0, radeonsi, r600, softpipe, swr, virgl)
   GL_ARB_shader_ballot                                  DONE (i965/gen8+, nvc0, radeonsi)
   GL_ARB_shader_clock                                   DONE (i965/gen7+, nv50, nvc0, r600, radeonsi)
   GL_ARB_shader_clock                                   DONE (i965/gen7+, nv50, nvc0, r600, radeonsi, virgl)
   GL_ARB_shader_stencil_export                          DONE (i965/gen9+, r600, radeonsi, softpipe, llvmpipe, swr, virgl)
   GL_ARB_shader_viewport_layer_array                    DONE (i965/gen6+, nvc0, radeonsi)
   GL_ARB_sparse_buffer                                  DONE (radeonsi/CIK+)
@@ -319,13 +319,16 @@ Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL ES ve
   GL_EXT_memory_object                                  DONE (radeonsi)
   GL_EXT_memory_object_fd                               DONE (radeonsi)
   GL_EXT_memory_object_win32                            not started
   GL_EXT_render_snorm                                   DONE (i965, radeonsi)
   GL_EXT_semaphore                                      DONE (radeonsi)
   GL_EXT_semaphore_fd                                   DONE (radeonsi)
   GL_EXT_semaphore_win32                                not started
   GL_EXT_texture_norm16                                 DONE (i965, r600, radeonsi, nvc0)
   GL_EXT_sRGB_write_control                             DONE (all drivers that support GLES 3.0+)
   GL_EXT_texture_norm16                                 DONE (freedreno, i965, r600, radeonsi, nvc0)
   GL_EXT_texture_sRGB_R8                                DONE (all drivers that support GLES 3.0+)
   GL_KHR_blend_equation_advanced_coherent               DONE (i965/gen9+)
   GL_KHR_texture_compression_astc_hdr                   DONE (i965/bxt)
   GL_KHR_texture_compression_astc_sliced_3d             DONE (i965/gen9+)
   GL_KHR_texture_compression_astc_sliced_3d             DONE (i965/gen9+, radeonsi)
   GL_OES_depth_texture_cube_map                         DONE (all drivers that support GLSL 1.30+)
   GL_OES_EGL_image                                      DONE (all drivers)
   GL_OES_EGL_image_external                             DONE (all drivers)
@@ -337,12 +340,69 @@ Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL ES ve
   GL_OES_texture_float_linear                           DONE (freedreno, i965, r300, r600, radeonsi, nv30, nv50, nvc0, softpipe, llvmpipe)
   GL_OES_texture_half_float                             DONE (freedreno, i965, r300, r600, radeonsi, nv30, nv50, nvc0, softpipe, llvmpipe)
   GL_OES_texture_half_float_linear                      DONE (freedreno, i965, r300, r600, radeonsi, nv30, nv50, nvc0, softpipe, llvmpipe)
   GL_OES_texture_view                                   DONE (i965/gen8+)
   GL_OES_viewport_array                                 DONE (i965, nvc0, radeonsi)
   GL_OES_texture_view                                   DONE (freedreno, i965/gen8+, r600, radeonsi, nv50, nvc0, softpipe, llvmpipe, swr)
   GL_OES_viewport_array                                 DONE (i965, nvc0, radeonsi, softpipe)
   GLX_ARB_context_flush_control                         not started
   GLX_ARB_robustness_application_isolation              not started
   GLX_ARB_robustness_share_group_isolation              not started
 GL_EXT_direct_state_access subfeatures (in the spec order):
   GL 1.1: Client commands                               not started
   GL 1.0-1.3: Matrix and transpose matrix commands      not started
   GL 1.1-1.2: Texture commands                          not started
   GL 1.2: 3D texture commands                           not started
   GL 1.2.1: Multitexture commands                       not started
   GL 1.2.1-3.0: Indexed texture commands                not started
   GL 1.2.1-3.0: Indexed generic queries                 not started
   GL 1.2.1: EnableIndexed.. Get*Indexed                 not started
   GL_ARB_vertex_program                                 not started
   GL 1.3: Compressed texture and multitexture commands  not started
   GL 1.5: Buffer commands                               not started
   GL 2.0-2.1: Uniform and uniform matrix commands       not started
   GL_EXT_texture_buffer_object                          not started
   GL_EXT_texture_integer                                not started
   GL_EXT_gpu_shader4                                    not started
   GL_EXT_gpu_program_parameters                         not started
   GL_NV_gpu_program4                                    n/a
   GL_NV_framebuffer_multisample_coverage                n/a
   GL 3.0: Renderbuffer/framebuffer commands, Gen*Mipmap not started
   GL 3.0: CopyBuffer command                            not started
   GL_EXT_geometry_shader4 commands (expose in GL 3.2)   not started
   GL_NV_explicit_multisample                            n/a
   GL 3.0: Vertex array/attrib/query/map commands        not started
   Matrix GL tokens                                      not started
 GL_EXT_direct_state_access additions from other extensions (complete list):
   GL_AMD_framebuffer_sample_positions                   n/a
   GL_AMD_gpu_shader_int64                               not started
   GL_ARB_bindless_texture                               not started
   GL_ARB_buffer_storage                                 not started
   GL_ARB_clear_buffer_object                            not started
   GL_ARB_framebuffer_no_attachments                     not started
   GL_ARB_gpu_shader_fp64                                not started
   GL_ARB_instanced_arrays                               not started
   GL_ARB_internalformat_query2                          not started
   GL_ARB_sparse_texture                                 n/a
   GL_ARB_sparse_buffer                                  not started
   GL_ARB_texture_buffer_range                           not started
   GL_ARB_texture_storage                                not started
   GL_ARB_texture_storage_multisample                    not started
   GL_ARB_vertex_attrib_64bit                            not started
   GL_ARB_vertex_attrib_binding                          not started
   GL_EXT_buffer_storage                                 not started
   GL_EXT_external_buffer                                not started
   GL_EXT_separate_shader_objects                        n/a
   GL_EXT_sparse_texture                                 n/a
   GL_EXT_texture_storage                                n/a
   GL_EXT_vertex_attrib_64bit                            not started
   GL_EXT_EGL_image_storage                              n/a
   GL_NV_bindless_texture                                n/a
   GL_NV_gpu_shader5                                     n/a
   GL_NV_texture_multisample                             n/a
   GL_NV_vertex_buffer_unified_memory                    n/a
   GL_NVX_linked_gpu_multicast                           n/a
   GLX_NV_copy_buffer                                    n/a
 The following extensions are not part of any OpenGL or OpenGL ES version, and
 we DO NOT WANT implementations of these extensions for Mesa.
@@ -381,7 +441,7 @@ Vulkan 1.1 -- all DONE: anv, radv
   VK_KHR_variable_pointers                              DONE (anv, radv)
 Khronos extensions that are not part of any Vulkan version:
   VK_KHR_8bit_storage                                   DONE (anv)
   VK_KHR_8bit_storage                                   DONE (anv, radv)
   VK_KHR_android_surface                                not started
   VK_KHR_create_renderpass2                             DONE (anv, radv)
   VK_KHR_display                                        DONE (anv, radv)

									
										17

docs/helpwanted.html
									
												View File
												
				@@ -8,13 +8,13 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				<div class="content">

				<h1>Help Wanted / To-Do List</h1>

				<h1>Help Wanted</h1>

				<p>

				We can always use more help with the Mesa project.

				@@ -32,8 +32,8 @@ Just applying patches, testing and reporting back is helpful.

				There are plenty of open bugs in the <a href="https://bugs.freedesktop.org/describecomponents.cgi?product=Mesa">bug database</a>.

				<li>

				<b>Remove aliasing warnings.</b>

				Enable gcc -Wstrict-aliasing=2 -fstrict-aliasing and track down aliasing

				issues in the code.

				Enable gcc's <code>-Wstrict-aliasing=2 -fstrict-aliasing</code> arguments, and

				track down aliasing issues in the code.

				<li>

				<b>Contribute more tests to

				<a href="https://piglit.freedesktop.org/">Piglit</a>.</b>

				@@ -47,8 +47,9 @@ You can find some further To-do lists here:

				<b>Common To-Do lists:</b>

				</p>

				<ul>

				  <li><a href="https://cgit.freedesktop.org/mesa/mesa/tree/docs/features.txt">

				    <b>features.txt</b></a> - Status of OpenGL 3.x / 4.x features in Mesa.</li>

				  <li><a href="https://gitlab.freedesktop.org/mesa/mesa/blob/master/docs/features.txt">

				    <code>features.txt</code></a> - Status of OpenGL 3.x / 4.x features in

				    Mesa.</li>

				</ul>

				<p>

				@@ -56,9 +57,9 @@ You can find some further To-do lists here:

				</p>

				<ul>

				  <li><a href="https://dri.freedesktop.org/wiki/R600ToDo">

				    <b>r600g</b></a> - Driver for ATI/AMD R600 - Northern Island.</li>

				    <code>r600g</code></a> - Driver for ATI/AMD R600 - Northern Island.</li>

				  <li><a href="https://dri.freedesktop.org/wiki/R300ToDo">

				    <b>r300g</b></a> - Driver for ATI R300 - R500.</li>

				    <code>r300g</code></a> - Driver for ATI R300 - R500.</li>

				</ul>

				<p>

									
										314

docs/index.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				@@ -16,6 +16,231 @@

				<h1>News</h1>

				<h2>August 7, 2019</h2>

				<p>

				<a href="relnotes/19.1.4.html">Mesa 19.1.4</a> is released.

				This is a bug-fix release.

				</p>

				<h2>July 23, 2019</h2>

				<p>

				<a href="relnotes/19.1.3.html">Mesa 19.1.3</a> is released.

				This is a bug-fix release.

				</p>

				<h2>July 9, 2019</h2>

				<p>

				<a href="relnotes/19.1.2.html">Mesa 19.1.2</a> is released.

				This is a bug-fix release.

				</p>

				<h2>June 26, 2019</h2>

				<p>

				<a href="relnotes/19.0.8.html">Mesa 19.0.8</a> is released.

				This is an emergency bug fix release. Users of 19.0.7 should updated to 19.0.8

				or 19.1.1 immediately.

				</p>

				<h2>June 25, 2019</h2>

				<p>

				<a href="relnotes/19.1.1.html">Mesa 19.1.1</a> is released.

				This is a bug-fix release.

				</p>

				<h2>June 24, 2019</h2>

				<p>

				<a href="relnotes/19.0.7.html">Mesa 19.0.7</a> is released.

				This is a bug-fix release.

				</p>

				<p>

				NOTE: It is anticipated that 19.0.7 will be the final release in the

				19.0 series. Users of 19.0 are encouraged to migrate to the 19.1

				series in order to obtain future fixes.

				</p>

				<h2>June 11, 2019</h2>

				<p>

				<a href="relnotes/19.1.0.html">Mesa 19.1.0</a> is released.

				This is a new development release. See the release notes for more

				information about this release

				</p>

				<h2>June 5, 2019</h2>

				<p>

				<a href="relnotes/19.0.6.html">Mesa 19.0.6</a> is released.

				This is a bug-fix release.

				</p>

				<h2>May 21, 2019</h2>

				<p>

				<a href="relnotes/19.0.5.html">Mesa 19.0.5</a> is released.

				This is a bug-fix release.

				</p>

				<h2>May 9, 2019</h2>

				<p>

				<a href="relnotes/19.0.4.html">Mesa 19.0.4</a> is released.

				This is a bug-fix release.

				</p>

				<h2>April 24, 2019</h2>

				<p>

				<a href="relnotes/19.0.3.html">Mesa 19.0.3</a> is released.

				This is a bug-fix release.

				</p>

				<h2>April 10, 2019</h2>

				<p>

				<a href="relnotes/19.0.2.html">Mesa 19.0.2</a> is released.

				This is a bug-fix release.

				</p>

				<h2>April 5, 2019</h2>

				<p>

				<a href="relnotes/18.3.6.html">Mesa 18.3.6</a> is released.

				This is a bug-fix release.

				</p>

				<p>

				NOTE: It is anticipated that 18.3.6 will be the final release in the

				18.3 series. Users of 18.3 are encouraged to migrate to the 19.0

				series in order to obtain future fixes.

				</p>

				<h2>March 27, 2019</h2>

				<p>

				<a href="relnotes/19.0.1.html">Mesa 19.0.1</a> is released.

				This is a bug-fix release.

				</p>

				<h2>March 18, 2019</h2>

				<p>

				<a href="relnotes/18.3.5.html">Mesa 18.3.5</a> is released.

				This is a bug-fix release.

				</p>

				<h2>March 13, 2019</h2>

				<p>

				<a href="relnotes/19.0.0.html">Mesa 19.0.0</a> is released.

				This is a new development release. See the release notes for more

				information about this release

				</p>

				<h2>February 18, 2019</h2>

				<p>

				<a href="relnotes/18.3.4.html">Mesa 18.3.4</a> is released.

				This is a bug-fix release.

				</p>

				<h2>January 31, 2019</h2>

				<p>

				<a href="relnotes/18.3.3.html">Mesa 18.3.3</a> is released.

				This is a bug-fix release.

				</p>

				<h2>January 17, 2019</h2>

				<p>

				<a href="relnotes/18.3.2.html">Mesa 18.3.2</a> is released.

				This is a bug-fix release.

				</p>

				<h2>December 27, 2018</h2>

				<p>

				<a href="relnotes/18.2.8.html">Mesa 18.2.8</a> is released.

				This is a bug-fix release.

				</p>

				<p>

				NOTE: It is anticipated that 18.2.8 will be the final release in the

				18.2 series. Users of 18.2 are encouraged to migrate to the 18.3

				series in order to obtain future fixes.

				</p>

				<h2>December 13, 2018</h2>

				<p>

				<a href="relnotes/18.2.7.html">Mesa 18.2.7</a> is released.

				This is a bug-fix release.

				</p>

				<h2>December 11, 2018</h2>

				<p>

				<a href="relnotes/18.3.1.html">Mesa 18.3.1</a> is released.

				This is a bug-fix release.

				</p>

				<h2>December 7, 2018</h2>

				<p>

				<a href="relnotes/18.3.0.html">Mesa 18.3.0</a> is released.  This is a

				new development release.  See the release notes for more information

				about the release.

				</p>

				<h2>November 28, 2018</h2>

				<p>

				<a href="relnotes/18.2.6.html">Mesa 18.2.6</a> is released.

				This is a bug-fix release.

				</p>

				<h2>November 15, 2018</h2>

				<p>

				<a href="relnotes/18.2.5.html">Mesa 18.2.5</a> is released.

				This is a bug-fix release.

				</p>

				<h2>October 31, 2018</h2>

				<p>

				<a href="relnotes/18.2.4.html">Mesa 18.2.4</a> is released.

				This is a bug-fix release.

				</p>

				<h2>October 19, 2018</h2>

				<p>

				<a href="relnotes/18.2.3.html">Mesa 18.2.3</a> is released.

				This is a bug-fix release.

				</p>

				<h2>October 5, 2018</h2>

				<p>

				<a href="relnotes/18.2.2.html">Mesa 18.2.2</a> is released.

				This is a bug-fix release.

				</p>

				<h2>September 24, 2018</h2>

				<p>

				<a href="relnotes/18.1.9.html">Mesa 18.1.9</a> is released.

				This is a bug-fix release.

				</p>

				<p>

				NOTE: It is anticipated that 18.1.9 will be the final release in the

				18.1 series. Users of 18.1 are encouraged to migrate to the 18.2

				series in order to obtain future fixes.

				</p>

				<h2>September 21, 2018</h2>

				<p>

				<a href="relnotes/18.2.1.html">Mesa 18.2.1</a> is released.

				This is a bug-fix release.

				</p>

				<h2>September 7, 2018</h2>

				<p>

				<a href="relnotes/18.1.8.html">Mesa 18.1.8</a> and

				<a href="relnotes/18.2.0.html">Mesa 18.2.0</a> are released.

				These are, respectively, a bug-fix release from the 18.1 branch and a

				new development release.  See the release notes for more information

				about the releases.

				</p>

				<h2>August 24, 2018</h2>

				<p>

				<a href="relnotes/18.1.7.html">Mesa 18.1.7</a> is released.

				This is a bug-fix release.

				</p>

				<h2>August 13, 2018</h2>

				<p>

				<a href="relnotes/18.1.6.html">Mesa 18.1.6</a> is released.

				This is a bug-fix release.

				</p>

				<h2>July 27, 2018</h2>

				<p>

				<a href="relnotes/18.1.5.html">Mesa 18.1.5</a> is released.

				@@ -44,7 +269,8 @@ This is a bug-fix release.

				<p>

				<a href="relnotes/18.0.5.html">Mesa 18.0.5</a> is released.

				This is a bug-fix release.

				<br>

				</p>

				<p>

				NOTE: It is anticipated that 18.0.5 will be the final release in the

				18.0 series. Users of 18.0 are encouraged to migrate to the 18.1

				series in order to obtain future fixes.

				@@ -91,7 +317,8 @@ This is a bug-fix release.

				<p>

				<a href="relnotes/17.3.9.html">Mesa 17.3.9</a> is released.

				This is a bug-fix release.

				<br>

				</p>

				<p>

				NOTE: It is anticipated that 17.3.9 will be the final release in the

				17.3 series. Users of 17.3 are encouraged to migrate to the 18.0

				series in order to obtain future fixes.

				@@ -150,7 +377,8 @@ This is a bug-fix release.

				<p>

				<a href="relnotes/17.2.8.html">Mesa 17.2.8</a> is released.

				This is a bug-fix release.

				<br>

				</p>

				<p>

				NOTE: It is anticipated that 17.2.8 will be the final release in the

				17.2 series. Users of 17.2 are encouraged to migrate to the 17.3

				series in order to obtain future fixes.

				@@ -209,7 +437,8 @@ This is a bug-fix release.

				<p>

				<a href="relnotes/17.1.10.html">Mesa 17.1.10</a> is released.

				This is a bug-fix release.

				<br>

				</p>

				<p>

				NOTE: It is anticipated that 17.1.10 will be the final release in the

				17.1 series. Users of 17.1 are encouraged to migrate to the 17.2

				series in order to obtain future fixes.

				@@ -280,7 +509,8 @@ This is a bug-fix release.

				<p>

				<a href="relnotes/17.0.7.html">Mesa 17.0.7</a> is released.

				This is a bug-fix release.

				<br>

				</p>

				<p>

				NOTE: It is anticipated that 17.0.7 will be the final release in the 17.0

				series. Users of 17.0 are encouraged to migrate to the 17.1 series in order

				to obtain future fixes.

				@@ -329,7 +559,8 @@ This is a bug-fix release.

				<a href="relnotes/17.0.2.html">Mesa 17.0.2</a> are released.

				These are bug-fix releases from the 13.0 and 17.0 branches, respectively.

				<br>

				</p>

				<p>

				NOTE: It is anticipated that 13.0.6 will be the final release in the 13.0

				series. Users of 13.0 are encouraged to migrate to the 17.0 series in order

				to obtain future fixes.

				@@ -364,7 +595,8 @@ This is a bug-fix release.

				<p>

				<a href="relnotes/12.0.6.html">Mesa 12.0.6</a> is released.

				This is a bug-fix release.

				<br>

				</p>

				<p>

				NOTE: This is an extra release for the 12.0 stable branch, as per developers'

				feedback. It is anticipated that 12.0.6 will be the final release in the 12.0

				series. Users of 12.0 are encouraged to migrate to the 13.0 series in order

				@@ -381,7 +613,8 @@ This is a bug-fix release.

				<p>

				<a href="relnotes/12.0.5.html">Mesa 12.0.5</a> is released.

				This is a bug-fix release.

				<br>

				</p>

				<p>

				NOTE: It is anticipated that 12.0.5 will be the final release in the 12.0

				series. Users of 12.0 are encouraged to migrate to the 13.0 series in order

				to obtain future fixes.

				@@ -443,7 +676,8 @@ about the release.

				<a href="relnotes/11.2.2.html">Mesa 11.2.2</a> are released.

				These are bug-fix releases from the 11.1 and 11.2 branches, respectively.

				<br>

				</p>

				<p>

				NOTE: It is anticipated that 11.1.4 will be the final release in the 11.1.4

				series. Users of 11.1 are encouraged to migrate to the 11.2 series in order

				to obtain future fixes.

				@@ -474,7 +708,8 @@ This is a bug-fix release.

				<p>

				<a href="relnotes/11.0.9.html">Mesa 11.0.9</a> is released.

				This is a bug-fix release.

				<br>

				</p>

				<p>

				NOTE: It is anticipated that 11.0.9 will be the final release in the 11.0

				series. Users of 11.0 are encouraged to migrate to the 11.1 series in order

				to obtain future fixes.

				@@ -538,7 +773,8 @@ This is a bug-fix release.

				<p>

				<a href="relnotes/10.6.9.html">Mesa 10.6.9</a> is released.

				This is a bug-fix release.

				<br>

				</p>

				<p>

				NOTE: It is anticipated that 10.6.9 will be the final release in the 10.6

				series. Users of 10.6 are encouraged to migrate to the 11.0 series in order

				to obtain future fixes.

				@@ -609,7 +845,8 @@ This is a bug-fix release.

				<p>

				<a href="relnotes/10.5.9.html">Mesa 10.5.9</a> is released.

				This is a bug-fix release.

				<br>

				</p>

				<p>

				NOTE: It is anticipated that 10.5.9 will be the final release in the 10.5

				series. Users of 10.5 are encouraged to migrate to the 10.6 series in order

				to obtain future fixes.

				@@ -719,7 +956,8 @@ This is a bug-fix release.

				and <a href="relnotes/10.4.2.html">Mesa 10.4.2</a> are released.

				These are bug-fix releases from the 10.3 and 10.4 branches, respectively.

				<br>

				</p>

				<p>

				NOTE: It is anticipated that 10.3.7 will be the final release in the 10.3

				series. Users of 10.3 are encouraged to migrate to the 10.4 series in order

				to obtain future fixes.

				@@ -770,7 +1008,8 @@ This is a bug-fix release.

				and <a href="relnotes/10.3.1.html">Mesa 10.3.1</a> are released.

				These are bug-fix releases from the 10.2 and 10.3 branches, respectively.

				<br>

				</p>

				<p>

				NOTE: It is anticipated that 10.2.9 will be the final release in the 10.2

				series. Users of 10.2 are encouraged to migrate to the 10.3 series in order

				to obtain future fixes.

				@@ -882,7 +1121,8 @@ This is a bug-fix release.

				<p>

				<a href="relnotes/10.0.5.html">Mesa 10.0.5</a> is released.

				This is a bug-fix release.

				<br>

				</p>

				<p>

				NOTE: Since the 10.1.1 release is being released concurrently, it is

				anticipated that 10.0.5 will be the final release in the 10.0

				series. Users of 10.0 are encouraged to migrate to the 10.1 series in

				@@ -1361,7 +1601,7 @@ with a new test that does over 130 tests of the

				shading language and built-in functions.

				</p>

				<h2>April 2007</h2>

				<h2>April 4, 2007</h2>

				<p>

				Thomas Hellstr&ouml;m of Tungsten Graphics has written a whitepaper

				describing the new DRI memory management system.

				@@ -1814,7 +2054,7 @@ Mesa 5.0.2 has been released.  This is a stable, bug-fix release.

				</pre>

				<h2>June 2003</h2>

				<h2>June 8, 2003</h2>

				<p>

				Mesa's directory tree has been overhauled.

				@@ -2191,7 +2431,7 @@ Here's what's new:</p>

				<h2>April 29, 2001</h2>

				<p>New Mesa website</p>

				<p>Mark Manning produced the new website.<br>Thanks, Mark!</p>

				<p>Mark Manning produced the new website. Thanks, Mark!</p>

				<h2>February 14, 2001</h2>

				@@ -2310,8 +2550,9 @@ just bug fixes.</p>

				</pre>

				<p>Please report any problems with this release ASAP. Bugs should be filed on the

				Mesa3D website at sourceforge.<br>

				After 3.2 is wrapped up I hope to release 3.3 beta 1 soon afterward.</p>

				Mesa3D website at sourceforge.

				</p>

				<p>After 3.2 is wrapped up I hope to release 3.3 beta 1 soon afterward.</p>

				<p>-- Brian</p>

				<h2>December 17, 1999</h2>

				@@ -2356,21 +2597,27 @@ ftp, and CVS services aren't fully restored yet. Please be patient.</p>

				<p>-Brian</p>

				<h2>June 7, 1999</h2>

				<p>RPMS of the nVidia RIVA server can be found at <code>ftp://ftp.mesa3d.org/mesa/misc/nVidia/</code>.</p>

				<p>RPMS of the nVidia RIVA server can be found at

				<a href="ftp://ftp.mesa3d.org/mesa/misc/nVidia/">

				ftp://ftp.mesa3d.org/mesa/misc/nVidia/</a>.</p>

				<h2>June 2, 1999</h2>

				<p><a href="https://www.nvidia.com/">nVidia</a> has released some Linux binaries for

				xfree86 3.3.3.1, along with the <b>full source</b>, which includes GLX acceleration

				based on Mesa 3.0. They can be downloaded from <code>https://www.nvidia.com/Products.nsf/htmlmedia/software_drivers.html</code>.</p>

				based on Mesa 3.0. They can be downloaded from

				<a href="https://www.nvidia.com/Products.nsf/htmlmedia/software_drivers.html">

				https://www.nvidia.com/Products.nsf/htmlmedia/software_drivers.html</a>.</p>

				<h2>May 24, 1999</h2>

				<p>Beta 2 of Mesa 3.1 has been make available at <code>ftp://ftp.mesa3d.org/mesa/beta/</code>.

				If you are into the quake scene, you may want to try this out, as it contains some

				optimizations specifically in the Q3A rendering path.

				<p>Beta 2 of Mesa 3.1 has been make available at

				<a href="ftp://ftp.mesa3d.org/mesa/beta/">ftp://ftp.mesa3d.org/mesa/beta/</a>. If you are into the

				quake scene, you may want to try this out, as it contains some optimizations

				specifically in the Q3A rendering path.

				<h2>May 13, 1999</h2>

				<p>For those interested in the integration of Mesa into XFree86 4.0, Precision Insight

				has posted their lowlevel design documents at <code>http://www.precisioninsight.com</code>.</p>

				has posted their lowlevel design documents at

				<a href="http://www.precisioninsight.com">www.precisioninsight.com</a>.</p>

				<h2>May 13, 1999</h2>

				<pre>May 1999 - John Carmack of id Software, Inc. has made a donation of

				@@ -2396,11 +2643,11 @@ grateful.

				<h2>May 1, 1999</h2>

				<p>John Carmack made an interesting .plan update yesterday:</p>

				<blockquote>

				    <i>"I put together a document on optimizing OpenGL drivers for Q3 that

				    should be helpful to the various Linux 3D teams.</i><br>

				    http://www.quake3arena.com/news/glopt.html"

				</blockquote>

				<pre>

				I put together a document on optimizing OpenGL drivers for Q3 that should be helpful to the various Linux 3D teams.

				http://www.quake3arena.com/news/glopt.html

				</pre>

				<h2>April 7, 1999</h2>

				<p>Updated the Mesa contributors section and added links to RPM Mesa packages.</p>

				@@ -2410,7 +2657,8 @@ grateful.

				<h2>February 16, 1999</h2>

				<p><a href="https://www.sgi.com/">SGI</a> releases its

				<a href="https://www.sgi.com/software/opensource/glx/">GLX source code</a>.</p>

				<a href="http://web.archive.org/web/20040805154836/http://www.sgi.com/software/opensource/glx/download.html">GLX source code</a>.

				</p>

				<h2>January 22, 1999</h2>

				<p><a href="https://www.mesa3d.org">www.mesa3d.org</a> established</p>

									
										80

docs/install.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				@@ -22,6 +22,7 @@

				  <li><a href="#prereq-general">General prerequisites</a>

				  <li><a href="#prereq-dri">For DRI and hardware acceleration</a>

				  </ul>

				<li><a href="#meson">Building with meson</a>

				<li><a href="#autoconf">Building with autoconf (Linux/Unix/X11)</a>

				<li><a href="#scons">Building with SCons (Windows/Linux)</a>

				<li><a href="#android">Building with AOSP (Android)</a>

				@@ -30,18 +31,17 @@

				</ol>

				<h1 id="prereq-general">1. Prerequisites for building</h1>

				<h2 id="prereq-general">1. Prerequisites for building</h2>

				<h2>1.1 General</h2>

				<h3>1.1 General</h3>

				<p>

				Build system.

				</p>

				<h4>Build system</h4>

				<ul>

				<li>Autoconf is required when building on *nix platforms.

				<li><a href="https://mesonbuild.com">meson</a> is required when building on *nix platforms.

				<li>Autoconf was removed in 19.1.0, use meson instead

				<li><a href="http://www.scons.org/">SCons</a> is required for building on

				Windows and optional for Linux (it's an alternative to autoconf/automake.)

				Windows and optional for Linux (it's an alternative to meson.)

				</li>

				<li>Android Build system when building as native Android component. Autoconf

				is used when when building ARC.

				@@ -49,6 +49,7 @@ is used when when building ARC.

				</ul>

				<h4>Compiler</h4>

				<p>

				The following compilers are known to work, if you know of others or you're

				willing to maintain support for other compiler get in touch.

				@@ -57,13 +58,12 @@ willing to maintain support for other compiler get in touch.

				<ul>

				<li>GCC 4.2.0 or later (some parts of Mesa may require later versions)

				<li>clang - exact minimum requirement is currently unknown.

				<li>Microsoft Visual Studio 2013 Update 4 or later is required, for building on Windows.

				<li>Microsoft Visual Studio 2015 or later is required, for building on Windows.

				</ul>

				<h4>Third party/extra tools.</h4>

				<p>

				Third party/extra tools.

				<br>

				<strong>Note</strong>: These should not be required, when building from a release tarball. If

				you think you've spotted a bug let developers know by filing a

				<a href="bugs.html">bug report</a>.

				@@ -72,20 +72,21 @@ you think you've spotted a bug let developers know by filing a

				<ul>

				<li><a href="https://www.python.org/">Python</a> - Python is required.

				Version 2.6.4 or later should work.

				When building with scons 2.7 is required.

				When building with meson 3.5 or newer is required.

				</li>

				<li><a href="http://www.makotemplates.org/">Python Mako module</a> -

				Python Mako module is required. Version 0.3.4 or later should work.

				Python Mako module is required. Version 0.8.0 or later should work.

				</li>

				<li>lex / yacc - for building the Mesa IR and GLSL compiler.

				<div>

				<p>

				On Linux systems, flex and bison versions 2.5.35 and 2.4.1, respectively,

				(or later) should work.

				On Windows with MinGW, install flex and bison with:

				<pre>mingw-get install msys-flex msys-bison</pre>

				For MSVC on Windows, install

				<a href="http://winflexbison.sourceforge.net/">Win flex-bison</a>.

				</div>

				</p>

				</ul>

				<p><strong>Note</strong>: Some versions can be buggy (eg. flex 2.6.2) so do try others if things fail.</p>

				@@ -111,29 +112,35 @@ the packaging tool used by your distro.

				  ... # others

				</pre>

				<h1 id="autoconf">2. Building with autoconf (Linux/Unix/X11)</h1>

				<h2 id="meson">2. Building with meson</h2>

				<p>

				The primary method to build Mesa on Unix systems is with autoconf.

				Meson is the latest build system in mesa, it is currently able to build for

				*nix systems like Linux and BSD, and will be able to build for windows as well.

				</p>

				<p>

				The general approach is the standard:

				The general approach is:

				</p>

				<pre>

				  ./configure

				  make

				  sudo make install

				  meson builddir/

				  ninja -C builddir/

				  sudo ninja -C builddir/ install

				</pre>

				<p>

				But please read the <a href="autoconf.html">detailed autoconf instructions</a>

				for more details.

				Please read the <a href="meson.html">detailed meson instructions</a>

				for more information

				</p>

				<h2 id="autoconf">3. Building with autoconf (Linux/Unix/X11)</h2>

				<p>

				  Autoconf support was removed in Mesa 19.1.0. Please use meson instead.

				</p>

				<h1 id="scons">3. Building with SCons (Windows/Linux)</h1>

				<h2 id="scons">4. Building with SCons (Windows/Linux)</h2>

				<p>

				To build Mesa with SCons on Linux or Windows do

				@@ -169,7 +176,7 @@ Additional information is available in <a href="README.WIN32">README.WIN32</a>.

				<h1 id="android">4. Building with AOSP (Android)</h1>

				<h2 id="android">5. Building with AOSP (Android)</h2>

				<p>

				Currently one can build Mesa for Android as part of the AOSP project, yet

				@@ -188,7 +195,7 @@ Android-x86 and/or other resources.

				</p>

				<h1 id="libs">5. Library Information</h1>

				<h2 id="libs">6. Library Information</h2>

				<p>

				When compilation has finished, look in the top-level <code>lib/</code>

				@@ -196,18 +203,17 @@ When compilation has finished, look in the top-level <code>lib/</code>

				You'll see a set of library files similar to this:

				</p>

				<pre>

				lrwxrwxrwx    1 brian    users          10 Mar 26 07:53 libGL.so -> libGL.so.1*

				lrwxrwxrwx    1 brian    users          19 Mar 26 07:53 libGL.so.1 -> libGL.so.1.5.060100*

				lrwxrwxrwx    1 brian    users          10 Mar 26 07:53 libGL.so -&gt; libGL.so.1*

				lrwxrwxrwx    1 brian    users          19 Mar 26 07:53 libGL.so.1 -&gt; libGL.so.1.5.060100*

				-rwxr-xr-x    1 brian    users     3375861 Mar 26 07:53 libGL.so.1.5.060100*

				lrwxrwxrwx    1 brian    users          14 Mar 26 07:53 libOSMesa.so -> libOSMesa.so.6*

				lrwxrwxrwx    1 brian    users          23 Mar 26 07:53 libOSMesa.so.6 -> libOSMesa.so.6.1.060100*

				lrwxrwxrwx    1 brian    users          14 Mar 26 07:53 libOSMesa.so -&gt; libOSMesa.so.6*

				lrwxrwxrwx    1 brian    users          23 Mar 26 07:53 libOSMesa.so.6 -&gt; libOSMesa.so.6.1.060100*

				-rwxr-xr-x    1 brian    users       23871 Mar 26 07:53 libOSMesa.so.6.1.060100*

				</pre>

				<p>

				<b>libGL</b> is the main OpenGL library (i.e. Mesa).

				<br>

				<b>libOSMesa</b> is the OSMesa (Off-Screen) interface library.

				<b>libGL</b> is the main OpenGL library (i.e. Mesa), while <b>libOSMesa</b>

				is the OSMesa (Off-Screen) interface library.

				</p>

				<p>

				@@ -226,10 +232,10 @@ versions of libGL and device drivers.

				</p>

				<h1 id="pkg-config">6. Building OpenGL programs with pkg-config</h1>

				<h2 id="pkg-config">7. Building OpenGL programs with pkg-config</h2>

				<p>

				Running <code>make install</code> will install package configuration files

				Running <code>ninja install</code> will install package configuration files

				for the pkg-config utility.

				</p>

				@@ -245,8 +251,6 @@ For example, compiling and linking a GLUT application can be done with:

				   gcc `pkg-config --cflags --libs glut` mydemo.c -o mydemo

				</pre>

				<br>

				</div>

				</body>

				</html>

									
										30

docs/intro.html
									
												View File
												
				@@ -2,13 +2,13 @@

				<html lang="en">

				<head>

				  <meta http-equiv="content-type" content="text/html; charset=utf-8">

				  <title>Mesa Introduction</title>

				  <title>Introduction</title>

				  <link rel="stylesheet" type="text/css" href="mesa.css">

				</head>

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				@@ -50,7 +50,7 @@ systems.

				<h1>Project History</h1>

				<h2>Project History</h2>

				<p>

				The Mesa project was originally started by Brian Paul.

				@@ -185,7 +185,7 @@ of the OpenGL, OpenGL ES and Vulkan specifications.

				<h1>Major Versions</h1>

				<h2>Major Versions</h2>

				<p>

				This is a summary of the major versions of Mesa.

				@@ -194,7 +194,7 @@ of the OpenGL specification is implemented.

				</p>

				<h2>Version 12.x features</h2>

				<h3>Version 12.x features</h3>

				<p>

				Version 12.x of Mesa implements the OpenGL 4.3 API, but not all drivers

				support OpenGL 4.3.

				@@ -204,21 +204,21 @@ Initial support for Vulkan is also included.

				</p>

				<h2>Version 11.x features</h2>

				<h3>Version 11.x features</h3>

				<p>

				Version 11.x of Mesa implements the OpenGL 4.1 API, but not all drivers

				support OpenGL 4.1.

				</p>

				<h2>Version 10.x features</h2>

				<h3>Version 10.x features</h3>

				<p>

				Version 10.x of Mesa implements the OpenGL 3.3 API, but not all drivers

				support OpenGL 3.3.

				</p>

				<h2>Version 9.x features</h2>

				<h3>Version 9.x features</h3>

				<p>

				Version 9.x of Mesa implements the OpenGL 3.1 API.

				While the driver for Intel Sandy Bridge and Ivy Bridge is the only

				@@ -233,7 +233,7 @@ tracker for OpenCL.

				</p>

				<h2>Version 8.x features</h2>

				<h3>Version 8.x features</h3>

				<p>

				Version 8.x of Mesa implements the OpenGL 3.0 API.

				The developers at Intel deserve a lot of credit for implementing most

				@@ -242,14 +242,14 @@ the i965 driver.

				</p>

				<h2>Version 7.x features</h2>

				<h3>Version 7.x features</h3>

				<p>

				Version 7.x of Mesa implements the OpenGL 2.1 API.  The main feature

				of OpenGL 2.x is the OpenGL Shading Language.

				</p>

				<h2>Version 6.x features</h2>

				<h3>Version 6.x features</h3>

				<p>

				Version 6.x of Mesa implements the OpenGL 1.5 API with the following

				extensions incorporated as standard features:

				@@ -289,7 +289,7 @@ OpenGL specification</a> for more details.

				<h2>Version 5.x features</h2>

				<h3>Version 5.x features</h3>

				<p>

				Version 5.x of Mesa implements the OpenGL 1.4 API with the following

				extensions incorporated as standard features:

				@@ -315,7 +315,7 @@ extensions incorporated as standard features:

				</ul>

				<h2>Version 4.x features</h2>

				<h3>Version 4.x features</h3>

				<p>

				Version 4.x of Mesa implements the OpenGL 1.3 API with the following

				@@ -334,7 +334,7 @@ extensions incorporated as standard features:

				<li>GL_ARB_transpose_matrix

				</ul>

				<h2>Version 3.x features</h2>

				<h3>Version 3.x features</h3>

				<p>

				Version 3.x of Mesa implements the OpenGL 1.2 API with the following

				@@ -350,7 +350,7 @@ features:

				</ul>

				<h2>Version 2.x features</h2>

				<h3>Version 2.x features</h3>

				<p>

				Version 2.x of Mesa implements the OpenGL 1.1 API with the following

				features.

									
										16

docs/license.html
									
												View File
												
				@@ -2,19 +2,21 @@

				<html lang="en">

				<head>

				  <meta http-equiv="content-type" content="text/html; charset=utf-8">

				  <title>License / Copyright Information</title>

				  <title>License and Copyright</title>

				  <link rel="stylesheet" type="text/css" href="mesa.css">

				</head>

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				<div class="content">

				<h1>Disclaimer</h1>

				<h1>License and Copyright</h1>

				<h2>Disclaimer</h2>

				<p>

				Mesa is a 3-D graphics library with an API which is very similar to

				@@ -32,7 +34,7 @@ vendor.

				<p>

				Please do not refer to the library as <em>MesaGL</em> (for legal

				reasons). It's just <em>Mesa</em> or <em>The Mesa 3-D graphics

				library</em>. <br>

				library</em>.

				</p>

				<p>

				@@ -42,7 +44,7 @@ library</em>. <br>

				<h1>License / Copyright Information</h1>

				<h2>License / Copyright Information</h2>

				<p>

				The Mesa distribution consists of several components.  Different copyrights

				@@ -82,7 +84,7 @@ SOFTWARE.

				</pre>

				<h1>Attention, Contributors</h1>

				<h2>Attention, Contributors</h2>

				<p>

				When contributing to the Mesa project you must agree to the licensing terms

				@@ -92,7 +94,7 @@ and their respective licenses.

				</p>

				<h1>Mesa Component Licenses</h1>

				<h2>Mesa Component Licenses</h2>

				<pre>

				Component         Location               License

									
										8

docs/lists.html
									
												View File
												
				@@ -2,13 +2,13 @@

				<html lang="en">

				<head>

				  <meta http-equiv="content-type" content="text/html; charset=utf-8">

				  <title>Mesa Mailing Lists</title>

				  <title>Mailing Lists</title>

				  <link rel="stylesheet" type="text/css" href="mesa.css">

				</head>

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				@@ -68,14 +68,14 @@ kernels, see the

				</p>

				<h1>IRC</h1>

				<h2>IRC</h2>

				<p>join <a href="irc://chat.freenode.net#dri-devel">#dri-devel channel</a>

				on <a href="https://webchat.freenode.net/">irc.freenode.net</a>

				</p>

				<h1>OpenGL Forums</h1>

				<h2>OpenGL Forums</h2>

				<p>

				Here are some other OpenGL-related forums you might find useful:

									
										93

docs/llvmpipe.html
									
												View File
												
				@@ -2,19 +2,21 @@

				<html lang="en">

				<head>

				  <meta http-equiv="content-type" content="text/html; charset=utf-8">

				  <title>llvmpipe</title>

				  <title>Gallium LLVMpipe Driver</title>

				  <link rel="stylesheet" type="text/css" href="mesa.css">

				</head>

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				<div class="content">

				<h1>Introduction</h1>

				<h1>Gallium LLVMpipe Driver</h1>

				<h2>Introduction</h2>

				<p>

				The Gallium llvmpipe driver is a software rasterizer that uses LLVM to

				@@ -28,7 +30,7 @@ It's the fastest software rasterizer for Mesa.

				</p>

				<h1>Requirements</h1>

				<h2>Requirements</h2>

				<ul>

				<li>

				@@ -45,7 +47,7 @@ It's the fastest software rasterizer for Mesa.

				   built with LLVM version 4.0 or later.

				   </p>

				   <p>

				   See /proc/cpuinfo to know what your CPU supports.

				   See <code>/proc/cpuinfo</code> to know what your CPU supports.

				   </p>

				</li>

				<li>

				@@ -71,8 +73,9 @@ It's the fastest software rasterizer for Mesa.

				   <p>

				   For Windows you will need to build LLVM from source with MSVC or MINGW

				   (either natively or through cross compilers) and CMake, and set the LLVM

				   environment variable to the directory you installed it to.

				   (either natively or through cross compilers) and CMake, and set the

				   <code>LLVM</code> environment variable to the directory you installed

				   it to.

				   LLVM will be statically linked, so when building on MSVC it needs to be

				   built with a matching CRT as Mesa, and you'll need to pass

				@@ -101,8 +104,8 @@ It's the fastest software rasterizer for Mesa.

				   </table>

				   <p>

				   You can build only the x86 target by passing -DLLVM_TARGETS_TO_BUILD=X86

				   to cmake.

				   You can build only the x86 target by passing

				   <code>-DLLVM_TARGETS_TO_BUILD=X86</code> to cmake.

				   </p>

				</li>

				@@ -112,7 +115,7 @@ It's the fastest software rasterizer for Mesa.

				</ul>

				<h1>Building</h1>

				<h2>Building</h2>

				To build everything on Linux invoke scons as:

				@@ -120,10 +123,12 @@ To build everything on Linux invoke scons as:

				  scons build=debug libgl-xlib

				</pre>

				Alternatively, you can build it with autoconf/make with:

				Alternatively, you can build it with meson with:

				<pre>

				  ./configure --enable-glx=gallium-xlib --with-gallium-drivers=swrast --disable-dri --disable-gbm --disable-egl

				  make

				  mkdir build

				  cd build

				  meson -D glx=gallium-xlib -D gallium-drivers=swrast

				  ninja

				</pre>

				but the rest of these instructions assume that scons is used.

				@@ -135,11 +140,12 @@ For Windows the procedure is similar except the target:

				</pre>

				<h1>Using</h1>

				<h2>Using</h2>

				<h2>Linux</h2>

				<h3>Linux</h3>

				<p>On Linux, building will create a drop-in alternative for libGL.so into</p>

				<p>On Linux, building will create a drop-in alternative for

				<code>libGL.so</code> into</p>

				<pre>

				  build/foo/gallium/targets/libgl-xlib/libGL.so

				@@ -149,13 +155,15 @@ or

				  lib/gallium/libGL.so

				</pre>

				<p>To use it set the LD_LIBRARY_PATH environment variable accordingly.</p>

				<p>To use it set the <code>LD_LIBRARY_PATH</code> environment variable

				accordingly.</p>

				<p>For performance evaluation pass build=release to scons, and use the corresponding

				lib directory without the "-debug" suffix.</p>

				<p>For performance evaluation pass <code>build=release</code> to scons,

				and use the corresponding lib directory without the <code>-debug</code>

				suffix.</p>

				<h2>Windows</h2>

				<h3>Windows</h3>

				<p>

				On Windows, building will create

				@@ -173,7 +181,9 @@ any OpenGL drivers):

				</p>

				<ul>

				  <li><p>copy build/windows-x86-debug/gallium/targets/libgl-gdi/opengl32.dll to C:\Windows\SysWOW64\mesadrv.dll</p></li>

				  <li><p>copy <code>build/windows-x86-debug/gallium/targets/libgl-gdi/opengl32.dll</code>

				         to <code>C:\Windows\SysWOW64\mesadrv.dll</code>

				  </p></li>

				  <li><p>load this registry settings:</p>

				  <pre>REGEDIT4

				@@ -190,7 +200,7 @@ any OpenGL drivers):

				</ul>

				<h1>Profiling</h1>

				<h2>Profiling</h2>

				<p>

				To profile llvmpipe you should build as

				@@ -204,7 +214,7 @@ This will ensure that frame pointers are used both in C and JIT functions, and

				that no tail call optimizations are done by gcc.

				</p>

				<h2>Linux perf integration</h2>

				<h3>Linux perf integration</h3>

				<p>

				On Linux, it is possible to have symbol resolution of JIT code with <a href="https://perf.wiki.kernel.org/">Linux perf</a>:

				@@ -216,27 +226,28 @@ On Linux, it is possible to have symbol resolution of JIT code with <a href="htt

				</pre>

				<p>

				When run inside Linux perf, llvmpipe will create a /tmp/perf-XXXXX.map file with

				symbol address table.  It also dumps assembly code to /tmp/perf-XXXXX.map.asm,

				which can be used by the bin/perf-annotate-jit.py script to produce disassembly of

				the generated code annotated with the samples.

				When run inside Linux perf, llvmpipe will create a

				<code>/tmp/perf-XXXXX.map</code> file with symbol address table.  It also

				dumps assembly code to <code>/tmp/perf-XXXXX.map.asm</code>, which can be

				used by the <code>bin/perf-annotate-jit.py</code> script to produce

				disassembly of the generated code annotated with the samples.

				</p>

				<p>You can obtain a call graph via

				<a href="https://github.com/jrfonseca/gprof2dot#linux-perf">Gprof2Dot</a>.</p>

				<h1>Unit testing</h1>

				<h2>Unit testing</h2>

				<p>

				Building will also create several unit tests in

				build/linux-???-debug/gallium/drivers/llvmpipe:

				<code>build/linux-???-debug/gallium/drivers/llvmpipe</code>:

				</p>

				<ul>

				<li> lp_test_blend: blending

				<li> lp_test_conv: SIMD vector conversion

				<li> lp_test_format: pixel unpacking/packing

				<li> <code>lp_test_blend</code>: blending

				<li> <code>lp_test_conv</code>: SIMD vector conversion

				<li> <code>lp_test_format</code>: pixel unpacking/packing

				</ul>

				<p>

				@@ -248,29 +259,31 @@ for later analysis, e.g.:

				</pre>

				<h1>Development Notes</h1>

				<h2>Development Notes</h2>

				<ul>

				<li>

				  When looking at this code for the first time, start in lp_state_fs.c, and

				  then skim through the lp_bld_* functions called there, and the comments

				  at the top of the lp_bld_*.c functions.

				  then skim through the <code>lp_bld_*</code> functions called there, and

				  the comments at the top of the <code>lp_bld_*.c</code> functions.

				</li>

				<li>

				  The driver-independent parts of the LLVM / Gallium code are found in

				  src/gallium/auxiliary/gallivm/.  The filenames and function prefixes

				  need to be renamed from "lp_bld_" to something else though.

				  <code>src/gallium/auxiliary/gallivm/</code>.  The filenames and function

				  prefixes need to be renamed from <code>lp_bld_</code> to something else

				  though.

				</li>

				<li>

				  We use LLVM-C bindings for now. They are not documented, but follow the C++

				  interfaces very closely, and appear to be complete enough for code

				  generation. See 

				  <a href="https://npcontemplation.blogspot.com/2008/06/secret-of-llvm-c-bindings.html">

				  this stand-alone example</a>.  See the llvm-c/Core.h file for reference.

				  this stand-alone example</a>.  See the <code>llvm-c/Core.h</code> file for

				  reference.

				</li>

				</ul>

				<h1 id="recommended_reading">Recommended Reading</h1>

				<h2 id="recommended_reading">Recommended Reading</h2>

				<ul>

				  <li>

				@@ -306,7 +319,7 @@ for later analysis, e.g.:

				      <li><a href="http://www.drdobbs.com/optimizing-pixomatic-for-modern-x86-proc/184405807">Optimizing Pixomatic For Modern x86 Processors</a></li>

				      <li><a href="http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html">Intel 64 and IA-32 Architectures Optimization Reference Manual</a></li>

				      <li><a href="http://www.agner.org/optimize/">Software optimization resources</a></li>

				      <li><a href="https://software.intel.com/en-us/articles/intel-intrinsics-guide">Intel Intrinsics Guide</a><li>

				      <li><a href="https://software.intel.com/en-us/articles/intel-intrinsics-guide">Intel Intrinsics Guide</a></li>

				    </ul>

				  </li>

				  <li>

									
										37

docs/mangling.html
									
												View File
											
				@@ -1,37 +0,0 @@

				<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

				<html lang="en">

				<head>

				  <meta http-equiv="content-type" content="text/html; charset=utf-8">

				  <title>GL Function Name Mangling</title>

				  <link rel="stylesheet" type="text/css" href="mesa.css">

				</head>

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				</div>

				<iframe src="contents.html"></iframe>

				<div class="content">

				<h1>GL Function Name Mangling</h1>

				<p>

				If you want to use both Mesa and another OpenGL library in the same

				application at the same time you may find it useful to compile Mesa with

				<i>name mangling</i>.

				This results in all the Mesa functions being prefixed with

				<b>mgl</b> instead of <b>gl</b>.

				</p>

				<p>

				This option is supported only with the autoconf build. To use it add

				--enable-mangling to your configure line.

				</p>

				<pre>

				<code>./configure --enable-mangling ...</code>

				</pre>

				</div>

				</body>

				</html>

									
										49

docs/mesa.css
									
												View File
												
				@@ -3,61 +3,48 @@ body {

					background-color: #ffffff;

					font: 14px 'Lucida Grande', Geneva, Arial, Verdana, sans-serif;

					color: black;

				 	link: #111188;

				}

				h1 {

					font: 24px 'Lucida Grande', Geneva, Arial, Verdana, sans-serif;

					font-size: 24px;

					font-weight: bold;

					color: black;

				}

				h2 {

					font: 18px 'Lucida Grande', Geneva, Arial, Verdana, sans-serif, bold;

					font-size: 18px;

					font-weight: bold;

					color: black;

				}

				code {

					font-family: monospace;

					font-size: 10pt;

					color: black;

				}

				pre {

					/*font-family: monospace;*/

					font-size: 10pt;

					/*color: black;*/

					background-color: #eee;

					margin-left: 2em;

					padding: .5em;

				}

				iframe {

				  width: 19em;

				  height: 80em;

				  border: none;

				  float: left;

					width: 19em;

					height: 80em;

					border: none;

					float: left;

				}

				.content {

				  position: absolute;

				  left: 20em;

				  right: 10px;

				  overflow: hidden

					position: absolute;

					left: 20em;

					right: 10px;

					overflow: hidden;

				}

				.header {

				  background: black url('gears.png') 15px no-repeat;

				  margin:0;

				  padding: 5px;

				  clear:both;

				}

				.header h1 {

				  background: url('gears.png') right no-repeat;

				  color: white;

				  font: x-large sans-serif;

				  text-align: center;

				  height: 50px;

				  margin: 0;

				  padding-top: 30px;

					background: url('gears.png') 15px no-repeat, black url('gears.png') right no-repeat;

					padding: 1.75rem;

					text-align: center;

					color: white;

					font: x-large sans-serif;

				}

									
										377

docs/meson.html
									
												View File
												
				@@ -2,69 +2,119 @@

				<html lang="en">

				<head>

				  <meta http-equiv="content-type" content="text/html; charset=utf-8">

				  <title>Compilation and Installation using Meson</title>

				  <title>Compilation and Installation Using Meson</title>

				  <link rel="stylesheet" type="text/css" href="mesa.css">

				</head>

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				<div class="content">

				<h1>Compilation and Installation using Meson</h1>

				<h1>Compilation and Installation Using Meson</h1>

				<h2 id="basic">1. Basic Usage</h2>

				<ul>

				  <li><a href="#intro">Introduction</a></li>

				  <li><a href="#basic">Basic Usage</a></li>

				  <li><a href="#advanced">Advanced Usage</a></li>

				  <li><a href="#cross-compilation">Cross-compilation and 32-bit builds</a></li>

				</ul>

				<p><strong>The Meson build system is generally considered stable and ready

				for production</strong></p>

				<h2 id="intro">1. Introduction</h2>

				<p>The meson build is tested on on Linux, macOS, Cygwin and Haiku, it should

				work on FreeBSD, DragonflyBSD, NetBSD, and OpenBSD.</p>

				<p>For general information about Meson see the

				<a href="http://mesonbuild.com/">Meson website</a>.</p>

				<p><strong>Mesa requires Meson >= 0.44.1 to build.</strong>

				<p><strong>Mesa's Meson build system is generally considered stable and ready

				for production.</strong></p>

				<p>The Meson build of Mesa is tested on Linux, macOS, Cygwin and Haiku, FreeBSD,

				DragonflyBSD, NetBSD, and should work on OpenBSD.</p>

				<p>If Meson is not already installed on your system, you can typically

				install it with your package installer.  For example:</p>

				<pre>

				sudo apt-get install meson   # Ubuntu

				</pre>

				or

				<pre>

				sudo dnf install meson   # Fedora

				</pre>

				<p><strong>Mesa requires Meson &gt;= 0.46.0 to build.</strong>

				Some older versions of meson do not check that they are too old and will error

				out in odd ways.

				</p>

				<p>You'll also need <a href="https://ninja-build.org/">Ninja</a>.

				If it's not already installed, use apt-get or dnf to install

				the <em>ninja-build</em> package.

				</p>

				<h2 id="basic">2. Basic Usage</h2>

				<p>

				The meson program is used to configure the source directory and generates

				either a ninja build file or Visual Studio® build files. The latter must

				be enabled via the <code>--backend</code> switch, as ninja is the default backend on all

				operating systems. Meson only supports out-of-tree builds, and must be passed a

				be enabled via the <code>--backend</code> switch, as ninja is the default

				backend on all

				operating systems.

				</p>

				<p>

				Meson only supports out-of-tree builds, and must be passed a

				directory to put built and generated sources into. We'll call that directory

				"build" for examples.

				"build" here.

				It's recommended to create a

				<a href="http://mesonbuild.com/Using-multiple-build-directories.html">

				separate build directory</a> for each configuration you might want to use.

				</p>

				<p>Basic configuration is done with:</p>

				<pre>

				    meson build/

				meson build/

				</pre>

				<p>

				To see a description of your options you can run <code>meson configure</code>

				along with a build directory to view the selected options for. This will show

				your meson global arguments and project arguments, along with their defaults

				and your local settings.

				Meson does not currently support listing options before configure a build

				directory, but this feature is being discussed upstream.

				This will create the build directory.

				If any dependencies are missing, you can install them, or try to remove

				the dependency with a Meson configuration option (see below).

				</p>

				<p>

				To review the options which Meson chose, run:

				</p>

				<pre>

				    meson configure build/

				meson configure build/

				</pre>

				<p>

				With additional arguments <code>meson configure</code> is used to change

				options on already configured build directory. All options passed to this

				command are in the form <code>-D "command"="value"</code>.

				Meson does not currently support listing configuration options before

				running "meson build/" but this feature is being discussed upstream.

				For now, we have a <code>bin/meson-options.py</code> script that prints

				the options for you.

				If that script doesn't work for some reason, you can always look in the

				<a href="https://gitlab.freedesktop.org/mesa/mesa/blob/master/meson_options.txt">

				meson_options.txt</a> file at the root of the project.

				</p>

				<p>

				With additional arguments <code>meson configure</code> can be used to change

				options for a previously configured build directory.

				All options passed to this command are in the form

				<code>-D "option"="value"</code>.

				For example:

				</p>

				<pre>

				    meson configure build/ -Dprefix=/tmp/install -Dglx=true

				meson configure build/ -Dprefix=/tmp/install -Dglx=true

				</pre>

				<p>

				@@ -77,64 +127,166 @@ and brackets to represent an empty list (<code>-D platforms=[]</code>).

				<p>

				Once you've run the initial <code>meson</code> command successfully you can use

				your configured backend to build the project. With ninja, the -C option can be

				be used to point at a directory to build.

				your configured backend to build the project in your build directory:

				</p>

				<pre>

				    ninja -C build/

				ninja -C build/

				</pre>

				<p>

				Without arguments, it will produce libGL.so and/or several other libraries

				depending on the options you have chosen. Later, if you want to rebuild for a

				different configuration, you should run <code>ninja clean</code> before

				changing the configuration, or create a new out of tree build directory for

				each configuration you want to build

				<a href="http://mesonbuild.com/Using-multiple-build-directories.html">as

				recommended in the documentation</a>

				</p>

				<dl>

				<dt><code>Environment Variables</code></dt>

				<dd><p>Meson supports the standard CC and CXX environment variables for

				changing the default compiler, and CFLAGS, CXXFLAGS, and LDFLAGS for setting

				options to the compiler and linker.

				The default compilers depends on your operating system. Meson supports most of

				the popular compilers, a complete list is available

				<a href="http://mesonbuild.com/Reference-tables.html#compiler-ids">here</a>.

				These arguments are consumed and stored by meson when it is initialized or

				re-initialized. Therefore passing them to meson configure will not do anything,

				and passing them to ninja will only do something if ninja decides to

				re-initialize meson, for example, if a meson.build file has been changed.

				Changing these variables will not cause all targets to be rebuilt, so running

				ninja clean is recommended when changing CFLAGS or CXXFLAGS. Meson will never

				change compiler in a configured build directory.

				The next step is to install the Mesa libraries, drivers, etc.

				This also finishes up some final steps of the build process (such as creating

				symbolic links for drivers).  To install:

				</p>

				<pre>

				    CC=clang CXX=clang++ meson build-clang

				    ninja -C build-clang

				    ninja -C build-clang clean

				    touch meson.build

				    CFLAGS=-Wno-typedef-redefinition ninja -C build-clang

				ninja -C build/ install

				</pre>

				<p>Meson also honors <code>DESTDIR</code> for installs</p>

				<p>

				Note: autotools automatically updated translation files (used by the DRI

				configuration tool) as part of the build process,

				Meson does not do this.  Instead, you will need do this:

				</p>

				<pre>

				ninja -C build/ xmlpool-pot xmlpool-update-po xmlpool-gmo

				</pre>

				<h2 id="advanced">3. Advanced Usage</h2>

				<dl>

				<dt>Installation Location</dt>

				<dd>

				<p>

				Meson default to installing libGL.so in your system's main lib/ directory

				and DRI drivers to a dri/ subdirectory.

				</p>

				<p>

				Developers will often want to install Mesa to a testing directory rather

				than the system library directory.

				This can be done with the --prefix option.  For example:

				</p>

				<pre>

				meson --prefix="${PWD}/build/install" build/

				</pre>

				<p>

				will put the final libraries and drivers into the build/install/

				directory.

				Then you can set LD_LIBRARY_PATH and LIBGL_DRIVERS_PATH to that location

				to run/test the driver.

				</p>

				<p>

				Meson also honors <code>DESTDIR</code> for installs.

				</p>

				</dd>

				<dt>Compiler Options</dt>

				<dd>

				<p>Meson supports the common CFLAGS, CXXFLAGS, etc. environment

				variables but their use is discouraged because of the many caveats

				in using them.

				</p>

				<p>Instead, it is recomended to use <code>-D${lang}_args</code> and

				<code>-D${lang}_link_args</code>. Among the benefits of these options

				is that they are guaranteed to persist across rebuilds and reconfigurations.

				</p>

				<p>

				This example sets -fmax-errors for compiling C sources and -DMAGIC=123

				for C++ sources:

				</p>

				<pre>

				meson builddir/ -Dc_args=-fmax-errors=10 -Dcpp_args=-DMAGIC=123

				</pre>

				</dd>

				<dt><code>LLVM</code></dt>

				<dd><p>Meson includes upstream logic to wrap llvm-config using it's standard

				dependency interface. It will search <code>$PATH</code> (or <code>%PATH%</code> on windows) for

				llvm-config, so using an LLVM from a non-standard path is as easy as

				<code>PATH=/path/with/llvm-config:$PATH meson build</code>.

				</p></dd>

				</dl>

				<dt>Compiler Specification</dt>

				<dd>

				<p>

				Meson supports the standard CC and CXX environment variables for

				changing the default compiler.  Note that Meson does not allow

				changing the compilers in a configured builddir so you will need

				to create a new build dir for a different compiler.

				</p>

				<p>

				This is an example of specifying the clang compilers and cleaning

				the build directory before reconfiguring with an extra C option:

				</p>

				<pre>

				CC=clang CXX=clang++ meson build-clang

				ninja -C build-clang

				ninja -C build-clang clean

				meson configure build -Dc_args="-Wno-typedef-redefinition"

				ninja -C build-clang

				</pre>

				<p>

				The default compilers depends on your operating system. Meson supports most of

				the popular compilers, a complete list is available

				<a href="http://mesonbuild.com/Reference-tables.html#compiler-ids">here</a>.

				</p>

				</dd>

				<dt>LLVM</dt>

				<dd><p>Meson includes upstream logic to wrap llvm-config using its standard

				dependency interface.

				</p></dd>

				<dd><p>

				As of meson 0.49.0 meson also has the concept of a

				<a href="https://mesonbuild.com/Native-environments.html">"native file"</a>,

				these files provide information about the native build environment (as opposed

				to a cross build environment). They are ini formatted and can override where to

				find llvm-config:

				</p>

				custom-llvm.ini

				<pre>

				    [binaries]

				    llvm-config = '/usr/local/bin/llvm/llvm-config'

				</pre>

				Then configure meson:

				<pre>

				    meson builddir/ --native-file custom-llvm.ini

				</pre>

				</dd>

				<dd><p>

				Meson &lt; 0.49 doesn't support native files, so to specify a custom

				<code>llvm-config</code> you need to modify your <code>$PATH</code> (or

				<code>%PATH%</code> on windows), which will be searched for

				<code>llvm-config</code>, <code>llvm-config<i>$version</i></code>,

				and <code>llvm-config-<i>$version</i></code>:

				</p>

				<pre>

				PATH=/path/to/folder/with/llvm-config:$PATH meson build

				</pre>

				</dd>

				<dd><p>

				For selecting llvm-config for cross compiling a

				<a href="https://mesonbuild.com/Cross-compilation.html#defining-the-environment">"cross file"</a>

				should be used. It uses the same format as the native file above:

				</p>

				<p>cross-llvm.ini</p>

				<pre>

				    [binaries]

				    ...

				    llvm-config = '/usr/lib/llvm-config-32'

				</pre>

				<p>Then configure meson:</p>

				<pre>

				    meson builddir/ --cross-file cross-llvm.ini

				</pre>

				See the <a href="#cross-compilation">Cross Compilation</a> section for more information.

				</dd>

				<dl>

				<dt><code>PKG_CONFIG_PATH</code></dt>

				<dd><p>The

				<code>pkg-config</code> utility is a hard requirement for configuring and

				@@ -170,9 +322,7 @@ with debugging as some code and validation will be optimized away.

				buildtype, which causes meson to inject no additional compiler arguments, only

				those in the C/CXXFLAGS and those that mesa itself defines.</p>

				</dd>

				</dl>

				<dl>

				<dt><code>-Db_ndebug</code></dt>

				<dd><p>This option controls assertions in meson projects. When set to <code>false</code>

				(the default) assertions are enabled, when set to true they are disabled. This

				@@ -182,6 +332,93 @@ is unrelated to the <code>buildtype</code>; setting the latter to

				</dd>

				</dl>

				<h2 id="cross-compilation">4. Cross-compilation and 32-bit builds</h2>

				<p><a href="https://mesonbuild.com/Cross-compilation.html">Meson supports

				cross-compilation</a> by specifying a number of binary paths and

				settings in a file and passing this file to <code>meson</code> or

				<code>meson configure</code> with the <code>--cross-file</code>

				parameter.</p>

				<p>This file can live at any location, but you can use the bare filename

				(without the folder path) if you put it in $XDG_DATA_HOME/meson/cross or

				~/.local/share/meson/cross</p>

				<p>Below are a few example of cross files, but keep in mind that you

				will likely have to alter them for your system.</p>

				<p>

				Those running on ArchLinux can use the AUR-maintained packages for some

				of those, as they'll have the right values for your system:

				</p>

				<ul>

				  <li><a href="https://aur.archlinux.org/packages/meson-cross-x86-linux-gnu">meson-cross-x86-linux-gnu</a></li>

				  <li><a href="https://aur.archlinux.org/packages/meson-cross-aarch64-linux-gnu">meson-cross-aarch64-linux-gnu</a></li>

				</ul>

				<p>

				32-bit build on x86 linux:

				</p>

				<pre>

				[binaries]

				c = '/usr/bin/gcc'

				cpp = '/usr/bin/g++'

				ar = '/usr/bin/gcc-ar'

				strip = '/usr/bin/strip'

				pkgconfig = '/usr/bin/pkg-config-32'

				llvm-config = '/usr/bin/llvm-config32'

				[properties]

				c_args = ['-m32']

				c_link_args = ['-m32']

				cpp_args = ['-m32']

				cpp_link_args = ['-m32']

				[host_machine]

				system = 'linux'

				cpu_family = 'x86'

				cpu = 'i686'

				endian = 'little'

				</pre>

				<p>

				64-bit build on ARM linux:

				</p>

				<pre>

				[binaries]

				c = '/usr/bin/aarch64-linux-gnu-gcc'

				cpp = '/usr/bin/aarch64-linux-gnu-g++'

				ar = '/usr/bin/aarch64-linux-gnu-gcc-ar'

				strip = '/usr/bin/aarch64-linux-gnu-strip'

				pkgconfig = '/usr/bin/aarch64-linux-gnu-pkg-config'

				exe_wrapper = '/usr/bin/qemu-aarch64-static'

				[host_machine]

				system = 'linux'

				cpu_family = 'aarch64'

				cpu = 'aarch64'

				endian = 'little'

				</pre>

				<p>

				64-bit build on x86 windows:

				</p>

				<pre>

				[binaries]

				c = '/usr/bin/x86_64-w64-mingw32-gcc'

				cpp = '/usr/bin/x86_64-w64-mingw32-g++'

				ar = '/usr/bin/x86_64-w64-mingw32-ar'

				strip = '/usr/bin/x86_64-w64-mingw32-strip'

				pkgconfig = '/usr/bin/x86_64-w64-mingw32-pkg-config'

				exe_wrapper = 'wine'

				[host_machine]

				system = 'windows'

				cpu_family = 'x86_64'

				cpu = 'i686'

				endian = 'little'

				</pre>

				</div>

				</body>

				</html>

									
										6

docs/opengles.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				@@ -25,7 +25,7 @@ https://www.khronos.org/opengles/</a>.</p>

				<h2>Build the Libraries</h2>

				<ol>

				<li>Run <code>configure</code> with <code>--enable-gles1 --enable-gles2</code> and enable the Gallium driver for your hardware.</li>

				<li>Run <code>meson configure</code> with <code>-D gles1=true -D gles2=true</code> and enable the Gallium driver for your hardware.</li>

				<li>Build and install Mesa as usual.</li>

				</ol>

				@@ -33,7 +33,7 @@ Alternatively, if XCB-DRI2 is installed on the system, one can use

				<code>egl_dri2</code> EGL driver with OpenGL|ES-enabled DRI drivers

				<ol>

				<li>Run <code>configure</code> with <code>--enable-gles1 --enable-gles2</code>.</li>

				<li>Run <code>meson configure</code> with <code>-D gles1=true -D gles2=true</code>.</li>

				<li>Build and install Mesa as usual.</li>

				</ol>

									
										15

docs/osmesa.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				@@ -45,14 +45,14 @@ The OSMesa interface may be used with any of three software renderers:

				There are several examples of OSMesa in the mesa/demos repository.

				</p>

				<h1>Building OSMesa</h1>

				<h2>Building OSMesa</h2>

				<p>

				Configure and build Mesa with something like:

				<pre>

				configure --enable-osmesa --disable-driglx-direct --disable-dri --with-gallium-drivers=swrast

				make

				meson builddir -Dosmesa=gallium -Dgallium-drivers=swrast -Ddri-drivers=[] -Dvulkan-drivers=[] -Dprefix=$PWD/builddir/install

				ninja -C builddir install

				</pre>

				<p>

				@@ -63,13 +63,12 @@ Make sure you have LLVM installed first if you want to use the llvmpipe driver.

				When the build is complete you should find:

				</p>

				<pre>

				lib/libOSMesa.so  (swrast-based OSMesa)

				lib/gallium/libOSMsea.so  (gallium-based OSMesa)

				$PWD/builddir/install/lib/libOSMesa.so  (swrast-based OSMesa)

				$PWD/builddir/install/lib/gallium/libOSMsea.so  (gallium-based OSMesa)

				</pre>

				<p>

				Set your LD_LIBRARY_PATH to point to one directory or the other to select

				the library you want to use.

				Set your LD_LIBRARY_PATH to point to $PWD/builddir/install to use the libraries

				</p>

				<p>

									
										2

docs/perf.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

									
										6

docs/postprocess.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				@@ -55,10 +55,6 @@ Numbers higher than 8 see minimizing gains.

				<li>pp_celshade - set to 1 to enable cell shading (a more complex color filter).

				</ul>

				<br>

				<br>

				</div>

				</body>

				</html>

									
										7

docs/precompiled.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				@@ -22,15 +22,14 @@ In general, precompiled Mesa libraries are not available.

				<p>

				Some Linux distributions closely follow the latest Mesa releases. On others one

				has to use unofficial channels.

				<br>

				There are some general directions:

				</p>

				<p>There are some general directions:</p>

				<ul>

				<li>Debian/Ubuntu based distros - PPA: xorg-edgers, oibaf and padoka</li>

				<li>Fedora - Corp: erp and che</li>

				<li>OpenSuse/SLES - OBS: X11:XOrg and pontostroy:X11</li>

				<li>Gentoo/Archlinux - officially provided/supported</li>

				</ul>

				</p>

				</div>

				</body>

									
										120

docs/release-calendar.html
									
												View File
												
				@@ -2,32 +2,53 @@

				<html lang="en">

				<head>

				  <meta http-equiv="content-type" content="text/html; charset=utf-8">

				  <title>Release calendar</title>

				  <title>Release Calendar</title>

				  <link rel="stylesheet" type="text/css" href="mesa.css">

				</head>

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				<div class="content">

				<h1>Overview</h1>

				<h1>Release Calendar</h1>

				<h2>Overview</h2>

				<p>

				Mesa provides feature/development and stable releases.

				</p>

				<p>

				The table below lists the date and release manager that is expected to do the

				specific release.

				<br>

				</p>

				<p>

				Regular updates will ensure that the schedule for the current and the next two

				feature releases are shown in the table.

				</p>

				<p>

				In order to keep the whole releasing team up to date with the tools used, best

				practices and other details, the member in charge of the next feature release

				will be in constant rotation.

				</p>

				<p>

				The way the release schedule works is explained

				<a href="releasing.html#schedule" target="_parent">here</a>.

				</p

				>

				<p>

				Take a look <a href="submittingpatches.html#criteria" target="_parent">here</a>

				if you'd like to nominate a patch in the next stable release.

				</p>

				<h1 id="calendar">Calendar</h1>

				<h2 id="calendar">Calendar</h2>

				<table border="1">

				@@ -39,47 +60,72 @@ if you'd like to nominate a patch in the next stable release.

				<th>Notes</th>

				</tr>

				<tr>

				<td rowspan="3">18.1</td>

				<td>2018-08-10</td>

				<td>18.1.6</td>

				<td rowspan="3">19.1</td>

				<td>2019-08-20</td>

				<td>19.1.5</td>

				<td>Juan A. Suarez</td>

				<td>

				</tr>

				<tr>

				<td>2019-09-03</td>

				<td>19.1.6</td>

				<td>Juan A. Suarez</td>

				<td>

				</tr>

				<tr>

				<td>2019-09-17</td>

				<td>19.1.7</td>

				<td>Juan A. Suarez</td>

				<td>Last planned 19.1.x release</td>

				</tr>

				<tr>

				<td rowspan="4">19.2</td>

				<td>2019-08-06</td>

				<td>19.2.0-rc1</td>

				<td>Emil Velikov</td>

				<td>

				</tr>

				<tr>

				<td>2019-08-13</td>

				<td>19.2.0-rc2</td>

				<td>Emil Velikov</td>

				<td>

				</tr>

				<tr>

				<td>2019-08-20</td>

				<td>19.2.0-rc3</td>

				<td>Emil Velikov</td>

				<td>

				</tr>

				<tr>

				<td>2019-08-27</td>

				<td>19.2.0-rc4</td>

				<td>Emil Velikov</td>

				<td>Last planned RC/Final release</td>

				</tr>

				<tr>

				<td rowspan="4">19.3</td>

				<td>2019-10-15</td>

				<td>19.3.0-rc1</td>

				<td>Dylan Baker</td>

				<td></td>

				<td>

				</tr>

				<tr>

				<td>2018-08-24</td>

				<td>18.1.7</td>

				<td>2019-10-22</td>

				<td>19.3.0-rc2</td>

				<td>Dylan Baker</td>

				<td></td>

				<td>

				</tr>

				<tr>

				<td>2018-09-07</td>

				<td>18.1.8</td>

				<td>2019-10-29</td>

				<td>19.3.0-rc3</td>

				<td>Dylan Baker</td>

				<td>Last planned 18.1.x release</td>

				<td>

				</tr>

				<tr>

				<td rowspan="4">18.2</td>

				<td>2018-08-01</td>

				<td>18.2.0rc1</td>

				<td>Andres Gomez</td>

				<td></td>

				</tr>

				<tr>

				<td>2018-08-08</td>

				<td>18.2.0rc2</td>

				<td>Andres Gomez</td>

				<td></td>

				</tr>

				<tr>

				<td>2018-08-15</td>

				<td>18.2.0rc3</td>

				<td>Andres Gomez</td>

				<td></td>

				</tr>

				<tr>

				<td>2018-08-22</td>

				<td>18.2.0rc4</td>

				<td>Andres Gomez</td>

				<td>2019-11-05</td>

				<td>19.3.0-rc4</td>

				<td>Dylan Baker</td>

				<td>Last planned RC/Final release</td>

				</tr>

				</table>

									
										246

docs/releasing.html
									
												View File
												
				@@ -2,25 +2,26 @@

				<html lang="en">

				<head>

				  <meta http-equiv="content-type" content="text/html; charset=utf-8">

				  <title>Releasing process</title>

				  <title>Releasing Process</title>

				  <link rel="stylesheet" type="text/css" href="mesa.css">

				</head>

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				<div class="content">

				<h1>Releasing process</h1>

				<h1>Releasing Process</h1>

				<ul>

				<li><a href="#overview">Overview</a>

				<li><a href="#schedule">Release schedule</a>

				<li><a href="#pickntest">Cherry-pick and test</a>

				<li><a href="#stagingbranch">Staging branch</a>

				<li><a href="#branch">Making a branchpoint</a>

				<li><a href="#prerelease">Pre-release announcement</a>

				<li><a href="#release">Making a new release</a>

				@@ -30,12 +31,14 @@

				</ul>

				<h1 id="overview">Overview</h1>

				<h2 id="overview">Overview</h2>

				<p>

				This document uses the convention X.Y.Z for the release number with X.Y being

				the stable branch name.

				<br>

				</p>

				<p>

				Mesa provides feature and bugfix releases. Former use zero as patch version (Z),

				while the latter have a non-zero one.

				</p>

				@@ -51,13 +54,16 @@ For example:

				</pre>

				<h1 id="schedule">Release schedule</h1>

				<h2 id="schedule">Release schedule</h2>

				<p>

				Releases should happen on Wednesdays. Delays can occur although those

				should be keep to a minimum.

				<br>

				See our <a href="release-calendar.html" target="_parent">calendar</a> for the

				should be kept to a minimum.

				</p>

				<p>

				See our <a href="release-calendar.html" target="_parent">calendar</a>

				for information about how the release schedule is planned, and the

				date and other details for individual releases.

				</p>

				@@ -66,6 +72,9 @@ date and other details for individual releases.

				<li>Available approximately every three months.

				<li>Initial timeplan available 2-4 weeks before the planned branchpoint (rc1)

				on the mesa-announce@ mailing list.

				<li>Typically, the final release will happen after 4

				candidates. Additional ones may be needed in order to resolve blocking

				regressions, though.

				<li>A <a href="#prerelease">pre-release</a> announcement should be available

				approximately 24 hours before the final (non-rc) release.

				</ul>

				@@ -80,13 +89,23 @@ approximately 48 hours before the actual release.

				<p>

				Note: There is one or two releases overlap when changing branches. For example:

				<br>

				</p>

				<p>

				The final release from the 12.0 series Mesa 12.0.5 will be out around the same

				time (or shortly after) 13.0.1 is out.

				</p>

				<p>

				This also involves that, as a final release may be delayed due to the

				need of additional candidates to solve some blocking regression(s),

				the release manager might have to update

				the <a href="release-calendar.html" target="_parent">calendar</a> with

				additional bug fix releases of the current stable branch.

				</p>

				<h1 id="pickntest">Cherry-picking and testing</h1>

				<h2 id="pickntest">Cherry-picking and testing</h2>

				<p>

				Commits nominated for the active branch are picked as based on the

				@@ -103,7 +122,7 @@ a casual search for terms such as regression, fix, broken and similar.

				<p>

				Maintainer is also responsible for testing in various possible permutations of

				the autoconf and scons build.

				the meson and scons build.

				</p>

				<h2>Cherry-picking and build/check testing</h2>

				@@ -111,18 +130,21 @@ the autoconf and scons build.

				<p>Done continuously up-to the <a href="#prerelease">pre-release</a> announcement.</p>

				<p>

				As an exception, patches can be applied up-to the last ~1h before the actual

				release. This is made <strong>only</strong> with explicit permission/request,

				and the patch <strong>must</strong> be very well contained. Thus it cannot

				affect more than one driver/subsystem.

				</p>

				<p>

				Currently Ilia Mirkin and AMD devs have requested "permanent" exception.

				Developers can request, <em>as an exception</em>, patches to be applied up-to

				the last one hour before the actual release. This is made <strong>only</strong>

				with explicit permission/request, and the patch <strong>must</strong> be very

				well contained. Thus it cannot affect more than one driver/subsystem.

				</p>

				<p>Following developers have requested permanent exception</p>

				<ul>

				<li>make distcheck, scons and scons check must pass

				<li><em>Ilia Mirkin</em>

				<li><em>AMD team</em>

				</ul>

				<p>The following must pass:</p>

				<ul>

				<li>meson test, scons and scons check

				<li>Testing with different version of system components - LLVM and others is also

				performed where possible.

				<li>As a general rule, testing with various combinations of configure

				@@ -130,9 +152,9 @@ switches, depending on the specific patchset.

				</ul>

				<p>

				Achieved by combination of local ad-hoc scripts, mingw-w64 cross

				compilation and AppVeyor plus Travis-CI, the latter as part of their

				Github integration.

				These are achieved by combination of <a href="basictesting">local testing</a>,

				which includes mingw-w64 cross compilation and AppVeyor plus Travis-CI, the

				latter two as part of their Github integration.

				</p>

				<p>

				@@ -153,9 +175,8 @@ good contact point.

				<p>

				<strong>Note:</strong> If a patch in the current queue needs any additional

				fix(es), then they should be squashed together.

				<br>

				The commit messages and the <code>cherry picked from</code> tags must be preserved.

				fix(es), then they should be squashed together. The commit messages and the

				&quot;<code>cherry picked from</code>&quot;-tags must be preserved.

				</p>

				<p>

				@@ -209,8 +230,27 @@ system and making some every day's use until the release may be a good

				idea too.

				</p>

				<h2 id="stagingbranch">Staging branch</h2>

				<h1 id="branch">Making a branchpoint</h1>

				<p>

				A live branch, which contains the currently merge/rejected patches is available

				in the main repository under <code>staging/X.Y</code>. For example:

				</p>

				<pre>

					staging/18.1 - WIP branch for the 18.1 series

					staging/18.2 - WIP branch for the 18.2 series

				</pre>

				<p>

				Notes:

				</p>

				<ul>

				<li>People are encouraged to test the staging branch and report regressions.</li>

				<li>The branch history is not stable and it <strong>will</strong> be rebased,</li>

				</ul>

				<h2 id="branch">Making a branchpoint</h2>

				<p>

				A branchpoint is made such that new development can continue in parallel to

				@@ -218,10 +258,9 @@ stabilisation and bugfixing.

				</p>

				<p>

				Note: Before doing a branch ensure that basic build and <code>make check</code>

				testing is done and there are little to-no issues.

				<br>

				Ideally all of those should be tackled already.

				Note: Before doing a branch ensure that basic build and <code>meson test</code>

				testing is done and there are little to-no issues. Ideally all of those should

				be tackled already.

				</p>

				<p>

				@@ -260,14 +299,15 @@ Proceed to <a href="#release">release</a> -rc1.

				</p>

				<h1 id="prerelease">Pre-release announcement</h1>

				<h2 id="prerelease">Pre-release announcement</h2>

				<p>

				It comes shortly after outstanding patches in the respective branch are pushed.

				Developers can check, in brief, what's the status of their patches. They,

				alongside very early testers, are strongly encouraged to test the branch and

				report any regressions.

				<br>

				</p>

				<p>

				It is followed by a brief period (normally 24 or 48 hours) before the actual

				release is made.

				</p>

				@@ -297,10 +337,8 @@ Barring reported regressions or objections from developers.

				<p>

				Patch does not fit the

				<a href="submittingpatches.html#criteria" target="_parent">criteria</a> and

				is followed by a brief information.

				<br>

				The release maintainer is human so if you believe you've spotted a mistake do

				let them know.

				is followed by a brief information. The release maintainer is human so if you

				believe you've spotted a mistake do let them know.

				</p>

				<h2>Format/template</h2>

				@@ -412,7 +450,7 @@ Reason: The patch was reverted shortly after it was merged.

				</pre>

				<h1 id="release">Making a new release</h1>

				<h2 id="release">Making a new release</h2>

				<p>

				These are the instructions for making a new Mesa release.

				@@ -425,7 +463,7 @@ Ensure the latest code is available - both in your local master and the

				relevant branch.

				</p>

				<h3>Perform basic testing</h3>

				<h3 id="basictesting">Perform basic testing</h3>

				<p>

				Most of the testing should already be done during the

				@@ -435,96 +473,48 @@ So we do a quick 'touch test'

				</p>

				<ul>

				<li>make distcheck (you can omit this if you're not using --dist below)

				<li>meson dist

				<li>scons (from release tarball)

				<li>the produced binaries work

				</ul>

				<p>

				Here is one solution that I've been using.

				  Here is one solution:

				</p>

				<pre>

					# Set MAKEFLAGS if you haven't already

					git clean -fXd; git clean -nxd

					read # quick cross check any outstanding files

					export __version=`cat VERSION`

					export __mesa_root=../

					export __build_root=./foo

					chmod 755 -fR $__build_root; rm -rf $__build_root

					mkdir -p $__build_root &amp;&amp; cd $__build_root

					# For the native builds - such as distcheck, scons, sanity test, you

					# may want to specify which LLVM to use:

					# export LLVM_CONFIG=/usr/lib/llvm-3.9/bin/llvm-config

					# Do a full distcheck

					$__mesa_root/autogen.sh &amp;&amp; make distcheck

					# Build check the tarballs (scons, linux)

					tar -xaf mesa-$__version.tar.xz &amp;&amp; cd mesa-$__version

					scons

					cd .. &amp;&amp; rm -rf mesa-$__version

					# Build check the tarballs (scons, windows/mingw)

					# Temporary drop LLVM_CONFIG, unless you have a Windows/mingw one.

					# save_LLVM_CONFIG=`echo $LLVM_CONFIG`; unset LLVM_CONFIG

					tar -xaf mesa-$__version.tar.xz &amp;&amp; cd mesa-$__version

					scons platform=windows toolchain=crossmingw

					cd .. &amp;&amp; rm -rf mesa-$__version

					# Test the automake binaries

					# Restore LLVM_CONFIG, if applicable:

					# export LLVM_CONFIG=`echo $save_LLVM_CONFIG`; unset save_LLVM_CONFIG

					tar -xaf mesa-$__version.tar.xz &amp;&amp; cd mesa-$__version

					./configure \

						--with-dri-drivers=i965,swrast \

						--with-gallium-drivers=swrast \

						--with-vulkan-drivers=intel \

						--enable-llvm-shared-libs \

						--enable-llvm \

						--enable-glx-tls \

						--enable-gbm \

						--enable-egl \

						--with-platforms=x11,drm,wayland,surfaceless

					make &amp;&amp; DESTDIR=`pwd`/test make install

					# Drop LLVM_CONFIG, if applicable:

					# unset LLVM_CONFIG

					__glxinfo_cmd='glxinfo 2>&amp;1 | egrep -o "Mesa.*|Gallium.*|.*dri\.so"'

					__glxgears_cmd='glxgears 2>&amp;1 | grep -v "configuration file"'

					__es2info_cmd='es2_info 2>&amp;1 | egrep "GL_VERSION|GL_RENDERER|.*dri\.so"'

					__es2gears_cmd='es2gears_x11 2>&amp;1 | grep -v "configuration file"'

					test "x$LD_LIBRARY_PATH" != 'x' &amp;&amp; __old_ld="$LD_LIBRARY_PATH"

					export LD_LIBRARY_PATH=`pwd`/test/usr/local/lib/:"${__old_ld}"

					export LIBGL_DRIVERS_PATH=`pwd`/test/usr/local/lib/dri/

					export LIBGL_DEBUG=verbose

					eval $__glxinfo_cmd

					eval $__glxgears_cmd

					eval $__es2info_cmd

					eval $__es2gears_cmd

					export LIBGL_ALWAYS_SOFTWARE=true

					eval $__glxinfo_cmd

					eval $__glxgears_cmd

					eval $__es2info_cmd

					eval $__es2gears_cmd

					export LIBGL_ALWAYS_SOFTWARE=true

					export GALLIUM_DRIVER=softpipe

					eval $__glxinfo_cmd

					eval $__glxgears_cmd

					eval $__es2info_cmd

					eval $__es2gears_cmd

					# Smoke test DOTA2

					unset LD_LIBRARY_PATH

					test "x$__old_ld" != 'x' &amp;&amp; export LD_LIBRARY_PATH="$__old_ld" &amp;&amp; unset __old_ld

					unset LIBGL_DRIVERS_PATH

					unset LIBGL_DEBUG

					unset LIBGL_ALWAYS_SOFTWARE

					unset GALLIUM_DRIVER

					export VK_ICD_FILENAMES=`pwd`/src/intel/vulkan/dev_icd.json

					steam steam://rungameid/570  -vconsole -vulkan

					unset VK_ICD_FILENAMES

				    __glxgears_cmd='glxgears 2&gt;&amp;1 | grep -v "configuration file"'

				    __es2info_cmd='es2_info 2&gt;&amp;1 | egrep "GL_VERSION|GL_RENDERER|.*dri\.so"'

				    __es2gears_cmd='es2gears_x11 2&gt;&amp;1 | grep -v "configuration file"'

				    test "x$LD_LIBRARY_PATH" != 'x' &amp;&amp; __old_ld="$LD_LIBRARY_PATH"

				    export LD_LIBRARY_PATH=`pwd`/test/usr/local/lib/:"${__old_ld}"

				    export LIBGL_DRIVERS_PATH=`pwd`/test/usr/local/lib/dri/

				    export LIBGL_DEBUG=verbose

				    eval $__glxinfo_cmd

				    eval $__glxgears_cmd

				    eval $__es2info_cmd

				    eval $__es2gears_cmd

				    export LIBGL_ALWAYS_SOFTWARE=true

				    eval $__glxinfo_cmd

				    eval $__glxgears_cmd

				    eval $__es2info_cmd

				    eval $__es2gears_cmd

				    export LIBGL_ALWAYS_SOFTWARE=true

				    export GALLIUM_DRIVER=softpipe

				    eval $__glxinfo_cmd

				    eval $__glxgears_cmd

				    eval $__es2info_cmd

				    eval $__es2gears_cmd

				    # Smoke test DOTA2

				    unset LD_LIBRARY_PATH

				    test "x$__old_ld" != 'x' &amp;&amp; export LD_LIBRARY_PATH="$__old_ld" &amp;&amp; unset __old_ld

				    unset LIBGL_DRIVERS_PATH

				    unset LIBGL_DEBUG

				    unset LIBGL_ALWAYS_SOFTWARE

				    unset GALLIUM_DRIVER

				    export VK_ICD_FILENAMES=`pwd`/test/usr/local/share/vulkan/icd.d/intel_icd.x86_64.json

				    steam steam://rungameid/570  -vconsole -vulkan

				    unset VK_ICD_FILENAMES

				</pre>

				<h3>Update version in file VERSION</h3>

				@@ -614,7 +604,7 @@ docs/release-calendar.html. Then commit and push:

				</pre>

				<h1 id="announce">Announce the release</h1>

				<h2 id="announce">Announce the release</h2>

				<p>

				Use the generated template during the releasing process.

				@@ -626,7 +616,7 @@ series, if that is the case.

				</p>

				<h1 id="website">Update the mesa3d.org website</h1>

				<h2 id="website">Update the mesa3d.org website</h2>

				<p>

				As the hosting was moved to freedesktop, git hooks are deployed to update the

				@@ -634,14 +624,12 @@ website. Manually check that it is updated 5-10 minutes after the final <code>gi

				</p>

				<h1 id="bugzilla">Update Bugzilla</h1>

				<h2 id="bugzilla">Update Bugzilla</h2>

				<p>

				Parse through the bugreports as listed in the docs/relnotes/X.Y.Z.html

				document.

				<br>

				If there's outstanding action, close the bug referencing the commit ID which

				addresses the bug and mention the Mesa version that has the fix.

				document. If there's outstanding action, close the bug referencing the commit

				ID which addresses the bug and mention the Mesa version that has the fix.

				</p>

				<p>

									
										36

docs/relnotes.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="contents.html"></iframe>

				@@ -21,6 +21,40 @@ The release notes summarize what's new or changed in each Mesa release.

				</p>

				<ul>

				<li><a href="relnotes/19.1.4.html">19.1.4 release notes</a>

				<li><a href="relnotes/19.1.3.html">19.1.3 release notes</a>

				<li><a href="relnotes/19.1.2.html">19.1.2 release notes</a>

				<li><a href="relnotes/19.0.8.html">19.0.8 release notes</a>

				<li><a href="relnotes/19.1.1.html">19.1.1 release notes</a>

				<li><a href="relnotes/19.0.7.html">19.0.7 release notes</a>

				<li><a href="relnotes/19.1.0.html">19.1.0 release notes</a>

				<li><a href="relnotes/19.0.6.html">19.0.6 release notes</a>

				<li><a href="relnotes/19.0.5.html">19.0.5 release notes</a>

				<li><a href="relnotes/19.0.4.html">19.0.4 release notes</a>

				<li><a href="relnotes/19.0.3.html">19.0.3 release notes</a>

				<li><a href="relnotes/19.0.2.html">19.0.2 release notes</a>

				<li><a href="relnotes/18.3.6.html">18.3.6 release notes</a>

				<li><a href="relnotes/19.0.1.html">19.0.1 release notes</a>

				<li><a href="relnotes/18.3.5.html">18.3.5 release notes</a>

				<li><a href="relnotes/19.0.0.html">19.0.0 release notes</a>

				<li><a href="relnotes/18.3.4.html">18.3.4 release notes</a>

				<li><a href="relnotes/18.3.3.html">18.3.3 release notes</a>

				<li><a href="relnotes/18.3.2.html">18.3.2 release notes</a>

				<li><a href="relnotes/18.2.8.html">18.2.8 release notes</a>

				<li><a href="relnotes/18.2.7.html">18.2.7 release notes</a>

				<li><a href="relnotes/18.3.1.html">18.3.1 release notes</a>

				<li><a href="relnotes/18.3.0.html">18.3.0 release notes</a>

				<li><a href="relnotes/18.2.6.html">18.2.6 release notes</a>

				<li><a href="relnotes/18.2.5.html">18.2.5 release notes</a>

				<li><a href="relnotes/18.2.4.html">18.2.4 release notes</a>

				<li><a href="relnotes/18.2.3.html">18.2.3 release notes</a>

				<li><a href="relnotes/18.2.2.html">18.2.2 release notes</a>

				<li><a href="relnotes/18.1.9.html">18.1.9 release notes</a>

				<li><a href="relnotes/18.2.1.html">18.2.1 release notes</a>

				<li><a href="relnotes/18.2.0.html">18.2.0 release notes</a>

				<li><a href="relnotes/18.1.8.html">18.1.8 release notes</a>

				<li><a href="relnotes/18.1.7.html">18.1.7 release notes</a>

				<li><a href="relnotes/18.1.6.html">18.1.6 release notes</a>

				<li><a href="relnotes/18.1.5.html">18.1.5 release notes</a>

				<li><a href="relnotes/18.1.4.html">18.1.4 release notes</a>

				<li><a href="relnotes/18.1.3.html">18.1.3 release notes</a>

									
										2

docs/relnotes/10.0.1.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.0.2.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.0.3.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.0.4.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.0.5.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.0.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.1.1.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.1.2.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.1.3.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.1.4.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.1.5.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.1.6.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.1.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.2.1.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.2.2.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.2.3.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.2.4.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.2.5.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.2.6.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.2.7.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.2.8.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.2.9.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										8

docs/relnotes/10.2.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

				@@ -69,14 +69,15 @@ TBD.

				<h2>Changes</h2>

				<ul>

				<li>Renamed <i>--with-llvm-shared-libs</i> to <i>--enable-llvm-shared-libs</i></li>

				<li>Renamed <i>--with-llvm-shared-libs</i> to <i>--enable-llvm-shared-libs</i>

				<p>

				The option is used to control how mesa is linked against LLVM, and now

				defaults to enabled (shared linking).

				</p>

				</li>

				<li>Split <i>libxatracker.so</i> into a standalone library which can be used

				with any gallium driver.</li>

				with any gallium driver.

				<p>

				Previously the library was linked statically against vmware's virtual gpu

				driver(svga), whereas now it loads a shared pipe_*.so driver. Provide the

				@@ -88,6 +89,7 @@ following options during configure, if you would like support for svga driver

				Note: The files are installed in $(libdir)/gallium-pipe/ and the interface

				between them and libxatracker.so is <strong>not</strong> stable.

				</p>

				</li>

				<li>The environment variable GALLIUM_MSAA that forced a multisample GLX visual was removed.</li>

				</ul>

									
										2

docs/relnotes/10.3.1.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.3.2.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.3.3.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.3.4.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.3.5.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.3.6.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										2

docs/relnotes/10.3.7.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

									
										4

docs/relnotes/10.3.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

				@@ -327,7 +327,7 @@ DRM drivers that don't have a full-fledged GEM (such as qxl or simpledrm)</li>

				<li>Removed support for the GL_ATI_envmap_bumpmap extension</li>

				<li>The hacky --enable-32/64-bit is no longer available in configure. To build

				32/64 bit mesa refer to the default method recommended by your distribution</li>

				</li>The environment variable GALLIUM_MSAA that forced a multisample GLX visual was removed.</li>

				<li>The environment variable GALLIUM_MSAA that forced a multisample GLX visual was removed.</li>

				</ul>

				</div>

									
										2

docs/relnotes/10.4.1.html
									
												View File
												
				@@ -8,7 +8,7 @@

				<body>

				<div class="header">

				  <h1>The Mesa 3D Graphics Library</h1>

				  The Mesa 3D Graphics Library

				</div>

				<iframe src="../contents.html"></iframe>

Compare commits

10636 Commits 18.2 ... 19.2-branc

3 .editorconfig Unescape Escape View File

52 .gitignore vendored Unescape Escape View File

382 .gitlab-ci.yml Normal file Unescape Escape View File

285 .gitlab-ci/debian-install.sh Normal file Unescape Escape View File

10 .gitlab-ci/deqp-default-skips.txt Normal file Unescape Escape View File

124 .gitlab-ci/deqp-llvmpipe-fails.txt Normal file Unescape Escape View File

112 .gitlab-ci/deqp-runner.sh Executable file Unescape Escape View File

445 .gitlab-ci/deqp-softpipe-fails.txt Normal file Unescape Escape View File

62 .gitlab-ci/meson-build.sh Executable file Unescape Escape View File

17 .gitlab-ci/run-shader-db.sh Executable file Unescape Escape View File

8 .mailmap Unescape Escape View File

679 .travis.yml Unescape Escape View File

19 Android.common.mk Unescape Escape View File

26 Android.mk Unescape Escape View File

6 CleanSpec.mk Unescape Escape View File

92 Makefile.am Unescape Escape View File

19 README.rst Unescape Escape View File

15 REVIEWERS Unescape Escape View File

1 SConstruct Unescape Escape View File

2 VERSION Unescape Escape View File

30 appveyor.yml Unescape Escape View File

14 autogen.sh Unescape Escape View File

9 bin/.gitignore vendored Unescape Escape View File

81 bin/get-fixes-pick-list.sh Unescape Escape View File

122 bin/get-pick-list.sh Unescape Escape View File

42 bin/get-typod-pick-list.sh Unescape Escape View File

29 bin/git_sha1_gen.py Executable file → Normal file Unescape Escape View File

22 bin/install_megadrivers.py Executable file → Normal file Unescape Escape View File

88 bin/meson-cmd-extract.py Executable file Unescape Escape View File

63 bin/meson-options.py Executable file Unescape Escape View File

1 bin/meson.build Unescape Escape View File

130 bin/symbols-check.py Normal file Unescape Escape View File

8 common.py Unescape Escape View File

3334 configure.ac View File

14 docs/application-issues.html Unescape Escape View File

257 docs/autoconf.html Unescape Escape View File

6 docs/bugs.html Unescape Escape View File

37 docs/codingstyle.html Unescape Escape View File

6 docs/conform.html Unescape Escape View File

58 docs/contents.html Unescape Escape View File

22 docs/debugging.html Unescape Escape View File

2 docs/developers.html Unescape Escape View File

36 docs/devinfo.html Unescape Escape View File

100 docs/dispatch.html Unescape Escape View File

50 docs/download.html Unescape Escape View File

45 docs/egl.html Unescape Escape View File

717 docs/envvars.html Unescape Escape View File

2 docs/extensions.html Unescape Escape View File

190 docs/faq.html Unescape Escape View File

182 docs/features.txt Unescape Escape View File

17 docs/helpwanted.html Unescape Escape View File

314 docs/index.html Unescape Escape View File

80 docs/install.html Unescape Escape View File

30 docs/intro.html Unescape Escape View File

16 docs/license.html Unescape Escape View File

8 docs/lists.html Unescape Escape View File

93 docs/llvmpipe.html Unescape Escape View File

37 docs/mangling.html Unescape Escape View File

49 docs/mesa.css Unescape Escape View File

377 docs/meson.html Unescape Escape View File

6 docs/opengles.html Unescape Escape View File

15 docs/osmesa.html Unescape Escape View File

2 docs/perf.html Unescape Escape View File

6 docs/postprocess.html Unescape Escape View File

7 docs/precompiled.html Unescape Escape View File

120 docs/release-calendar.html Unescape Escape View File

246 docs/releasing.html Unescape Escape View File

36 docs/relnotes.html Unescape Escape View File

2 docs/relnotes/10.0.1.html Unescape Escape View File

2 docs/relnotes/10.0.2.html Unescape Escape View File

2 docs/relnotes/10.0.3.html Unescape Escape View File

2 docs/relnotes/10.0.4.html Unescape Escape View File

2 docs/relnotes/10.0.5.html Unescape Escape View File

2 docs/relnotes/10.0.html Unescape Escape View File

2 docs/relnotes/10.1.1.html Unescape Escape View File

2 docs/relnotes/10.1.2.html Unescape Escape View File

2 docs/relnotes/10.1.3.html Unescape Escape View File

2 docs/relnotes/10.1.4.html Unescape Escape View File

10636 Commits

18.2 ... 19.2-branc

3

.editorconfig

View File

52

.gitignore vendored

View File

382

.gitlab-ci.yml Normal file

View File

285

.gitlab-ci/debian-install.sh Normal file

View File

10

.gitlab-ci/deqp-default-skips.txt Normal file

View File

124

.gitlab-ci/deqp-llvmpipe-fails.txt Normal file

View File

112

.gitlab-ci/deqp-runner.sh Executable file

View File

445

.gitlab-ci/deqp-softpipe-fails.txt Normal file

View File

62

.gitlab-ci/meson-build.sh Executable file

View File

17

.gitlab-ci/run-shader-db.sh Executable file

View File

8

.mailmap

View File

679

.travis.yml

View File

19

Android.common.mk

View File

26

Android.mk

View File

6

CleanSpec.mk

View File

92

Makefile.am

View File

19

README.rst

View File

15

REVIEWERS

View File

1

SConstruct

View File

2

VERSION

View File

30

appveyor.yml

View File

14

autogen.sh

View File

9

bin/.gitignore vendored

View File

81

bin/get-fixes-pick-list.sh

View File

122

bin/get-pick-list.sh

View File

42

bin/get-typod-pick-list.sh

View File

29

bin/git_sha1_gen.py Executable file → Normal file

View File

22

bin/install_megadrivers.py Executable file → Normal file

View File

88

bin/meson-cmd-extract.py Executable file

View File

63

bin/meson-options.py Executable file

View File

1

bin/meson.build

View File

130

bin/symbols-check.py Normal file

View File

8

common.py

View File

3334

configure.ac

View File

14

docs/application-issues.html

View File

257

docs/autoconf.html

View File

6

docs/bugs.html

View File

37

docs/codingstyle.html

View File

6

docs/conform.html

View File

58

docs/contents.html

View File

22

docs/debugging.html

View File

2

docs/developers.html

View File

36

docs/devinfo.html

View File

100

docs/dispatch.html

View File

50

docs/download.html

View File

45

docs/egl.html

View File

717

docs/envvars.html

View File

2

docs/extensions.html

View File

190

docs/faq.html

View File

182

docs/features.txt

View File

17

docs/helpwanted.html

View File

314

docs/index.html

View File

80

docs/install.html

View File

30

docs/intro.html

View File

16

docs/license.html

View File

8

docs/lists.html

View File

93

docs/llvmpipe.html

View File

37

docs/mangling.html

View File

49

docs/mesa.css

View File

377

docs/meson.html

View File

6

docs/opengles.html

View File

15

docs/osmesa.html

View File

2

docs/perf.html

View File

6

docs/postprocess.html

View File

7

docs/precompiled.html

View File

120

docs/release-calendar.html

View File

246

docs/releasing.html

View File

36

docs/relnotes.html

View File

2

docs/relnotes/10.0.1.html

View File

2

docs/relnotes/10.0.2.html

View File

2

docs/relnotes/10.0.3.html

View File

2

docs/relnotes/10.0.4.html

View File

2

docs/relnotes/10.0.5.html

View File

2

docs/relnotes/10.0.html

View File

2

docs/relnotes/10.1.1.html

View File

2

docs/relnotes/10.1.2.html

View File

2

docs/relnotes/10.1.3.html

View File

2

docs/relnotes/10.1.4.html

View File

2

docs/relnotes/10.1.5.html

View File